Jan 31, 2007

Can One Person Provision and Manage 400 Servers in an Afternoon?

With the latest release of Collage, affectionately codenamed “Starbuck,” we have significantly increased the scalability of Collage. With this release, Collage can now manage 400+ servers under a single control node. As part of the standard system test effort for Collage, we routinely scale to ~450 app nodes being managed by a single Collage control node. And this control node is run-of-the-mill pizza box server, typically with 2 CPU’s and 4 GB of RAM. You can get one of these from CDW for less than $4000, sometimes as cheap as $2500.

Just the other day, I had to give a demo to our EVP of Sales, Jim Flatley, to show him the new product’s capabilities. So I decided to set up a system with 400 app nodes, just to show off our new stuff. BTW, app node is short for “application node,” which for us means a physical machine or virtual machine (VM) under Collage management. My System Test team normally scales to 400+ app nodes under management as part of the dev-test environment. (In fact, Mukund likes to show off a system he’s been running with 465 app nodes—kind of chest-thumping, but it’s still pretty cool.)

So I started off with a system with 40 physical servers, all bare metal. I imported 3 different VmWare ESX images, one with 24 VM’s per host, one with 18 VM’s per host and another with 12 VM’s per host. I created three tiers, one tier for each image, and let Collage allocate servers for VM hosts so that each VM would have 500 MB RAM. (This is easy to do, just set up each tier’s attributes to specify memory requirements.) I also used the NetInstall feature, which is new in the latest Collage release. The application image for the ESX host is locally installed on the app node.

Once the ESX hosts were activated, all the VM’s were discovered and inventoried by Collage. So I had 400 VM’s in the free pool. I then imported some application images— a combination of ELAS 4 and Windows images— and created tiers with these. Some were set up as Web farms, others as computational tiers. I then activated tiers—first a 50-node tier, then another 50-node tier. Then I activated a 100-node tier. I also created some tiers that used physical machines, just for fun. At one point, I was simultaneously activating 200 app nodes. Collage automatically throttles the activations so that app nodes activate in batches.

Three hours later, I had configured an environment with 400 servers under management, all provisioned from bare metal by Collage. I had a mixture of physical machines, virtual machines, Linux applications and Windows applications. And by the way, I’m not a sys admin. I’m an engineering director at Cassatt. I don’t set up servers for a living. But with our product, I can set up and manage my own data center.

And BTW, the demo was a success. I’m pretty sure Jim was impressed. He was smiling at the end of the meeting.