Cassatt Collage goes beyond provisioning. Collage constantly monitors the applications and servers in your data center. Collage polls for a heartbeat from each server—using standard OS-level and application-level monitors available in Linux, Windows and Solaris. You can also introduce your own custom monitors, such as a customized agent or a script running inside a database, and have Collage monitor that as well. When a server has failed, Collage will replace the server with a new one from the free pool and boot the same application/service on the new server—all of this within minutes and without losing any data.
For each application, you specify the monitoring parameters. You can define which monitors to use, the polling interval, and how many retries you should allow. In a smaller configuration with less than 100 servers, I like to monitor SNMP at 30-second intervals with 3 retries. For larger configurations with ~400 servers, I would increase the monitoring interval to 60 seconds. For Apache servers, I add an HTTP monitor on a system URL embedded in my web app so that I can ensure that the Apache service is running.
As part of our standard customer demo, I like to pull a blade server that’s running an application and watch Collage respond to the failure. I’ll let the customer pull the blade out of the chassis in the lab; blinking lights, fan noise and 10 racks of servers always adds a little extra to the demo. By the time we return to the conference room (with the failed blade in hand), Collage has quarantined the failed blade in a maintenance pool, allocated a new server from the free pool and booted this server to the same application. All of this takes only 3 minutes from bare metal to a running application on a new server. This demo is always very memorable and illustrates High Availability in a very simple manner. (Seeing is believing.) I had given this demo to some visiting executives from TCS. A year later, I had met them again in San Jose, and they still remembered the demo.