Vinay Pai: 2007

Dec 9, 2007

Server Marketshare from Enck and Claunch at Gartner

At the Gartner Conference, John Enck and Carl Claunch (both Gartner analysts) presented some interesting trends in server operating systems. Based on worldwide server shipment revenue, two operating systems are growing in marketshare:

Windows: $19 billion in 2007
Linux: $8.6 billion

UNIX marketshare is declining, but was still $16 billion in 2007. The total server shipment revenue in 2007 was $54 billion. Among mainframes, IBM z-Series is growing, while other vendors are decreasing.

Based on a poll of the audience (~300 members),

40% were reducing the number of server operating systems at their enterprise.
19% were increasing
41% were remaining steady

Based on another audience poll for use of blade architectures in the data center:

13% are researching blades
13% are testing blades
54% have deployed blades
20% have no plans for blades

Some other interesting highlights from this presentation:

Itanium is viable for only the HP-UX platform but not for other operating systems.
80% of all x86 servers deployed in data centers run Windows.
Solaris, AIX and HP-UX are stable UNIX platforms. Other UNIX platforms are dying.
Gartner sees a resurgence of physical appliances. Appliances must integrate with existing monitoring and management tools.

Dec 7, 2007

Leadership Lessons from Commander Mike Abrashoff

At the Gartner Conference, one of the keynotes was from Mike Abrashoff, a former Navy Commander. During his talk, which was titled "It's Your Ship!," he focused on leadership and how you can change the performance of an organization. He used anecdotes from his command of the USS Benfold.

When he took command of the ship, he found that performance suffered from:

Infighting among the 5 deparments on the ship
Training was cut, due to budget reasons
People would complain about things not under their control
Tradition: Things were done as they had always been done

Abrashoff worked for William Perry, whom he described as "excellent without being arrogant." He modeled his own leadership after the way Perry treated people.

He spoke about how to instill a sense of urgency. Each month, a different division would "be in the spotlight."

Abrashoff would routinely walk around the ship and talk with sailors. He would ask:

What do you like most?
What do you like the least?
What can we improve, within budget?

Abrashoff instilled a culture where anyone on the ship, regardless of rank, could make a suggestion. Here are some of the suggestions that he implemented:

Change the bolts on the ship from iron to stainless steel. Instead of painting the ship every 2 months, the ship could be painted every 10 months. This best-practice was then implemented throughout the US Navy.
Thursday evenings, the crew would gather on the flight deck and listen to jazz and watch the sun set. This improved morale and was relatively cheap to implement.

Abrashoff believes in setting limits and letting people take action within those limits. He also believes in recruiting people everyday even they're already on-board. During his command, the Benfold became one of the top ships in the US Navy based on different performance and productivity criteria. It all starts with leadership!

Gartner's Infrastructure Operations Maturity Model

At Gartner's Data Center Conference, Donna Scott and Jay Pultz, both Gartner VP Analysts,

announced the new Gartner Infrastructure and Operations Maturity Model. There are 6 levels for an organization:

0: Survival
1: Awareness
2: Committed: sufficient resources available (people, capital)
3: Proactive
4: Service-Aligned (SLA’s defined as IT services)
5: Business Partner

From a poll of the audience (~2000 attendees), 60% are at levels 1 and 2, 7% are at level 0 (survival), 19% are level 3 (proactive) and only 1% are level 5 (business partner).

Level 2: focuses on customer satisfaction. Organizations invest in project management, incident management and service support.

There are three components to the IO Maturity Model: People, process and technology. Technology changes are the easiest to implement, followed by process changes and finally followed by people changes (e.g., training and having the appropriate staff).

Technology changes needed for the different levels:

1: Asset management system in place
2: Consolidation, standards in place
3: Automated failover and architecture in place
4: End-to-end service levels
5: Dynamically change the infrastructure

Levels 2 and below are not sustainable levels. By the end of 2012,

35% of large enterprises will be at level 3 (proactive), compared with 25% in 2007.
14% will be at service-aligned, compared with 9% today

Bittman Talks about Data Center Sprawl

Okay, this post is a little late, but hey, it was hard to get work done after hours in Vegas. We had booth duty until 8:30pm at the show, followed by a long dinner. I'm not much of a gambler, but I did drop $40 in slot machines, hoping for the big payout. So I didn't break the bank, and I'm back to blogging.

Here are some highlights from Tom Bittman's opening keynote at the Gartner Conference:

Data center sprawl (physical and virtual) can be managed by creating pools of resources. Automation of the Real Time Infrastructure will be the trend if 2010-2016. Virtualization is becoming a commodity. By the end of 2008, the hypervisor will become free. By the end of 2009, there will be 4 million x86 VM’s. Managing those VM's will be the challenge, and where most vendors should focus their efforts.

Real Time Infrastructure

Resources are shared.
The interface is business policy and SLA’s.
Provides agility to applications and services.
Inputs are service requirements, servers and storage. The outputs are IT services.

CMDB’s must be used with well-defined process. Organizations must make changes to culture and process for CMDB’s to be effective.

Run Book Automation: workflow of operations and process. This is not really technology.

Virtualization enables alternative delivery models:

Cloud computing / grid
Software appliances
Containers
Infrastructure as a service

Power and cooling are problems. Demand drives an increase in energy requirements. Virtualization solves a short-term, tactical problem. However, virtualization increases the long-term demand for energy, since the barrier to entry for deploying new servers (as VM’s) is reduced.

Over the next few days, I'll try to post about the other sessions I attended. Stay tuned.

Nov 27, 2007

Tuesday at the Gartner Data Center Conference

I just survived day one at the Gartner Data Center Conference, and I'm taking a small break before booth duty at the Cassatt booth. This is my first year at the conference, and I'm joining a veteran crew of Cassatt folks-- or alumni as Gartner calls them. This year, the conference is held in the MGM Grand Conference Center, which is a 10-minute walk from the MGM Grand hotel. There are nearly 2,000 attendees at the event, but the crowd is quite different from the Java One crowd I used to hang with. There are quite a few "suits" in the audience, and the sessions are quite different from Java One sessions and BOF's (Birds of a Feather-- an informal tech talk usually held in the evening).

I attended several sessions today:

Keynote: The Future of Infrastructure and Operations by Tom Bittman (VP Distinguished Analyst, Gartner)
Keynote: The Gartner Infrastructure and Operations Maturity Model by Donna Scott and Jay Pultz (both are VP Distinguished Analysts, Gartner)
Keynote: It's Your Ship! by Michael Abrashoff (author and former Navy Commander)
Certain and Uncertain Futures of Server Technology by Carl Claunch (VP Distinguished Analyst, Gartner) and John Enck (Research VP, Gartner)
A vendor session by Greg Ratcliff, Manager, Liebert Monitoring)
Two vendor sessions by Ivan Passos (Director of Product Management, Avocent): one on their new VM support in DS View 3 and the other on their MergePoint appliance

Later tonight, I'll post some entries with my notes from these different talks. I've got to don my green Cassatt bowling shirt and get ready for booth duty. There will be a small contingent of us in the booth, flanking an MCC that has the latest Cassatt Active Response software. Let's hope we wow the crowd! It is Vegas after all.

Nov 17, 2007

Come and Get It! Standard Edition is Here

On Nov 16, 2007 at 8:59pm PT, we had our GA build of Cassatt Active Response, Standard Edition. That's when I got the final e-mail from Sudhrity Mondal, the new QE manager, that he and Chuck Brunson had finished their testing of Standard Edition. Build 9352 is now our GA build for the new Policy Manager, and we now have our GA release of the Standard Edition-- on Nov 16, as planned. (In all three US time zones, no less.) Whoo, hoo!

Standard Edition introduces several new components:

The Policy Manager, a new UI for entering policies for power-managing your servers.

The Scheduler, which schedules these policies and graphically shows which policy is in effect.

The Report Manager, which provides different reports for showing how much your different teams are spending on power. Yes, it's a

charge-back report on operating costs for your server usage.

Looking back on the past few months, it's definitely been a pretty intense effort getting to this point. And here are the folks who made it all possible: Alan McClellan, Barbara McKercher, Bill Minto, Bob Hendrich, Chuck Brunson, Craig Vosburgh, Dave Resch, Dorothy Vernon, Jim Engquist, Jason Haugland, James Urquhart, Jo Pelkey, Jon Nordby, Ken Oestreich, Kevin Werner, Kirk Fjeldheim, Linda Finnegan, Lynn Still, Mark Emeis, Martha Dumler, MaryAnn Zhang, Melinda Sorber, Spencer Smith and Sudhrity Mondal. I hope everyone's taking the weekend off!

Nov 14, 2007

From Capitol Hill to Sand Hill (Road)

The latest Silicon Valley development revolves around our newest Venture Capitalist-- Vice President Al Gore. Earlier this week, Kleiner Perkins announced that Al Gore has joined the VC firm as their newest partner. Gore will focus on new clean-technology investments, continuing Kleiner's new investments in green technology.

I guess you could see this coming. When Gore accepted his Nobel Peace Prize, he accepted in downtown Palo Alto. As a Palo Alto resident, I remember the crowds lining up to see the Vice President. School was out that day (for other reasons), and many students even ventured downtown to see Mr. Green.

Nov 11, 2007

Managing VM's Is No Easy Task

A recent cover story in Network World magazine talks about the difficulty in managing virtualized environments. Virtualization is everywhere-- especially in dev/test environments. VM's are easy to create and setup, but that rapid proliferation of VM's introduces new challenges in managing these virtualized environments.

Now, don't get me wrong. I like VM's. They're cool, and the learning curve is not steep. If you can set up a development environment on a physical server, there's almost nothing else you need to do with a VM. However, what happens when you have 100's or 1000's of VM's running around?

According to Network World, there are several things to watch for:

Consistency and standardization (patch-levels on your apps and O/S) become a bigger issue when managing VM's alongside physical machines.
Since VM's are easy to deploy (just create 'em as you need 'em), there is a tendency to have too many VM's. Controlling virtual server sprawl requires the same processes and auditing that would be used to control deployments of physical servers.
The standard management tools that ship with VmWare or Xen are not sufficient to manage large-scale VM deployments.
The problems of the managing physical servers don't disappear in the virtual world-- they multiply and become obscured by the intangible boundaries between systems.

It was a long article, but a very interesting read. You should check it out for yourself.

If you've been following my blog, you'll know that I've posted often about creating VM environments. Back in January, I wrote about provisioning 400 VM's in an afternoon with Collage and XVM. Our new product line, Cassatt Active Response, integrates the Collage and XVM products.

Cassatt Active Response Premium Edition provides the ability to create and manage VM environments from VmWare and Xen. Active Response will provision the hypervisor, create the VM's and then deploy applications to those VM's. With Linux applications, you can create a heterogeneous environment with applications deployed to VM's and physical machines. You can also migrate a Linux app running on a VM to a physical machine, in case you realize that you need the additional horsepower from a dedicated, physical server.

Cassatt Active Response allows you to manage physical machines and virtual machines, by integrating the features of the former XVM product. We've also reduced the price point with Premium Edition. Check it out for yourself, and get those VM's under control!

Nov 5, 2007

Say Hello to My Little Friend, Cassatt Active Response

My recent blog entries have all focused on power management and some of the inefficiencies in today's engineering labs and departmental server rooms. During the past few weeks, I've been talking with customers, and a recurring theme has been the desire to get control of the power and cooling costs.

One customer in Arizona has maxed out their data center, and they're not allowed to pull any more power into their facility. Even though electricity is inexpensive in Arizona, this customer has a power problem. They would like to reduce electrical consumption by 20% so that they can deploy new applications. However, they're not in a position to change out their servers or their infrastructure. They just want an easy way to power down one set of servers (and applications) so they can deploy another set of new applications (and their corresponding servers). Many of these servers are needed for cyclical applications, such as batch and ERP, that need more capacity at the end of the month. So managing which servers are powered on/off can allow this customer to deploy new applications without building a new data center.

In Silicon Valley, a large product-development company has hundreds of engineering labs and departmental server rooms. These labs are teaming with the devices they need to develop and test their product. These labs are used primarily for weekly builds and occasional patches for customer escalations. And after you count all the devices in these "little labs," the grand total is more than 200,000 devices. Now that's a large electric bill!

Cassatt's new product line, Cassatt Active Response, provides some solutions that are easy to implement in these engineering environments. The new product line consists of four product editions packaged for different audiences.

Standard Edition provides energy efficiency by allowing you to manage power consumption and set policies in your environment.
Premium Edition provides increased energy efficiency and application resiliency by allowing you to pool resources and manage application workloads.
Data Center Edition provides increased energy efficiency, high application availability,
and server workload management in production environments.
Enterprise Edition will allow you to manage all data center resources across your enterprise.

Premium Edition and Data Center Edition map to the previous Collage & XVM products. (details). Both of these editions are available today. Enterprise Edition is a new product that we are still developing, and it will be available in late 2008.

Standard Edition is a new product that is combines new technologies, such as Active Power Management, with Collage technologies, such as policy management and a rules engine. Standard Edition introduces a new policy manager that allows different teams to set up power-management policies for the servers that they use. Standard Edition is easy to deploy in engineering environments, since you don't need to change the applications or O/S on the servers that will be power-managed. And you can see immediate savings. To calculate the savings in your environment, check out the ROI calculator.

And yes, we fly our own airplane within Cassatt. (I prefer that to "eating your own dog food.") We have three engineering labs with close to 500 servers. The servers in these labs are all managed with Cassatt Active Response. Servers are powered on only when they're part of a dev/test cycle. And our instantaneous power consumption is well below our rated capacity in each lab.

Oct 12, 2007

Should You Plant a Tree for Every Server in Your Lab?

Last week, I had traveled to Colorado Springs to spend some time with my team. There were several things I noticed during my visit. The clean, Colorado air was far less polluted than familiar Silicon Valley. There were no brown "smog rings" around Pike's Peak, unlike those I see around Mt. Hamilton from my office window. I heard from Spen that the local electric utility is planning to roll-out a demand-curtailment program, even though electricity is pretty cheap in Colorado. (By the way, that's my Colorado team in the picture. We're in Garden of the Gods, and that's Pike's Peak in the background.)

In Silicon Valley, the local utility, Pacific Gas & Electric, has several demand-curtailment programs to help combat the excess demand on the electric grid during warm summer months. If your data center participates in this program, PG&E will (1) provide lower rates for electricity during peak periods (2) provide rebates and (3) most importantly guarantee that your data center will remain operational (i.e., no brown-outs). In September, Cassatt had announced the Active Power Management technology. Watch this space for some exciting product announcements in the very near future.

This morning, I was reading a recent interview with Gartner's Rakesh Kumar. Some interesting highlights:

Data centers account for 25% of global carbon emissions from IT and communications technology.
40% of the emissions are from PC's and monitors.
The data-center emissions are rising more rapidly than other sources.
Gartner released a research advisory Monday in which "Green IT" tops their list of industry issues.

In the past few weeks, I've been talking with several customers in Silicon Valley, and they all have expressed similar concerns. One company has three engineering sites in Silicon Valley, Massachusetts and India. All three are located in geographies with expensive electricity, but that's also where they get engineering talent. Their engineering labs are spending $1 million / year in electricity. We're talking with them about Active Power Management and how we can provide some quick savings by powering off unneeded servers and networking devices according to schedule-based policies. Customers are very receptive, since several are planning to roll-out new conservation programs in the coming months.

So there are some easy ways to manage the electric bill in your organization by starting with the development and test environments you have. You don't need to buy carbon offsets or plant a tree. You can address the source of the problem with a low-cost solution that's also simple-to-implement.

Sep 3, 2007

Yes, It's Still Safe to Power Off and Power On That Server

After my previous post on the reliability of power supplies, I decided to see what our Cassatt experiences can tell us about server reliability. Within my department, I have engineering labs located in three locations-- Colorado Springs, Minneapolis and San Jose-- and about 500 servers in total.

Mukund and I decided to look at the data from 123 servers located in San Jose. These servers are used by Mukund's team for System Test activities. His team has developed over 700 automated tests that are used to qualify our Cassatt product suite. As part of the test run, servers are routinely power-cycled. We physically pull power from the servers at the start of each test run. All the nodes are on managed Power Distribution Units (APC's and Baytechs), and the automated tests power down the outlets from the PDU before running the tests. This has been in place since 2004.

For the 123 servers that were analyzed, not a single power-supply or disk drive failed during the past two years.

Here are the server counts in the study:

26 IBM HS-20 blades
8 HP DL380 G4
45 HP DL360 G4
8 HP DL360 G3
6 HP DL140
3 HP DL385
5 Sun SPARC
1 IBM x345
15 Dell 1850
6 Dell 2650

During the past 5 months, the power supplies on these 23 servers were power-cycled 18,826 times. That's an average of once per day per server. As part of the system testing, these servers were power-cycled repeatedly by using their power controller. The power operations from the power controller generate stress on the server's internal comments, such as the motherboard and disk drives, but the power supply remains connected to A/C power. These power operations from the power controller are not counted in the 18,826 figure cited earlier.

In a future posting, Mukund and I will provide more details on these additional power operations. We will also provide data from the servers in our other engineering labs.

So if you're still afraid to power down that server, don't worry! Power supplies and hard drives are very reliable these days. From several different studies, we've seen that power supplies hold up quite well from (and are even designed for) power cycling.

Aug 28, 2007

Don't Worry, It's Safe to Power off that Server and Power It on Again

There's lot of superstition out there regarding data center best practices, and there is some amount of voodoo when it comes to powering down servers. Will the server come up when you power it on? Will the power supply fail?

This morning, I spoke with Mukesh Khattar who has studied the failure rates for the 30,000 servers in his data center. During the past three years, only 1 power supply failed. That's a failure rate of 0.001% / year. (Yeah, that's five-nines, baby!)

Mukesh is also looking at reliability rates for servers with dual power supplies and single power supplies. A power supply is designed to run at 80%-90% load, which is the case in a server with a single power supply. When you have dual power supplies in a server, each supply only runs at 40%-50% load. Since these power supplies are running below their optimum load level, they consequently generate more heat.

In our conversation, I started thinking about what you're spending for that additional reliability. With a Dell 2950, that second power supply costs $299. For 1000 servers, you've just spent $299k for those second power supplies. I'm not even counting the additional operating expense for the eletricity and cooling costs. During the 3-year depreciation for those servers, only 0.06 power supplies will fail. That's right, not even 1 sever out of that 1000 is expected to fail in 3 years due to a power-supply failure.

So, a few take-aways:

Don't over-provision your server hardware for failures that are extremely unlikely. You're going to drive up your capital expenses and your operating expense for a failure that's unlikely to ever occur.
Power off those unused servers. When you power them back on, they will come back on. Throw away your garlic cloves and salt shakers. It will be okay. Look at the data, it will set you free.
Take a different approach to high availability. Instead of trying to bullet-proof your hardware to prevent a failure, think about a graceful way to recover from a hardware failure.

Aug 27, 2007

What's Your Carbon Footprint?

Today's Mercury News had a great (and simple) worksheet that allows you to measure how you are personally contributing to global warming. There are a total of 5 inputs:

miles driven per year
gas mileage of your vehicle
average electric usage per month (in KWh)
average natural gas usage per month (in therms)
miles flown each year

I used the worksheet and calculated my family's carbon footprint and came up with the following. My family's carbon footprint is 50,156 lbs of CO2 per year-- or 12,539 lbs of CO2/year per person.

The averages provided by the Mercury News are:

Bay Area: 25,102 lbs of CO2/year
California: 26,301
Nationwide: 35,967

I don't know if comparing my family's per-individual amortization to the averages provided is fair. However, I included both cars and the family vacations instead of just my personal transportation and travels. Hmm, I wonder if I should purchase CO2 offset credits? (And I'm actually serious) One thing that I have done at home is to replace all my incandescent light bulbs with Compact Fluorescent Lights (CFL's).

Anyway, take a look at the Mercury News and their nifty calculator for yourself.

Aug 23, 2007

What Did Your Organization Do This Summer?

With the dog days of summer winding down, most kids will be starting (or have already started) the new school year. As part of the traditional back-to-school ritual, there's always a "what did you do during summer vacation?" session during the first week of class. Kids get to brag about all the new and cool things they did during the past several weeks. One of my son's classmates is the son of a famous CEO, and this kid always has the best stories for the summer. My son and I play golf at the local muni, whereas my son's friend plays Sharon Heights, Stanford and Pebble. (sigh)

I was attending a vendor's seminar the other day, and the speaker was asking the audience of IT personnel about new initiatives at their companies. There's lots of press out there about new trends in the industry: virtualization, server consolidation, power management, data center management...

With summer coming to a close, take a moment to reflect back on what your organization did this summer? What initiatives do you have in your organization? How are you advancing your data-center management practices? What are you doing to help top-line revenue growth or improve operating expenses? Are there simple things you can address in short order? (The famous CEO I alluded to earlier has a favorite expression: "low-hanging watermelon.")

Maybe you've already got one or two initiatives in place that are staffed and budgeted. Great! And if not, maybe it's a good time to go after some low-hanging watermelon...

Back to Virtual Reality

After a 1-week vacation in Houston, I'm back to Silicon Valley. I was briefly back in the Central time zone last week during my 3-day visit to our Minneapolis office. Earlier today, I had lunch with a former colleague from Sun Microsystems, and we were catching up on old times. Arun is now a technology evangelist in the Java group at Sun, and we were chatting about various Web 2.0 development technologies and the growth of these new communities. We eventually got around to talking blogs and comparing blog statistics. (Hey, if you blog, you know you track your blog statistics, even if you don't admit it.) Anyway, Arun's daily blog traffic is 10x my monthly blog traffic, which is pretty impressive. If you're a hardcore developer, you should check out his blog. My hats off to you, Arun!

Aug 5, 2007

Welcome to the Boomtown

This week, I'm literally and figuratively thousands of miles away from Silicon Valley. I'm visiting my parents in Houston, Texas, which is also my old hometown (at least from sixth grade onwards). Houston has been-- and still is-- the oil capital of the world. They even say oil differently in Houston. It's pronounced ahwl-- and there's plenty of it here. Gas is much cheaper (by almost $0.90/gallon) than in the San Francisco Bay Area. Even my kids noticed the difference at the pump! And I haven't seen a single Prius in two days. I bet Toyota's best-selling car in Texas is their Tundra Crew Cab. In Texas, the Prius might do as a golf cart, but not as your vehicle for getting around town. After all, it was already 80 degrees at the crack of dawn, and it warmed up to a balmy 94 degrees by noon. I wonder how long the air conditioner in a Prius runs from the battery? So, no surprise that the Bay Area's best-selling car is practically a no-show in Houston.

Although I'm the canonical Silicon Valley geek who lives on the leading edge of technology, my parents tend to be a little more towards the center of the bell curve when it comes to technology adoption. (Well, actually, they're somewhere in the trailing edge of the bell curve, but they do read my blog.) Earlier this year, they converted to broadband (DSL) from a dial-up Internet connection. My Dad even set up the DSL modem himself! They also have an HDTV and Dish network (satellite). And my Dad doesn't subscribe to a print newspaper. He gets his news from a variety of news sources on the Internet. This is pretty cool! This was a nice data point reaffirming my own beliefs in broadband, the Web and how technology is transforming everyday lives. Now, I just need to see when my Dad starts blogging...

Aug 1, 2007

Appliances in the Data Center

Recently, Google has been generating some new buzz with its search appliances. Google offers a shrink-wrapped version of its search engine in two different packages-- a 1-U rack-mounted server or a 2-U rack-mounted server. The Google Mini starts at $1,995 for a 50,000-document version and scales up to $8,995 for a version that searches up to 300,000 documents. Once you cross above the 300,000 document limit of the Mini, you step up to the pricier Google Search Appliance, which starts at $30k.

The appliances are manufactured by Dell and are distributed by Ingram. Dell touts the multi-colored Google appliance in its print ads and on its website. From the look of Google's appliance, I'm guessing it's a Dell 1950 under the hood of the mini and a Dell 2950 under the hood of the larger version. Dell's new 1900 series are much better than the their 1800 series. We have four generations of Dell hardware in our lab. From our experience, the 1800 series had some quality and reliability issues not found in their previous generations. From our recent experiences, Dell appears to have corrected these problems with their latest generation of servers.

By controlling the hardware environment, installation is greatly simplified, and the application's performance and reliability becomes more predictable. It will be interesting to see how the Google appliance fares.

Jul 29, 2007

Rob's Most Excellent Tape

In 2006, we were discussing ways to distribute small software add-ons that complement our current products. Rob Gingell, our CTO, talked about the "utility tape" that accompanied Solaris and other UNIX operating systems. The top utility was never part of the operating system by default, but it was found on "the tape." However, everyone used top, and top eventually found its way into the standard operating system distro.

So, the RMET, or Rob's Most Excellent Tape, was born! Now, we don't actually use a tape to distribute it, but the name was kinda catchy. (And I think Rob liked it.)

One of the first things that found its way to the tape was the Scripting SDK (SSDK). We needed a way to extend Collage functionality with small scripts written in a UNIX shell or Windows batch program. So, one day Mukund wrote the SSDK. It hid the complexities of the Collage Web Services interface and exposed a small and simple set of commands. The SSDK is now a standard product offering and even has its own documentation on InfoCentral.

The Java dashboard is another item that is found its way into the SSDK. Last year, we were giving a demonstration to the executive staff of a large company, and we decided that we needed a high-level dashboard rather than the detailed web UI. So, Mukund developed a Java Swing app that used the Collage Web Services interface to expose a high-level dashboard that showed how servers were allocated to different application tiers. His dashboard also showed servers powering up and down. Recently, the Java dashboard found a new audience with our Sales Engineers, and its now installed in a partner's executive briefing center in New York.

And a few months ago, Jason became a new contributor to the RMET. He developed a Reporting application that extracts data from the Collage database into an external data warehouse. Jason then used Active Grid to develop a reporting application that allows you to track server usage by cost centers. The reporting tool allows you to track depreciation and operating costs (e.g., electricity) by the cost centers you define. Since Collage allows you to allocate servers to applications on a dynamic basis, you can adjust the capacity allocated to each application. So, now you can have a centrally managed IT pool and charge back to different groups for their actual usage.

The RMET provides a convenient way to introduce these add-on functionalities. And one day some of these add-ons might find their way into the standard product suite-- just like the top you know and love!

Jul 8, 2007

Introducing My New Blog

With this blog, I've focused primarily on technology topics, with a particular emphasis on data-center issues. Well, I continue to keep this same focus in this blog. However, I've started a new blog that will focus on one of my hobbies. Most of my colleagues and friends know about my interest in the game of golf. (Okay, it's probably more than just an interest.) Well, along the way, I've managed to teach both of my kids the game, and my 8-year old is my regular playing partner.

My new blog, "Golf with Your Kids," will be a collection of experiences, resources and tips for enjoying the game of golf with your kids. Take a look at my introductory posting. And if you're a golfer, please check back on a regular basis. Fore!

Jul 3, 2007

What's in Your Head (of Tree)?

Head of Tree. Tip of Tree. It's your source code repository, where you check in your latest and greatest source code. Could you ship it to a customer at a moment's notice? Actually, would you ship it to a customer? I understand your hesitation. The latest code still needs to be tested. You have to shake out the bugs, and then beat the release into shape. And then after a stabilization period, you're ready to release-- probably weeks to months later, right?

At Cassatt, we switched to a new development process in PD at the start of the year. Head of Tree, or HoT as we affectionately call it, is always shippable. Asynchronous projects are used as the vehicle for developing new functionality or fixing bugs. Each project has a well-defined scope and is completely self-contained for completeness. When a project is launched, its completion criteria-- the required functionality, expected quality criteria, associated application payloads and documentation-- are all defined up-front. Only when the project meets all of its completion criteria, can it integrate to head of tree.

In this new model, a project can not integrate if it introduces new bugs or regressions. So, all projects must integrate with zero bugs. So as new functionality is introduced, the overall quality level of the head of tree is maintained. Since many projects introduce new automated tests in addition to new functionality, the overall quality of head of tree can increase over time.

Transitioning to this new process was a significant cultural change for the Product Development organization-- and it wasn't easy. Starting in the last quarter of 2006, the PD management team defined the new process and worked through the details. We flattened the organization and centralized decision-making to one Product Team. And there were many small bumps and hiccups along the way. At times, we were still working "the old way" in the new model, and we had to make a conscious effort to change. At times it felt like we were working more slowly in the new model, but we found that there were fewer missteps and almost no "steps backward" to move forward. And by slowing down, we took the time to review architectural changes to the system and understand their impacts. With each new project, we got better and more efficient at defining requirements, planning, developing and testing. Along the way, we developed new metrics and reporting tools to monitor the projects and the head of tree.

It's now been six months since we transitioned to the new model, and we have seen several tangible benefits.

New functionality can be developed and released to customers quickly. There is no need to "wait for the rest of the release" to complete.
Projects can be sized accordingly to complexity and/or effort involved. A "large" project can complete in 3-4 months, and a "small" projects can complete in 1-2 months. Simple bug fixes can be handled in a lightweight manner, requiring only a few days.
Requirements are defined and understood before starting development. We have even decided to not proceed with some projects because the scope was not appropriate or the ROI was not sufficient.
Development has become more predictable. We maintain an integration schedule for the next several months and let our Field know when to expect new functionality. The upcoming integration schedule is reviewed weekly with Sales and Professional Services.

In a future posting, I'll provide more details about the process and also talk about some of the metrics we use to monitor progress. Until then, have a happy Fourth of July, and stay safe!

Jun 11, 2007

To DHCP, or not to DHCP: That is the Question.

My apologies in advance (or belated in this case) to William Shakespeare. This morning, I met with a customer and had a rather in-depth whiteboard session with two of their senior architects. We were discussing how to use Collage in a production deployment. I usually start with our standard technical presentation (gotta love PowerPoint), but I quickly find myself drawn to the whiteboard (or "grease board" as one of our Sales reps is fond of saying).

All technical environments (with IT as no exception) have their own systems of best practices, dogmas and religious beliefs. DHCP versus fixed IP addressing for servers falls somewhere in between dogma and religion. However, as you start moving towards utility computing, your data center can take on a more dynamic persona. For example, applications (and servers) could be provisioned as they are needed to respond to increasing workloads. Servers can also be re-purposed during the day, as your data center takes on different application profiles. A given server could be an e-mail server in the morning, a web server in the afternoon and a business intelligence server at night.

Cassatt Collage allows you to manage more static, traditional data centers and also allows you to manage a more dynamic, utility-computing environment. In order to repurpose a server, Collage takes advantage of DHCP. A server's IP address is assigned by Collage's DHCP service. However, Collage allows you to control precisely what IP addresses are allocated to your applications.

Let's take the following example of a typical three-tier application:

Web tier: up to 20 servers with IP addresses 10.20.120.40 - 10.20.120.59
App server tier: up to 10 servers with IP addresses 10.20.120.60 - 10.20.120.69
Database tier: up to 5 servers with IP addresses 10.20.120.70 - 120.20.120.74

When you create these tiers in Collage, you can specify the IP addresses available to each tier. In this manner, you have precise control over your IP address space and how different applications map to your network topology. The Collage Network Virtualization Service (NVS) also allows you to specify a VLAN or network segment for each application tier.

When servers are allocated to an application tier at run time, each server is given an IP address by the Collage DHCP service. Even though these servers take advantage of dynamic IP addressing, their IP address can be constrained ahead of time. If you are taking advantage of NVS, Collage will can create a new VLAN for you and will automatically program the layer-2 switches in your data center. This allows you precise control of how servers are mapped into your environment. A particular server's application stack and network identification, however, are determined dynamically when that server is allocated to an application tier.

Jun 3, 2007

Some Fun in the Sun

Despite scattered thunderstorms (or more appropriately, scattered sunshine) during my three-day visit to Minneapolis, I did manage to get some sunshine during my visit. I was pleasantly surprised to find a brand new Toyota Solara convertible waiting fo

r me at the Hertz gold canopy. I had reserved an intermediate car, but there was a shiny, red Solara (with only 7 miles on the odometer) waiting for me.

On Thursday evening, Martha, Linda, Jason, Luis and I hopped into the Solara and ventured to Babalu, a new dining hot spot in downtown Minneapolis. The food was great, and the drive was great too. We managed to avoid downpours, and the weather was a pleasant 70 degrees. On the way back, we took the scenic route through St. Paul and checked out the Cathedral of St. Paul. The cathedral marked its 100th anniversary this weekend, and there was a historic lighting of the exterior during my visit. A local doctor arranged for a lighting company

to light the exterior for two days, and he footed the bill for this event. Pretty neat!

After winding our way along the Mississippi River, we finally arrived back at the Cassatt office in Mendota Heights. It was a nice evening-- with good company, good food and a nice drive in the cool evening air.

May 31, 2007

It's 11:30 PM. Do You Know What Your Servers Are Doing?

My guess is that your servers are still powered on, but probably not doing anything. I'm writing this post from my hotel room, after a nice dinner in downtown Minneapolis with my engineering team. After a relaxing, long weekend, I hopped a short flight up to the land of 10,000 lakes to spend some time with my engineering team in our Cassatt office in Mendota Heights, Minnesota.

Earlier in the afternoon, Jason showed me his latest updates to the Collage Reporting application. The Collage Reporting application collects statistics on how your managed servers and services are used and builds a data warehouse with this information. The Reporting application provides metering capabilities by delivering reports that detail:

Which resources (e.g., servers) are allocated to different applications or departments.
Utilization of these resources in CPU-hours.

These reports allow your IT department to charge the different lines of business for their actual usage of data center resources. Jason and I talked about some enhancements. What if you could enter the depreciation and operating cost (i.e., power and HVAC) of each server? Now, you get an actual operating cost for your business applications!

I'll provide more details (and a screen shot) in a future post. It's almost midnight in Central Time, and I think I'm going to call it a night...

May 28, 2007

A Whale of a Weekend

Unless you've been living under a rock for the past three weeks, you've probably heard of Delta and Dawn-- the mother and daughter humpback whale who have been swimming in the Sacramento River. On Saturday, I decided to go see the whales for myself. So, the wife, kids and I piled into the car and let the trusty navigation system guide us to Rio Vista, home of the Rio Vista Bridge that you've been seeing on TV. The 90 mile drive from Palo Alto to Rio Vista was quite scenic. However, once you passed the furthest edge of the Bay Area, the scenery and towns started to look more like east Texas than California-- flat plains, small towns, old pick-up trucks and two-lane roads.

We finally ended up at Rio Vista and found a small landing at the base of the bridge, where an NBC-11 news van had been parked since morning. There was also a small crowd of 40 people, trying to catch a glimpse of the whales. As it turns out, we had missed the whales. They had swum north past the bridge in the morning. So we headed north along the west bank of the river to the ferry crossing and joined a small gathering of 50 or so families, all trying to catch a glimpse of the whales. It was a warm, sunny day and the perfect weather for a whale sighting. The local Cal Trans employee informed us that the whales were 30 miles north, close to Sacramento.

Once again, we headed north along the river, towards Sacramento. At times, the river got quite narrow, and we wondered how these whales ever got this far inland. After 20 miles or so, we gave up our on our whale-watching quest, and we decided to head back home to Bay Area.

Today's news, however, was quite encouraging. The whales have turned south, crossed under the Rio Vista bridge, and are headed towards the open Ocean. Hurray for Delta and Dawn!

May 19, 2007

SaaS Making Inroads in the Consumer Space

Software as a Service (SaaS) is the latest trend in the software development industry. Everyone wants to deliver their software as a hosted service that you rent rather than purchase and install on your own servers. The high-flying Salesforce.com is the poster-child of SaaS. Even SAP is wrapping itself in SaaS clothing these days. (See my previous posts).

There are some interesting trends taking place in the consumer space. This year, tax season has been rather good to Intuit, despite a few hiccups in last-minute online filings. Intuit just released their quarterly earnings, and they had their first $1 billion quarter-- $1.15 billion to be exact. GAAP net income came in at $367 million, which is not too shabby. (Details)

Apart from the nice revenue, there are some interesting trends in Intuit's Turbo Tax unit sales for this tax year:

Intuit sold 6,942,000 copies of its shrink-wrapped Turbo Tax software that you install on your desktop. This was a 2% drop from last year's sales of shrink-wrapped Turbo Tax.
Unit sales of Turbo Tax for the web increased by 16% to 6,042,000.
Another 1,422,000 users filed for free with Turbo Tax for the web, a 3% increase over last year's numbers.
Altogether, more people (51.8%) filed with the hosted (web-based) version of Turbo Tax rather than the version installed on your desktop.

For years, most people have been using a hosted e-mail provider (e.g., Yahoo, Google or Hotmail) for their personal e-mail. The latest sales figures from Turbo Tax show that consumers are comfortable with using hosted applications for their financial data. It will be interesting to see how Google does with its Google documents-- a suite of hosted office applications.

May 18, 2007

City Hall Is Going Green

Sixteen cities around the world will receive financing from major banks to make their government buildings more energy efficient. Citi, Deutsche Bank, JP Morgan Chase, UBS and ABN Amro have each committed $1 billion to finance upgrades in lighting, cooling, heating, rooting and other environmental improvements. This initiative was announced Wednesday at the C40 Large Cities Climate Summit in New York. Houston, New York, Chicago and London are among the cities participating.

What's interesting is how these cities will pay for the improvements. The banks are providing loans to finance the improvements. The city governments believe that the energy savings will exceed the financing costs. In other words, the energy improvements will pay for themselves, which is pretty cool. (Pun sort-of intended)

Check out all the details in the article in Wired magazine.

May 15, 2007

Hasso Plattner Speaks about SAP's A1S Software

To continue a theme from my recent posting, I found some more details about SAP's A1S product from a recent article in Sand Hill. You've probably seen the SAP TV commercial commercials-- the yellow ones with customers who are pleasantly surprised to learn about SAP's offerings in the small-to-medium-sized business (SMB) customers. SAP's A1S product, which has been under development for the past 3 years, aims to deliver solutions for the SMB space.

Some additional insight into their product:

3,000 SAP employees are working on A1S.
A1S is a split from their previous source base and delivers a new user interface. (I've worked with the current SAP client, which is a locally installed, fat client.) The new user interface (UI) is web-based and makes extensive use of Service-Oriented Architectures (SOA).
In a departure from their rigid past, the new UI can be customized to the different industry verticals.
A1S will emphasize a hosted deployment model. In fact, Hasso references successes from Google and Salesforce.com with a hosted deployment model. Customers wary of a hosted model will still be able to install the software on their on-site servers.

May 9, 2007

SAP Talks about Its New A1S Software

Today's Mercury News has a very timely article about SAP's A1S Software. Hasso Plattner, their chairman, talked about A1S at the Software 2007 conference that's taking place this week in Santa Clara, just a few miles from our Cassatt headquarters.

Some highlights on A1S:

A1S will be a hosted, web-based offering of SAP's famous (or infamous depending on your point-of-view) ERP software
A1S has been under development for the past 3 years and is expected to release in 2008.
Hasso touts the "Software as a Service" (SaaS) model and how A1S will be SAP's foray into this space.
A1S was announced and discussed last month at SAP's Sapphire conference for its key customer and partners.

Very interesting article. Take a read for yourself.

May 4, 2007

Dispelling Myths in the Data Center

To manage a data center, you need security policies, operating procedures, best practices and run books. Unfortunately, there's also a collection of myths and superstitions that tend to accumulate over time. One of these concerns the impact of powering a server up/down on that server's failure rate. It's time to dispell that myth.

The reality: servers and their internal components are designed to be resilient to power operations. Powering servers on and off does not increase their failure rate. Most server hardware released in the past 4 years has been designed for power operations. Servers from HP, Dell, IBM and Sun all ship with power controllers that allow you to power them on/off remotely. All the internal components are designed wtih power management in mind: solid-state power supplies, small-diameter hard drives that can spin up/down very quickly, efficient use of VLSI and custom ASIC's, redundant on-board network interface cards.

Think about your own laptop. Do you power it off or suspend it at night when you're not using it? Do you have power-management enabled on your laptop so the hard drive spins down after a period of inactivity?

Guess what, servers are also designed to be powered off when they're not needed and powered on only when you need them. It's just that most data center applications are not designed with power management in mind. Data centers are provisioned for peak load, whereas the average load is significantly lower.

What if you could power on servers only as they are needed to respond to increasing load? Cassatt has power management solutions for your data center where we can provide that missing power-management capability. These same power management solutions also allow to you to reduce power consumption based on time of day or demand-reduction events from your power company.

We Believe!

I realize this is a bit off topic from my normal postings, but last night's Warriors game was just amazing! The Golden State Warriors trounced the Dallas Mavericks 111-86! The Warriors took over in the third quarter, and the game was never close after that. It was a great game to watch, that is if you're from the Bay Area. So, the eighth-seeded Warriors close out the series 4-2 and advance to the second round. For more details on the game, check out the Mercury News.

Apr 30, 2007

Insights from Bill Janeway at Warburg Pincus

Here's an interesting article in Sandhill.com from Bill Janeway at Warburg Pincus. Bill is one of our board members at Cassatt, and he previously led the investment in BEA Systems. The article gives you a 40-year perspective on the Venture Capital community and how it's changed over the years.

Some interesting points on what it takes these days to put together a start-up that is seeking venture capital:

Begin with a first-class, seasoned management team
Find a market undergoing demand growth where some form of disruption is creating a space for a new business
Buy the components of the business – technology, distribution, customer base - that you can; only build what you have to (which usually does include building innovative technology)

Bill also talks about what's required these days to IPO. For those of us at Cassatt, we've heard these same principles from our CEO, Bill Coleman, in various town halls. The article makes for an interesting read. Take a look for yourself.

Apr 29, 2007

Do You Fly Your Own Airplane?

One of the best tests for a software product is using it for your own needs-- in other words, flying your own airplane. By using your product in an internal production environment, you can shake out bugs and reliability issues and make a better product for your paying customers. Running in a real environment helps you find issues that wouldn't turn up in test scenarios.

In Cassatt, we started using Collage to run our IT systems and development build systems in early 2005. We have three development sites, and the IT systems in each site are run on Collage. A common set of application images are developed in San Jose and pushed out to the other sites, which have no local IT staff.

Product builds, however, are the at core of a development organization. If you can't check-in code and build your product, everything comes to a screeching halt. In Product Development, we use Cruise Control (builds), Bit Keeper (source-code management) and Bugzilla (defect tracking). All three applications are managed in a Collage environment. Spencer Smith, one of our developers in Colorado, wears a second hat-- release engineer or buildmeister.

In 2007, we changed our development process in PD so that we now have about a dozen projects proceeding in parallel. Each project requires its own repository and build. The build system was revamped to include a web application that allows project leads to configure new projects and to set up their own build schedules.

This new Unified Build Service (UBS) is a LAMP application that simply runs as an application tier within a Collage environment. This same Collage environment includes other application tiers for the builder nodes, which are part of the CruiseControl system. CruiseControl also runs as another application within this Collage environment. Spen just finished UBS last week, and now all project builds have transitioned to UBS. So, Collage is built using Collage. Pretty neat, huh?

Apr 18, 2007

Out of Capacity on Tax Day

Like many tax payers, I waited until the last weekend to finish my taxes, but I filed one day early, on April 16. I used Intuit's Turbo Tax, but I still filed the old-fashioned way-- by mail. As it turns out, many procrastinators filing electronically on April 17 were in for a nasty surprise. Intuit ran out of server capacity, and many people's returns could not be processed in time. Luckily, the IRS extended the deadline to April 19 for folks who had trouble filing electronically. Read the details in today's Mercury News.

Some highlights:

Intuit processed more than 1 million transactions on April 17. This was double the number of electronic filings from the previous year.
Once the system reached capacity, many filers were simply turned away.

I bet Intuit had other servers sitting idle, dedicated to other applications while their tax-processing servers were maxed out. Unfortunately, there was no way to tap into that excess capacity. These types of workload spikes happen in many industries, and sometimes the traffic patterns are very predictable-- news properties during the week, music and sports properties during the weekend.

What if you could shift capacity around your data center as the demand is needed? What if you could tap into those lower-priority applications and harvest the capacity to higher-priority applications? How much would that be worth to your business' top-line revenue?

Apr 11, 2007

How to Optimize Power Consumption of Your Data Center

There's a lot of talk these days about "green data centers" and reducing power consumption. Cassatt Collage has several features that can be used to optimize the power consumption of your data center.

Collage dynamically allocates servers to applications as they are needed. So, if you have a farm of web servers, you can allocate capacity as demand increases rather than provisioning all the servers at once. Servers that are unused remain in a free pool, and Collage powers them off. (Check out an older post for more details.)

Collage includes a pretty powerful rules engine, sometimes called the brain, and sometimes called Jerry's brain (named after its inventor). The rules engine allocates servers using a least-cost algorithm so that servers are optimally allocated based on application needs. For example, you might have a database tier that need dual-CPU, 4 GB servers, and an Apache tier that can use any old server. Collage will allocate your heavy-duty servers to the database tier and the lightweight servers to the Apache tier.

Using the Collage rules engine and power management technologies, you can have servers allocated based on their power consumption. For example, virtual machines could be allocated first-- since they don't require additional power-- followed by 1-U servers, then 2-U servers and then finally those power-hungry 4-U servers. For each class of server, you simply add additional attributes to the hardware inventory maintained by Collage. Add one attribute for your 1-U servers, two attributes for your 2-U servers and four attributes for your 4-U servers.

When servers are allocated to an application tier, Collage will automatically allocate the most power-efficient server that is available. And when that application no longer requires the same service capacity, extra servers are automatically returned to the free pool and powered off. Voila! Your data center now optimizes its own power consumption based on application needs. Pretty cool, huh?

Apr 8, 2007

Interesting Silicon Valley News

Today's Mercury News has two interesting articles about the tech rebound that has been occurring in Silicon Valley. The headline, "The Tech Rebound," talks about the rebound of the economy in Silicon Valley after the painful dot-com bust. Some interesting stats for the % change from 2005 to 2006:

Household income is up 6.2%, compared to a nationwide average of 4.9%.
50,000 new jobs were created in the past two years-- not quite the 200,000 jobs lost in the dot-com crash. Employment is still up 2.9% (1.8% nationwide).
Home prices are up 4% (1.4% nationwide). The median home price in Santa Clara County rose to $749,000.
Auto sales are up 2.6%, whereas they're down 2.6% nationwide.

We locals use the traffic on the major freeways as a gut-check for the economy. The good news is that traffic has increased, which means the economy has definitely picked up!

Another article talks about Google's impending brain-drain as their earliest employees approach the fourth year of vesting for their pre-IPO stock. Now that's a nice problem to have as an employee!

Check out both articles in the Merc for an interesting read!

Apr 6, 2007

Avez-Vous Un Blog?

A few weeks ago, Alan McClellan had set up a Google analytics profile for my blog. (Thanks, Alan!) Recently, I hit a personal milestone with my blog. I now have readers from every continent, which is pretty cool. So, welcome to my new international audience. I'll try to keep this space current and informative. Maybe I'll be brave and try to set up automatic translation of my blog (traduction en francais). Yes, I did live in Paris for three years during an international assignment, and I did love the experience.

Mar 30, 2007

SAP Uses Second Life As Well

The other night, while I was trying to find Cory's blog, I ran across an interesting blog from a guy at SAP. Turns out we were both at the same talk in Palo Alto. (See my previous post for more info). SAP is using Second Life for online training and demos. For details, read Mario's blog. Pretty neat.

Mar 28, 2007

Virtual Worlds Hosted in Data Centers

A few weeks ago, I went to a talk by Linden Labs CTO, Cory Ondrejka. His background leading up to Second Life is quite interesting. He started off developing simulators for the military, then moved to writing arcade games, and then finally ended up creating Second Life. Pretty interesting fellow, and a good speaker as well. For those of you not at the talk, here's an interesting article in Information Week about Second Life (SL).

SL runs on dual-cpu x86 servers with Linux and MySQL. A 16 acre "plot" in SL runs on a single core on one of these servers. A human (or company) in the real world can purchase a 16-acre plot for an initial cost of US$1900 and a monthly payment of US$300. SL is essentially a hosting company for virtual real estate and virtual goods that are traded in the real world. As an example, the SL city of Amsterdam sold for US$50,000 on eBay. (Check it out)

Here's some metrics on SL:

Bandwidth: 10 GB/second outbound
Storage: 40 TB of user data
Scalability: 100,000,000 SQL queries/day (using MySQL)

Cory also claims that SL is one of the largest (or the largest) deployment of MySQL. And SL is adding 1-2 racks of servers each week to keep up with demand. Now those are some pretty impressive stats.

Welcome, Connie Weiss!

One of my friends, Connie Weiss, joined the blogosphere. Check out her blog for a welcome change of pace. Although I don't have the variety of pets that Connie does, I do have four parakeets at home!

Data Centers are Hot Again, Thanks to Web 2.0

Yesterday's San Jose Mercury News describes how data centers are "hot again" due to traffic from the growing popularity of social networking Web sites, such as YouTube and Facebook. Demand for data centers is increasing, as companies such as Apple, Yahoo and Kaiser are purchasing or leasing new data centers to keep up with demand for their applications.

Some interesting excerpts from the article:

While prime office space in downtown San Jose costs about $2 a square foot per month, space in a data center rents from $15 to $30 a square foot per month.
In the past two years, one million square feet of space in Silicon Valley data centers was purchased or leased, more than all the space taken off the market from 2001 through 2004.

The article mentions Apple's 2006 purchase of a data center in Newark, California. I wonder if that's the location that serves my daughter's iTunes purchases-- just a short hop across the Dumbarton bridge.

Mar 26, 2007

The Most Energy-Efficient Server Is One That's Powered Off

"Green data centers" and energy-efficient servers are the latest buzz in IT and government, as it turns out. Information Week's March 12 features green data center technologies as the cover storr. The online version doesn't have the same catchy Gordon Gecko-esque "Green is Good" title, but it's still a good read.

Today's data centers are built like offshore platforms from the oil and gas industry. Every 100 years, there's a monster wave, usually 50 feet high, from a massive hurricane or storm. So, offshore platforms are built to withstand this 100-year wave. The platform is taller than the wave's crest, and the platform is built to withstand the force of the wave.

Data centers are typically provisioned to meet peak demand, much the same way as oil platforms are engineered for that 100-year wave. Servers are provisioned to handle peak applications loads or traffic from major events, such as the quarterly sales promotion or special news event. On a typical day, however, utilization is much lower, but you still have all of those servers humming away, using electricity and generating heat.

What if you could power off all of those unused servers? Not only would you save electricity, you would also reduce your cooling costs. Could you determine which servers are idle and then those shut down? And could you do this automatically? How fast could you respond to changes in your environment? If load increases, could you power on the additional servers you need?

These are tough demands for your data center, but these changes can reduce your energy costs. The local utility, PG&E, is providing rebates to companies who can reduce their overall energy consumption or reduce their energy consumption on-demand in order to prevent rolling black-outs. We're participating in a new initiative by the Silicon Valley Leadership Group for energy-efficient data centers. As part of this initiative, we're showing how Cassatt Collage can be used to implement these use cases to reduce your overall power costs.

Back on the home front, we have several hundred servers in our system test lab. Since they're managed by Collage, only a third of the servers are usually powered on at a given time. When we do large scale-out tests, Collage fires up most of the servers in the lab. However, we usually schedule those scale-out tests for the weekends. Our lab A/C consists of three different units, each of which turn on when there is demand. One unit is constantly running; the second one kicks in during parts of the day; the third one only kicks in during heat waves or scale-out tests. Pretty cool, huh? (Pun sort-of intended)

Mar 25, 2007

Getting "Out There"

A friend of a friend (funny how that works) gave me some pointers to get my blog plugged into the broader "blogosphere." I just created a profile on technorati.com and am "posting claim" to my blog there.
Technorati Profile

Mar 22, 2007

Would You Please Get Those VM’s Under Control!

With all the recent controversy in financial markets and politics, it’s time for us geeks to stir up a little controversy of our own. A recent article in the Register talks about the challenges in managing Virtual Machines and licensing issues associated with the ability to deploy new (virtual) servers at will. A recent blog at Server Virtualization captured some of the opinions regarding the topic of how closely you should control the VM’s in your environment and whether or not developers can/should “hide a VM under their desk.”

Instead of finding new ways to circumvent your IT department, what if your IT department could provide you access to the VM’s you need, when you need them? When you’re done with your VM’s, you could give them back so that others can use them. Since there’s no physical asset, borrowing a VM should be easy. What if your IT department could go one step further and provide your VM’s freshly installed with the O/S and applications you need. This is a win-win situation. The IT department can still keep tabs on server resources and software licenses. And developers get access to the environments they need in their dev/test cycles. Cassatt Collage can help you set up this dynamic VM environment. (Read more)

At Cassatt, our IT director, Kirk, is the one who’s on the hook for Sarbanes-Oxley compliance. I really don’t want to see Kirk go to jail for failure to comply with software licenses-- one of the possible consequences of not getting a handle on how many VM's you have in your enterprise. After all, Kirk is a nice guy, and he does bring farm-fresh eggs for Karen, our CFO. So even though I’m a developer at heart, I do my part to make sure that Kirk stays out of the slammer, Karen gets her eggs, and I get my paychecks. So don't you think it might be worthwhile to get those VM's under control?

Mar 16, 2007

What Can You Do with a Web Services Interface?

Last year, we were working with a large software company and using Collage to provision and manage this company’s application suite. We presented our results to the VP sponsoring our project. We showed several use cases for provisioning new server instances within minutes and automatically responding to server failure. He was impressed with the functionality, but he requested that we provide a high-level “dashboard” application that could integrate with his company’s application suite. The VP designed this new User Interface (UI) on the whiteboard at the end of the meeting, and the UI looked nothing like our current UI (of course).

After returning to our office, I tapped Mukund, who wears many hats at Cassatt. I needed him to write a new UI, drawing on his prior experiences as a Java Swing developer. He used the Collage Web Services interface to build a Java Swing app that provided a high-level view of the application domain. He also included controls to increase service levels of the managed application.

A week-and-a-half later, we went back to the VP and showed him the new app that he had designed on the whiteboard. The VP was quite pleased-- surprised, in fact-- that we were able to turn this around so quickly. Later, we had a chance to present our results (and the Swing app) to their executive management team. We’re still working with that customer, and we’re still using the Web Services interface to provide custom functionality in their environment.

Mar 12, 2007

What Should You Do if a Server Fails? Absolutely Nothing.

A server failure can be disruptive to your business and your personal life. Servers tend to fail at the worst times— late at night, over the weekend, or in the middle of an important demo. Disk drives fail, motherboards burn out and software crashes; these things happen. To fix the problem, someone could reboot a server or reinstall software on a new server. What if your data center could automatically recover by performing these actions for you?

Cassatt Collage goes beyond provisioning. Collage constantly monitors the applications and servers in your data center. Collage polls for a heartbeat from each server—using standard OS-level and application-level monitors available in Linux, Windows and Solaris. You can also introduce your own custom monitors, such as a customized agent or a script running inside a database, and have Collage monitor that as well. When a server has failed, Collage will replace the server with a new one from the free pool and boot the same application/service on the new server—all of this within minutes and without losing any data.

For each application, you specify the monitoring parameters. You can define which monitors to use, the polling interval, and how many retries you should allow. In a smaller configuration with less than 100 servers, I like to monitor SNMP at 30-second intervals with 3 retries. For larger configurations with ~400 servers, I would increase the monitoring interval to 60 seconds. For Apache servers, I add an HTTP monitor on a system URL embedded in my web app so that I can ensure that the Apache service is running.

As part of our standard customer demo, I like to pull a blade server that’s running an application and watch Collage respond to the failure. I’ll let the customer pull the blade out of the chassis in the lab; blinking lights, fan noise and 10 racks of servers always adds a little extra to the demo. By the time we return to the conference room (with the failed blade in hand), Collage has quarantined the failed blade in a maintenance pool, allocated a new server from the free pool and booted this server to the same application. All of this takes only 3 minutes from bare metal to a running application on a new server. This demo is always very memorable and illustrates High Availability in a very simple manner. (Seeing is believing.) I had given this demo to some visiting executives from TCS. A year later, I had met them again in San Jose, and they still remembered the demo.

Mar 7, 2007

Business Lessons from “Buff Bill”

Two years ago, I had put together a fun video for our Sales kick-off event. I wanted to illustrate several use cases and the derived benefits from using Cassatt Collage to manage your data center. The video featured Cassatt employees and Bill Coleman, our CEO, as “Buff Bill,” head of Buff Bill’s Boards and Bikes. The video was definitely fun to make and somewhat funny (as in, “don’t quit your day job” kind of funny). In the video, Buff Bill used Collage to:

Quickly integrate the IT systems of a recent acquisition.
Shift the distribution of computing resources between different web portals based on seasonal demand.
Respond quickly to increased workload by ramping up capacity as needed and in advance of new sales promotions.

The video featured Mukund as the poor sys admin who had to miss a Blink182 concert to bring new systems online-- but that was before he started managing his data center with Collage. After Mukund started managing his systems with Collage, he had way more free time—enough time to party with his friends and even take in a Linkin Park concert with Buff Bill.

Although we didn’t win any awards in the Sundance Film Festival (no surprises there), Buff Bill tackled some of the same challenges facing our customers today. One of our recent customers has achieved the top spot in their industry, and their only means to grow top-line revenue is through acquisitions. Their biggest headache has been integrating the IT systems of the acquired company. They are planning to use Cassatt Collage to manage all their IT systems so that they can quickly integrate newly acquired companies.

Another large customer faces predictable traffic spikes at different times during the week. Certain web properties are accessed almost exclusively on the weekend, whereas other business-related web properties are most active during the weekdays. We’re talking with them about using Collage to shift resources dynamically between their different web properties so that they can reduce their CapEx and OpEx.

One of these days I might post the Buff Bill video on YouTube. (I’ve been forbidden by our CFO, Karen. That almost sounds like a challenge to me, but I do like my paychecks.) In the meantime, you should check out Bill's thoughts on utility computing and what data centers may look like in the future. Recently, Bill was interviewed by Laurianne McLaughlin , the technical editor of CIO Magazine. Bill talked about a future where phone companies will run your data center. You can read about it in Laurianne’s blog. It’s a very interesting read, even though it's not as funny as my video.

Mar 1, 2007

Blog Tag

Okay, so I just got tagged a few days ago by Ken Oestreich. It’s taken me a while to post a list of five things about me that you probably don’t know, but here goes:

When I was in high school, I sold my first computer program and accompanying article to an Apple II enthusiast magazine—Nibble Magazine. I sold them 3 articles and programs in total, but they only published one of them. It was assembly code and an accompanying bitmapped font set to display text with high-resolution graphics on the Apple II+.
In 1989, I started my own three-person software company (Victory Software) to write role-playing games for the Apple IIgs. We managed to sell a few thousand copies of three different games. However, we shut down the company a few years later after Apple killed the IIgs line in favor of the color Macs. All of this was in the pre-Internet era, but it’s amazing what historical references you can find on the web these days.
I don’t write game software anymore, but I have taken up golf. A few years ago, my daughter and I were playing at Blackberry Farm in Cupertino. The pro shop told us that Julie Inkster was playing two holes ahead of us. We did manage to see Julie and her kids from across the lake, but we never did close enough for an autograph. My golf swing and short game are nothing to write about (even in a blog), but I have been a pretty good swing coach for my kids. My kids and I never saw anyone else famous on the golf course, but my son is a big Tiger fan. He wears Tiger’s Sunday red-and-black combination when we play on Sundays.
When I worked in Rochester, Minnesota in the 80’s, I actually placed fifth in a city road-race. I ran just under 11 minutes for a two-mile road race— nothing great, but good enough for a small town of 60,000. I still have my plaque somewhere in the garage. The race was in early May, but it was 30 degrees outside and snowing! During the last 100 yards, I got passed by a teenager, which was a little disheartening. Back in my youth, I never did manage to break the 5-minute-mile barrier, but I did get within 5 seconds. I still run these days, but it’s more like 7:30 miles. :-)
I have a US patent. It’s patent #6,683,553. I got it when I worked in the Java Web Services group at Sun. The patent belongs to Sun, but I still have the plaque on my desk.

Okay, so now it’s my turn to tag 5 folks. Unfortunately, I’m a relatively new blogger, and I don’t have a bunch of blogging friends. So, I’ll tag Floyd Strimling and Rob Gingell. And I’ll put out 3 requests to folks who would be good bloggers. (I’ll just nudge them privately.)

Feb 6, 2007

Sometimes an API is the Right UI

Graphical User Interfaces (GUI’s) are great for client-side applications. Everyone has their favorite software application or consumer device, and the usability of the GUI usually makes a huge difference in user satisfaction. Well what’s the right user interface for data center infrastructure? What if you want to embed platform functionality into your own application?

There’s a funny ad from IBM, where a poor sys admin has to monitor dozens of monitors—one monitoring console per data center application. The ad talks about “silo’ed applications” and not being able to tap into data.

My company’s product, Cassatt Collage, delivers a platform for provisioning and managing a host of data center resources—servers, virtual machines and networks. Collage monitors applications and maintains service levels, so you don’t need to watch a monitor. Under the covers, Collage is a J2EE application. The Collage UI is a web-based UI that is built with JSP and Servlets. Collage also provides several programmatic interfaces so that you can respond to events and drive Collage functionality from the command line or your own applications.

Collage provides a Web Services interface (affectionately— but incorrectly— dubbed the WSDL interface) that gives you access to all the same functionality available from the GUI. You can create new application tiers, allocate resources (servers) to those application tiers, respond to events, increase or decrease service levels— basically anything you could do by clicking through the UI.

Remember Mukund? (Mr. 465 VM’s from my last posting). About a year ago, Mukund had created a Scripting SDK (SSDK) that used the Collage web services interface but provided a much simpler interface for use in scripting languages. The SSDK is intended to allow our professional services folks to integrate Collage with a customer’s applications. Since Mukund developed the SSDK in Java, I decided to poke under the covers and use the underlying Java functionality. (I’m not much of a script guy, but I do like simple API’s that I can call from Java.)

As part of the system test effort for the last Collage release, we developed several reliability and performance tests that measure Collage performance at extreme scale and over long time intervals with thousands of user actions. In order to automate these performance tests, I used the Java internals of the SSDK to create application tiers, allocate servers, activate servers, deactivate servers, deallocate tiers and delete tiers. We call this a “tier life cycle.” I also used JWebUnit to measure UI performance at different times and different deployment sizes. We use these tests as part of the standard system test cycle to characterize Collage performance at scale and ensure that performance levels are sustained over time—simulating how Collage will perform in your data center after a year’s worth of activity. I also wrote some reliability tests that create dozens of threads that basically pound a Collage system with different tier actions.

The SSDK has also proven useful in customer scenarios. Martha is the director for the Applications Engineering team, and she is on the “front lines” with many customer engagements. Martha is a super-star in pretty much any application domain— whether it’s J2EE, ETL, ERP, CRM, BI, DB. Whenever there’s a tough customer problem, Martha’s the one we call. In January, Martha used Collage’s image management capabilities and SSDK to automate the deployment and configuration of a customer’s complex application environment. The customer needs to deploy hundreds of independent Collage environments and wants to do this without any human setup. Each environment can be deployed and configured in only 37 minutes, and the process is completely automated! This is a perfect fit for Collage and the SSDK.

In a future posting, I’ll talk about more about the web services interface and my adventures with the Eclipse Web Tools Platform, which works very well with our web services interface BTW. If you want to find out more about the SSDK, check out Cassatt InfoCentral.