Monday, April 28, 2008

A Field Trip to Dell

From 4/15 through 4/17 my BIDMC team visited the Dell facilities in the Austin area for an executive briefing on several areas of their operations and futures. Today's blog is about their lessons learned.

Data Center Tour
They toured the main Dell data center in Round Rock. This was one of Dell's two Tier III data centers, reduced from the dispersed 16 data center model they used from 2001-2003. Cooling and power were left the way that they had been designed at the time of the acquisition. The system delivers cool air from the ceiling and from the floor. They acknowledged that it was not optimal and are making plans for a re-vamp. More on the plans under the “Data Center Engineering Lab” section below.

The site is a completely "lights out" facility housing over 7,500 servers and associated storage. They use remote control power strips to control the power cycling of systems. They initially used APC Power Distribution Units, but saw a market opportunity and designed their own switchable PDU product. The site was manned by data center manager and security guards.

They have a very stringent access control policy for employees. For an employee to gain access to perform work there must be an active, approved ticket in their tracking system. The employee swipes their badge for access. The guard, using the swiped data, is presented with any tracking tickets the employee may have assigned. There must be a ticket approved, matching the time window they are attempting to access the center for them to gain access. If there is not a match, they do not gain access.

They use an automated build procedure for all of their servers with a standard base image. A key component of that build is their Dell Open Management software agent on all servers.

The Open Management software provides a inventory of physical equipment and running software. This is used for their inventory control and for ensuring disaster recovery currency. In addition to the software based inventory they perform a physical inventory every 3 to 4 months.

Lessons learned
• BIDMC does not fully exploit the value the HP Insight manager/agent we own. Like the Dell management software it can deliver server power consumption graphs, server temperature data and disaster recovery data on the services and applications running.
• BIDMC would benefit from a more tightly controlled process for hardware deployment in which data is recorded in a data base to be compared against the Insight reports and network switch data for cross inventory validation

Manufacturing Tour
They toured the manufacturing plant for servers and gaming systems. The manufacturing process is tightly integrated with the ordering system and is a just in time inventory model. The parts for a system arrive by truck 2 hours prior to being required. The product used for any particular system build is determined by the order placed.

The quality control infrastructure is extensive. Once a machine is built, it is burned in twice. Once briefly at the technician assembly area where it is confirmed the system powers up and the requested Bios and any preloaded software images are applied. It then passes on to a diagnostic rack were it is hooked up and specific diagnostics are run on all of the hardware.

Quality control is also applied to all of the cosmetics from the positioning of the tags on cards and the case to the appearance of the assembled product. If even an external inventory or FCC tag is misaligned the system will be rejected.

They use an interesting system to rank each technician, team, group and plant. Each one of these has their own metrics and the metrics roll up. The more complex the system being built, the more points awarded. The sooner a quality problem is located the fewer points deducted. The later a problem is detected the more points deducted progressively down the chain. This gives a high incentive to catch the a problem as early as possible. If a final Q/A engineer catches the problem the whole plant takes a quality hit.

Lessons Learned
• Dell offers the ability to have all servers ordered pre-imaged with an image the customer supplies and controls. This would seem to be a big time saver for BIDMC.
• The pre-loading of the image would also be beneficial for BIDMC workstations. A base image could be flashed with a script that would run on initial boot up. The script could prompt for the type of system this was to be, point back to a second script and the system initialization would be completed. In that model, the system could be unpacked – moved to the users desk, the tech on connecting the system would answer the initial boot question and walk away.

Data Center Engineering Lab
They met with a representative from the Data Center Engineering Lab to talk about the futures of Data Center design – in particular areas of cooling.

Dell maintains a 1500 sq ft data center to test the different cooling technologies and techniques in. The racks in the test areas are filled with shells of servers that are basically large toasters. They can be controlled to generate amounts of heat to simulate data center load.

The most promising cooling technique is hot aisle containment. There are number of methods to contain and provide cooling. This is the method that they will be using in their own data center. Their studies have shown the techniques provides for smaller foot print, more efficient, targeted cooling which allows the data center ambient temperature to be raised. It also reduces the power used for cooling, and allows for higher power/utilization of the server thus getting the power supply efficiencies that come with higher utilization.

Dell Servers/Blades
Dell is about to begin shipping their next generation blade server technology. Their chassis holds 16 blade servers or combinations of blades and storage as needed. The design objectives and features are similar to other vendors. They claim a 7 to 10 year life cycle for the chassis, the ability to mix and match different generations of blades that might be issued during the chassis life and fully redundant power and embedded switches. Dell also stresses a green theme, highlighting dynamic fan and power supply efficiencies. They have built in some temperature and load sensing to auto adjust the power consumption and shutdown some components.

From a needs perspective Dell stated they see the blade server market as solving three problems 1 – Footprint, 2 – Cabling , 3 – Power issues.

Lessons Learned
•The packaging and engineering of the blade servers is well done The granularity of control for power and components is a nice touch.
• Dell ranks their applications by server need. Based on the applications needs the application is placed in VM, on a blade, or on a standalone server. This requires upfront assessment of an application before deployment.

• Dell is working to complete their line of services and products to meet enterprise needs. They have made a significant improvement/advancement in their blade servers. They are positioning themselves as leaders in data center design relative to power/cooling.

• Dell has a systems management software package, Open Management. This provides full management services to their and third party products. The interesting features were the recording , graphing, reporting on each servers power consumption and operating temperature. Limits/ranges could be sent for the sending alerts etc.

• There is an opportunity through the use of addressable power strips to use some logic/intelligence to turn off systems when they are not required. The best opportunity is in the evenings to possibly turn off portions of clusters or systems that do not need to be operational during the evening.

A great visit and many lessons learned about data center management in general. Thanks to Dell for their time.


Unknown said...

Nice read.

I got to see the American Express data center in Phoenix when I was interning for IBM.

I would talk about some of the security they used, but last thing i want is some AMEX lawyer calling me citing me for something I wasn't supposed to say. There's no telling what confidentiality agreements I signed back then.

But a big data center, 1 acre per floor, is something to see. I was in charge of rolling out redhat linux builds to 30 or so test machines. I spent a lot of time in the basement shuffling boot floppies.

Jonathan Merrill said...

Not to turn this into a HP vs Dell debate, but having used both equipment for some time, I am routinely impressed with Dell's advancements and continued focus on the maintenance and support for their infrastructure. Both companies, from what I saw, are staffed with human beings and have made mistakes on items (i.e., shipping snafus, missing parts).

My point of feedback was our technical people make continual comments about the complexity of HP's engineering. It shouldn't take 15 minutes to install HP server rails (yet it does, to many of our hospital net admins chagrin). HP's top tools and server agents are also fairly heavier in memory than competitor products.

I guess, in summary, my perception is Dell is trying harder. That resonated with our team. Although, most healthcare vendors we do business with require HP or IBM equipment, that creates some challenges from being able to centrally manage all the equipment from OpenManage.

Excellent article.

None said...

"The direction he sees is the development of a cold plate or cover. In this technique, chilled water is run through the upper or lower cover like a radiator and provides cooling to the entire case."

Just on this quote, if they are looking into building a heat exchanger does anyone know if they plan to use to the heat in any way?

John Halamka said...

Last week, I visited the SAS Institute in Cary, North Carolina. They use the data center to heat their research and development building. The idea of using a data center as a part of a climate control system is a good one.

Anonymous said...



Unknown said...

I think it really comes down to understand the dcim solution that you have deployed in your data center. If you have the right software you will be able to run your management system far more efficiently and save a considerable amount on energy costs.