Tuesday, October 30, 2007

A Green Approach to Storage

Recently, several folks from the press have asked me about "green" approaches to data storage, since we have 200 Terabytes (that's 2,000,000,000,000 bytes) of medical records online. Over the past year, we've begun the journey to reduce the power consumption of storing data.

1. Higher capacity drives. Just a few years ago, our drives were a few dozen gigabytes each and we had to keep a large number of drives powered up in our data center. Now, higher capacity drives (750 Gigabytes each) means fewer devices consuming less power. We also use Serial Advanced Technology Attachment (SATA) drives, which rotate slower and consume less power. Many folks were worried about the performance of these slower rotating drives. We did a pilot and put half our users on the SATA drives and half on Fiber Channel. Then we moved everyone to SATA. No one noticed the change.

2. Reducing the amount of data stored. My needs for storage are growing at 25% per year and despite all my attempts to reduce demand (i.e. deleting all MP3 files on the network every data), demand is hard to control. Recently, we've used de-duplication techniques to help with this. Our email system keeps only copy of an attachment sent to our 5000 employees, not 5000 copies. Our backup systems deduplicate files, so only one copy is stored. We've seen a 50% reduction in the space needed for archiving because of this, reducing the amount of storage devices and the electricity needed to power them.

3. Spin-down and slow-down technologies - turning off unused drives. Spin-down is controversial. Some believe the benefits are over-stated because periodic wake-ups of the disk for integrity checks, etc may consume more energy and shorten disk life cycles. Many vendors seem to favor slow-down technologies. They especially see this for backup media such as Virtual Tape Libraries (disk emulating tape). Long term, there is a prediction solid state drive (Flash RAM drives) costs will drop and permit more frequent use. These are more energy efficient and have no moving parts, making them easier to manage from an energy perspective.


Thundertype said...

Another good post, John. Did you run into any major issues during deduplication efforts? What solution did you go with?

John Halamka said...

We are piloting two solutions now - Data Domain and EMC's Avamar. More to follow as we learn from these pilots.