Wednesday, June 17, 2009

Our Storage and Backup Strategy

Over the past year, Harvard Medical School has worked with research, administrative, and educational stakeholders to develop a set of storage policies and technologies that support demand, are achievable in the short term and are affordable.

I recently gave a keynote at Bio-IT World where I described the HMS storage strategy to ensure scalability, high performance, and reliability.

Since that presentation, we've refined our strategy for replication/backup/restoration of data for disaster recovery. In many ways backup is a harder problem to solve and a more expensive project than data storage itself.

Our best thinking (a strawman for now that we are still reviewing with customers) is outlined on this slide

For databases and Microsoft exchange, we're using Data Domain appliances to replace tape and using the following backup schedule - 3 days of checkpoints, 14 days of incremental backups, 60 days of weekly cumulative incremental backups and 3 months of full backups

For research data and administrative data, we're creating 1 checkpoint daily for 14 days and 1 checkpoint weekly for 60 days. Data is fully replicated between two datacenters.

The reason we have chosen replication and checkpoints is 3 fold:
1. Costs - We are able to reduce costs significantly using large quantities of a single reasonably priced storage system. It's actually cheaper for us to have a large quantity of a single storage type, then replicate our data using simple tools than it is to have multiple tiers of storage for short term, backup, and long term storage. The research community is very sensitive to the cost of storage, so we need to balance risk and cost. Replication with checkpoints does that nicely.

2. Technological simplicity - With every different storage type comes a different set of tools and a learning curve. Having one set of storage and one set of tools for our administrative files and research community results in a much easier environment to maintain.

3. Requirements of customers are met - We have asked our customers for their input on the need for file recovery and retention. Using checkpoints and replication meets the majority of their needs. Some departments have asked not to replicate at all, since it is cheaper to rerun an experiment than to replicate the terabytes of data each experiment generates.

Storage is a journey. The needs of BIDMC and the needs of Harvard Medical School are different so the choice of technologies and the balance of reliability/cost is different. By working with customers and embracing evolving storage technologies we believe we can meet the demand for storage at a price the institution/our customers are willing to pay.

1 comment:

Traci said...

This post is of much interest to me as I've come to similar conclusions when researching user storage needs at The University of Michigan Medical School. It appears you have reached similar conclusions regarding the partitioning of storage based on usage type (i.e. admin data, database, research, ect.), and providing appropriate backup scenarios for each. In fact, the user interviews I've analyzed here at UM point to exactly this type of structure. Can't wait to hear more on your work in this area!