Wednesday, August 24, 2011

Storage Dreams

As I continue to support the infrastructure requirements of the research faculty of Harvard Medical School (in parallel with the process to find my own successor at HMS),  I have a storage dream.

The scene opens to a researcher logging into "Storage Central", a browser neutral, operating system neutral website that even runs perfectly on an iPad.

After thoughtful analysis of faculty needs, Harvard Medical School will have concluded that there are 3 different directory types in 3 different storage workflows

Directory types
a.  Massive numbers of small files (i.e. next generation sequencing) that needs solid state metadata management (i.e Isilon 32000X SSD)
b.  Small numbers of really large files (i.e. image processing) that needs high I/O throughput (i.e. Isilon 72000X)
c.  Average numbers of average sized files that can use lower performance technologies (i.e. Isilon 72NL)

a.  Files with a high turnover rate (scratch space) that are created and destroyed daily.  No snapshot or archival tier is needed
b.  Files with a low turnover rate that do not need replication because the data is easy to regenerate. Snapshots are needed to protect the data against drive failure.
c.  Files with a low turnover rate that need to be retained for years due to compliance requirements and the difficulty of regenerating the data.  An archival tier is needed. (i.e. arrays of inexpensive 2 Terabyte drives)

The researcher sees a visual representation of her storage use in each directory and workflow, both currently and monthly over the past year.   Data on primary storage, snapshots used to protect the data, and archival copies of the data are shown separately.

The researcher oversees several post docs.   By clicking on a link, the researcher can see the storage use of all those she supervises.

Each directory type has a fixed three year cost per terabyte.   Workflows with snapshots or archives have an incremental cost.   These costs are well known and accepted by all the users.

The researcher can set their own quotas for directory types and workflows.   A calculation of cost for current storage and total quota is shown.   The researcher can type in a grant number or departmental account number to reserve the directory types and workflows they need.

The departmental administrator oversees many researchers.    She can view the storage use of all her faculty with historical, current, and projected costs shown on screen.

She can discover who is likely to exceed their budget and who is responsible for the largest amount of storage growth over time.

An IT storage concierge is assigned to each department to help researchers and administrators move data among directory types and workflows to balance performance and cost.    There is complete transparency between the demand created by the users and the supply provided by the IT department.  

The Dean knows the total costs charged to departments, the IT department, and the school (as overhead components in indirect costs).

The CIO and the infrastructure team receive daily summary reports which forecast growth so that additional storage can be added as necessary, ensuring that each directory type and workflow always has 20% unused capacity.   Storage vendors can ship nodes to expand each directory type and workflow within 1 week of receiving a PO, so storage can be expanded just in time without risking over or under provisioning.

The chargeback model is NIH compliant and motivates researchers to maintain files via the easy to use move/deletion tools in the web interface.

The research community, school administration, and  IT are deliriously happy.  Storage challenges are a solved problem.    The governance committees have turned their attention to cool applications that advance science instead of infrastructure limitations that impede it.

We're assembling industry experts to work on this dream.   My hope that is that I can report back in 2012 that the dream is now the Harvard Medical School reality.

1 comment:

Marco Deterink said...

This is really helpful to structure our thinking. The work flows you describe are very similar to the outcome of some discussions we had recently.

Marco Deterink (UMC Utrecht NL)