Yesterday, I participated in a National Library of Medicine Conference called
"Long term Preservation and Management of the EHR." Given that the EHR is a legal record, a source of data for clinical care, and a repository of knowledge for clinical research, how do we preserve it for a sufficiently long period of time to maximize value to patient, caretaker, and scientist?
Here are the
program details.
I presented an overview of our
tiered storage approach to information lifecycle management at BIDMC.
One controversial item was my conclusion that the storage costs per patient to retain data are insignificant.
Here's the calculation. At BIDMC we generate approximately 1 terabyte of clinical text data (structured and unstructured) per year. We generate approximately 19 terabytes of image data per year (radiology, cardiology, pathology, Gastrointestinal, Pulmonology, Ob/Gyn etc). We have approximately 250,000 active patients. 20 terabytes/250,000 = 80 megabytes per patient per year.
There are many kinds of storage and many ways to calculate cost. Rather than specify a vendor or an infrastructure, I'll use storage numbers from a non-BIDMC site for purposes of computation.
The other site offers 2 kinds of storage:
Standard storage which has a marginal cost of .34 cents per gigabyte added (or .68 per gigabyte with replication).
High performance storage which has a marginal cost of .55 cents per gigabyte added (or .89 per gigabyte if it is replicated onto standard storage)
Let's choose high performance replicated storage at .89 per gigabyte. In Massachusetts we retain medical records for 15 years and images for 7 years. Let's compute the cost of storing the 80 megabytes per patient per year (4 megabytes of text and 76 megabytes of images) for these regulatory lifetimes.
Text storage = 4 megabytes added per person per year. We'll need to compute the cost of storing old data plus adding new data every year i.e.
Year 1 = 4 megabytes
Year 2 = 4 megabytes old + 4 megabytes new
Year 3 = 8 megabytes old + 4 megabytes new
Year 4 = 12 megabytes old + 4 megabytes new
and sum all these costs over 15 years. Let's use the
formula for summing numbers: n*(n+1)/2 for 15 years and .89/gigabyte
4 megabytes*15*16/2*.89/1000 = 42 cents per patient for the first 15 years
After year 15, we can begin deleting the oldest data, so we'll always have just 15 years of data - 4 megabytes*15*.89/1000= 5 cents per year thereafter
Image storage = 76 megabytes added per person per year, retained for 7 years
76 megabytes*7*8/2*.89/1000= $1.89 per patient for the first 7 years
After year 7, we can begin deleting the oldest data, so we'll always have just 7 years of data - 76 megabytes*7*.89/1000= 47 cents per year thereafter
So when we debate the question of storing data for later reuse, keep in mind that the cost per patient is 42 cents for the first 15 years of text and $1.89 for the first 7 years of images.
The equivalent of
Moore's law applies to storage - continuously decreasing costs and higher density. We'll also have cloud storage options (although only a few public cloud providers offer HIPAA compliant storage with indemnification for privacy breaches).
In my analysis above, some may question the cost per gigabyte I used. Feel free to multiply it by 10 such that text records could be stored for $4.20 per patient for 15 years. It's still very economical.
In the interest of completeness, let's examine fully loaded cost. At BIDMC, we have multiple storage platforms. About 40% of the cost is depreciation on capital budgets. The rest is staff, software/hardware maintenance, and other operating cost. The average cost among these collective platforms runs $1.27 per GB or $1,270 per TB per year, fully loaded.
Of course, there are other considerations:
1. The definition of the "official medical record" is in flux. The usual process for most diagnostic and treatment modalities is to cull the media so that only the important content is saved. For example, in a sleep study, you would not save uneventful sleep time. If medical/legal issues push us toward saving raw content, especially video, the amount of data per patient is going to rapidly expand.
2. At BIDMC, technologies and vendors have been stable for many years. This makes backward compatibility issues much more manageable. By staying with the same vendors and technologies, we've not been challenged with migrating our clinical data to a new database or vendor.
3. The increased use of multimedia in clinical care may also expand the amount of storage per patient. Voice files (call center, voice mail, raw transcription, and the like) might someday be required to be saved for medical/legal reasons.
4. As data expands, so does the burden of dealing with release of information requests, backup/recovery, disaster replication, testing new versions, and other application life cycle requirements. We seldom operate with just two copies of the data. There are usually two copies locally, sometimes more for high availability, and another copy at our disaster recovery site. We may store additional copies for testing new versions of software, snap backups, and the like.
5. Emerging factors contribute to costs. e-Discovery can expand our overall costs because because backups must be retained indefinitely. The "digital footprint" of patient data is changing. Text only is manageable, but the imaging/diagnostic components are ever growing, both in number and in size.
Yes, costs add up over time for large patient populations, but the cost of storing text data is so minimal that we have not deleted a single datum from the electronic health record since I became CIO in 1997 and we have no plans to do so!