Wednesday, April 6, 2011

The Cost of Storing Patient Records

Yesterday, I participated in a National Library of Medicine Conference called "Long term Preservation and Management of the EHR."    Given that the EHR is a legal record, a source of data for clinical care, and a repository of knowledge for clinical research, how do we preserve it for a sufficiently long period of time to maximize value to patient, caretaker, and scientist?

Here are the program details.

I presented an overview of our tiered storage approach to information lifecycle management at BIDMC.

One controversial item was my conclusion that the storage costs per patient to retain data are insignificant.

Here's the calculation.    At BIDMC we generate approximately 1 terabyte of clinical text data (structured and unstructured) per year.    We generate approximately 19 terabytes of image data per year (radiology, cardiology, pathology, Gastrointestinal, Pulmonology, Ob/Gyn etc).    We have approximately 250,000 active patients.    20 terabytes/250,000 = 80 megabytes per patient per year.

There are many kinds of storage and many ways to calculate cost.   Rather than specify a vendor or an infrastructure, I'll use storage numbers from a non-BIDMC site for purposes of computation.

The other site offers 2 kinds of storage:

Standard storage which has a marginal cost of .34 cents per gigabyte added (or .68 per gigabyte with replication).

High performance storage which has a marginal cost of .55 cents per gigabyte added (or .89 per gigabyte if it is replicated onto standard storage)

Let's choose high performance replicated storage at .89 per gigabyte.    In Massachusetts we retain medical records for 15 years and images for 7 years.     Let's compute the cost of storing the 80 megabytes per patient per year (4 megabytes of text and 76 megabytes of images) for these regulatory lifetimes.

Text storage = 4 megabytes added per person per year.    We'll need to compute the cost of storing old data plus adding new data every year i.e.

Year 1 = 4 megabytes
Year 2 = 4 megabytes old + 4 megabytes new
Year 3 = 8 megabytes old + 4 megabytes new
Year 4 = 12 megabytes old + 4 megabytes new

and sum all these costs over 15 years.     Let's use the formula for summing numbers:  n*(n+1)/2  for 15 years and .89/gigabyte

4 megabytes*15*16/2*.89/1000 = 42 cents per patient for the first 15 years

After year 15, we can begin deleting the oldest data, so we'll always have just 15 years of data - 4 megabytes*15*.89/1000= 5 cents per year thereafter

Image storage = 76 megabytes added per person per year, retained for 7 years

76 megabytes*7*8/2*.89/1000= $1.89 per patient for the first 7 years

After year 7, we can begin deleting the oldest data, so we'll always have just 7 years of data - 76 megabytes*7*.89/1000= 47 cents per year thereafter

So when we debate the question of storing data for later reuse, keep in mind that the cost per patient is 42 cents for the first 15 years of text and $1.89 for the first 7 years of images.

The equivalent of Moore's law applies to storage - continuously decreasing costs and higher density.   We'll also have cloud storage options (although only a few public cloud providers offer HIPAA compliant storage with indemnification for privacy breaches).

In my analysis above, some may question the cost per gigabyte I used.  Feel free to multiply it by 10 such that text records could be stored for $4.20 per patient for 15 years.   It's still very economical.

In the interest of completeness, let's examine fully loaded cost.  At BIDMC, we have multiple storage platforms.  About 40% of the cost is depreciation on capital budgets.   The rest is staff, software/hardware maintenance, and other operating cost.   The average cost among these collective platforms runs $1.27 per GB or $1,270 per TB per year, fully loaded.

Of course, there are other considerations:

1.   The  definition of the "official medical record" is in flux.  The usual process for most diagnostic and treatment modalities is to cull the media so that only the important content is saved.   For example, in a sleep study, you would not save uneventful sleep time.    If medical/legal issues push us toward saving raw content, especially video, the amount of data per patient is going to rapidly expand.

2.  At BIDMC, technologies and vendors have been stable for many years.    This makes backward compatibility issues much more manageable.   By staying with the same vendors and technologies, we've not been challenged with migrating our clinical data to a new database or vendor.

3.   The increased use of multimedia in clinical care may also expand the amount of storage per patient.    Voice files (call center, voice mail, raw transcription, and the like) might someday be required to be saved for medical/legal reasons.

4.  As data expands, so does the burden of dealing with release of information requests, backup/recovery, disaster replication, testing new versions, and other application life cycle requirements.  We seldom operate with just two copies of the data.  There are usually two copies locally, sometimes more for high availability, and another copy at our disaster recovery site.    We may store additional copies for testing new versions of software, snap backups, and the like.

5.  Emerging factors contribute to costs.   e-Discovery can expand our overall costs because because backups must be retained indefinitely.   The  "digital footprint"  of patient data is changing.  Text only is manageable, but the imaging/diagnostic components are ever growing, both in number and in size.

Yes, costs add up over time for large patient populations, but the cost of storing text data is so minimal that we have not deleted a single datum from the electronic health record since I became CIO in 1997 and we have no plans to do so!



6 comments:

Gil Press said...

Great analysis. You may want to mention the opportunity cost: In 1996, digital storage became more cost-effective for storing data than paper (Morris and Truskowski, IBM Systems Journal, Vol. 42, No. 2, 2003).

肖重庆 said...

Dr. Halamka:

Great posts as alwasy.

One question I have is what is your strategy to decide what kind of structured clinical data can be deleted after 15 years safely without impact patient care.

It is easy to delete unstructured data based on age criteria but I don't think that works well for structured data for two reasons.

1) Strucutred data are normally linked together.
2) Certain types of clinical data (for example, medication or results) might never be purged for patient care reason.

Just wonder what is your take on that.

Thanks
Chong

For example,

Niranjan Sharma said...

Dr. Halamka,

This is great analysis on the subject of important consideration when we design Healthcare Delivery Management Systems and choose storage technologies. Interesting enough the PHI size and Growth rates pose less of a concerns then serving that data to the use on Sub Second scale hence Cost is more for serving this data to user then managing this data for later use.

I am new reader of your blog and started following on every day now :)

Great service of yours for all of us in healthcare space in general.

Thanks.

Niranjan.

rjag2034 said...

I'm not sure I understand what your cost calculation covers. Does it cover the cost of future migration to new OS's, hardware replacement of aging infrastructure, hardware to read the data and replacement, software costs and licensing to read the data, cost of indexing, cost of retrieval, security, retention planning, cost of future legal requirements etc.

Anyone can store a gigabyte of data, but the secret is to manage it, ensure compliance and access for the life of the record which in this case is usually longer than the life of the patient.

lmoisan said...

What percentage of the total space, the audit trails represent ?

R. Hal said...

Like in The Sixth Sense, when I look back in our EHR 10 years, "I see dead people." whom I can't make go away. The reality is that many vendors don't yet support purge strategies for aged, deceased, and inactive data/patients. It remains unclear to me when the business force will push R&D dollars this way, though something will eventually have to give.