Wednesday, March 17, 2010

Purging Files

I was recently asked if we purge older, untouched files from our storage systems.

This is a very tricky question because of the many compliance, medical-legal, and privacy requirements of a healthcare institution.

Short answer - we do not purge data for active employees. With the number of organizations (4 hospitals, 3 physician organizations, a community health center etc.), home directories and department shares we have it is almost impossible for us to determine centrally within IT what has business value and what is obsolete personal data that should be deleted.

How should organizations approach the complex of problem of what data to save and what to delete?

In my opinion, the best way to manage this is to setup storage quotas and increase them as people needed more space. The pro - it discourages unbridled storage growth. The con - it does cause additional overhead for the Help Desk and Storage Team, and from a compliance/e-Discovery standpoint would encourage users to permanently destroy files.

At BIDMC, we have tried desktop archiving and run into issues with archive software products not supporting all desktop clients equally (works on Windows but not on Mac or Linux). The solution we now are pursuing is to move the older files to the cheapest tier of storage (although maintaining anything we find forever) with relative transparency to the end customer. We use a storage virtualization appliance from F5 (formerly Acopia) to do this.

A purging/archiving policy should include a defined policy that states files are archived for x years, after which files should be moved into an extended retention folder which we will archive and keep, all other folders will be periodically purged of data beyond the stated policy retention period.

We have a data retention policy that governs our business records including paper and electronic for clinical, financial and administrative records. These retentions are governed by applicable law, e.g. 20 years for clinical record content. The retention schedules are included as an appendix to the policy.

We have some log content that is overwritten as storage runs out, i.e. first-in, first-out. How long we save log files is dependent on the content involved. For logs related to clinical record access, we save forever.

We do delete files and email accounts for terminated employees after a grace period. The grace period is to make sure there is no need for the data by the person's manager and the employee will not return to work at BIDMC or an affiliate. The current grace period is 270 days.

Periodically, we have litigation hold involving a subset of our records; primarily Windows files and email. For those accounts subject to the lit hold, we retain them for whatever duration Legal requests.

We are including a capital budget request for next FY for a more robust eDiscovery capability that will allow us to index and search our backup copies of our email and Windows files.

Purging/archiving requires a great deal of thought, senior management/board sponsorship and and rigid enforcement to be effective. With the cost of storage dropping, we will continue to store everything in the short term. However, in the long term this becomes challenging to maintain, so ideally we'll use a combination of quotas and cost effective tiering of data to balance the need for retention, compliance, and business value.

1 comment:

Ashish Prasad said...

I'd like to offer a philosophical response to your current dilemma.

I tend to like your idea of giving responsibility to user to manage their own quota. If you compare the user’s behaviour in UGC (User Generated Content) web sites such as YouTube, Digg, Delicious, Flickr, Wikipedia, etc., users have the responsibility to manage their own content. They know exactly how to "Title" their content, what sort of "Tags" to add, and so on.

There was a study (I don't remember the details) that tells technology adoption (especially web based) by corporate lags behind consumer. It usually takes over 2-3 years for a mainstream consumer technology to become parts and parcel of corporate intranet portals. Remember those CEO's monthly messages in your voice mail; thankfully those things are being replaced by tweets, blogs or status updates n Facebook type app.

What I am trying to get here is, eventually the sense of responsibility will come to corporate users on the matter of content creation, promotion, associations (tagging), archival, etc. It's better to gear them up from now and make them ILM (Information Lifecycle Management) aware.