Tuesday, May 8, 2012

Metadata in an HIE

Last week, the Technology Workgroup of the Massachusetts State HIE Advisory Committee was asked to address an interesting policy and technology question.

When a payload of data (a clinical summary, a public health transaction, a lab result) is sent from provider to provider, what data should be included in the electronic envelope used in the sending process?

Massachusetts uses the Direct protocol so the payload is encrypted during transport.  The Healthcare Information Services Provider (HISP) cannot read the contents of the message.   All routing information i.e. who is the sender, who is the receiver, when was it sent, are there special privacy restrictions etc. must be placed as metadata in an electronic envelope around the payload.

Most metadata is not very controversial.   Beth Israel Deaconess sent a payload to Dr. Smith on May 9th at 8:00am with patient consent.

However, for auditing purposes, it could be important to send patient identifiers in the envelope.   If the HIE is asked a question like "we sent 10 payloads about John Halamka, can you tell us the time/date and location of delivery?"   For medical/legal, data integrity, and service level guarantees, patient identifiers in the audit trail make HIE operations easier.

However, there are downsides.   The audit trail becomes protected healthcare information.  Operators of the HIE now have access to person identified information.

How could this be a problematic?

What if the audit trail is itself is breached?   The HIE must follow HITECH reporting requirements.   The Direct Protocol was designed so that transport intermediaries minimize risk of breach by sending unidentified payloads.

What if someone asks the HIE to provide the date/delivery times of a patient's payloads sent from a substance abuse or psychiatric treatment facility?   The public is likely to have concerns that HIE staff (especially state government operators) have access to audit trails which contains such sensitive details.

Furthermore, applications that will perform novel routing and linking may need more than just limited amounts of person identified metadata in the envelope to add functionality.   Clinicians on the Tech Workgroup noted that data elements such as visit type (inpatient or outpatient),  message purpose (discharge summary, medication summary, admission notification), author of the message etc. are needed to automate advanced routing functions.   Thus, the recipient organization will likely open the payload after it is securely received to access additional information for processing.

What did we decide?

We elected to remove all human readable patient identifiers from the audit trail, instead using hashes of such data elements as name and date of birth for auditing purposes.

How will that work?

Suppose my PCP wants to send a clinical summary to a specialist as part of a referral.

We agreed to use a secure hashing algorithm (such as salted SHA-2) to anonymize identifiers.

The hash of John becomes AY#!

The hash of Halamka becomes *iUOP

The hash of my birthday becomes G5^*

If the audit trail is breached or mined by HIE staff, there is no way to know that AY#! *iUOP refers to me

However, I can ask the HIE to run an audit on AY#! *iUOP G5^* messages to ensure the payloads were delivered.   We get a perfect audit trail that's non-disclosing.

Such hashing approaches for anonymous linkage of patient records are very powerful and I recommend you study the work of Jeff Jonas, described in this post and this powerpoint.  Linking identity among heterogenous databases will be required for healthcare reform and emerging ACO business intelligence applications.   Doing it without having to disclose identity of the patient gives us the functionality we need without the risk.

Thus, Massachusetts has decided to use Direct without human readable personally identified metadata, instead adopting hashes of personal identifiers in the envelope and audit trail.   The HIE cannot be asked to mine audit trails by anyone but the sender of the messages, and the audits themselves are non-disclosing.

We have broad support for this approach and we'll let you know how it works in production.


Neil Kudler said...

Terrific summary and advance of the discussion we had during our Provider Engagement WG last week. I'm glad we came to the same conclusion and for the same reasons, at least at a high level. As always, you provide the slick tech think. Thanks John.


John Moehrke said...

The hashing method you discuss does obscure, but doesn't secure. Unless there is a secret salt, all you have done is obscure the identifiers. Ultimately the audit log must be managed, including securing it.

John Halamka said...

We agreed to use Salted SHA-2, but I left that out of my original post. I've added the details