Wednesday, April 11, 2012

Clinical Query, I2B2, and QueryHealth

Today I'm presenting an overview of our new clinical trials/clinical research business intelligence system, called Clinical Query to the BIDMC Chiefs and Vice Presidents.

Here are the slides I'll use.

The principle behind Clinical Query is that investigators will want to ask questions, preliminary to research, that will help them understand the potential statistical power of a clinical trial or the availability of data for clinical research.

What did we do?

We loaded 2.2 million patients (1997 to the present) and 200 million data elements into a repository,  ensuring that every data element was mapped to a controlled vocabulary.  When then built a web-based query tool capable of navigating 20,000 medical concepts via boolean (AND/OR) expressions of arbitrary complexity.

Labs were mapped to LOINC codes.

Problems/Encounter Diagnoses were mapped to SNOMED-CT codes.

Medications and Allergies were mapped to RxNorm codes

Demographics were mapped to the same code sets required for Meaningful Use.

The result is that any authorized user, who has completed our institutional HIPAA training, can run real time population queries.

For example, since my wife has Breast Cancer and has taken Ace Inhibitors, maybe I want to study the association of the two and I need a cohort of potential subjects.    The query from start to finish took 3 seconds and yielded 2421 +/- 3 patients.

Why +/- 3?

We never report the exact number to ensure that the privacy of individual patients is protected since I could create a query so arcane that it identifies a single individual.   The fictional example I've used in lectures is: "my neighbor has one blue and one green eye.   Show me the count of all blue-eyed, green-eyed people taking mental health medications."  A count of 1 could be disclosing.    By adding arbitrary numbers to every result we ensure that population queries remain ambiguous.

The BIDMC and Harvard-wide Institutional Review Boards (IRB) decided that aggregate de-identified queries, preliminary to research, may be done by authorized, trained users without requiring IRB approval.

Additional data extraction that would be used as part of offering a clinical trial or clinical research opportunity to a patient does require IRB approval.

Many novel explorations are possible such as the fact that 80,000 patients had ischemic heart disease and no history of Vioxx use, while 800 patients had ischemic heart disease and took Vioxx, which was withdrawn from the market in 2004 because of concerns about increased risk of heart attack and stroke with long-term, high-dosage use.   Clinical Query can help investigators explore the temporal relationship between the introduction of Vioxx and the frequency of ischemic heart disease.

Clinical Query is based on the I2B2 Standards for aggregate query/response of clinical data bases.   Over 60 hospitals have implemented I2B2 applications, often in support of the Clinical and Translational Science Awards (CTSA).

I've written about the QueryHealth initiative which is using HL7 Health Quality Measures Format (HQMF) to query heterogeneous data sources.

What if the QueryHealth initiative could access I2B2 connected data sources, such as enabling pharmaco-vigilance queries from the FDA to be broadcast across the country?

We can best protect privacy by keeping all our patient identified data inside our data center and responding to external queries from payers, government agencies, and public health with aggregate numbers.   Reporting on the number of patients taking Vioxx and the number of patients presenting with chest pain without submitting patient identified data to external registries minimizes risk.

We'll explore the intersection of QueryHealth and I2B2 at BIDMC in the upcoming months and I'll write about that, just as I wrote about our PopHealth lessons learned.

I've written about our efforts to Free the Data and create a learning healthcare system.    Today marks a milestone for enabling our data to be explored with Clinical Query while protecting the security and integrity of our enterprise registries and repositories.


Paea said...

I thoroughly enjoyed your blog post.

Did you know: we can also tag all of your clinical notes using the same controlled vocabularies and include the coded text as part of the search and analysis of the EHR? The text alone is so rich that the Vioxx signal is loud and clear, years ahead of when it was recalled. (See:

I like to tell people it's so easy that: "You take one USB stick, install in 45 min, and call me in the morning." Millions of notes can be automatically coded and indexed in a couple of hours.

Thank you for pushing the envelop and paving the way forward!

-Paea LePendu, PhD
Stanford Biomedical Informatics Research Scientist
National Center for Biomedical Ontology (

Art Vandelay said...

How long did it take to map that much data to all the ontologies? What types of quality assurance to review these mappings or new mappings do you have going forward?

John Halamka said...

1) Natural Language Processing (NLP) algorithms are getting better at extracting concepts from clinical notes. Each year, an i2b2 NLP Challenge offers researchers to the opportunity showcase their latest techniques:

2) Clinical Query builds upon the decade of work that BIDMC has done to create its Clinical Data Repository (CDR). About 90% of the CDR already uses controlled vocabularies; however, the information contained within the CDR is spread across dozens of databases. Clinical Query consolidates the coded data into a single set of database tables. Bringing the final 10% of data into Clinical Query is an ongoing challenge--there is data that cannot be easily mapped to the ontologies we are using, and there is a small but continual stream of uncoded data entering the CDR.

Roy Pardee said...

Very cool stuff--thanks for posting.

I'm curious how your users feel about the SNOMED-CTs for diagnoses. Are they pretty comfortable specifying conditions in that nomenclature, or do you get requests for ICD-9s?

The hierarchical arrangement of the 'ontology' in i2b2 is pretty user-friendly, but I have the impression that code searching is not a strength & so I'd guess that familiarity w/the nomenclatures used would be pretty important to getting sensible answers from it.

Craig S. said...


Fascinating system. How are time stamps handled in a query?

Would it be possible using Clinical Query to find indications occurring within x days after drug given, for example?