Thursday, September 9, 2021

Secure Computing Enclaves Move Digital Medicine Forward

By providing a safe, secure environment, novel approaches enable health care innovators to share data without opening the door to snoopers and thieves.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

We know that bringing together AI algorithms and data in ways that preserve privacy and intellectual property is one of the keys to delivering the next generation of clinical decision support. But meeting that challenge requires health care innovators to look to other innovators who themselves have created unique cybersecurity solutions. Among these “Think outside the box” solutions are products and services from vendors like TripleBlind, Verily, Beekeeper.AI/Microsoft, Terra, and Nvidia.

The concept of secure computing enclaves has been around for many years. Apple created its secure enclave, a subsystem built into its systems on a chip (SoC), which in turn is “an integrated circuit that incorporates multiple components into a single chip,” including an application processor, secure enclave, and other coprocessors. Apple explains that “The Secure Enclave is isolated from the main processor to provide an extra layer of security and is designed to keep sensitive user data secure even when the Application Processor kernel becomes compromised. It follows the same design principles as the SoC does—a boot ROM to establish a hardware root of trust, an AES [advanced encryption standard] engine for efficient and secure cryptographic operations, and protected memory. Although the Secure Enclave doesn’t include storage, it has a mechanism to store information securely on attached storage separate from the NAND flash storage that’s used by the Application Processor and operating system.” The secure enclave is embedded into the latest versions of its iPhone, iPad, Mac computers, Apple TV, Apple Watch, and Home Pod.

While this security measure provides users when an extra layer of protection, because it’s a hardware-based solution, its uses are limited. With that in mind, several vendors have created software-based enclaves that are more readily adapted by customers. At Mayo Clinic Platform, we are deploying TripleBlind’s services to facilitate sharing data with our many external partners. It allows Mayo Clinic to test its algorithms using another organization’s data without either party losing control of its assets. Similarly, we can test an algorithm from one of our academic or commercial partners with Mayo Clinic data, or test an outside organization’s data with another outside organization’s data.

How is this “magic” performed? Of course, it’s always about the math. TripleBlind allows the use of distributed data that is accessed but never moved or revealed; it always remains one-way encrypted with no decryption possible. TripleBlind’s novel cryptographic approaches can operate on any type of data (structured or unstructured images, text, voice, video), and perform any operation, including training of and inferring from AI and ML algorithms. An organization’s data remains fully encrypted throughout the transaction, which means that a third party never sees the raw data because it is stored behind the data owner organization’s firewall. In fact, there is no decryption key available, ever. When two health care organizations partner to share data, for instance, TripleBlind software de-identifies their data via one-way encryption; then, both partners access each other’s one-way encrypted data through an Application Programming Interface (API). That means each partner can use the other’s data for training an algorithm, for example, which in turn allows them to generate a more generalizable, less biased algorithm. During a recent conversation with Riddhiman Das, CEO for TripleBlind, he explained: “To build robust algorithms, you want to be able to access diverse training data so that your model is accurate and can generalize to many types of data. Historically, health care organizations have had to send their data to one another to accomplish this goal, which creates unacceptable risks. TripleBlind performs one-way encryption from both interacting organizations, and because there is no decryption possible, you cannot reconstruct the data. In addition, the data can only be used by an algorithm for the specific purpose spelled out in the business agreement.”

Developing innovative technological services is exciting work, with the potential to reshape the health care ecosystem worldwide. But along with the excitement is the challenge of keeping data safe and secure. Taking advantage of the many secure computing enclaves available on the market allows us to do just that.

Tuesday, August 31, 2021

Breast Cancer Screening: We Can Do Better

The three risk assessment tools now in use fall far short. Using the latest deep learning techniques, investigators are developing more personalized ways to locate women at high risk.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

The promise of personalized medicine will eventually allow clinicians to offer individual patients more precise advice on prevention, early detection and treatment. Of course, the operative word is eventually. A closer examination of the screening tools available to detect breast cancer demonstrates that we still have a way to go before we can fulfill that promise. But with the help of better technology, we are getting closer to that realization.

Disease screening is about risk assessment. Researchers collect data on thousands of patients who develop breast cancer, for instance, and discover that the age range, family history and menstruation history of those who develop the disease differs significantly from those who remain free of it. That in turn allows policy makers to create a screening protocol that suggests women of a certain age who have experienced early menarche or late menopause are more likely to develop the malignancy. That risk assessment is consistent with the fact that more reproductive years means more exposure to the hormones that contribute to breast cancer. Similarly, there’s evidence to show that women with first degree relatives with the cancer and those with a history of ovarian cancer or HRT use are at greater risk.

Statistics like this are the basis for several breast cancer risk scoring systems, including the Gail score, the IBIS score, and BCSC tool.  The National Cancer Institute, which uses the Gail model, explains: “The Breast Cancer Risk Assessment Tool allows health professionals to estimate a woman's risk of developing invasive breast cancer over the next 5 years and up to age 90 (lifetime risk). The tool uses a woman’s personal medical and reproductive history and the history of breast cancer among her first-degree relatives (mother, sisters, daughters) to estimate absolute breast cancer risk—her chance or probability of developing invasive breast cancer in a defined age interval.” While the screening tool saves lives, it can also be misleading. If, for example, it finds that a woman has a 1% likelihood of developing breast cancer, what that really means is a large population of women with those specific risk factors has a one in 100 risk of developing the disease. There is no way of knowing what the threat is for any one patient in that group. Similar problems exist for the International Breast Cancer Intervention Study (IBIS) score, based on the Tyrer-Cuzick Model, and the Breast Cancer Surveillance Consortium (BCSC) Risk Calculator. These 3 assessment tools can give patients a false sense of security if they don’t dive into the details. BCSC, for instance, cannot be applied to women younger that 35 or older than 74, nor does it accurately measure risk for anyone who has previously had ductal carcinoma in situ (DCIS), or had breast augmentation. Similarly, the NCI tool doesn’t accurately estimate risk in women with BRCA1 or BRCA1 mutation, as well as certain other subgroups.

During a conversation with Tufia Haddad, M.D,, a Mayo Clinic medical oncologist with specialty interest in precision medicine in breast cancer and artificial intelligence, she discussed the research she and her colleagues are doing to improve the risk assessment process and identify more high-risk women. Dr. Haddad pointed out that there are numerous obstacles that prevent women from obtaining the best possible risk assessment. Too many women do not have a primary care practitioner who might use a risk tool. And those that do have a PCP are more likely to have an evaluation based on the Breast Cancer Risk Assessment tool (the Gail model). “We prefer the Tyrer-Cuzick model in part because it incorporates more personal information for each individual patient including a detailed family history, a woman’s breast density from her mammogram, as well as her history of atypia or other high risk benign breast disease,” says Dr. Haddad. Unfortunately, the Tyrer-Cuzick method requires many more data elements to assess breast cancer risk, which discourages busy clinicians from using it.

Another obstacle to using any of these risk assessment tools is the fact that they don’t readily fit into the average physician’s clinical workflow. Ideally these tools should seamlessly integrate into the EHR system. Even better would be the incorporation of AI-enhanced algorithms that automate the abstraction of the required data elements from the patient’s record into the assessment tool. For example, the algorithm would flag a family history of breast cancer, increased breast density as determined during a mammogram, as well as hormone replacement therapy and insert those risk factors into the Tyrer-Cuzick tool.

Even with this AI-enhanced approach, all of the available risk models fall short because they take a population-based approach, as we mentioned above. Dr. Haddad and her colleagues are looking to make the assessment process more individualized, as are others work in this specialty. That model could incorporate each patient’s previous mammography results, their genetics and benign breast biopsy findings, and much more. Adam Yala, and his colleagues at MIT recently developed a mammography-based deep learning model designed to take this more sophisticated approach. Called Mirai, it was trained on a large data set from Massachusetts General Hospital and from facilities in Sweden and Taiwan.  The new model generated significantly better results for breast cancer risk prediction than the TC model.

Breast cancer risk assessment continues to evolve. And with better utilization of existing assessment tools and the assistance of deep learning, we can look forward to better patient outcomes.

Monday, August 23, 2021

Can Social Determinants of Health Predict Your Patient’s Future?

The evidence is mixed but suggests that these overlooked variables have a profound impact on each patient’s journey. 

This article was written by Tim Suther, Nicole Hobbs, Jeff McGinn, Matt Turner with Change Healthcare, John Halamka, MD, MS, president of Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform.

By one estimate, social determinants of health (SDoH) influence up to 80% of health outcomes. Although reports like this suggest that these social factors have a major impact, thought leaders continue to debate whether they can also enhance the accuracy in predictive models. Resolving that debate is far from simple because the answer depends on the type, source and quality of the data, and the design of the model under consideration.

In general, we derive SDoH from subjective and objective sources. Subjective data includes self-reported or clinician-collected data such as patient reported outcomes, Z codes from ICD-10-CM that report factors that influence health status and interactions with health service providers, and other unstructured EHR data. Objective data includes individual-level and community-level data from government, public and private (and consumer behavior) sources; it’s usually more structured and often derived from national-level datasets.

Unfortunately, the research on the value of SDoH in predictive models varies widely. Some studies report no appreciable differences when SDoH are injected into models, while others report significant enhancements to predictive power. Unsurprisingly, these varying study results depend in part on levels of reliance on traditional clinical models and, most importantly, on the types and sources of SDoH data employed in the studies.

For example, a group from Johns Hopkins Bloomberg School of Public Health demonstrated SDoH predictive models can fail in part due to predictive model design as well as to EHR-level data that is unstructured and collected inconsistently.  They also demonstrated that dependence on data from EHR-derived population health databases for SDoH can be problematic because the data tends to be used as a proxy for individual-level social factors.  The problem lies in the fact that these proxies are often based on assumptions, not evidence. Other research supports the above and showcases the challenges of using SDoH data from sources that traditionally struggle with the comprehensive collection and standardization of these data types.

On a more positive note, several studies and healthcare articles have reported success by relying on objectively collected and/or highly structured and consistent data. For example, one study that used EHR-derived SDoH data sources found that the addition of structured data on median income, unemployment rate,  and education from trustworthy non-EHR sources  enhanced their model’s health prediction granularity for some of the most vulnerable subgroups of patients. In another study, collaboration between Stanford, Harvard, and the Imperial College London found that adding structured SDoH data from the US Census, along with using machine learning techniques, improved risk prediction model accuracy for hospitalization, death, and costs. They also showed that their models based on SDoH alone, as well as those based on clinical comorbidities alone, could predict health outcomes and costs. Similarly, researchers at The Ohio State University College of Medicine added community-level and consumer behavior data not available in standard EHR data and found it enhanced the study of and impact on obesity prevention.  Juhn et. al. at Mayo Clinic tapped telephone survey data and appended housing and neighborhood characteristic data from local government sources to create a socioeconomic status index (HOUSES). They first showed that HOUSES correlated well with outcome measures and later showed that HOUSES could even serve as a predictive tool for graft failure in patients.

Patient Level SDoH + Clinical Data = Predictive Power

Incorporating social factors into the healthcare equation can fill gaps needed at the point of care, but it also generates better healthcare predictions, but only when these determinants are patient level and linked to robust clinical data. Change Healthcare, for example, has curated such an integrated national-level dataset, linking billions of historical de-identified distinct medical claims with patient-level social, physical and behavioral determinants of health. One of this dataset’s most important uses is to understand the relative weight of specific patient SDOH factors, in comparison to clinical factors alone, for various therapeutic conditions, including COVID-19. For example, across Change Healthcare’s research, economic stability is repeatedly ranked as the highest or among the highest predictors of the healthcare experience. Despite this realization, most end users, including providers and payers, lack such visibility (or rely on geographic averages that are unhelpful in making accurate predictive models).

Incorporating SDoH data into predictive models holds much promise. Given the relative newness of SDoH data in predictive analytics, along with a lack of data standardization and scale, it’s not surprising to find varying degrees of success in using it to improve predictive health models. But as researchers learn more about the best types and sources of SDoH data to use, along with developing better-suited models for these types of data, we’re likely to see significant advances in healthcare predictive models. By combining the right data with the right models, SDoH are a powerful asset in predictive models of health, outcomes, and potential health disparities.

If you're still with us . . .

Please consider supporting Dr. Steve Parodi, Reed Abelson and I by "voting up" on our panel at the upcoming South by Southwest conference in March of 2022. Our proposed panel, "Extending the Stethoscope Into the Home," will dive into a discussion about acute health care for patients in their home and the infrastructures needed to support it. If you are so inclined to vote, please do so here.

Tuesday, August 17, 2021

We Need to Open Up the AI Black Box

To convince physicians and nurses that deep learning algorithms are worth using in everyday practice, developers need to explain how they work in plain clinical English.

Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, and John Halamka, M.D., president, Mayo Clinic Platform, wrote this article.

AI’s so-called black box refers to the fact that much of the underlying technology behind machine learning-enhanced algorithms is probability/statistics without a human readable explanation. Oftentimes that’s the case because the advanced math or the data science behind the algorithms is too complex for the average user to understand without additional training. Several stakeholders in digital health maintain, however, that this lack of understanding isn’t that important. They argue that as long as an algorithm generates actionable insights, most clinicians don’t really care about what’s “under the hood.” Is that reasoning sound?

Some thought leaders point to the fact that there are many advanced, computer-enhanced diagnostic and therapeutic tools currently in use that physicians don’t fully understand, but nonetheless accept.  The CHA2DSA-VASc score, for instance, is used to estimate the likelihood of a patient with non-valvular atrial fibrillation having a stroke. Few clinicians are familiar with the original research or detailed reasoning upon which the calculator is based, but they nonetheless use the tool.  Similarly, many physicians use the FRAX score to estimate a patient’s 10-year risk of developing a bone fracture, despite the fact that they have not investigated the underlying math. 

It’s important to point out, however, that the stroke risk tool and the FRAX tool both have major endorsements from organizations that physicians respect. The American Heart Association and the American College of Cardiology both recommend the CHA2DSA-VASc score while the National Osteoporosis Foundation supports the use of FRAX score. That gives physicians confidence in these tools even if they don’t grasp the underlying details. To date, there are no major professional associations recommending specific AI-enabled algorithms to supplement the diagnosis or treatment of disease. The American Diabetes Association did include a passing mention of an AI-based screening tool in its 2020 Standards of Medical Care in Diabetes, stating: “Artificial intelligence systems that detect more than mild diabetic retinopathy and diabetic macular edema authorized for use by the FDA represent an alternative to traditional screening approaches. However, the benefits and optimal utilization of this type of screening have yet to be fully determined.” That can hardly be considered a recommendation.

Given this scenario, most physicians have reason to be skeptical, and surveys bear out that skepticism.   A survey of 91 primary care physicians found that understandability of AI is one of the important attributes they want before trusting its recommendations during breast cancer screening. Similarly, a survey of senior specialists in UK found that understandability was one of their primary concerns about AI.  Among New Zealand physicians, 88% were more likely to trust an AI algorithm that produced an understandable explanation of its decisions.

Of course, it may not be possible to fully explain the advanced mathematics used to create machine learning based algorithms. But there are other ways to describe the logic behind these tools that would satisfy clinicians.  As we have mentioned in previous publications and oral presentations, there are tutorials available to simplify machine learning-related systems like neural networks, random forest modeling, clustering, and gradient boosting. Our most recent book contains an entire chapter on this digital toolbox. Similarly, JAMA  has created clinician friendly video tutorials designed to graphically illustrate how deep learning is used in medical image analysis and how such algorithms can be used to help detect lymph node metastases in breast cancer patients. 

These resources require clinicians to take the initiative and learn a few basic AI concepts, but developers and vendors also have an obligation to make their products more transparent.  One way to accomplish that goal is through saliency maps and generative adversarial networks. Using such techniques, it’s possible to highlight the specific pixel grouping that a neural network has identified as a trouble spot, which the clinician can then view on a radiograph, for example.  Alex DeGrave, with the University of Washington, and his colleagues, used this approach to help explain why an algorithm designed to detect COVID-19-related changes in chest X-rays made its recommendations. Amirata Ghrobani and associates from Stanford University have taken a similar approach to help clinicians comprehend the echocardiography recommendations coming from a deep learning system. The researchers trained a convolutional neural network (CNN) on over 2.6 million echocardiogram images from more than 2,800 patients and demonstrated it was capable of identifying enlarged left atria, left ventricular hypertrophy, and several other abnormalities. To open up the black box, Ghorbani et al presented readers with “biologically plausible regions of interest” in the echocardiograms they analyzed so they could see for themselves the reason for the interpretation that the model has arrived at. For instance, if the CNN said it had identified a structure such as a pacemaker lead, it highlighted the pixels it identifies as the lead. Similar clinician-friendly images are presented for a severely dilated left atrium and for left ventricular hypertrophy.

Deep learning systems are slowly ushering in a new way to manage diagnosis and treatment, but to bring skeptical clinicians on board, we need to pull the curtain back. In addition to providing evidence that these tools are equitable and clinically effectively, practitioners want reasonable explanations to demonstrate that they will do what they claim to do.

Friday, July 30, 2021

The Future Belongs to Digital Pathology

Advances in artificial intelligence are slowly transforming the specialty, much the way radiology is being transformed by similar advances in digital technology.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

Any patient who faces a potential cancer diagnosis knows how important an accurate, timely pathology report is.  Similarly, surgeons often require fast pathology results when they are performing a delicate procedure to determine their course of action during an operation. New technological developments are poised to meet the needs of patients and clinicians alike.

AI can improve pathology practice in numerous ways. The right digital tools can automate several repetitive tasks, including the detection of small foci. It can also help improve the staging of many malignancies, make the workflow process more efficient, and help classify images, which in turn gives pathologists a “second set of eyes”. And those “eyes” do not grow tired at the end of a long day or feel stressed out from too much work.

Such capabilities have far-reaching implications. With the right scanning hardware and the proper viewer software, pathologists and technicians can easily view and store whole slide images (WSIs). That view is markedly different from what they see through a microscope, which only allows a narrow field of view. In addition, digitization allows pathologists to mark up WSIs with non-destructive annotations, use the slides as teaching tools, search a laboratory’s archives to make comparisons with images that depict similar cases, give colleagues and patients access to the images, and create predictive models. And if the facility has cloud storage capabilities, it allows clinicians, patients, and pathologists around the world to access the data.

A 2020 prospective trial conducted by University of Michigan and Columbia University investigators illustrates just how profound the impact of AI and ML can be when applied to pathology.  Todd Hollon and colleagues point out that interoperative diagnosis of cancer relies on a “contracting, unevenly distributed pathology workforce.”1 The process can be quite inefficient, requiring a tissue specimen travel from the OR to a lab, followed by specimen processing, slide preparation by a technician, and a pathologist’s review. At University of Michigan, they are now using Stimulated Raman histology, an advanced optical imaging method, along with a convolutional neural network (CNN) to help interpret the images. The machine learning tools were trained to detect 13 histologic categories and includes an inference algorithm to help make a diagnosis of brain cancer. Hollon et al conducted a 2-arm, prospective multicenter, non-inferiority trial to compare the CNN results to those of human pathologists. The trial, which evaluated 278 specimens, demonstrated that the machine learning system was as accurate as pathologists’ interpretation (94.6% vs 93.9%). Equally important was the fact that it took under 15 seconds for surgeons to get their results with the AI system, compared to 20-30 minutes with conventional techniques. And that latter estimate does not represent the national average. In some community settings, slides have to be shipped by special courier to labs that are hours away.

Mayo Clinic is among several forward-thinking health systems that are in the process of implementing a variety of digital pathology services. Mayo Clinic has partnered with Google and is leveraging their technology in two ways. The program will extend Mayo Clinic’s comprehensive Longitudinal Patient Record profile with digitized pathology images to better serve and care for patients. And we are exploring new search capabilities to improve digital pathology analytics and AI. The Mayo/Google project is being conducted with the help of Sectra, a digital slide review and image storage and management system. Once proof of concept, system testing, and configuration activities are complete, the digital pathology solution will be introduced gradually to Mayo Clinic departments throughout Rochester, Florida, and Arizona, as well as the Mayo Clinic Health System.

The new digital capabilities taking hold in several pathology labs around the globe are likely to solve several vexing problems facing the specialty. Currently there is a shortage of pathologists worldwide, and in some countries, that shortage is severe. One estimate found there is one pathologist per 1.5 million people in parts of Africa. And China has one fourth the number of pathologists practicing in the U.S., on a per capita basis. Studies predict that the steady decline of the number of pathologists in the U.S. will continue over the next two decades. A lack of subspecialists is likewise a problem. Similarly, there are reports of poor accuracy and reproducibility, with many practitioners making subjective judgements based on a manual estimate of the percentage of positive cells for a biomarker. Finally, there is reason to believe that implementing digital pathology systems will likely improve a health system’s financial return on investment. One study has suggested that it can “improve the efficiency of pathology workloads by 13%.” 2

As we have said several times in these columns, AI and ML are certainly not a panacea, and they will never replace an experienced clinician or pathologist. But taking advantage of the tools generated by AI/ML will have a profound impact of diagnosis and treatment for the next several decades.



1. Hollon T, Pandian B, Adapa A et al. Near real-time intraoperative brain tumor diagnosis using stimulated Raman histology and deep neural networks. Nat. Med. 2020. 26:52-58.

2. Ho J, Ahlers SM, Stratman C, et al. Can digital pathology result in cost savings? a financial projection for digital pathology implementation at a large integrated health care organization. J Pathol Inform. 2014;5(1):33; doi: 10.4103/2153-3539.139714.

Wednesday, July 28, 2021

Shift Happens

Dataset shift can thwart the best intentions of algorithm developers and tech-savvy clinicians, but there are solutions.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

Generalizability has always been a concern in health care, whether we’re discussing the application of clinical trials or machine-learning based algorithms. A large randomized controlled trial that finds an intensive lifestyle program doesn’t reduce the risk of cardiovascular complications in Type 2 diabetics, for instance, suggests the diet/exercise regimen is not worth recommending to patients. But the question immediately comes to mind: Can that finding be generalized to the entire population of Type 2 patients? As we have pointed out in other publications, subgroup analysis has demonstrated that many patients do, in fact, benefit from such a program.

The same problem exists in health care IT. Several algorithms have been developed to help classify diagnostic images, predict disease complications, and more. A closer look at the datasets upon which these digital tools are based indicates many suffer from dataset shift. In simple English, dataset shift is what happens when the data collected during the development of an algorithm changes over time and is different from the data when the algorithm is eventually implemented. For example, the patient demographics used to create a model may no longer represent the patient population when the algorithm is put into clinical use. This happened when COVID 19 changed the demographic characteristics of patients, making the Epic sepsis prediction tool ineffective.

Samuel Finlayson, PhD, with Harvard Medical School, and his colleagues described a long list of data set shift scenarios that can compromise the accuracy and equity of AI-based algorithms, which in turn can compromise patient outcomes and patient safety. Finlayson et al list 14 scenarios, which fall into 3 broad categories: changes in technology; changes in population and setting; and changes in behavior. Examples of ways in which dataset shift can create misleading outputs that send clinicians down the wrong road include:

  • Changes in the X-ray scanner models used
  • Changes in the way diagnostic codes are collected (e.g. using ICD9 and then switching to ICD10)
  • Changes in patient population resulting from hospital mergers

Other potential problems to be cognizant of include changes in your facility’s EHR system. Sometimes updates to the system may result in changes in how terms are defined, which in turn can impact predictive algorithms that rely on those definitions. If a term like elevated temperature or fever is changed to pyrexia in one of the EHR drop down menus, for example, it may no longer map to the algorithm that uses elevated temperature as one of the variable definitions to predict sepsis, or any number of common infections. Similarly, if the ML-based model has been trained on a patient dataset for a medical specialty practice or hospital cohort, it’s likely that data will generate misleading outputs when applied to a primary care setting.

Finlayson et al mention another example to be aware of: changes in the way physicians practice can influence data collection: “Adoption of new order sets, or changes in their timing, can heavily affect predictive model output.” Clearly, problems like this necessitate strong interdisciplinary ties, including an ongoing dialogue between the chief medical officer, clinical department heads, and chief information officer and his or her team. Equally important is the need for clinicians in the trenches to look for subtle changes in practice patterns that can impact the predictive analytics tools currently in place. Many dataset mismatches can be solved by updating variable mapping, retraining or redesigning the algorithm, and multidisciplinary root cause analysis.

While addressing dataset shift issues will improve the effectiveness of your AI-based algorithms, they are only one of many stumbling blocks to contend with. One classic example that demonstrates that computers are still incapable to matching human intelligence is the study that concluded that patients with asthma are less likely to die from pneumonia that those who don’t have asthma. The machine learning tool used to come to that unwarranted conclusion had failed to take into account the fact that many asthmatics often get faster, earlier, more intensive treatment when their condition flares up, which results in a lower mortality rate. Had clinicians acted on the misleading correlation between asthma and fewer deaths from pneumonia, they might have decided asthma patients don’t necessarily need to hospitalized when they develop pneumonia.

This kind of misdirection is relatively common and emphasizes the fact that ML-enhanced tools sometimes have trouble separating useless “noise” from meaningful signal. Another example worth noting: Some algorithms designed to help detect COVID 19 by analyzing X-rays suffer from this shortcoming. Several of these deep learning algorithms rely on confounding variables instead of focusing on medical pathology, giving clinicians the impression that they are accurately identifying the infection or ruling out its presence. Unbeknownst to their users, the algorithms have been shown to rely on text markers or patient positioning instead of pathology findings.

At Mayo Clinic, we have had to address similar problems. A palliative care model that was trained on data from the Rochester, Minnesota, community, for instance, did not work well in our health system because the severity of patient disease in a tertiary care facility is very different than what’s seen in a local community hospital. Similarly, one of our algorithms broke when a vendor did a point release in its software and changed the format of the results. We also had a vendor with a CT stroke detection algorithm run 10 of our known stroke patients through its system and was only able to identify one patient. The root cause: Mayo Clinic medical physicists have optimized our radiation exposure to 25% of industry standards to reduce radiation exposure to patients, but that changed the signal to noise ratio of the images and the vendor’s system wasn’t trained on that ratio and couldn’t find the images.

Valentina Bellini, with University of Parma, Parma, Italy, and her colleagues sum up the AI shortcut dilemma in a graphic that illustrates 3 broad problem areas: Poor quality data, ethical and legal issues, and lack of educational programs for clinicians who may be skeptical or uninformed about the value and limitations of AI enhanced algorithms in intensive care settings.

As we have pointed out in other blogs, ML-based algorithms rely on math, not magic. But when reliance on that math overshadows clinicians’ diagnostic experience and common sense, they need to partner with their IT colleagues to find ways to reconcile artificial and human intelligence.

Friday, July 23, 2021

Causality in Medicine: Moving Beyond Correlation in Clinical Practice

A growing body of research suggests it’s time to abandon outdated ideas about how to identify effective medical therapies.

Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, and John Halamka, M.D., president, Mayo Clinic Platform, wrote this article.

“Correlation is not causation.” It’s a truism that researchers take for granted, and for good reason. The fact that event A is followed by event B doesn’t mean that A caused B. An observational study of 1,000 adults, for example, that found those taking high doses of vitamin C were less likely to develop lung cancer doesn’t prove the nutrient protects against the cancer; it’s always possible that a third factor — a confounding variable — was responsible for both A and B. In other words, patients taking lots of vitamin C may be less likely to get lung cancer because they are more health conscious than the average person, and therefore more likely to avoid smoking, which in turn reduces their risk of the cancer.

As this example illustrates, confounding variables are the possible contributing factors that may mislead us into imagining a cause-and-effect relationship exists when there isn’t one. It’s the reason interventional trials like the randomized controlled trial (RCT) remain a more reliable way to determine causation than observational studies. But it’s important to point out that in clinical medicine, there are many treatment protocols in use that are not supported by RCTs. Similarly, there are many risk factors associated with various diseases but it’s often difficult to know for certain whether these risk factors are actually contributing causes of said diseases. 

While RCTs remain the good standard in medicine, they can be impractical for a variety of reasons: they are often very expensive to perform; an RCT that exposes patients to potentially harmful risk factor and compares them to those who aren’t would be unethical; most trials require many exclusion and inclusion criteria that don’t exist in the everyday practice of medicine. For instance, they usually exclude patients with co-existing conditions, which may distort the study results.

One way to address this problem is by accepting less than perfect evidence and using a reliability scale or continuum to determine which treatments are worth using and which are not. That scale might look something like this, with evidential support growing stronger from left to right along the continuum: 

In the absence of RCTs, it’s feasible to consider using observational studies like case/control and cohort trials to justify using a specific therapy. And while such observational studies may still mislead because some confounding variables have been overlooked, there are epidemiological criteria that strengthen the weight given to these less than perfect studies:

  • A stronger association or correlation between two variables is more suggestive of a cause/effect relationship than a weaker association.

  • Temporality. The alleged effect must follow the suspected cause not the other way around. It would make no sense to suggest that exposure to Mycobacterium tuberculosis causes TB if all the cases of the infection occurred before patients were exposed to the bacterium.

  • A dose-response relationship exists between alleged cause and effect. For example, if researchers find that a blood lead level of 10 mcg/dl is associated with mild learning disabilities in children, 15 mcg/dl is linked to moderate deficit, and 20 mcg/dl with severe deficits, this gradient strengthens the argument for causality.

  • A biologically plausible mechanism of action linking cause and effect strengthens the argument. In the case of lead poisoning, there is evidence pointing to neurological damage brought on by oxidative stress and a variety of other biochemical mechanisms.

  • Repeatability of the study findings: If the results of one group of investigators are duplicated by independent investigators, that lends further support to the cause/effect relationship.

While adherence to all these criteria suggests causality for observational studies, a statistical approach called causal inference can actually establish causality. The technique, which was spearheaded by Judea Pearl, Ph.D., winner of the 2011 Turing Award, is considered revolutionary by many thought leaders and will likely have profound implications for clinical medicine, and for the role of AI and machine learning. During the recent Mayo Clinic Artificial Intelligence Symposium, Adrian Keister, Ph.D., a senior data science analyst at Mayo Clinic, concluded that causal inference is “possibly the most important advance in the scientific method since the birth of modern statistics — maybe even more important than that.”

Conceptually, causal inference starts with the conversion of word-based statements into mathematical statements, with the help of a few new operators. While that may sound daunting to anyone not well-versed in statistics, it’s not much different than the way we communicate by using the language of arithmetic. A statement like fifteen times five equals seventy five is converted to 15 x 5 = 75. In this case, x is an operator. The new mathematical language of causal inference might look like this if it were to represent an observational study that evaluated the association between a new drug and an increase in patients’ lifespan: P (L|D) where P is probability, L, lifespan, D is the drug, and | is an operator that means “conditioned on.”

An interventional trial such as an RCT, on the other hand, would be written as X causes Y if P (L|do (D)) > P(Y), in which case the do-operator refers to the intervention, i.e., giving the drug in a controlled setting. This formula is a way to of saying X (the drug being tested), causes Y (longer life) if the results of the intervention are greater than the probability of a longer life without administering the drug, in other words, the probability in the placebo group, namely P(Y).

This innovative technique also uses causal graphs to show the relationship of a confounding variable to a proposed cause/effect relationship. Using this kind of graph, one can illustrate how the tool applies in a real-world scenario. Consider the relationship between smoking and lung cancer. For decades, statisticians and policy makers argued about whether smoking causes the cancer because all the evidence supporting the link was observational. The graph would look something like this.

Figure 1:

G is the confounding variable — a genetic predisposition for example — S is smoking and LC is lung cancer. The implication here is that if a third factor causes persons to smoke and causes cancer, one cannot necessarily conclude that smoking causes lung cancer.  What Pearl and his associates discovered was that if an intermediate factor can be identified in the pathway between smoking and cancer, it’s then possible to establish a cause/effect relationship between the 2 with the help of a series of mathematical calculations and a few algebraic rewrite tools. As figure 2 demonstrates, tar deposits in the smokers’ lung are that intermediate factor.  

Figure 2:

For a better understanding of how causal inference works, Judea Pearl’s The Book of Why is worth a closer look. It provides a plain English explanation of causal inference. For a deeper dive, there’s Causal Inference in Statistics: A Primer.

Had causal inference existed in the 1950s and 1960s, the argument by tobacco industry lobbyists would have been refuted, which in turn might have saved many millions of lives. The same approach holds tremendous potential as we begin to apply it to predictive algorithms and other machine-learning based digital tools. 

Monday, July 19, 2021

Taking Down the Fences that Divide Us

Innovation in healthcare requires new ways to think about interdisciplinary solutions.

Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform and John Halamka, M.D., president, Mayo Clinic Platform, wrote this article.

During the 10 years we have worked together, John and I have written often about the power of words like transformation, optimism, cynicism, and misdiagnosis. Another word that needs more attention is “interdisciplinary.” It’s been uttered so many times in science, medicine, and technology that it’s lost much of its impact.  We all give lip service to the idea, but many aren’t willing or able to do the hard work required to make it a reality, and one that fosters innovation and better patient care.

Examples of the dangers of focusing too narrowly on one discipline are all around us. The disconnect between technology and medicine becomes obvious when you take a closer look at the invention of blue light emitting diodes (LEDs), for instance, for which Isamu Aksaki, Hiroshi Amano, and Shuti Nakamura won the Nobel Prize in Physics in 2014. While this technology reinvented the way we light own homes, providing a practical source of bright, energy-saving light, the researchers failed to take into account the health effects of their invention.  Had they been encouraged to embrace an interdisciplinary mindset, they might have considered the neurological consequences of being exposed to too much blue light. Certain photoreceptive retinal cells detect blue light, which is plentiful in sunlight. As it turns out, the brain interprets LEDs much like it interprets sunlight, in effect telling us it’s time to wake up, making it difficult to get to sleep.

Problems like this only serve to emphasize what materials scientist Ainissa Ramirez, PhD  discusses in a recent essay: “The culture of research … does not incentivize looking beyond one’s own discipline … Academic silos barricade us from thinking broadly and holistically. In materials science, students are often taught that the key criteria for materials selection are limited to cost, availability, and the ease of manufacturing. The ethical dimension of a materials innovation is generally set aside as an elective class in accredited engineering schools. But thinking about the impacts of one’s work should be neither optional nor an afterthought.”

This is the same problem we face in digital health. Too many data scientists and venture capitalists have invested time and resources into developing impressive algorithms capable of screening for disease and improving its treatment. But some have failed to take a closer look at the data sets upon which these digital tools are built, data sets that misrepresent the populations they are trying to serve. The result has been an ethical dilemma that needs our immediate attention.

Consider the evidence: A large commercially available risk prediction data set used to guide healthcare decisions has been analyzed to find out how equitable it is. The data set was designed to determine which patients require more than the usual attention because of their complex needs. Zaid Obermeyer from the School of Public Health at the University of California, Berkley, and his colleagues looked at over 43,000 White and about 6,000 Black primary care patients in the data set and discovered that when Blacks were assigned to the same level of risk as Whites by the algorithm based on the data set, they were actually sicker than their White counterparts. How did this racial bias creep into the algorithm? Obermeyer et al explain: “Bias occurs because the algorithm uses health costs as a proxy for health needs. Less money is spent on Black patients who have the same level of need, and the algorithm thus falsely concludes that Black patients are healthier than equally sick White patients.”

Similarly, evidence from an Argentinian study that analyzed data from deep neural networks used on publicly available X-ray image datasets intended to help diagnose thoracic diseases revealed inequities. When the investigators compared gender-imbalanced datasets to datasets in which males and females were equally represented, they found that, “with a 25%/75% imbalance ratio, the average performance across all diseases in the minority class is significantly lower than a model trained with a perfectly balanced dataset.” Their analysis concluded that datasets that underrepresent one gender results in biased classifiers, which in turn may lead to misclassification of pathology in the minority group.

These disparities not only re-emphasize the need for technologists, clinicians, and ethicists to work together, they beg the question: How can we fix the problem now? Working from the assumption that any problem this complex needs to be precisely measured before it can be rectified, Mayo Clinic, Duke School of Medicine, and Optum/Change Healthcare are currently analyzing a massive data set with more than 35 billion healthcare events and about 16 billion encounters that are linked to data sets that include social determinants of health. That will enable us to stratify the data by race/ethnicity, income, geolocation, education, and the like. Creating a platform that systematically evaluates commercially available algorithms for fairness and accuracy is another tactic worth considering. Such a platform would create “food label” style data cards that include the essential features of each digital tool, including its input data sources and types, validation protocols, population composition, and performance metrics. There are also several analytical tools specifically designed to detect algorithmic bias, including Google’s TCAV, Audit-AI, and IBM’s AI- Fairness 360.

The fences that divide healthcare can be torn down. It just takes determination and enough craziness to believe it can be done — and lots of hard work.

Monday, July 12, 2021

Identifying the Best De-Identification Protocols

Keeping patient data private remains one of the biggest challenges in healthcare. A recently developed algorithm from nference is helping address the problem.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

In the United States, healthcare organizations that manage or store personal health information (PHI) are required by law to keep that data secure and private. Ignoring that law, as spelled out in the HIPAA regulations, has cost several providers and insurers millions of dollars in fines, and serious damage to their reputations. HIPAA offers 2 acceptable ways to keep PHI safe: Certification by a recognized expert and the Safe Harbor approach, which requires organizations to hide 18 identifiers in patient records so that unauthorized users cannot identify patients. At Mayo Clinic, however, we believe we must do more.

In partnership with the data analytics firm nference, we have developed a de-identification approach that takes patient privacy to the next level, using a protocol on EHR clinical notes that includes attention-based deep learning models, rule-based methods, and heuristics. Murugadoss et al explain that “rule-based systems use pattern matching rules, regular expressions, and dictionary and public database look-ups to identify PII [personally identifiable information] elements.” The problem with relying solely on such rules is they miss things, especially in an EHR’s narrative notes, which often use non-standard expressions, including unusual spellings, typographic errors and the like. Such rules also consume a great deal of time to manually create.  Similarly, traditional machine learning based systems, which may rely on support vector machine or conditional random fields, have their shortcomings and tend to remain reliable across data sets.

The ensemble approach used at Mayo includes a next generation algorithm that incorporates natural language processing and machine learning. Upon detection of PHI, the system transforms detected identifiers into plausible, though fictional, surrogates to further obfuscate any leaked identifier. We evaluated the system with a publicly available dataset of 515 notes from the I2B2 2014 de-identification challenge and a dataset of 10,000 notes from Mayo Clinic. We compared our approach with other existing tools considered best-in-class. The results indicated a recall of 0.992 and 0.994 and a precision of 0.979 and 0.967 on the I2B2 and the Mayo Clinic data, respectively.

While this protocol has many advantages over older systems, it’s only one component of a more comprehensive system used at Mayo to keep patient data private and secure.  Experience has shown us that de-identified PHI, once released to the public, can sometimes be re-identified if a bad actor decides to compare these records to other publicly available data sets. There may be obscure variants within the data that humans can interpret as PHI but algorithms will not. For example, a computer algorithm expects phone numbers to be in the form area code, prefix, suffice i.e. (800) 555-1212. What if a phone number is manually recorded into a note as 80055 51212? A human might dial that number to re-identify the record. Further we expect dates to be in the form mm/dd/yyyy. What if a date of birth is manually typed into a note as 2104Febr (meaning 02/04/2021)? An algorithm might miss that.

With these risks in mind, Mayo Clinic is using a multi-layered defense referred to as data behind glass. The concept of data behind glass is that the de-identified data is stored in an encrypted container, always under control of Mayo Clinic Cloud. Authorized cloud sub-tenants can be granted access such that their tools can access the de-identified data for algorithm development, but no data can be taken out of the container. This prevents prevents merging the data with other external data sources.

At Mayo Clinic, the patient always comes first, so we have committed to continuously adopt novel technologies that keep information private.

Tuesday, July 6, 2021

Learning from AI’s Failures

A detailed picture of AI’s mistakes is the canvas upon which we create better digital solutions.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

We all tend to ignore clichés because we’ve heard them so often, but some clichés are worth repeating. “We learn more from failure than success” comes to mind. While it may be overused, it nonetheless conveys an important truth for anyone involved in digital health. Two types of failures are worth closer scrutiny: algorithms that claim to improve diagnosis or treatment but fall short for lack of evidence or fairness; and failure to convince clinicians in community practice that evidence-based algorithms are worth using.

As we mentioned in an earlier column, a growing number of thought leaders in medicine have criticized the rush to generate AI-based algorithms because many lack the solid scientific foundation required to justify their use in direct patient care. Among the criticisms being leveled at AI developers are concerns about algorithms derived from a dataset that is not validated with a second, external dataset, overreliance on retrospective analysis, lack of generalizability, and various types of bias. A critical look at the hundreds of healthcare-related digital tools that are now coming to market indicates the need for more scrutiny, and the creation of a set of standards to help clinicians and other decision makers separate useful tools from junk science. 

The digital health marketplace is crowded with attention-getting tools. Among 59 FDA-approved medical devices that incorporated some form of machine learning, 49 unique devices were designed to improve clinical decision support, most of which are intended to assist with diagnosis or triage. Some were designed to automatically detect diabetic retinopathy, analyze specific heart sounds, measure ejection fraction and left ventricular volume, and quantify lung nodules and liver lesions, to name just a few. Unfortunately, the evidential support for many recently approved medical devices varies widely.

Among the AI-based algorithms that has attracted attention is one designed to help clinicians predict the onset of sepsis.  The Epic Sepsis Model (ESM) has been used on tens of thousands of inpatients to gauge their risk of developing this life-threatening complication. Part of the Epic EHR system, it is a penalized logistic regression model that the vendor has tested on over 400,000 patients in 3 health systems. Unfortunately, because ESM is a proprietary algorithm, there’s a paucity of information available on the software’s inner workings or its long-term performance. Investigators from the University of Michigan just conducted a detailed analysis of the tool among over 27,600 patients and found it wanting. Andrew Wong and his associates found an area under the receiver operating characteristic curve (AURAC) of only 0.63. Their report states: “The ESM identified 183 of 2552 patients with sepsis (7%) who did not receive timely administration of antibiotics, highlighting the low sensitivity of the ESM in comparison with contemporary clinical practice. The ESM also did not identify 1709 patients with sepsis (67%) despite generating alerts for an ESM score of 6 or higher for 6971 of all 38,455 hospitalized patients (18%), thus creating a large burden of alert fatigue.” They go on to discuss the far-reaching implications of their investigation: “The increase and growth in deployment of proprietary models has led to an underbelly of confidential, non–peer-reviewed model performance documents that may not accurately reflect real-world model performance. Owing to the ease of integration within the EHR and loose federal regulations, hundreds of US hospitals have begun using these algorithms.”

Reports like this only serve to amplify the reservations many clinicians have about trusting AI-based clinical decision support tools. Unfortunately, they tend to make clinicians not just skeptical but cynical about all AI-based tools, which is a missed opportunity to improve patient care. As we pointed on in a recent NEJM Catalyst review, there are several algorithms that are supported by prospective studies, including a growing number of randomized controlled trials.

So how do we get scientifically well-documented digital health tools into clinicians’ hands and convince them to use them? One approach is to develop an evaluation system that impartially reviews all the specs for each product, and generates model cards to provide end users a quick snapshot of their strengths and weaknesses. But that’s only the first step. By way of analogy, consider the success of online stores hosted by Walmart or Amazon. They’ve invested heavily in state of the art supply chains that ensure their products are available from warehouses as customers demand them. But without a delivery service that gets products into customers’ homes quickly and with a minimum of disruption, even the best products will sit on warehouse shelves. The delivery service has to seamlessly integrate into customers’ lives. The product has to show up on time, it has to be the right size garment, in a sturdy box, and so on. Similarly, the best diagnostic and predictive algorithms have to be delivered with careful forethought and insight, which requires design thinking, process improvement, workflow integration, and implementation science.

Ron Li and his colleagues at Stanford University describe this delivery service in detail, emphasizing the need to engage stakeholders from all related disciplines before even starting algorithm development to look for potential barriers to implementation. They also suggest the need for “empathy mapping” to look for potential power inequities among clinician groups who may be required to use these digital tools.  It is easy to forget that implementing any technological innovation must also take into account the social and cultural issues unique to the healthcare ecosystem, and to the individual facility where it is being implemented.

If we are to learn from AI’s failures, we need to evaluate its products and services more carefully and develop them within an interdisciplinary environment that respects all stakeholders.

Monday, June 28, 2021

A Paradigm Shift in Digital Health

Innovation is best scaled when pipelines are replaced with platforms.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

The Digital Health Frontier includes cutting edge predictive analytics, machine learning enhanced algorithms and big data analytics. But for these innovations to have their full impact on patient care requires the right strategic and operational foundation. In the past, many technology-focused organizations have relied on a pipeline approach as the foundation to construct innovations and promote growth. But history suggests this approach is less sustainable than a platform approach. A recent article in Harvard Business Review (HBR) sums up the difference: “Platform businesses bring together producers and consumers in high-value exchanges. Their chief assets are information and interactions, which together are also the source of the value they create and their competitive advantage…. Pipeline businesses create value by controlling a linear series of activities — the classic value-chain model. Inputs at one end of the chain (say, materials from suppliers) undergo a series of steps that transform them into an output that’s worth more: the finished product.” While this explanation gets the point across, it’s rather abstract. To really appreciate the advantages of one approach over the other, we need an example or two.

Apple’s handset business follows the pipeline model, making sure there are adequate supplies available to build the device and then overseeing the various other steps to create a finished iPhone, as well as its distribution, sales, and servicing. But when Apple linked the phone to its App store, the situation changed dramatically, turning the operation into a sustainable platform that connected app developers with iPhone owners. In their HBR article, Geoffrey Parke and Sangeet Paul Choudary explain: “The resource-based view of competition holds that firms gain advantage by controlling scarce and valuable — ideally, inimitable — assets. In a pipeline world, those include tangible assets such as mines and real estate and intangible assets like intellectual property. With platforms, the assets that are hard to copy are the community and the resources its members own and contribute, be they rooms or cars or ideas and information.” Apple’s success and the loss of market share by pipeline-oriented companies like Nokia can be explained by such differences.

Like Apple, John Deere has successfully employed a platform approach. They own not just the physical assets — e.g. tractors and combines — but a vast collection of intellectual property — including APIs and apps to help farmers manage what is now being called precision agriculture. The company links third party providers and producers to their farming customers and reaps the benefits. With all these technological tools in place, farmers now have the ability to more efficiently monitor their equipment with data on fuel consumption, location, machine hours, and engine RPMs; and they can improve crop management with weather prediction data, community pricing and the like. Some of John Deere’s more advanced combines incorporate a grain quality camera, grain loss sensor, a Gen4 display monitor, and remote access to an operations center from inside the cab. For every new connected tractor sold, more data flows into the John Deere Platform, enhancing the value of the platform to partners creating new apps and analytics.

Mayo Clinic Platform (MCP) is taking a similar approach. Instead of creating dozens of pipeline businesses or building an organization chart to support pipeline businesses, we are leveraging external collaborators, network effects, and data flowing back to the Platform, which increases its value for producers of products and consumers of services. Mayo Clinic and Commure, a General Catalyst portfolio health care technology company, have launched Lucem Health to connect data from remote medical devices with AI-enabled algorithms. External collaborators who have partners with MCP include nference, Medically Home, Kaiser Permanente, and K Health. The strategic approach allows the Mayo Clinic Platform to offer products and services that fall into 4 broad categories of functionality: Gather, Discover, Validate, and Deliver. For example, in the Deliver “bucket” is the combined ECG/algorithm system that was recently validated and published in Nature Medicine. The digital tool was able to detect low ejection fraction, thereby improving the diagnosis of left ventricular systolic dysfunction. While the ECG/algorithm is improving direct patient care at Mayo Clinic, it can also be offered to external partners and embedded in their ECG waive form viewer, which in turn will improve the relationship between a community hospital and its patient population.  Similarly, the clinical data analytics tools developed by MCP are being made available to outside partners like K Health, which provides symptom checking, access to virtual visits with a clinician. The data analytics function is helping K Health improve its services to their clientele.

As the HBR article emphasized, the chief assets of a platform are information and interactions, which together are also the source of the value they create and their competitive advantage. Such value and advantages are what will sustain healthcare innovators through the next several decades.

Tuesday, June 15, 2021

When AI Meets SDOH

Artificial intelligence can help identify and address the social determinants of health.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

Machine learning is getting better at predicting things. There are now algorithms that improve the detection of diabetic retinopathy, predict the onset of sepsis, and help determine a critically ill patient’s risk of dying.  But a piece of wisdom from Warren Buffet comes to mind: “Predicting rain doesn’t matter. Building arks does.” Even the most impressive algorithm is relatively useless if it doesn’t allow us to build better “arks” to address the medical  disorder or  complications that the digital tool identifies.  And building the best healthcare interventions requires that clinicians not just identify the right signs, symptoms, and biomarkers, whether they be high cholesterol levels, elevated A1c, or a lump in a woman’s breast.  It requires we understand what’s happening in patients’ everyday lives outside  the clinic, the so-called social determinants of health (SDOH), and then using that data to inform treatment.

A great deal has been written recently about SDOH. Health professionals are slowly beginning to realize that we cannot “remove health and illness from the social contexts in which they are produced,” according to Simukai Chigudu, Oxford Department of International Development, University of Oxford.1  That begs the questions: What social issues are mostly likely to influence our patients’ clinical course and what do we do about them? How can AI help alleviate the impact of these issues?

The Centers for Disease Control and Prevention (CDC) has numerous data sources to help incorporate SDOH into public health initiatives and medical practice. But as the agency points out, moving from data to action is the hard part. CDC has several programs designed to focus clinicians’ attention on key social issues, including socioeconomic status, educational level, and work history. One initiative, for instance, zeros in on the role of EHRs. Its purpose is to support the incorporation and use of structured work information into health IT systems. How might this SDOH element inform a physician’s different diagnosis? Consider a patient with hypertension who doesn’t respond to a low sodium diet or anti-hypertension medication. Awareness of his 10 year history as a house painter might point the clinician in the direction of lead poisoning, a possible cause of hypertension. Similarly, a nurse practitioner may be at a loss to figure out why a patient with type 2 diabetes has recently seen a spike in her A1c levels.  If at EHR system is linked to work history, when the NP enters the reason for the clinic visit into the EHR field for chief complaint, this might trigger a pop up box that states that the patient works the night shift and that shift work can affect diabetes control. The system would then provide recommendations on diabetes management among shift workers. The same CDC program is also working on a work information data model, as well as national standards for vocabulary, system interoperability, and instructions for health IT system developers. 

At Mayo Clinic, we are also studying the impact of SDOH on health and disease. Young Juhn, MD, MPH, Director of the AI Program and Precision Population Science Lab of Department of Pediatric & Adolescent Medicine at the Clinic, has studied  the effects of socioeconomic status on health since in 2006 when his research work was supported by the NIH. With the support from the NIH, he developed and validated a housing-based socioeconomic measure called the Housing Based Index of Socioeconomic Status or HOUSES index, which is being used in epidemiologic research to help understand health disparities and differences in a variety of health outcomes in both adults and children. The index has enabled researchers to overcome the absence of socioeconomic measures in commonly used data sources (e.g., medical records or administrative data), conduct geospatial analysis in health disparities research, and apply a life course approach.

The HOUSES index is an objective way to measure the individual-level socioeconomic status of  patients because it is based on real property data for individual (not aggregated) housing units and is derived from public records; it uses 4 data points: the number of bedrooms in a person’s residence, as well as the number of bathrooms, square footage of the unit, and estimated building value of the unit. The index can help target patients who are most at risk of poor health outcomes and inadequate access to health care , demonstrating the real value of adding SDOH into the mix by addressing the limitations of the existing SDOH. For example, Stevens et al have shown that patients with a higher HOUSES score (quartiles 2-4) had 53% lower risk of  kidney transplant rejection (adjusted hazard ratio 0.47), when compared to those with the lowest score (quartile 1).2 Dr Juhn and his colleagues have found that HOUSES can  predict 44 different health outcomes and behavioral risk factors in both adults and children.

Of course, clinicians still have to be reasonable in their expectations. Even if an algorithm were outfitted with every conceivable SDOH, it still may not reduce disparities in healthcare. Patients and providers may choose to ignore the recommendations of the improved algorithm because they believe the recommended diagnostic test is too expensive or unjustified, for example, because it is too difficult for patients to get to the testing facility, or because a patient’s lack of health literacy prevents them from seeing the value of said test.

Despite these shortcomings, SDOH-enhanced algorithms have the potential to improve patient care. While physicians and nurses have gained tremendous insights into health and disease by measuring countless clinical parameters during office visits, it’s clear now that’s not enough.  The clinical picture generated with these metrics is too often hazy and needs to be supplemented by a long list of social metrics that can influence a patient’s access to care and their long-term outcomes.


 1. Chigudu S. Book: An ironic guide to colonialism in global health. Lancet. 2021. 397:1874-1975.

 2. Stevens M, Beebe TJ, Wi Chung-II et al. HOUSES index as an innovative socioeconomic measure predicts graft failure among kidney transplant recipients. Transplantation 2020; 104:2383-2392.