Thursday, October 21, 2021

Machine Learning Can Make Lab Testing More Precise

An analysis of over 2 billion lab test results suggests a deep learning model can help create personalized reference ranges, which in turn would enable clinicians to monitor health and disease better.

Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform and John Halamka, M.D., president, Mayo Clinic Platform, wrote this article.

Almost every patient has blood drawn to measure a variety of metabolic markers. Typically, test results come back as a numeric or text value accompanied by a reference range which represents normal values. If total serum cholesterol level is below 200 mg/dl or serum thyroid hormone level is 4.5 to 12.0 mcg/dl, clinicians and patients assume all is well. But suppose Helen’s safe zone varies significantly from Mary’s safe zone. If that were the case, it would suggest a one-size-fits-all reference range misrepresents an individual’s health status. That position is supported by studies that found the distribution of more than half of all lab test results, which rely on standard reference ranges, differ when personal characteristics are considered.1

With these concerns in mind, Israeli investigators from the Weismann Institute and Tel Aviv Sourasky Medical Center extracted data on 2.1 billion lab measurements from EHR records, taken from 2.8 million adults for 92 different lab tests. Their goal was to create “data-driven reference ranges that consider age, sex, ethnicity, disease status, and other relevant characteristics.”1  To accomplish that goal, they used machine learning and computational modeling to segment patients into different “bins'' based on health status, medication intake, and chronic disease.2. That in turn left the team with about half a billion lab results from the initial 2.8 million people, which they used to model a set of reference lab values that more precisely reflected the ranges of healthy persons. Those ranges could then be used to predict patients’ “future lab abnormalities and subsequent disease.”

Taking their investigation one step forward, Cohen et al. used their new algorithms to evaluate the risk of specific disorders amongst healthy individuals. When they looked at anemia cut offs like hemoglobin and mean corpuscular volume, a measurement of red blood cell size, their newly created risk calculators were able to separate anemic patients into groups at high risk for microcytic and macrocytic anemia from those with a risk no higher than the average nonanemic population. Similar benefits were observed when the researchers applied their models to prediabetes: “…using a personalized risk model, we can improve the classification of patients who are prediabetic and identify patients at risk 2 years earlier compared to classification based merely on current glucose levels.”

William Morice, M.D., Ph.D., chair of the Department of Laboratory Medicine and Pathology (DLMP) at Mayo Clinic and president of Mayo Clinic Laboratories, immediately saw the value of this type of data analysis: “In the ‘era of big data and analytics,’ it is almost unconscionable that we still use ‘normal reference ranges’ that lack contextual data, and possibly statistical power, to guide clinicians in the clinical interpretation of quantitative lab results. I was taught this by Dr. Piero Rinaldo, a medical geneticist in our department and a pioneer in this field, who focuses on its application to screening for inborn errors of metabolism. He has developed an elegant tool that is now used globally for this application, Collaborative Laboratory Integrated Reports (CLIR).”

During a recent conversation with Piero Rinaldo, M.D., Ph.D., he explained that Mayo Clinic has been using a more personalized approach to lab testing since 2015 and stated that “CLIR is a shovel-ready software for the creation of collaborative precision reference ranges.” The web-based application has been used to create several personalized data sets that can improve clinicians’ interpretation of lab test results. It has been deployed by Dr. Rinaldo and his associates to improve the screening of newborns for congenital hyperthyroidism.3. The software performs multivariate pattern recognition on lab values collected from 7 programs, including more than 1.9 million lab test results. CLIR is able to integrate covariate-adjusted results of different tests into a set of customized interpretive tools that physicians can use to better distinguish between false positive and true positive test results.


1. Tang A, Oskotsky T, Sirota M. Personalizing routine lab tests with machine Learning. Nature Medicine. 2021; 27:1510-1517.

2. Cohen N, Schwartzman O, Jaschek R et al. Personalized lab test models to quantify disease potentials in healthy individuals. Nature Medicine.2021; 27: 1582-1591.

3. Rowe AD, Stoway SD, Ahlman H et al. A Novel Approach to Improve Newborn Screening for Congenital Hypothyroidism by Integrating Covariate-Adjusted Results of Different Tests into CLIR Customized Interpretive Tools. Inter J Neonatal Screening. 2021. 7:23

Wednesday, October 13, 2021

Gastroenterology Embraces Artificial Intelligence

AI and machine learning have the potential to redefine the management of several GI disorders.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

Colonoscopy is one of the true success stories in modern medicine. Studies have demonstrated that colonoscopy screening detects the cancer at a much earlier stage, reducing the risk of invasive tumors and metastatic disease, and reducing mortality. However, while colorectal cancer is highly preventable, it is the third leading cause of cancer-related deaths in the U.S. About 148,000 individuals develop the malignancy and over 53,000 die from it each year. We asked ourselves a question: can AI improve the detection of this and related gastrointestinal disorders?

As we explained in The Digital Reconstruction of Healthcare, one of the challenges in making an accurate diagnosis of GI disease is differentiating between disorders that look similar at the cellular level. For example, because environmental enteropathy and celiac disease overlap histopathologically, deep learning algorithms have been designed to analyze biopsy slides to detect the subtle differences between the two conditions. Syed et al.1 used a combination of convolutional and deconvolutional neural networks in a prospective analysis of over 3,000 biopsy images from 102 children. They were able to tell the differences between environmental enteropathy, celiac disease, and normal controls with an accuracy rating of 93.4%, and a false negative rate of 2.4%. Most of these mistakes occurred when comparing celiac patients to healthy controls.

The investigators also identified several biomarkers that may help separate the two GI disorders: interleukin 9, interleukin 6, interleukin 1b, and interferon-induced protein 10 were all helpful in making an accurate prediction regarding the correct diagnosis. The potential benefits to this deep learning approach become obvious when one considers the arduous process that patients have to endure to reach a definitive diagnosis of either disorder: typically, they must undergo 4 to 6 biopsies and may need several endoscopic procedures to sample various sections of the intestinal tract because the disorder may affect only specific areas along the lining and leave other areas intact.

Several randomized controlled trials have been conducted to support the use of ML in gastroenterology. Chinese investigators, working in conjunction with Beth Israel Deaconess Medical Center and Harvard Medical School, tested a convolutional neural network to determine if it was capable of improving the detection of precancerous colorectal polyps in real time.2 The need for a better system of detecting these growths is evident, given the fact that more than 1 in 4 adenomas are missed during coloscopies. To address the problem, Wang et al. randomized more than 500 patients to routine colonoscopy and more than 500 to computer-assisted colonoscopies. In the final analysis, the adenoma detection rate (ADR) was higher in the ML-assisted group (29.1% vs. 20.3%, P < 0.001). The higher ADR occurred because the algorithm was capable of detecting a greater number of smaller adenomas (185 vs. 102). There were no significant differences in the detection of large polyps.

Nayantara Coelho-Prabhu, M.D., a gastroenterologist at Mayo Clinic, points out, however, that the clinical relevance of detection of diminutive polyps remains to be determined. “Yet, there is definite clinical importance in the subsequent development of computer assisted diagnosis (CADx) or polyp characterization algorithms. These will help clinicians determine clinically relevant polyps, and possibly advance the resect and discard practice. It also will help clinicians adequately assess margins of polyps, so that complete removal can be achieved, thus decreasing future recurrences.”

Randomized clinical trials demonstrated that a convolutional neural network in combination with deep reinforcement learning (collectively called the WISENSE system) can reduce the number of blind spots during endoscopy intended to evaluate the esophagus, stomach, and duodenum in real time. “A total of 324 patients were recruited and randomized; 153 and 150 patients were analysed in the WISENSE and control group, respectively. Blind spot rate was lower in WISENSE group compared with the control (5.86% vs 22.46%, p<0.001) . . .”3

Mayo Clinic’s Endoscopy Center, utilizing Mayo Clinic Platform’s resources, has also been exploring the value of machine learning in GI care with the assistance of Endonet, a comprehensive library of endoscopic videos and images, linked to clinical data including symptoms, diagnoses, pathology, and radiology. These data will include unedited full-length videos as well as video summaries of the procedure including landmarks, specific abnormalities, and anatomical identifiers. Dr. Coelho-Prabhu explains that the idea is to have different user interfaces: 

“From the patient’s perspective, it will serve as an electronic video record of all their procedures, and future procedures can be tailored to survey prior abnormal areas as needed.

From a research perspective, this will be a diverse and rich library including large volumes of specialized populations such as Barrett’s esophagus, inflammatory bowel disease, familial polyposis syndromes. The additional strength is that Mayo Clinic provides highly specialized care, especially to these select populations. We can develop AI algorithms to advance medical care using this library. From a hospital system perspective, this would serve as a reference library, guiding endoscopists, including for advanced therapeutic procedures in the future. It also could be used to measure and monitor quality indicators in endoscopy. From an educational standpoint, this library can be developed into a teaching set for both trainee and advanced practitioners looking for CME opportunities. From industry perspective, this database could be used to train/validate commercial AI algorithms.”

AI and machine learning may not be the panacea some technology enthusiasts imagine it to be, but there’s little doubt they are becoming an important partner in the road to more personalized patient care.


1. Syed S, Al-Bone M, Khan MN, et al. Assessment of machine learning detection of environmental enteropathy and celiac disease in children. JAMA Network Open. 2019;2:e195822.

2. Wang P, Berzin TM, Brown JR, et al. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut. 2019;68:1813–1819.

3. Wu L, Zhang J, Zhou W, et al Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy. Gut. 2019;68:2161–2169.

Wednesday, October 6, 2021

Societal Resilience Requires a Public Health Focus

We must make a serious commitment to increase financial resources and provide better analytics for real world evidence/real time data in support of public health.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

Public health has been underfunded for decades. That neglect has had a profound impact since the COVID-19 pandemic has taken hold, and awakened policy makers and thought leaders to the need for more investment.

Consider the statistics: The U.S. spends about $3.6 trillion each year on health but less than 3% of that amount on public health and prevention. A 2020 Forbes report likewise pointed out that “From the late 1960s to the 2010s, the federal share of total health expenditure for public health dropped from 45 percent to 15 percent.” This relative indifference to public health is partly responsible for the nation’s mixed response to the SARS-CoV-2 pandemic. A recent McKinsey & Company analysis concluded: “Government leaders remain focused on navigating the current crisis, but making smart investments now can both enhance the ongoing COVID-19 response and strengthen public-health systems to reduce the chance of future pandemics. Investments in public health and other public goods are sorely undervalued; investments in preventive measures, whose success is invisible, even more so.”

Among the other “public goods” that require more investment is population health management and analytics. Although experts continue to debate the differences between public health and population health, most are unimportant. For our purposes, population health refers to the status of a specific group of individuals, whether they reside in a specific city, state, or country. Public health usually casts a wider net, concerned about the status of the entire population. Managing the health of these subgroups requires an analytical approach that can take into account a long list of variables, including social determinants of health (SDoH), the content of their medical records, and much more. SDoH data from Change Health care, for instance, has demonstrated that economic stability index (ESI) is a strong predictor of health care utilization. ESI is a cluster model that uses market behavior and financial attitudes o group individuals into one of 30 categories, with category 1 representing persons most likely to be economically stable and category 30 least likely to be stable. The figure, which links race, ESI and health care utilization in Kentucky, suggests that Blacks/African Americans are far less likely to be economically stable (category 1). The same analysis found that Blacks/African Americans were almost twice as likely to use the ED compared to Whites (30.5% vs 18.1%). A growing number of health care organizations are starting to see the value of such population health metrics and are incorporating these statistics into their decision making.

Among the valuable sources of data that can inform population health are patient surveys, clinical registries, and EHRs. Several traditional analytics tools are available to extract actionable insights from these data sources, including logistic regression. Over the decades, several major studies have also generated risk scoring systems to improve public health. The Framingham heart health risk score has been used for many years to assess the likelihood of developing cardiovascular disease over a 10-year period. Because the scoring system can help predict the onset of heart disease, it can also serve as a useful tool in creating population-based preventive programs to reduce that risk. The tool requires patients to provide their age, gender, smoking status, total cholesterol, HDL cholesterol, systolic blood pressure, and whether they are taking antihypertensive medication. The American Diabetes Association has developed its own risk scoring method to assess the likelihood of type 2 diabetes in the population. The tool takes into account age, gender, history of gestational diabetes, physical activity level, family history of diabetes, hypertension, height and weight. Another analytics methodology that has value in population health is the LACE Index. The acronym stands for length of stay, acuity of admission, Charlson comorbidity index (CCI), and number of emergency department visits in the preceding 6 months. More recently, there are several AI-based analytic tools currently being used to improve population health. A review of ML-related analytic methods found that neural networks based algorithms are the most commonly used (41%) in this context, compared to 25.5% for support vector machines, and 21% for random forest modeling.

There is no way of knowing how the world would have coped with COVID-19 had policy makers fully invested in public and population health programs and analytics. But there’s little doubt that we’ll all fare much better during the next health crisis if we put more time, energy, and resources into these initiatives.