Tuesday, August 31, 2021

Breast Cancer Screening: We Can Do Better

The three risk assessment tools now in use fall far short. Using the latest deep learning techniques, investigators are developing more personalized ways to locate women at high risk.

John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.

The promise of personalized medicine will eventually allow clinicians to offer individual patients more precise advice on prevention, early detection and treatment. Of course, the operative word is eventually. A closer examination of the screening tools available to detect breast cancer demonstrates that we still have a way to go before we can fulfill that promise. But with the help of better technology, we are getting closer to that realization.

Disease screening is about risk assessment. Researchers collect data on thousands of patients who develop breast cancer, for instance, and discover that the age range, family history and menstruation history of those who develop the disease differs significantly from those who remain free of it. That in turn allows policy makers to create a screening protocol that suggests women of a certain age who have experienced early menarche or late menopause are more likely to develop the malignancy. That risk assessment is consistent with the fact that more reproductive years means more exposure to the hormones that contribute to breast cancer. Similarly, there’s evidence to show that women with first degree relatives with the cancer and those with a history of ovarian cancer or HRT use are at greater risk.

Statistics like this are the basis for several breast cancer risk scoring systems, including the Gail score, the IBIS score, and BCSC tool.  The National Cancer Institute, which uses the Gail model, explains: “The Breast Cancer Risk Assessment Tool allows health professionals to estimate a woman's risk of developing invasive breast cancer over the next 5 years and up to age 90 (lifetime risk). The tool uses a woman’s personal medical and reproductive history and the history of breast cancer among her first-degree relatives (mother, sisters, daughters) to estimate absolute breast cancer risk—her chance or probability of developing invasive breast cancer in a defined age interval.” While the screening tool saves lives, it can also be misleading. If, for example, it finds that a woman has a 1% likelihood of developing breast cancer, what that really means is a large population of women with those specific risk factors has a one in 100 risk of developing the disease. There is no way of knowing what the threat is for any one patient in that group. Similar problems exist for the International Breast Cancer Intervention Study (IBIS) score, based on the Tyrer-Cuzick Model, and the Breast Cancer Surveillance Consortium (BCSC) Risk Calculator. These 3 assessment tools can give patients a false sense of security if they don’t dive into the details. BCSC, for instance, cannot be applied to women younger that 35 or older than 74, nor does it accurately measure risk for anyone who has previously had ductal carcinoma in situ (DCIS), or had breast augmentation. Similarly, the NCI tool doesn’t accurately estimate risk in women with BRCA1 or BRCA1 mutation, as well as certain other subgroups.

During a conversation with Tufia Haddad, M.D,, a Mayo Clinic medical oncologist with specialty interest in precision medicine in breast cancer and artificial intelligence, she discussed the research she and her colleagues are doing to improve the risk assessment process and identify more high-risk women. Dr. Haddad pointed out that there are numerous obstacles that prevent women from obtaining the best possible risk assessment. Too many women do not have a primary care practitioner who might use a risk tool. And those that do have a PCP are more likely to have an evaluation based on the Breast Cancer Risk Assessment tool (the Gail model). “We prefer the Tyrer-Cuzick model in part because it incorporates more personal information for each individual patient including a detailed family history, a woman’s breast density from her mammogram, as well as her history of atypia or other high risk benign breast disease,” says Dr. Haddad. Unfortunately, the Tyrer-Cuzick method requires many more data elements to assess breast cancer risk, which discourages busy clinicians from using it.

Another obstacle to using any of these risk assessment tools is the fact that they don’t readily fit into the average physician’s clinical workflow. Ideally these tools should seamlessly integrate into the EHR system. Even better would be the incorporation of AI-enhanced algorithms that automate the abstraction of the required data elements from the patient’s record into the assessment tool. For example, the algorithm would flag a family history of breast cancer, increased breast density as determined during a mammogram, as well as hormone replacement therapy and insert those risk factors into the Tyrer-Cuzick tool.

Even with this AI-enhanced approach, all of the available risk models fall short because they take a population-based approach, as we mentioned above. Dr. Haddad and her colleagues are looking to make the assessment process more individualized, as are others work in this specialty. That model could incorporate each patient’s previous mammography results, their genetics and benign breast biopsy findings, and much more. Adam Yala, and his colleagues at MIT recently developed a mammography-based deep learning model designed to take this more sophisticated approach. Called Mirai, it was trained on a large data set from Massachusetts General Hospital and from facilities in Sweden and Taiwan.  The new model generated significantly better results for breast cancer risk prediction than the TC model.

Breast cancer risk assessment continues to evolve. And with better utilization of existing assessment tools and the assistance of deep learning, we can look forward to better patient outcomes.

Monday, August 23, 2021

Can Social Determinants of Health Predict Your Patient’s Future?

The evidence is mixed but suggests that these overlooked variables have a profound impact on each patient’s journey. 

This article was written by Tim Suther, Nicole Hobbs, Jeff McGinn, Matt Turner with Change Healthcare, John Halamka, MD, MS, president of Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform.

By one estimate, social determinants of health (SDoH) influence up to 80% of health outcomes. Although reports like this suggest that these social factors have a major impact, thought leaders continue to debate whether they can also enhance the accuracy in predictive models. Resolving that debate is far from simple because the answer depends on the type, source and quality of the data, and the design of the model under consideration.

In general, we derive SDoH from subjective and objective sources. Subjective data includes self-reported or clinician-collected data such as patient reported outcomes, Z codes from ICD-10-CM that report factors that influence health status and interactions with health service providers, and other unstructured EHR data. Objective data includes individual-level and community-level data from government, public and private (and consumer behavior) sources; it’s usually more structured and often derived from national-level datasets.

Unfortunately, the research on the value of SDoH in predictive models varies widely. Some studies report no appreciable differences when SDoH are injected into models, while others report significant enhancements to predictive power. Unsurprisingly, these varying study results depend in part on levels of reliance on traditional clinical models and, most importantly, on the types and sources of SDoH data employed in the studies.

For example, a group from Johns Hopkins Bloomberg School of Public Health demonstrated SDoH predictive models can fail in part due to predictive model design as well as to EHR-level data that is unstructured and collected inconsistently.  They also demonstrated that dependence on data from EHR-derived population health databases for SDoH can be problematic because the data tends to be used as a proxy for individual-level social factors.  The problem lies in the fact that these proxies are often based on assumptions, not evidence. Other research supports the above and showcases the challenges of using SDoH data from sources that traditionally struggle with the comprehensive collection and standardization of these data types.

On a more positive note, several studies and healthcare articles have reported success by relying on objectively collected and/or highly structured and consistent data. For example, one study that used EHR-derived SDoH data sources found that the addition of structured data on median income, unemployment rate,  and education from trustworthy non-EHR sources  enhanced their model’s health prediction granularity for some of the most vulnerable subgroups of patients. In another study, collaboration between Stanford, Harvard, and the Imperial College London found that adding structured SDoH data from the US Census, along with using machine learning techniques, improved risk prediction model accuracy for hospitalization, death, and costs. They also showed that their models based on SDoH alone, as well as those based on clinical comorbidities alone, could predict health outcomes and costs. Similarly, researchers at The Ohio State University College of Medicine added community-level and consumer behavior data not available in standard EHR data and found it enhanced the study of and impact on obesity prevention.  Juhn et. al. at Mayo Clinic tapped telephone survey data and appended housing and neighborhood characteristic data from local government sources to create a socioeconomic status index (HOUSES). They first showed that HOUSES correlated well with outcome measures and later showed that HOUSES could even serve as a predictive tool for graft failure in patients.

Patient Level SDoH + Clinical Data = Predictive Power

Incorporating social factors into the healthcare equation can fill gaps needed at the point of care, but it also generates better healthcare predictions, but only when these determinants are patient level and linked to robust clinical data. Change Healthcare, for example, has curated such an integrated national-level dataset, linking billions of historical de-identified distinct medical claims with patient-level social, physical and behavioral determinants of health. One of this dataset’s most important uses is to understand the relative weight of specific patient SDOH factors, in comparison to clinical factors alone, for various therapeutic conditions, including COVID-19. For example, across Change Healthcare’s research, economic stability is repeatedly ranked as the highest or among the highest predictors of the healthcare experience. Despite this realization, most end users, including providers and payers, lack such visibility (or rely on geographic averages that are unhelpful in making accurate predictive models).

Incorporating SDoH data into predictive models holds much promise. Given the relative newness of SDoH data in predictive analytics, along with a lack of data standardization and scale, it’s not surprising to find varying degrees of success in using it to improve predictive health models. But as researchers learn more about the best types and sources of SDoH data to use, along with developing better-suited models for these types of data, we’re likely to see significant advances in healthcare predictive models. By combining the right data with the right models, SDoH are a powerful asset in predictive models of health, outcomes, and potential health disparities.

If you're still with us . . .

Please consider supporting Dr. Steve Parodi, Reed Abelson and I by "voting up" on our panel at the upcoming South by Southwest conference in March of 2022. Our proposed panel, "Extending the Stethoscope Into the Home," will dive into a discussion about acute health care for patients in their home and the infrastructures needed to support it. If you are so inclined to vote, please do so here.

Tuesday, August 17, 2021

We Need to Open Up the AI Black Box

To convince physicians and nurses that deep learning algorithms are worth using in everyday practice, developers need to explain how they work in plain clinical English.

Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, and John Halamka, M.D., president, Mayo Clinic Platform, wrote this article.

AI’s so-called black box refers to the fact that much of the underlying technology behind machine learning-enhanced algorithms is probability/statistics without a human readable explanation. Oftentimes that’s the case because the advanced math or the data science behind the algorithms is too complex for the average user to understand without additional training. Several stakeholders in digital health maintain, however, that this lack of understanding isn’t that important. They argue that as long as an algorithm generates actionable insights, most clinicians don’t really care about what’s “under the hood.” Is that reasoning sound?

Some thought leaders point to the fact that there are many advanced, computer-enhanced diagnostic and therapeutic tools currently in use that physicians don’t fully understand, but nonetheless accept.  The CHA2DSA-VASc score, for instance, is used to estimate the likelihood of a patient with non-valvular atrial fibrillation having a stroke. Few clinicians are familiar with the original research or detailed reasoning upon which the calculator is based, but they nonetheless use the tool.  Similarly, many physicians use the FRAX score to estimate a patient’s 10-year risk of developing a bone fracture, despite the fact that they have not investigated the underlying math. 

It’s important to point out, however, that the stroke risk tool and the FRAX tool both have major endorsements from organizations that physicians respect. The American Heart Association and the American College of Cardiology both recommend the CHA2DSA-VASc score while the National Osteoporosis Foundation supports the use of FRAX score. That gives physicians confidence in these tools even if they don’t grasp the underlying details. To date, there are no major professional associations recommending specific AI-enabled algorithms to supplement the diagnosis or treatment of disease. The American Diabetes Association did include a passing mention of an AI-based screening tool in its 2020 Standards of Medical Care in Diabetes, stating: “Artificial intelligence systems that detect more than mild diabetic retinopathy and diabetic macular edema authorized for use by the FDA represent an alternative to traditional screening approaches. However, the benefits and optimal utilization of this type of screening have yet to be fully determined.” That can hardly be considered a recommendation.

Given this scenario, most physicians have reason to be skeptical, and surveys bear out that skepticism.   A survey of 91 primary care physicians found that understandability of AI is one of the important attributes they want before trusting its recommendations during breast cancer screening. Similarly, a survey of senior specialists in UK found that understandability was one of their primary concerns about AI.  Among New Zealand physicians, 88% were more likely to trust an AI algorithm that produced an understandable explanation of its decisions.

Of course, it may not be possible to fully explain the advanced mathematics used to create machine learning based algorithms. But there are other ways to describe the logic behind these tools that would satisfy clinicians.  As we have mentioned in previous publications and oral presentations, there are tutorials available to simplify machine learning-related systems like neural networks, random forest modeling, clustering, and gradient boosting. Our most recent book contains an entire chapter on this digital toolbox. Similarly, JAMA  has created clinician friendly video tutorials designed to graphically illustrate how deep learning is used in medical image analysis and how such algorithms can be used to help detect lymph node metastases in breast cancer patients. 

These resources require clinicians to take the initiative and learn a few basic AI concepts, but developers and vendors also have an obligation to make their products more transparent.  One way to accomplish that goal is through saliency maps and generative adversarial networks. Using such techniques, it’s possible to highlight the specific pixel grouping that a neural network has identified as a trouble spot, which the clinician can then view on a radiograph, for example.  Alex DeGrave, with the University of Washington, and his colleagues, used this approach to help explain why an algorithm designed to detect COVID-19-related changes in chest X-rays made its recommendations. Amirata Ghrobani and associates from Stanford University have taken a similar approach to help clinicians comprehend the echocardiography recommendations coming from a deep learning system. The researchers trained a convolutional neural network (CNN) on over 2.6 million echocardiogram images from more than 2,800 patients and demonstrated it was capable of identifying enlarged left atria, left ventricular hypertrophy, and several other abnormalities. To open up the black box, Ghorbani et al presented readers with “biologically plausible regions of interest” in the echocardiograms they analyzed so they could see for themselves the reason for the interpretation that the model has arrived at. For instance, if the CNN said it had identified a structure such as a pacemaker lead, it highlighted the pixels it identifies as the lead. Similar clinician-friendly images are presented for a severely dilated left atrium and for left ventricular hypertrophy.

Deep learning systems are slowly ushering in a new way to manage diagnosis and treatment, but to bring skeptical clinicians on board, we need to pull the curtain back. In addition to providing evidence that these tools are equitable and clinically effectively, practitioners want reasonable explanations to demonstrate that they will do what they claim to do.