A growing body of research suggests it’s time to abandon outdated ideas about how to identify effective medical therapies.
Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, and John Halamka, M.D., president, Mayo Clinic Platform, wrote this article.
“Correlation is not causation.” It’s a truism that researchers take for granted, and for good reason. The fact that event A is followed by event B doesn’t mean that A caused B. An observational study of 1,000 adults, for example, that found those taking high doses of vitamin C were less likely to develop lung cancer doesn’t prove the nutrient protects against the cancer; it’s always possible that a third factor — a confounding variable — was responsible for both A and B. In other words, patients taking lots of vitamin C may be less likely to get lung cancer because they are more health conscious than the average person, and therefore more likely to avoid smoking, which in turn reduces their risk of the cancer.
As this example illustrates, confounding variables are the possible contributing factors that may mislead us into imagining a cause-and-effect relationship exists when there isn’t one. It’s the reason interventional trials like the randomized controlled trial (RCT) remain a more reliable way to determine causation than observational studies. But it’s important to point out that in clinical medicine, there are many treatment protocols in use that are not supported by RCTs. Similarly, there are many risk factors associated with various diseases but it’s often difficult to know for certain whether these risk factors are actually contributing causes of said diseases.
While RCTs remain the good standard in medicine, they can be impractical for a variety of reasons: they are often very expensive to perform; an RCT that exposes patients to potentially harmful risk factor and compares them to those who aren’t would be unethical; most trials require many exclusion and inclusion criteria that don’t exist in the everyday practice of medicine. For instance, they usually exclude patients with co-existing conditions, which may distort the study results.
One way to address this problem is by accepting less than perfect evidence and using a reliability scale or continuum to determine which treatments are worth using and which are not. That scale might look something like this, with evidential support growing stronger from left to right along the continuum:
In the absence of RCTs, it’s feasible to consider using observational studies like case/control and cohort trials to justify using a specific therapy. And while such observational studies may still mislead because some confounding variables have been overlooked, there are epidemiological criteria that strengthen the weight given to these less than perfect studies:
- A stronger
association or correlation between two variables is more suggestive of a
cause/effect relationship than a weaker association.
- Temporality. The
alleged effect must follow the suspected cause not the other way around. It
would make no sense to suggest that exposure to Mycobacterium
tuberculosis causes TB if all the cases of the infection occurred before
patients were exposed to the bacterium.
- A dose-response relationship exists between alleged cause
and effect. For example, if researchers find that a blood lead level of
10 mcg/dl is associated with mild learning disabilities in children, 15 mcg/dl
is linked to moderate deficit, and 20 mcg/dl with severe deficits, this
gradient strengthens the argument for causality.
- A biologically plausible mechanism of action
linking cause and effect strengthens the argument. In the case of lead
poisoning, there is evidence pointing to neurological damage brought on by
oxidative stress and a variety of other biochemical mechanisms.
- Repeatability of the study findings: If the results of one group of investigators are duplicated by independent investigators, that lends further support to the cause/effect relationship.
While adherence to all these criteria suggests causality for observational studies, a statistical approach called causal inference can actually establish causality. The technique, which was spearheaded by Judea Pearl, Ph.D., winner of the 2011 Turing Award, is considered revolutionary by many thought leaders and will likely have profound implications for clinical medicine, and for the role of AI and machine learning. During the recent Mayo Clinic Artificial Intelligence Symposium, Adrian Keister, Ph.D., a senior data science analyst at Mayo Clinic, concluded that causal inference is “possibly the most important advance in the scientific method since the birth of modern statistics — maybe even more important than that.”
Conceptually, causal inference starts with the conversion of word-based statements into mathematical statements, with the help of a few new operators. While that may sound daunting to anyone not well-versed in statistics, it’s not much different than the way we communicate by using the language of arithmetic. A statement like fifteen times five equals seventy five is converted to 15 x 5 = 75. In this case, x is an operator. The new mathematical language of causal inference might look like this if it were to represent an observational study that evaluated the association between a new drug and an increase in patients’ lifespan: P (L|D) where P is probability, L, lifespan, D is the drug, and | is an operator that means “conditioned on.”
An interventional trial such as an RCT, on the other hand, would be written as X causes Y if P (L|do (D)) > P(Y), in which case the do-operator refers to the intervention, i.e., giving the drug in a controlled setting. This formula is a way to of saying X (the drug being tested), causes Y (longer life) if the results of the intervention are greater than the probability of a longer life without administering the drug, in other words, the probability in the placebo group, namely P(Y).
technique also uses causal graphs to show the relationship of a confounding
variable to a proposed cause/effect relationship. Using this kind of graph, one
can illustrate how the tool applies in a real-world scenario. Consider the
relationship between smoking and lung cancer. For decades, statisticians and
policy makers argued about whether smoking causes the cancer because all the
evidence supporting the link was observational. The graph would look something
the confounding variable — a genetic predisposition for example — S is smoking and
LC is lung cancer. The implication here is that if a third factor causes persons
to smoke and causes cancer, one cannot necessarily conclude that smoking causes
lung cancer. What Pearl and his
associates discovered was that if an intermediate factor can be identified in
the pathway between smoking and cancer, it’s then possible to establish a cause/effect
relationship between the 2 with the help of a series of mathematical
calculations and a few algebraic rewrite tools. As figure 2 demonstrates, tar
deposits in the smokers’ lung are that intermediate factor.
Had causal inference existed in the 1950s and 1960s, the argument by tobacco industry lobbyists would have been refuted, which in turn might have saved many millions of lives. The same approach holds tremendous potential as we begin to apply it to predictive algorithms and other machine-learning based digital tools.