Correlation and Causation

Distinguishing between correlation and causation is an important consideration within the evidence-driven paradigm to prevent misattribution of causation when only correlation is determined by data.

Aug 24, 2022

One purpose of scientific research is to establish an understanding of the causal relationship between variables and outcomes. This may be related to the development of a disease, effectiveness of treatment, or prognostic variables. Given the manner in which the human brain attempts to identify patterns, we are highly susceptible to confusing correlation and causation. This distinction is important because if we are to truly understand the relationship between variables or recommend the most beneficial interventions, it is crucial to identify which relationships are causal in nature and which are merely correlation or coincidence.

Correlation occurs when there is an association between two factors or variables. In contrast, causation occurs when one variable leads to (or “causes”) the other. It is typically simple to find evidence of correlation between variables. Establishing a causal pathway or relationship, on the other hand, is much more difficult and requires consideration of several factors. In essence, the process of establishing that one variable causes another requires additional and higher levels of evidence than establishing an association between two variables.

In 1965, Sir Bradford Hill presented several considerations for establishing a causal relationship between variables. Some of these factors included the strength of the association, consistency of the association, temporality, presence of a dose-response relationship, plausibility, and coherence. With the exception of temporality, which describes the concept that the causal variable must occur prior to the outcome, each of the factors are relative indicators of causation. This means that neither the presence nor absence of the variable definitively defines or refutes a causal relationship. In addition to these criteria, it is also helpful to consider whether other factors, such as unmeasured or intermediary variables, may be causing the effect being studied.

There are several examples demonstrating the difference between correlation and causation. A hypothetical example would exist if measurement revealed an association between the amount of ice cream consumed and sunscreen applied, with greater amounts of each occurring during the summer months of the year. While there may be an association between these variables, it would most likely reflect a common third variable, specifically warm and sunny weather during the summer. When the weather is warmer outside, it is more likely that people would eat ice cream and simultaneously, due to the warm and sunny weather, apply sunscreen rather than any direct causal impact of one variable on the other.

While the example above may seem trivial, there was an article published in The New England Journal of Medicine in 2012 that described an association between the amount of chocolate consumed in the population and the number of Nobel Prize Laureates within the same country. Despite this article having been published to emphasize the distinction between correlation and causation, there were articles written in the mainstream media suggesting a causal pathway between chocolate consumption and subsequent Nobel Prize winners. In reality, there was no such causal relationship determined by the data.

Klein et al recently reported a cross sectional study of numerous immune related measurements and other hematologic findings amongst four groups of patients with the aim of identifying features to define long COVID. The study included 40 healthy, uninfected control patients, 37 health unvaccinated and previously infected control patients, 39 healthy previously infected patients with no persistent symptoms, and 99 patients with prior COVID infection and persistent symptoms. This fourth group was defined as having ‘long COVID’ and their median time from acute infection was 432 days. In addition to several tests of immune function, cortisol levels were measured amongst all those defined as having ‘long COVID’, 15 previously infected patients with no symptoms and 25 healthy and uninfected patients. The authors reported that cortisol levels amongst those with ‘long COVID’ “…were roughly half of those found in healthy or convalescent controls. Based on machine learning, cortisol levels along were the most significant predictor for Long COVID classification…”. On the basis of this study, there have been several articles in the mainstream media reporting a causal relationship between cortisol levels and development of ‘long COVID’. Despite the strength of the authors interpretation of the results, it is worth considering the discussion above regarding the distinction between correlation and causation and whether or not these findings are indicative of a causal relationship.

In order to promote trust amongst the target audience of a healthcare platform, it is important to not only provide evidence-driven content but be able to delineate between correlation and causation so that interpretations and recommendations can be as accurate as possible. At House Call Media, we provide our clients with the expertise in social media management and evidence-driven content development. As a result, clients are able to engage with their audience on the basis of high quality content.

To learn more, please visit www.housecallmedia.com

REFERENCES

Klein J, et al. Distinguishing Features of Long COVID Identified Through Immune Profiling. medRxiv, https://doi.org/10.1101/2022.08.09.22278592

Hill AB. The Environment and Disease: Association or Causation? Proc of Royal Soc Med, 58: 295-300, 1965.

Messerli F. Chocolate Consumption, Cognitive Function, and Nobel Laureates. N Engl J Med, 367: 1562-1564, 2012.

House Call Media

Correlation and Causation

Distinguishing between correlation and causation is an important consideration within the evidence-driven paradigm to prevent misattribution of causation when only correlation is determined by data.

Discussion about this post