Causes and consequences

What is causal inference and how is it applied in population health research?

A couple of hours ago I was hit with a sudden headache. I took a painkiller, and after a while the headache was gone. Was the painkiller the reason for my recovery?

Causal inference, that is, the study of cause and effect, is one of the key issues in population health research: does obesity increase heart disease risk? Does high blood pressure affect the risk of stroke? More generally, is exposure to X causal to the probability of disease Y?

Going back to the headache example: did the paracetamol tablet actually cure my headache, or was my recovery a consequence of something else, such as the glass of water that I drank to flush down the pill?

What would have happened, if…

If we approach this question from a scientific point of view, we would have to imagine an alternative, counterfactual situation: would I have recovered from the headache within two hours without taking the painkiller?

We cannot give a definite answer to this question, as obviously we cannot live in two parallel universes, one in which I take the painkiller, the other where I do not. This problem is called the fundamental problem in causal inference.

Randomized and non-randomized studies

The best way to approach this problem would be to conduct a study on multiple people. The participants would be randomly allocated into two groups, one in which everyone receives the painkiller, and the other where everyone gets a similar pill but without the agent to cure a headache. In this scenario, we can make a statistical comparison on a group level to assess the effect of paracetamol on curing a headache.

However, there are many research questions for which we cannot do such a randomized study. If we were interested in, say, the effects of high blood pressure on the risk of stroke, such a study would require randomizing the study population to those who are forced to suffer from hypertension, and those whose blood pressure is on an average level. Such a study would not only be extremely difficult to implement, but also highly unethical to conduct.

In these cases, causal inference has to be done based on various non-randomized (or observational) studies, such as different birth cohorts or biobanks.

Causal inference in practice

How is causal inference done when it is impossible to randomize the exposure? A common way is to identify different confounding factors that affect the phenomenon of interest, and then use various statistical methods to adjust for the effects of these confounding factors. Examples of common methods for doing this are regression modelling, or matching in case-control studies.

Alternative methods include various quasi-randomized settings which attempt to mimic a randomized study. An example of these is Mendelian randomization, which exploits the random inheritance of genetic variants from parents.

The jump from correlation to causation

A common factor in all these methods is that the jump from correlation or a statistical relationship to causation can only be done by making certain assumptions. Whether these assumptions are valid or not always depends on the situation. It is also important to understand how various biases may affect the results.

In addition, it is important to be aware that causal inference should not be done based on a single study. It is imperative to triangulate the results from different types of studies, which have different approaches with different assumptions, and different sources of bias.

Nevertheless, despite causal associations, there is always inherent individual variation and randomness in disease onset; not all smokers get lung cancer, and some non-smokers can also get lung cancer.

Causal inference in population health research: difficult, but not impossible

In population health research, a randomized study is not feasible in the majority of study questions. Causal inference based on non-randomized studies is very challenging indeed, however it is not impossible; it is well known that smoking causes increased risk of lung cancer. However, no randomized studies in human populations were needed to reach this conclusion.

The authors of Decoding Health and Disease blog would like to wish everyone a peaceful Christmas time!

Author: Ville Karhunen

Created 17.12.2024 | Updated 17.12.2024

Causes and consequences

Decoding health and disease

Body mass index on the scale

Using metabolomics data in predicting diabetes risk

Fascinating world of genes

Publication Bias in Medical Research

Postal address

Street address