➡️ Start Asking Your Data ‘Why?’ — A Gentle Intro To Causality

Correlation does not imply causation. It turns out, however, that with some simple ingenious tricks one can, potentially, unveil causal relationships within standard observational data, without having to resort to expensive randomised control trials. This post is targeted towards anyone making data driven decisions. The main takeaway message is that causality may be possible by […] The post ➡️ Start Asking Your Data ‘Why?’ — A Gentle Intro To Causality appeared first on Towards Data Science.

Feb 15, 2025 - 03:54
 0
➡️ Start Asking Your Data ‘Why?’ — A Gentle Intro To Causality

Correlation does not imply causation. It turns out, however, that with some simple ingenious tricks one can, potentially, unveil causal relationships within standard observational data, without having to resort to expensive randomised control trials.

This post is targeted towards anyone making data driven decisions. The main takeaway message is that causality may be possible by understanding that the story behind the data is as important as the data itself.

By introducing Simpson’s and Berkson’s Paradoxes, situations where the outcome of a population is in conflict with that of its cohorts, I shine a light on the importance of using causal reasoning to identify these paradoxes in data and avoid misinterpretation. Specifically I introduce causal graphs as a method to visualise the story behind the data point out that by adding this to your arsenal you are likely to conduct better analyses and experiments.

My ultimate objective is to whet your appetite to explore more on causality, as I believe that by asking data “Why?” you will be able to go beyond correlation calculations and extract more insights, as well as avoid common misjudgement pitfalls.

Note that throughout this gentle intro I do not use equations but demonstrate using accessible intuitive visuals. That said I provide resources for you to take your next step in adding Causal Inference to your statistical toolbox so that you may get more value from your data.

The Era of Data Driven Decision Making

In [Deity] We Trust, All Others Bring Data! — William E. Deming

In this digital age it is common to put a lot of faith in data. But this raises an overlooked question: Should we trust data on its own?

Judea Pearl, who is considered the godfather of Causality, articulated best:

“The collection of information is as important as the information itself “ — Judea Pearl

In other words the story behind the data is as important as the data itself.

Judea Pearl is considered the Godfather of Causality. Credit: Aleksander Molak

This manifests in a growing awareness of the importance of identifying bias in datasets. By the end of this post I hope that you will appreciate that causality pertains the fundamental tools to best express, quantify and attempt to correct for these biases.

In causality introductions it is customary to demonstrate why “correlation does not imply causation” by highlighting limitations of association analysis due to spurious correlations (e.g, shark attacks                         </div>
                                            <div class= Read More