How to be the best Economic Data Scientist: The Seven Tools of Causal Inference and Ethics

Originally published on November 21, 2019, on LinkedIn, updated lightly October 29, 2022

My blog tagline is economists put the science into data science. Part of the reason I make this claim is many applied econometricians (sadly not all) place a high value on causality and causal inference. Further, those same economists will follow an ethic of working with data that is close to the 2002 guidance of Peter Kennedy and myself.

Judea Pearl discusses “The Seven Tools of Causal Inference with Reflections on Machine Learning” (cacm.acm.org/magazines/2019/3/234929), a Contributed Article in the March 2019 CACM.

This is a great article with three messages.

The first message is to point out the ladder of causation.

  1. As shown in the figure, the lowest rung is an association, a correlation. He writes it as given X, what then is my probability of seeing Y?
  2. The second rung is intervention. If I do X, will Y appear?
  3. The third is counterfactual in that if X did not occur, would Y not occur?

In his second message, he discusses an inference engine, of which he says AI people and I think economists should be very familiar. After all, economists are all about causation, being able to explain why something occurs, but admittedly not always at the best intellectual level. Nevertheless, the need to seek casualty is definitely in the economist’s DNA. I always say the question “Why?” is an occupational hazard or obsession for economists.

People who know me understand that I am a huge admirer, indeed a disciple of the late Peter Kennedy (Guide to Econometrics, chapter on Applied Econometrics, 2008). Kennedy in 2002 set out the 10 rules of applied econometrics in his article “Sinning in the Basement: What are the rules.” I think they imply practices of ethical data use and are of wider application than with Kennedy’s intended audience. I wrote about Ethical Rules in Applied Econometrics and Data Science here.

Kennedy’s first rule is to use economic theory and common sense when articulating a problem and reasoning a solution. Pearl in his Book of Why explains that one cannot advance beyond rung one without other outside information. I think Kennedy would wholeheartedly agree. I want to acknowledge Marc Bellemare for his insightful conversation on the combination of Kennedy and Pearl in the same discussion of rules in applied econometrics. Perhaps I will write about that later.

Pearl’s third message is to give his seven (7) rules or tools for Causal Inference. They are

  1. Encoding causal assumptions: Transparency and testability.
  2. Do-calculus and the control of confounding.
  3. The algorithmization of counterfactuals. 
  4. Mediation analysis and the assessment of direct and indirect effects.
  5. Adaptability, external validity, and sample selection bias.
  6. Recovering from missing data. 
  7.  Causal discovery.

I highly recommend this article, followed by the Book of Why (lead coauthor) and Causal Inference in Statistics: A Primer. (lead coauthor). Finally, I include a plug for a book in which I contributed a chapter on ethics in econometrics, Bill Franks, 97 Things About Ethics Everyone in Data Science Should Know: Collective Wisdom from the Experts.