How to be the best Economic Data Scientist: The Seven Tools of Causal Inference and Ethics

Originally published on November 21, 2019, on LinkedIn, updated lightly October 29, 2022

My blog tagline is economists put the science into data science. Part of the reason I make this claim is many applied econometricians (sadly not all) place a high value on causality and causal inference. Further, those same economists will follow an ethic of working with data that is close to the 2002 guidance of Peter Kennedy and myself.

Judea Pearl discusses “The Seven Tools of Causal Inference with Reflections on Machine Learning” (cacm.acm.org/magazines/2019/3/234929), a Contributed Article in the March 2019 CACM.

This is a great article with three messages.

The first message is to point out the ladder of causation.

  1. As shown in the figure, the lowest rung is an association, a correlation. He writes it as given X, what then is my probability of seeing Y?
  2. The second rung is intervention. If I do X, will Y appear?
  3. The third is counterfactual in that if X did not occur, would Y not occur?

In his second message, he discusses an inference engine, of which he says AI people and I think economists should be very familiar. After all, economists are all about causation, being able to explain why something occurs, but admittedly not always at the best intellectual level. Nevertheless, the need to seek casualty is definitely in the economist’s DNA. I always say the question “Why?” is an occupational hazard or obsession for economists.

People who know me understand that I am a huge admirer, indeed a disciple of the late Peter Kennedy (Guide to Econometrics, chapter on Applied Econometrics, 2008). Kennedy in 2002 set out the 10 rules of applied econometrics in his article “Sinning in the Basement: What are the rules.” I think they imply practices of ethical data use and are of wider application than with Kennedy’s intended audience. I wrote about Ethical Rules in Applied Econometrics and Data Science here.

Kennedy’s first rule is to use economic theory and common sense when articulating a problem and reasoning a solution. Pearl in his Book of Why explains that one cannot advance beyond rung one without other outside information. I think Kennedy would wholeheartedly agree. I want to acknowledge Marc Bellemare for his insightful conversation on the combination of Kennedy and Pearl in the same discussion of rules in applied econometrics. Perhaps I will write about that later.

Pearl’s third message is to give his seven (7) rules or tools for Causal Inference. They are

  1. Encoding causal assumptions: Transparency and testability.
  2. Do-calculus and the control of confounding.
  3. The algorithmization of counterfactuals. 
  4. Mediation analysis and the assessment of direct and indirect effects.
  5. Adaptability, external validity, and sample selection bias.
  6. Recovering from missing data. 
  7.  Causal discovery.

I highly recommend this article, followed by the Book of Why (lead coauthor) and Causal Inference in Statistics: A Primer. (lead coauthor). Finally, I include a plug for a book in which I contributed a chapter on ethics in econometrics, Bill Franks, 97 Things About Ethics Everyone in Data Science Should Know: Collective Wisdom from the Experts.

Avoiding Pitfalls in Regression Analysis

(Updated with links and more Dec 1, 2020. Updated with SAS Global Forum announcement on Jan. 22, 2021.)

Professors reluctant to venture into these areas do no service to their students for preparation to enter the real world of work.

Today (November 30, 2020)  I presented: “Avoiding Pitfalls in Regression Analysis” during the Causal Inference Webinar at the Urban Analytics Institute in the Ted Rogers School of Management, Ryerson University. I was honored to do this at the kind invitation of Murtaza Haider, author of Getting Started with Data Science.  Primary participants are his students in Advanced Business Data Analytics in Business. This is an impressive well-crafted course (taught in R) and at the syllabus-level covers many of the topics in this presentation. I met Murtaza some time ago online and have come to regard him as a first-rate Applied Econometrician.

Ethics and moral obligation to our students

Just as Peter Kennedy developed rules for the ethical use of Applied Econometrics, this presentation is the first step to developing a set of rules for avoiding pain in one’s analysis. A warning against Hasty Regression (as defined) is prominent.

(Update 1/22/2021: My paper, “Haste Makes Waste: Don’t Ruin Your Reputation with Hasty Regression,” has been accepted for a prerecorded 20 minute breakout session at SAS Global Forum 2021, May 18-20, 2021. More on this in a separate post later.)

Kennedy said in the original 2002 paper, Sinning in the Basement, “… my opinion is that regardless of teachability, we have a moral obligation to inform students of these rules, and, through suitable assignments, socialize them to incorporate them into the standard operating procedures they follow when doing empirical work.… (I) believe that these rules are far more important than instructors believe and that students at all levels do not accord them the respect they deserve.”– Kennedy, 2002, pp. 571-2”  See my contribution to the cause, an essay on Peter Kennedy’s vision in Bill Frank’s book cited below.

While the key phrase in Peter’s quote seems to be the “moral obligation,” the stronger phrase is “regardless of teachability.” Professors reluctant to venture into these areas do no service to their students when they enter the real world of work. As with Kennedy, some of the avoidance of pitfall rules are equally difficult to teach leading faculty away from in-depth coverage.

The Presentation

A previous presentation has the subtitle, “Don’t let common mistakes ruin your regression and your career.” I only dropped that subtitle here for space-saving and not to disavow the importance of these rules in a good career trajectory.

cover slide

This presentation highlights seven of ten pitfalls that can befall even the technically competent and fully experienced. Many regression users will have learned about regression in courses dedicating a couple of weeks to much of a semester, and could be self-taught or have learned on the job. The focus of many curricula is to perfect estimation techniques and studiously learn about violations of the classical assumptions.  Applied work is so much more and one size does not always fit. The pitfalls remind all users to think fully through their data and their analysis. Used properly, regression is one of the most powerful tools in the analyst’s arsenal. Avoiding pitfalls will help the analyst avoid fatal results.

The Pitfalls in Regression Practice?

  1. Failure to understand why you are running the regression.
  2. Failure to be a data skeptic and ignoring the data generating process.
  3. Failure to examine your data before you regress.
  4. Failure to examine your data after you regress.
  5. Failure to understand how to interpret regression results.
  6. Failure to model both theory and data anomalies, and to know the difference.
  7. Failure to be ethical.
  8. Failure to provide proper statistical testing
  9. Failure to properly consider causal calculus
  10. Failure to meet the assumptions of the classical linear model.

How to get this presentation

Faculty, if you would like this presentation delivered to your students or faculty via webinar, please contact me.  Participants of the webinar can request a copy of the presentation by emailing me at myers@uakron.edu. Specify the title of the presentation and please give your name and contact information. Let me know what you thought of the presentation as well.

You can join me on LinkedIn at https://www.linkedin.com/in/stevencmyers/. Be sure to tell me why you are contacting me so I will be sure to add you.

I extend this to those who have heard the presentation before when first presented to the Ohio SAS Users Group 2020 webinar series on August 26, 2020.

Readings, my papers:

Recommended Books:

Other Readings and references:

Ethics Rules in Applied Econometrics and Data Science

Failing to follow these rules brings about ethical implications if not direct unethical behavior. To knowingly violate the rules is to be, or at least risk being, unethical in your handling of data.

I am honored to have a version of this essay appearing in Bill Frank’s 97 Things about ethics everyone in data science should know. Pick up a copy now. 
Published August 25, 2020

I have taught ethics in Applied Econometrics and Data Analysis for at least the last 20 years. But I rarely have used the word ethics, resorting to phrases such as data skepticism, and other attitudes that suggest acting ethically.

Nothing in the past 20 years has had as much impact on me and my classroom teaching and my ethics of data analysis as Peter Kennedy’s “Sinning in the Basement: What are the Rules? The Ten Commandments of Applied Econometrics.” This essay also appears in his Guide to Econometrics (Kennedy, 2008).

From the moment I read this paper, I was completely transformed and forever a disciple of his. I was fortunate to host him on my campus where he spoke of the misuse of econometrics and failure of research to make it past his editor’s desk at the Journal of Economic Education. One example, a paper is rejected because they did not acknowledge a problem in their analysis, ignored it, and probably hoped the editor would not notice. Being honest and transparent enough to acknowledge a problem of which the authors were aware, but unable to solve is sometimes enough Peter would point out. Hiding one transgression suggests other ethical abuses of data.

I used the word ethical, but Peter did not, preferring the oft used word sin and sinning. But the point is made. When I got my Ph.D. at The Ohio State University in 1980, I had taken nine separate statistics and econometrics courses over the 5 years that I was there. I learned classical estimation and inference from some of the best professors, and yes we had to “sin in the basement” (by pilgrimage to the mainframe computer in those days) because there was scarcely a day’s instruction of how to use the computer much less conduct what Peter would call the moral obligation of applied econometrics 20 years later.

Kennedy says “… my opinion is that regardless of teachability, we have a moral obligation to inform students of these rules, and , through suitable assignments, socialize them to incorporate them into the standard operating procedures they follow when doing empirical work.… (I) believe that these rules are far more important than instructors believe and that students at all levels do not accord them the respect they deserve.” (Kennedy, 2002, pp. 571-2)” I could not agree more and have tried to follow faithfully these rules and to teach my students and colleagues to do likewise.

Failing to follow these rules brings about ethical implications if not direct unethical behavior. To knowingly violate the rules is to be, or at least risk being, unethical in your handling of data. To unknowingly violate the rules would nevertheless lead to unintended consequences of poor outcomes that could be avoided.

The Rules of Applied Econometrics

Rule 1:

Use common sense and economic reasoning

Rule 2:

Avoid Type III errors

Rule 3:

Know the context

Rule 4:

Inspect the data

Rule 5:

Keep it sensibly simple

Rule 6:

Use the interocular trauma test

Rule 7:

Understand the costs and benefits of data mining.

Rule 8:

Be prepared to compromise

Rule 9:

Do not confuse statistical significance with meaningful magnitude

Rule 10:

Report a sensitive analysis

Failing to well and fully articulate the problem (rule 1) is so critical that to not spend time on the problem and the common sense and economic theoretical solution can lead to serious flaws in the study from the very first step. It might lead to a violation of rule 2 where the right answer to the wrong question is discovered? What if you fail to inspect the data (Rule 4), fail to clean the data and provide for necessary transforms, fail to control for selection bias, then you will have results based on assumptions that are not realistic and produce wrong results that are unduly influenced by the dirty data. The importance of this cannot be over emphasized. Recall that Griliches exclaims that if it weren’t for dirty data, economists wouldn’t have jobs. What if you violate Rule 7 and, knowingly or not, allow the data to lie to you. In the words of Nobel Prize winning economist, Ronald Coase (1995), “If you torture the data long enough, it will confess.” A violation of Rule 9 might lead you to worship R2 or participate in p–hacking. It might cause you to ignore a huge economic implication (large magnitude) only because it has a large p-value. Violations of Rule 10 may be the largest of all. Suppose in the spirit of discussing “sinning” you believe your model is from God (as suggested by Susan Athey) then why would you ever look at alternative specifications or otherwise validate the robustness of your findings. What you find is what you find and alternatives be dammed.

“Many data scientists make bad decisions – with ethical implications – not because they are intentionally trying to do harm, but because they do not have an understanding of how the algorithms they are taking responsibility for actually work. (Jennifer Priestly, August 2019, LinkedIn post).” Likewise many in the field who ignore the Rules for Applied Econometrics risk doing real harm, not out of intentionality, but out of ignorance or neglect. This later lack of motive is just as real and likely more widespread than the intentional harm, but harm occurs in any event.

The American Economic Association adopted ethical, code of conduct guidelines that states in part: “The AEA’s founding purpose of ‘the encouragement of economic research’ requires intellectual and professional integrity. Integrity demands honesty, care, and transparency in conducting and presenting research; disinterested assessment of ideas; acknowledgement of limits of expertise; and disclosure of real and perceived conflicts of interest.”

The AEA statement does not go directly to data ethics, but is suggestive since little economic research and no applied economic research can be conducted without data. The AEA statement is a beginning, but I suggest that those who do applied economic research would do well to hold to the rules for sinning in the basement. This is so important now since going to the basement is no longer the norm and many more analysts should be trying to avoid sinning wherever and whenever they have their hands on their laptop.

References:

American Economic Association (2018) Code of Professional Conduct, Adopted April 20, 2018, accessed at https://www.aeaweb.org/about-aea/code-of-conduct  on October 7, 2019.

Coase, Ronald. (1995) Essays on Economics and Economists, University of Chicago Press,

Kennedy, Peter E. (2002) “Sinning in the Basement: What are the rules? The Ten Commandments of Applied Econometrics.” Applied Econometrics, Blackwell Publishers Ltd. Accessed at https://onlinelibrary.wiley.com/doi/abs/10.1111/1467-6419.00179

Kennedy, Peter E. (2008) A Guide to Econometrics, 6th edition, Cambridge, MIT Press.

Gola, joanna. 10 Commandments of Applied Econometrics (series), Bright Data, SAS Blog, first in the series accessed here: https://blogs.sas.com/content/brightdata/2017/03/01/10-commandments-of-applied-econometrics-or-how-not-to-sin-when-working-with-real-data/

Priestley, Jennifer L. (2018) “The Good, The Bad, and the Creepy: Why Data Scientists Need to Understand Ethics.” SAS Global Forum, Dallas. April 28-May 1, 2019.

Further Reading:

Zvi Griliches (1985) “Data and Econometricians–The Uneasy Alliance.” The American Economic Review, Vol. 75, No. 2, Papers and Proceedings of the Ninety Seventh Annual Meeting of the American Economic Association, pp. 196-200 accessed at http://www.jstor.org/stable/1805595 on October 7, 2019.

DeMartino, George and Deirdre N. McCloskey, eds. (2016). The Oxford Handbook of Professional Economic Ethics, Oxford University Press, New York.

Author contact:

Steven C. Myers
Associate Professor of Economics
College of Business Administration
The University of Akron
myers@uakron.edu
https://econdatascience.com
https://www.linkedin.com/in/stevencmyers/