September 2019 - Econ Data Science

Do you know that data lies? Come talk to me at MWSUG (Midwest SAS Users Group Conference) and I will help you protect yourself against lying data.

MWSUG September 29-October 1, 2019 Chicago http://www.mwsug.org/2019/index.html

One of the papers i am presenting is on time series data. Time series analysis is pretty intense and there is as much art as science in its modeling. My paper is BL-101 “Exploring and characterizing time series data in a non-regression based approach.”

Nobel Prize economist Ronald Coase famously said: “If you torture the data long enough, it will confess.” It will confess to anything, just to stop the beating. I think there is a corollary to that, “If you don’t do some interrogation, the data may just tell a lie, perhaps what you want to hear.

Consider the following graph, assembled with no torture at all and not even a short painless interrogation. The graph shows that money supply and the federal debt track each others time path very closely. It tempts you to believe what you see. Do you believe that when the money supply increases we all have more to spend and this will translate into dept? Do you have an alternate reasoning that explains this movement? If so, this graph confirms your thoughts and you decide to use it to make or demonstrate or prove your point. Good stuff huh?

Sadly you just fell to confirmation bias and because you have failed to investigate the data generating process of the series, you fell for the lying data. You have found correlation, but not causation. in fact, you may have found a random walk. Don’t cheer yet, that is not a good thing to make your case.

“But,” you think, “I like that graph and besides the correlation between Money Supply and Debt is really high so it has to mean something! right?“

Sadly, no.

Mathematically, if the series are random walks then changes in the series are only generated by random error. Which means the correlation between the two variables will be very low.

A random walk takes the form of

y(t) = y(t-1) + e

which says that the currently observed variable at time t, is equal to the immediate past value plus a random error term. The problem here can be seen by subtracting y(t-1) from each side yielding a new and horrifying equation that says that any growth observed is purely random error, that is

Change in y = y(t) – y(t-1) = e.

Since you cannot write an equation to predict random error, it stands to reason that you cannot predict current or forecast future changes in the variable of interest.

Consider the next graph. The percentage change over the last year in the money supply is graphed against the percentage change over the last year of debt. See a definite pattern? I do not.

The correlation between money supply and debt in the first graph is 0.99 where 1.0 would be perfectly one-to-one related. In the second graph the correlation falls to 0.07 meaning there is almost no relationship between them.

The lesson: You should do more investigation, torture is not necessary, but no investigation is never desirable.

Economists are obsessed in determining the data generating process (DGP)which take a lot of investigation. Economists know traps like random walks and know ways to find the true relationship between money supply and debt, if any. Ignore the DGP and your quick results could be a lie. Torture the data, and again you could find a lie (it just may take a long time of wasteful actions).

So come take a random walk in Chicago with me at MWSUG.

After the conference my paper will be available on the conference proceedings.

Month: September 2019

Time Series data will lie to you, or take a random walk in Chicago.

Like this: