SAS Coding, Problem Based Learning and preparing economists for data science careers: frustration to elation

Published on October 16, 2018 on LinkedIn

I love coding, but I love copying others code even more. There is a great SAS resource that is amazingly helpful, run by Lex Jansen (link below). I had a need and found it in a 2003 SUGI28 presentation: “Paper 118-28-Renaming All Variables in a SASĀ® Data Set Using the Information from PROC SQL’s Dictionary Tables,” by Prasad Ravi accessed at http://www2.sas.com/proceedings/sugi28/118-28.pdf

My need was to merge data from multiple years of the Index of Economic Freedom (https://www.heritage.org/index) across all countries where all variable names were the same. A simple merge overwrites the common variable name with the last data merged. So my creation of a clasroom problem/team based learning exercise by merging 2013 and 2018 data was much frustrated. Sure, I could go into the Excel and manually change every variable name in every sheet, but really that is what coding is supposed to before. The elegant macro by Prasad Ravi named “rename” worked and I modified it to change the specific prefix used (from NEWNAME_ to my choice) and to protect certain key variables (such as ID) for merging.

I changed %macro rename(lib,dsn); to %macro rename(lib,dsn,prefix,protect); and all is well. So the 2013 table will have prefix Y2013_ and I can protect the first 4 variables of country id, country name, webname and region which are (or should be) common in all years of the data. My specific change is limited to the inclusion of prefix and protect as illustrated in the PROC DATASET part of the rename macro shown here:

proc datasets library=&LIB; modify &DSN; rename %do i=&protect+1 %to &num_vars; &&var&i=&prefix&&var&i. %end;

and in the call %macro rename(lib,dsn,prefix,protect);

My economic data analytic students will be “thrilled” (or I will always think they should have been) as we move from EDA in Time Series to EDA in cross-section data. It is an introductory undergraduate class so nothing beyond simple presentations and simple statistics can be used and many do not yet have their statistics completed. After this class they will start down the multivariate inference, model selection and specification path. My class is to teach them SAS use in Economic Data Analytics and prepare for them a platform to stand on so any further foray into Economic Data Science is possible. At a minimum they will take required courses in econometrics and economic forecasting and encouraged to load up on other analytical electives.

Each problem they have to solve in their teams cover both the pillars of data science (aquisition, management and manipulation, analysis and reporting) and the pilars of applied econometrics (of problem articulation, data cleaning and model specification, hat tip to Peter Kennedy, p. 361 of his guide to econometrics, 6e.). The students just finished a time series problem/team based learning exercise requiring the merging of multiple datasets and a problem that requires extraoridinary articulation. The question was “Do deficits go up under republicians and down under democrats?” A moment of reflection beyond your personal knee-jerk reaction reveals that problem to have many facets and to the students discontent, there is no one simple answer. Dashed are their hopes that the first piece of analyitical effort will reveal truth and much dective work must ensue. The next step is into a problem/team based investigation of the role of economic freedom and progress throughout the countries of the world. I have until Monday to come up with the problem statement!

About Lex Jansen: If you do not know about the tremendous SAS resource at https://www.lexjansen.com then you should! I found a lot of help there after searching documentation and sas.support.com without finding the specific help I needed. Apologies to others offering similar approaches to a macro renaming, I do not know if the Ravi paper is the first or the best. What I know is it worked for me. Thanks to Prasad Ravi for writing it and to Lex Jansen for storing it. 

I mentionned applied econometrics and data science in the above and have to pause and thank two friends who I met each personally only once, but communicated with on these topics. Peter Kennedy has passed and I miss not asking him about my next brainstorm. He came and spoke to our university at my invitation in 2004. I was so thoroughly convinced of his applied econometric processes that his book (which I used from its first edition) and our conversations have effected my teaching to this day. While I had used problem based learning in my classes it was Peter who encouraged it as the single best way to teach applied econometrics. As much as the students in the midst of a PBL process “hate” it, the number of students feeding back their succcess in their careers and saying how it was the PBL that granted them success is my reward. Peter also writes in he book that there is a “world of difference between applied and theoretical econometrics …” that most university econometric “teaching is technique oriented rather than problem oriented.” He goes on to say that teaching applied and not theoretical econometrics is hard and brings the teacher no professional prestige. To this I can attest, but my product is not professional articles, but students in very high level data-analytic and sata-science positions.

As to data science it was work by Ken Stanford (@eKENomics), who I first talked to when he was at SAS, that convinced me that economists make great data scientists. Like Peter he confirmed much that I was already doing and teaching and helped me refine my approach to my teaching. He also was responsible for helping our students qualify for a SAS Certificate in Economic Data Analytics offered jointly by our department and SAS. In a pure self-serving manner I cite Ken’s contribution in a paper on our department’s website here. Ken serves on our Economics Advisory Board and has moved from SAS to Dataiku. His encouragement to our faculty at the 2017 NABE conference pushed us to forge ahead with our deeper curriculum changes into preparing economists for data science roles. Now if we can just get the administration to buy into our vision!

The test of time is how well these current students do in their careers, an answer we may have to wait years to know.