Harold Henson - Program Evaluator At Large: 2015

Friday, 10 July 2015

Statistics without causality

We are all being told now that Big Data is the next big thing. It is difficult to argue that the dramatic drops in computer costs and data storage will not have a huge impact. Hard disks larger than five terabytes are becoming an inexpensive consumer item. This was inconceivable for many of us only a few years ago. Now a book on "Big Data" even made the best seller list.

That a book such as Big Data should appear is not a surprise. For those of us who have worked with the large datasets coming from the administration of programs for years, it will seem a long time coming. Although the current dramatic drops in computational costs were foreseen, it was not anticipated that there would be a basic challenge to how we think about statistics.

The need for some kind of shift in statistical thinking is obvious to the applied statisticians working with the big datasets. Those that had access attempted to do statistical analysis on the larger datasets as soon as it became technically feasible. As these databases became larger, the explanatory variables became more significant. By the time regressions with a sample size of over 100K became possible, it was rare that an explanatory variable would not be seen as statistically significant.

Where does that leave program evaluators? I, like many others, always relied on the test of statistical significance to justify the statement that the program has no impact. There was something very convenient about the statement, "No statistically significant evidence was found of program impacts". Now with all variables being statistically significant, where does this leave program evaluators?

In my opinion, big data will lead to a healthy reexamination of how evaluators think about causality. Unfortunately, we may be to some degree alone on this. Early on in the above cited book, the authors state, "society will need to shed some of its obsession for causality in exchange for simple correlation". This is perfectly understandable as the bulk of data analysts across the world spend their days trying to spot emerging trends and predict the future. Program evaluators are fairly rare in their exclusive interest in causality.

How will this play out for evaluators? No one can predict the future with certainty. However, I would suggest that there are two likely outcomes. First off, rather than just focussing on the sign and significance of the estimate of program impact, we will look at the size of the coefficient. As confidence intervals shrink with growing sample sizes, the conversation may move to whether the estimated benefits are large enough to be considered significant from a policy standpoint rather than a statistical. Secondly, we may move away from statistical inference to what is called statistical learning.

Recent books have proposed that we evaluate the quality of statistical models by how well they predict outcomes, rather than the traditional R-square and p-statistics. Books such as An Introduction to Statistical Learning demonstrate how it is not necessary to use all of the sample to estimate the model, but that substantial portions can be omitted for testing purposes. The test of a model consists of seeing how well it can predict the values in the test portion of the data set. With such a procedure, evaluators could describe the state of the world for program participants as versus non participants. This would provide an interpretation of program evaluation results that the clients may even find more intuitive than before.

The old problem of self-selection will still remain. Still with larger samples, approaches such as matching may even work better. Myself, I am optimistic about the future, although it will not be without challenges.

How many extra drives will you need in the age of Big Data?

Friday, 5 June 2015

SPSS 40 Years Later

SPSS was the first package that I used as a third-year student studying econometrics. This was in 1978 almost 10 years after the package first appeared. After almost 40 later, I found myself at a site where SPPS was the only package available. What had changed?

It is certainly true that there is a nice user interface on top of the old system. In fact, it is probably the easiest package for anyone with no experience to do some very basic analysis. However, at the core of the system there are still bugs. It is hard to believe that after all this time basic operations can be unreliable.

What I ran up against was problems with the equation parser. Some commands were not able to translate simple equations and would fail. How to prevent this is something that one would study in an undergraduate computer science course. I myself have written complex equation parsers in C++. It was difficult to do, but once it was done the parsers were robust and reliable. It is this background that gives me the confidence to be surprised at this result.

However, initially, I thought that it must have be me that was at fault. After all it is a package that has been around for 40 years. Fortunately, Google was my friend and there were others that had experienced the same. Further research revealed that there even has been discussions of the reliability of the actual estimates coming from the statistical software. These discussions date back to articles in reviews such as the Journal of Economic Literature in 1999.

Why does this situation exist? Basically, people are too trusting and do not complain enough. Now the decisions about statistical software are made by non-statisticians. Unfortunately, they trust the men in suits from big companies when superior free software such as R exists.

Sunday, 3 May 2015

Can Program Evaluators do anything about data quality?

Since I started in evaluation in the early 90s, myself and my fellow evaluators have always complained about the quality of the data. The only good that came from the complaints was that we were able to justify doing expensive surveys to replace the data that was not properly collected in the first place. It seemed like this could go on forever but the response rates to surveys just kept getting lower and the computers that we used to analyse administrative data kept getting more powerful.

It is 2015, and all the years of complaining have not done anything. This is even after the Canadian Auditor General found in 2009 that the bulk of the federal program evaluations suffered due to lack of data. However, I would propose that there is something that can be done. I was fortunate enough to be given some time to think about the challenge. After a review of the literature, it became clear that many of the tools of program evaluation could be applied to data quality.

Will it be an easy sell? I doubt it. Everyone wants the quality of data to improve but they think that someone else should do it. How many times have you heard about problems being identified but no one actually roles up their sleeves and replaces the wrong number with the right number. Fortunately, program evaluation on its good days can be an agent for organizational change.

The only downside that I can see to this is that we might overreach our expertise. We can't forget that there are a lot of people who are trained in computer science and feel that they have something to say about data quality. And to a large extent they are right. For example, program evaluators may never have intelligent opinions on how pure an implementation of SQL should be. However, it is evaluators that are well positioned to assess whether the data used by an organization is good enough to make strategic decisions

At the CES meeting in May 2015 in Montreal I will be floating my approach. If I am crazy, at least it is for a noble cause. Still I think that Program Evaluation has something to offer. If you are interested, check out my paper at http://henskyconsulting.com/services/data-quality-evaluation.

Harold Henson Program Evaluation Data Quality CES Montreal

Can you spot the outlier?