By: Steven J. Klees
Harold R.W. Benjamin Professor of International and Comparative Education
University of Maryland, College Park
The story of fifth-grade teacher Sarah Wysocki (March 7) is tragic and, unfortunately, this is a tragedy being repeated across the country. By all reports and evaluations, but one, Wysocki was an excellent teacher. The one was a piece of statistical legerdemain that has been sweeping the country called “value-added,” fed, in part, by the Obama administration’s Race to the Top program which mandates it. Value-added is a statistical set of procedures that purports to measure scientifically what has become theholy grail of education — the impact of a teacher on student test scores. And these statistics said that Wysocki didn’t do enough to improve test scores so she was fired.
Unfortunately, measuring value-added in practice is simply impossible, illogical, and unscientific, and you don’t have to be a statistician to understand why. Value-added statistical models are supposed to separate the impact of one factor — the teacher — from the literally dozens of other factors that contribute to a student’s performance on a test.
For example: access to a home computer, other resources in the home, technology access in the schools, effort at homework, parents’ education, parent’s support, influence of previous teachers, peer effects, school climate, aspirations, access to health care, better diet, a good night’s sleep, and many, many others. Even if you had information on all these factors, believing some statistical model could sort out the relative influence of each is wishful thinking. Moreover, value-added models only have data on very few factors — usually special education status, English proficiency, attendance, and eligibility for reduced price lunch. Controlling for these and attributing the rest to the teacher makes no sense. The effect attributed to the teacher is always incorrect since omitted factors could change the teacher impact measure in either direction. Statisticians who attempt to control for a few of these factors can do so, and the analysis will always identify so-called meritorious teachers. But the results are completely illegitimate. Controlling for different factors will lead to different teachers selected as meritorious, and there is no basis for deciding which factors to control.
Florida tried a value-added approach to merit pay for schools in the 1980s. This suffers the same problems as a value-added merit pay for teachers’ scheme. In Florida, school district statisticians found their value-added models identified different schools as meritorious, depending on which factors they controlled for and they realized there was no right way to decide what to control for. These statisticians were embarrassed when it came time to awarding money to meritorious schools since there was no stable way to estimate which schools were meritorious, and there was no rational basis for explaining to schools why they won or lost.
That is the situation we are now in. Teachers in the District, who generally know who the good teachers are, said they were “stunned” and “bewildered” by the firings. The decision to fire Ms. Wysocki would never have been made if statisticians were to provide alternative estimates of teacher impact based on using different value-added models. But they do not. Why? Partly, statisticians become fascinated with their models and want to believe in them. But partly, this is now a big business and you don’t get paid if you offer equivocal answers. Unfortunately, while value-added approaches are very complex, they are simply not science.
I am not saying test scores are irrelevant to teacher assessment. Simple measures of a classroom’s gain in test scores, as one piece of information among many about a teacher’s performance, can be interpreted with knowledge of the local context as part of a professional peer evaluation system. But we can no more scientifically determine teachers’ effects on test scores than we can legislators’ impact on economic growth or poverty reduction. Sure, both have an impact, but the processes are too complicated for simplistic solutions.