Now We Know Why Obama Doesn’t Understand VAM
In order to truly understand value added modeling (VAM), forget the likes of me and of others who hold degrees in mathematics, or statistics, or measurement. Forget that we offer solid, detailed discussions of the problems of VAM. Forget also that those who formerly promoted VAM, like Louisiana’s George Noell, are mysteriously “no longer associated with the project.”
According to Michael Bloomberg, just ask a banker.
That’s right. Banker and former director of the Office of Management and Budget for the Obama administration Peter Orszag has written an enlightening piece for Bloomberg.com explaining that VAM really does work. According to Orszag, VAM can determine “which teachers are best.” Now, mind you, I’m no banker, but I would like to offer my thoughts on Orszag’s very positive article on the value of the value added.
First, let me begin with Orszag’s statement regarding “promoting the most talented teachers.” What, exactly, is a “most talented teacher,” according to corporate reform? One whose students score highest on standardized tests, of course:
One way of measuring a teacher’s effectiveness has been to see how much his or her students’ test scores rise. This kind of “value added” measure is straightforward and can easily be used to weed out bad teachers and promote better ones. [Emphasis added.]
According to our banker, VAM is the answer to the “teacher problem.” And remember, according to corporate reform, the teacher must be the problem if the test scores are not stellar.
In a “sleight of word” in his “study,” Orszag decides to narrow the problems of VAM in his next statement:
Critics complain, however, that this measurement has two potential flaws:
In short, VAM has only “two potential flaws” because Orszag decided such was true.
Let me pause here to state that as an expert in statistics and as one who has written detailed accounts of the problems with VAM such as this discourse to Louisiana legislators, I did not (I could not) limit my discussion to only two flaws. VAM is replete with problems, not the least of which is the problem of data integrity and management of the so-called pilot studies purporting to support VAM. No study is ever better than the quality of its data. Neither will be any “testing” of teachers using VAM. Data collection for a high-stakes measurement situation must be flawless.
Orszag does not touch the data integrity issue. He does not address the erratic classification issue. He does not address the limitations of using hierarchical linear modeling (the statistical analysis commonly employed in VAM) in pinpointing “causes” for student test scores. He does not address the huge validity issue of using tests designed to assess student achievement as a measure of “backdoor” teacher “achievement.”
Orszag limits the reader to these two VAM issues, which he presents as if these are the only two issues:
Critics complain, however, that this measurement has two potential flaws: Some teachers’ scores may rise not because they have performed so well in the classroom but merely because they have better students. And some teachers may push up their students’ scores by teaching to the test, rather than giving students the understanding of concepts that pays off in the long run. [Emphasis added.]
Orszag then offers “two important pieces of research” to “rebut both of these concerns.” The first is a study sponsored by the Gates Foundation where students are “randomly assigned… to about 1600 teachers.” In a glaring jump in logic, banker Orszag’s next statement is
The random assignment ensured that any observed improvement in the students’ test scores was caused by their teachers.
So, random assignment removes any and all other influences upon student achievement as measured by the standardized test? Not so if these students are actual human beings with independent wills. And not so if the measures (standardized tests) are not designed for the purpose that they are being used (a looming validity issue). And not so if there is anything else, anything at all, in the lives of these students other than their teachers.
Student learning can never be absolutely controlled. Period.
Orszag’s “outcome” statement is also suspect:
The Gates team… found, as non-randomized studies had also found, that value-added measures were predictive of student achievement.
This statement actually says nothing. What does it mean that the measures were “predictive” of student achievement? I thought the focus here was on connecting student achievement to teacher effectiveness. Instead, Orszag connects VAM to student achievement. Orszag offers nothing solid, such as the reclassification rates of teachers given no change in student achievement. (High reclassification of teachers not altering their teaching into their original categories based upon multiple, subsequent measuring times is what is required to establish VAM as reliable. I have yet to see a VAM stability study offer such proof.)
Intstead, Orszag offers yet another lukewarm outcome statement:
As [the Gates researchers] conclude, “our findings suggest that existing measures of teacher effectiveness provide important and useful information on the causal effects that teachers have on their students’ outcomes.”
Yet another uncommitted statement. What happened to the earlier statement that VAM is “straightforward” and “can be easily used”?
In the next paragraph, Orszag notes the usefulness of subjective measures to assist VAM, which is already doing the “heavy lifting” in labeling teacher effectiveness. He notes that the Gates study found that “bad” teachers are so bad that their self-reporting can be trusted because it cannot conceal just how bad they are:
The Gates researchers also experimented with various supplements to a purely test-based metric, and found that although the value-added measure did the heavy lifting, student surveys and observational analyses of teaching quality were useful. Interestingly, they found that teacher analysis could be done without having observers make random visits to the classroom; allowing a teacher to submit a self-selected set of videos from the classroom worked just as well, because even the best classes conducted by bad teachers were worse than those from better teachers. [Emphasis added.]
This last statement begs the question: If even subjective self-reporting by “bad” teachers is so useful in determining teacher quality, why are we doing all of this testing? Could the supposed VAM “heavy lifting” be a sham?
As to Orszag’s addressing the “second” critique of VAM, teaching to the test, well, he writes that teaching to the test isn’t happening (never mind that this year I have been instructed to “expose” my students to items relating to three separate standardized tests). Orszag offers two “studies” as proof. First, he offers this proof, which inadvertently undermines his first point for rebuttal, that teachers’ scores rise on VAM due to their having better students:
The Gates team also partially addressed the second critique — that “good” teachers are only teaching to the test — by examining results from other measures of educational quality. For example, the researchers administered open-ended word problems to test students’ understanding of math. The teachers who were predicted to produce achievement gains on state tests produced gains two-thirds as large on the supplemental assessments. [Emphasis added.]
Are these teachers predicted to produce achievement gains because their students are already achievers? Random assignment cannot account for this “what-comes-first-chicken-or-egg” scenario. Orszag does not address this issue.
And what of this “two thirds” gain on the supplemental assessments? Orszag is writing that the students score higher on the standardized tests than they did on the open-ended word problems. That is, the students did not score as high on the non-standardized test as they did on the standardized test. How is this proof of not “teaching to the test”?
Finally, Orszag offers this “proof” that teachers are not just teaching to the test (never mind the firsthand pressure I face as a teacher to do so):
An even more compelling rebuttal of the second critique, however, is found in a December 2011 paper by Raj Chetty and John Friedman of Harvard University and Jonah Rockoff of Columbia University. These researchers assembled a database of 2.5 million students in grades 3 through 8 along with 18 million English and math tests from 1989 through 2009. They then linked that database with income tax returns.
Their paper is fascinating because the researchers assessed how a high value-added teacher can influence students’ later earnings and other outcomes. Someone just teaching to the test, without improving the quality of education, wouldn’t be expected to have any lasting impact on students’ earnings. Yet Chetty and the others found big effects later on in students’ lives from having a higher value-added teacher. [Emphasis added.]
Okay. Here are some obvious issues: Are we to assume that there is a direct, otherwise-uninfluenced connection between my third- through eighth-grade math or English scores and my salary? And is my math or English score for each year nothing more than the math or English teacher I had that year? Does not my choice of profession have any influence upon my salary? How about my work ethic? The region where I reside? The economy?
What of those who cheat on taxes? This is a valid question concerning the integrity of the outcome data measure in the Chetty-Friedman study.
And should we assume that data from 1989 to 2009 really has captured the devastation that the 2012 nationwide emphasis on tying teacher jobs and reputations to test scores will bring?
Based upon the newly-instituted Race to the Top’s love affair with nonpartisan corporate reform and the unprecedented punitive measures levied against teachers, schools, and school districts, do corporate-reform-vested people like Orszag, and Bloomberg, and Obama have any clue regarding the potentially irreversible damage they are inflicting upon public education?
Don’t they realize that VAM is a disappointing, two-dimensional cardboard cutout for assessing educational quality?
What we need to do, according to Orszag, is fire those bottom 10%. Sure, VAM isn’t perfect, but let’s use it, anyway, as a means of purging a profession to which we don’t even belong:
As the Gates report demonstrates, it’s possible to improve teacher effectiveness metrics. But that shouldn’t keep us from using the ones we have now. To help raise future productivity, we should set a clear goal for all school districts: to deny tenure to teachers in the bottom 10 percent of the distribution according to value-added measurements. That would still mean granting tenure to lots of teachers who perform worse than the average novice, but it would be a good start. [Emphasis added.]
Wow. And this man advised Obama. Good thing Obama replaced him with Wal Mart.