Now We Know Why Obama Doesn’t Understand VAM

March 16, 2013

In order to truly understand value added modeling (VAM), forget the likes of me and of others who hold degrees in mathematics, or statistics, or measurement. Forget that we offer solid, detailed discussions of the problems of VAM. Forget also that those who formerly promoted VAM, like Louisiana’s George Noell, are mysteriously “no longer associated with the project.”

According to Michael Bloomberg, just ask a banker.

That’s right. Banker and former director of the Office of Management and Budget for the Obama administration Peter Orszag has written an enlightening piece for Bloomberg.com explaining that VAM really does work. According to Orszag, VAM can determine “which teachers are best.” Now, mind you, I’m no banker, but I would like to offer my thoughts on Orszag’s very positive article on the value of the value added.

First, let me begin with Orszag’s statement regarding “promoting the most talented teachers.” What, exactly, is a “most talented teacher,” according to corporate reform? One whose students score highest on standardized tests, of course:

One way of measuring a teacher’s effectiveness has been to see how much his or her students’ test scores rise. This kind of “value added” measure is straightforward and can easily be used to weed out bad teachers and promote better ones. [Emphasis added.]

According to our banker, VAM is the answer to the “teacher problem.” And remember, according to corporate reform, the teacher must be the problem if the test scores are not stellar.

In a “sleight of word” in his “study,” Orszag decides to narrow the problems of VAM in his next statement:

Critics complain, however, that this measurement has two potential flaws:

In short, VAM has only “two potential flaws” because Orszag decided such was true.

Let me pause here to state that as an expert in statistics and as one who has written detailed accounts of the problems with VAM such as this discourse to Louisiana legislators, I did not (I could not) limit my discussion to only two flaws. VAM is replete with problems, not the least of which is the problem of data integrity and management of the so-called pilot studies purporting to support VAM. No study is ever better than the quality of its data. Neither will be any “testing” of teachers using VAM. Data collection for a high-stakes measurement situation must be flawless.

Orszag does not touch the data integrity issue. He does not address the erratic classification issue. He does not address the limitations of using hierarchical linear modeling (the statistical analysis commonly employed in VAM) in pinpointing “causes” for student test scores. He does not address the huge validity issue of using tests designed to assess student achievement as a measure of “backdoor” teacher “achievement.”

Orszag limits the reader to these two VAM issues, which he presents as if these are the only two issues:

Critics complain, however, that this measurement has two potential flaws: Some teachers’ scores may rise not because they have performed so well in the classroom but merely because they have better students. And some teachers may push up their students’ scores by teaching to the test, rather than giving students the understanding of concepts that pays off in the long run. [Emphasis added.]

Orszag then offers “two important pieces of research” to “rebut both of these concerns.” The first is a study sponsored by the Gates Foundation where students are “randomly assigned… to about 1600 teachers.” In a glaring jump in logic, banker Orszag’s next statement is

The random assignment ensured that any observed improvement in the students’ test scores was caused by their teachers.

So, random assignment removes any and all other influences upon student achievement as measured by the standardized test? Not so if these students are actual human beings with independent wills. And not so if the measures (standardized tests) are not designed for the purpose that they are being used (a looming validity issue). And not so if there is anything else, anything at all, in the lives of these students other than their teachers.

Student learning can never be absolutely controlled. Period.

Orszag’s “outcome” statement is also suspect:

The Gates team… found, as non-randomized studies had also found, that value-added measures were predictive of student achievement.

This statement actually says nothing. What does it mean that the measures were “predictive” of student achievement? I thought the focus here was on connecting student achievement to teacher effectiveness. Instead, Orszag connects VAM to student achievement. Orszag offers nothing solid, such as the reclassification rates of teachers given no change in student achievement. (High reclassification of teachers not altering their teaching into their original categories based upon multiple, subsequent measuring times is what is required to establish VAM as reliable. I have yet to see a VAM stability study offer such proof.)

Intstead, Orszag offers yet another lukewarm outcome statement:

As [the Gates researchers] conclude, “our findings suggest that existing measures of teacher effectiveness provide important and useful information on the causal effects that teachers have on their students’ outcomes.”

Yet another uncommitted statement. What happened to the earlier statement that VAM is “straightforward” and “can be easily used”?

In the next paragraph, Orszag notes the usefulness of subjective measures to assist VAM, which is already doing the “heavy lifting” in labeling teacher effectiveness. He notes that the Gates study found that “bad” teachers are so bad that their self-reporting can be trusted because it cannot conceal just how bad they are:

The Gates researchers also experimented with various supplements to a purely test-based metric, and found that although the value-added measure did the heavy lifting, student surveys and observational analyses of teaching quality were useful. Interestingly, they found that teacher analysis could be done without having observers make random visits to the classroom; allowing a teacher to submit a self-selected set of videos from the classroom worked just as well, because even the best classes conducted by bad teachers were worse than those from better teachers. [Emphasis added.]

This last statement begs the question: If even subjective self-reporting by “bad” teachers is so useful in determining teacher quality, why are we doing all of this testing? Could the supposed VAM “heavy lifting” be a sham?

As to Orszag’s addressing the “second” critique of VAM, teaching to the test, well, he writes that teaching to the test isn’t happening (never mind that this year I have been instructed to “expose” my students to items relating to three separate standardized tests). Orszag offers two “studies” as proof. First, he offers this proof, which inadvertently undermines his first point for rebuttal, that teachers’ scores rise on VAM due to their having better students:

The Gates team also partially addressed the second critique — that “good” teachers are only teaching to the test — by examining results from other measures of educational quality. For example, the researchers administered open-ended word problems to test students’ understanding of math. The teachers who were predicted to produce achievement gains on state tests produced gains two-thirds as large on the supplemental assessments. [Emphasis added.]

Are these teachers predicted to produce achievement gains because their students are already achievers? Random assignment cannot account for this “what-comes-first-chicken-or-egg” scenario. Orszag does not address this issue.

And what of this “two thirds” gain on the supplemental assessments? Orszag is writing that the students score higher on the standardized tests than they did on the open-ended word problems. That is, the students did not score as high on the non-standardized test as they did on the standardized test. How is this proof of not “teaching to the test”?

Finally, Orszag offers this “proof” that teachers are not just teaching to the test (never mind the firsthand pressure I face as a teacher to do so):

An even more compelling rebuttal of the second critique, however, is found in a December 2011 paper by Raj Chetty and John Friedman of Harvard University and Jonah Rockoff of Columbia University. These researchers assembled a database of 2.5 million students in grades 3 through 8 along with 18 million English and math tests from 1989 through 2009. They then linked that database with income tax returns.

Their paper is fascinating because the researchers assessed how a high value-added teacher can influence students’ later earnings and other outcomes. Someone just teaching to the test, without improving the quality of education, wouldn’t be expected to have any lasting impact on students’ earnings. Yet Chetty and the others found big effects later on in students’ lives from having a higher value-added teacher. [Emphasis added.]

Okay. Here are some obvious issues: Are we to assume that there is a direct, otherwise-uninfluenced connection between my third- through eighth-grade math or English scores and my salary? And is my math or English score for each year nothing more than the math or English teacher I had that year? Does not my choice of profession have any influence upon my salary? How about my work ethic? The region where I reside? The economy?

What of those who cheat on taxes? This is a valid question concerning the integrity of the outcome data measure in the Chetty-Friedman study.

And should we assume that data from 1989 to 2009 really has captured the devastation that the 2012 nationwide emphasis on tying teacher jobs and reputations to test scores will bring?

Based upon the newly-instituted Race to the Top’s love affair with nonpartisan corporate reform and the unprecedented punitive measures levied against teachers, schools, and school districts, do corporate-reform-vested people like Orszag, and Bloomberg, and Obama have any clue regarding the potentially irreversible damage they are inflicting upon public education?

Don’t they realize that VAM is a disappointing, two-dimensional cardboard cutout for assessing educational quality?

Nope.

What we need to do, according to Orszag, is fire those bottom 10%. Sure, VAM isn’t perfect, but let’s use it, anyway, as a means of purging a profession to which we don’t even belong:

As the Gates report demonstrates, it’s possible to improve teacher effectiveness metrics. But that shouldn’t keep us from using the ones we have now. To help raise future productivity, we should set a clear goal for all school districts: to deny tenure to teachers in the bottom 10 percent of the distribution according to value-added measurements. That would still mean granting tenure to lots of teachers who perform worse than the average novice, but it would be a good start. [Emphasis added.]

Wow. And this man advised Obama. Good thing Obama replaced him with Wal Mart.

From → Bill Gates, High-stakes Testing, VAM

15 Comments

crazycrawfish permalink

Reblogged this on Crazy Crawfish's Blog and commented:
Hi Mercedes!
Mind if I steal your blog and add your blog thoughts to my own? (not that you have a choice. 🙂 )

I think Peter Orszag, makes a valid point. VAM is slightly better than nothing depending on how you look at it, .. Assuming perfect data and that children were simple computer models confined to a lab. It might be correct 5% of the time or so. If we only knew which of the 100% of the results we got, comprised that 5%, perhaps we could act on it in a constructive and responsible manner? What does not make sense is using unproved, disproven and destructive free-market inspired economic principals to replicate 5% of the “correct results” (which we can’t identify) as well as 95% inaccurate ones.
Let’s say I want to kill sharks because I’ve decided they are “bad” fish. To a VAMite, the best way to do this is to create a giant barrel and dredge in along the ocean floor scooping up the schools of tuna the sharks like to hang out with. Now we must eliminate the shark. To do that I could figure out an accurate way to identify a shark, maybe just hire a good fisherman(principal) that can identify them and remove them from the school. Not much money to be made doing that, and how will I sell all my canned tuna with all this live fresh tuna swimming around? So I decide the best way for my tuna cannery and gun smithing business interests is to take a shotgun and shoot all the fish in the barrel. Success! I can now claim I killed a shark and wave it around for all to see. Tada! I killed a “bad” fish with a bullet. Now our tuna supply is saved!
Hmm. . . I also killed all the other fish by letting out the water or riddling them with bullets… But wait! Double kaching! More cheap fodder for my factory!
Now I must sell this idea to the masses. . .
Attention Masses: All we have to do is put all out fish in barrels and shoot them; then we will kill all the “bad” fish! (Sure a few good ones had to be sacrificed, well all of them, but we did get rid of the “bad” fish and we can always can(computerize) the casualties. ) Bad fish problem solved and canned tuna all around!
NOTE TO SELF: Hire Pierson to capture all the fish and put them in barrels. Tell Murdoch to supply the guns and get Gates to supply the bullets.

Reply
crazycrawfish permalink

I think you should cover my favorite VAM characteristic. That with certain student populations (like high performing and low performing but not disabled) there is a negative correlation between VAM score and identifying truly effective teachers and masking ineffective respectively.

Reply
Terrance Shuman permalink

The planted axiom in all of this, of course, is that the test scores we’re using actually convey something meaningful and important. This has not, in my opinion, been definitively established. And if we don’t know that, the rest of the house of cards comes tumbling down, doesn’t it?

And, crazycrawfish, I don’t think Mercedes minds getting re-blogged. I sure HOPE she doesn’t, because I do it to her all the time over in my little corner of the interwebz… 😉

Reply
crazycrawfish permalink

As a corollary to your axiom, while i think it’s fair to say they convey something, and something that may even prove to meaningful in a limited context, what is not proven is whether the “something” that might be conveyed is meaningful to the context is is being applied. These are student test scores, not teacher test scores. Sure, teachers have an impact on test scores, the absence of a teacher would probably yield a much lower one for instance. That does not take into account all factors inlfuencing a test score. In fact, what they study results do find is that is proven teachers are not the sole determining factor, and possibly not even the most important one. Scores in remain consistent only about roughly 20-30% of the time year over year with no change in compositon of students or teaching methods. That implies other factors not accounted for in the “model” impact 70-80% of the score.

Reply
- gfbrandenburg permalink
  
  I never found them to be of any real classroom use except for one thing: they told me pretty well what the family income and education level was.
  
  Reply
starrimom permalink

In our state, our teachers’ VAM scores depend upon how much our students’ scores are RAISED from a previous year’s score, even though the same score shows a year’s worth of growth. However, teachers do not yet know how much they have to be raised. Gifted students, many of whom score in the top percentile rank anyway, must raise their score 2 points MORE than other students. Therefore, those with less room to grow have to raise their scores the most. Fair?

Reply
- deutsch29 permalink
  
  Definitely not fair. Psychometric nonsense.
  
  Reply
Julian Vasquez Heilig permalink

Reblogged this on Cloaking Inequity and commented:
Take note policymakers. More from the actual statistical experts on the use of Value Added Models for Teacher Evaluation.

Reply
Ken Mortland permalink

Ms. Scjmeider:
The fundamental problem I see in the Gates MET (VAM) report is the impracticality of “random assignment of students.: It contradicts most of the factors influencing school, class, and student scheduling.

1) Random assignment is so contrary to school, class, and student scheduling that many schools in the Gates study failed to honor or maintain that fundamental component of the study.
2) Scheduling factors that prevent practical application of “random assignment of students” are multiple and varied. You probably know them better than I.
3) Because those factors can’t be ignored and drive schedule making decisions that prohibit “random assignment of students”, the concept itself is unreproducible. Yet it is necessary for the application of VAM, according to MET.
4) Therefore, regardless of the data behind the MET (VAM) report, that data is of no practical use. It is unreproducible. It has no practical application.
5) “Random assignment of students”, Gates MET proposals, and VAM exit the picture as meaningless.

Am I missing something scientifically or statistically valid here?

Ken Mortland
Retired Teacher (37 years in classrooms)
Experimental Psychology major from college (1966)
Active member of education reform work in Washington state

Reply
- deutsch29 permalink
  
  Hi, Ken. The best scenario is to have random assignment. However, the statistical analysis HLM (Hierarchical Linear Modeling) presumes what is called a nested design of set groups, so HLM can account for not having randomly assigned groups. The biggest problem I see is that HLM does not have the precision to say what exactly within a group (for example, a classroom) is “the reason” for a particular score. That is, the teacher cannot be separated from his/her classroom.
  
  A second incredible problem is the integrity of data. Franklu, those handling the data do not know what they are doing, or they do not care, or both. Therefore, the data cannot be trusted. And no analysis is better than the quality of its data. Garbage in, garbage out
  
  –Mercedes
  
  Reply