The DC IMPACT Study Results : Already Obsolete
In October 2013, researchers Thomas Dee of Stanford University and James Wyckoff of the University of Virginia published (or by someone was somehow made public) a working paper on limited aspects of the District of Columbia Public Schools (DCPS) teacher evaluation system, IMPACT, which was introduced in 2009– during the time that Michelle Rhee was chancellor of DC schools.
The beginning of the paper includes this disclaimer:
NBER (National Bureau of Economic Research) working papers are circulated for discussion and comment purposes. They have not been peer reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications. [Emphasis added.]
Nevertheless, this does not stop the media from offering the public such definitive headlines as Study: Program in DC Removed Bad Teachers and Controversial Teacher Evaluation System is Working in DC, Study Says.
In this post, I would like to examine some details regarding what the Dee and Wyckoff working paper offers– and what it does not.
I located the copy of the paper used here in pro-privatizer Frederick Hess’ article on Dee and Wyckoff’s work. Yet I noticed that the Dee and Wyckoff appendix includes the heading, “Confidential Draft, Do Not Cite or Quote Without the Permission of the Authors.”
The paper is now public, and it was so before I publicly located it.
It is important to note that the results of the Dee and Wyckoff paper are already obsolete; DCPS has altered its evaluation system for 2013-14.
And now, some background on DCPS’s IMPACT teacher evaluation system, including a comparison of what it was (i.e., what Dee and Wyckoff analyzed) and what it is based upon the 2013-14 changes.
Under the latest version of IMPACT, teachers annually receive one of five possible ratings: Ineffective, Minimally Effective, Developing, Effective, or Highly Effective.
During the years for which Dee and Wyckoff have data (2009-11) the “Developing” category did not exist.
Ratings are calculated based upon four areas: student achievement (i.e., student test scores); instructional expertise (based upon five observations– three by school administrators; two by “independent, expert practitioners”); collaboration (“measuring…working together of behalf of students”), and professionalism (e.g., following school policies and procedures).
Teachers receive a score ranging from 1 to 4 for each of the four components above. The components each have different weights (presented below); weighted scores are then summed and multiplied by 100, for a maximum total of 400 points. (The calculation is not a single score per category– see page 62 of this IMPACT document for an example of how the final score is calculated.)
IMPACT incorporates value added modeling (VAM); however, at the time of the Dee and Wyckoff paper, only 17% of the DC general education teachers were subject to use of VAM on DC’s Comprehensive Assessment System (CAS). For VAM teachers (Group 1) in the Dee and Wyckoff data (2009-11), individualized VAM scores (IVA) counted for 50% of the IMPACT score; 25% on the observation component (TLF), and the remaining 25% was comprised of the collaboration component (Commitment to School and Community [CSC]) as well as a school-level VAM score (SVA). The category, “professionalism,” was assumed; therefore, teachers could only receive score deductions in this category.
For the 83% of general education teachers not teaching VAM courses (Group 2), 75% of the overall IMPACT score was based upon the observation component (TLF); the remaining 25% was comprised of an administrator-approved, “Teacher-Assessed-Student-Achievement” (TAS) score, plus the SVA and CSC scores.
Thus, one can see that DC teachers are judged by inconsistent criteria. And that judgment is definitely high stakes. These criteria are in flux, with DC Chancellor Kaya Henderson assuring DC teachers that Group 1 will increase as standardized assessments are developed for more grades/ courses.
The general education, Group 1 weights have also changed: 50% student achievement (35% IVA; 15% TAS); 40% TLF, and 10% CSC. “Professionalism” demerits could still possibly be deducted.
As for general education, Group 2, 75% remains TLF; however, the remaining 25% has no school-level value added component and instead is comprised of 15% TAS and 10% CSC.
IMPACT has other school-based personnel “groups”: Group 3 includes special education teachers; Group 4, non-itinerant English Language Learner (ELL) teachers, and still more, for a total of 20 groups.
The current five classifications of DCPS teacher “effectiveness” include only two that are positive categories:
Ineffective: “Separation from the school system” following one ineffective score. No second chances.
Minimally effective: “Performance significantly below expectations,” but a second chance allowed before “separation.” No salary increase.
Developing: Not as low a score as “minimally effective,” but still “below expectations.” A third chance allowed before “separation.” No salary increase.
Effective: “Solid performance.” Normal advancement on pay scale. Possibility of base salary increases.
Highly effective: “Outstanding performance.” Eligible for annual bonuses “up to $5,000” and possible base salary increases “up to $7,000.”
In this recent revision of IMPACT, Henderson has reduced the VAM component for Group 1 from 50% to 35% and has removed the school-level VAM component from teachers’ scores altogether. This is a move in the right direction. VAM is junk. VAM is replete with problems. Nevertheless, Henderson insists that more DC teachers will be moving into Group 1 as VAM is instituted.
I challenge Henderson to teach in a DC classroom and not only subject herself to her own IMPACT, but specifically to Group 1 IMPACT.
Time to examine some issues raised in the now-obsolete Dee and Wyckoff paper.
To their credit, Dee and Wyckoff attempted to address the “elephant in the room”: the unresolved issue of cheating in the DC schools during Michelle Rhee’s time as chancellor. DC principal Adell Cothorne lost her job for insisting upon increased test security when she leaned that teachers were violating test security protocols. Then comes the story of the “missing memo” that Rhee denies having read. Sadly, it seems that investigation of Rhee and DC cheating has died.
Here is what Dee and Wyckoff offer in regard to assurance of the integrity of this data supplied to them by DC Chancellor Kaya Henderson, who denied knowing about the missing memo but who was placed in a meeting where Rhee discussed the memo with Henderson. (Again, the DC cheating issue has not been resolved):
Allegations of cheating on the DC CAS have received extensive coverage in the press. There are several reasons we believe these allegations are not empirically relevant for the analysis we present here. First and foremost, these test-based measures of teacher performance were only relevant for Group 1 teachers under IMPACT and these teachers constitute less than 20 percent of the analytical samples in our RD analysis. Furthermore, our results are robust to excluding these teachers (unclear who “these” are) from our analysis. [Comment added.]
Second, we observe performance separately on all of IMPACT’s subcomponent (i.e., IVA and TLF, CSC, TAS, and CP) so we can distinguish performance gains related to CAS scores and those measured in other ways. Third, the most prominent allegations of cheating on the DC CAS actually pre-date the introduction of IMPACT (Gillum and Bellow, 2011; Brown, 2013). Fourth, during the IMPACT era, DCPS hired independent test-security firms (i.e., Caveon Test Security; Alvarez and Marsal) to assess potential violations. They identified critical violations in no more than a dozen classrooms per year.
Dee and Wyckoff believe that their DCPS data has been purged of any effect of cheating. They also believe that since the allegations “predate IMPACT,” the pressure on teachers to “game the system” automatically does not affect the IMPACT data that they are analyzing. Yet jobs are clearly on the line via IMPACT. Furthermore, standardized test scores comprised 50% of VAM teachers’ IMPACT scores in the data available to Dee and Wyckoff.
I am especially humored by Dee and Wyckoff’s faith in the “test security” provided by both Caveon and Alvarez and Marsal (A&M). As for Caveon, their investigation was seriously limited by Henderson’s restricting Caveon’s role. For example, even though Cothorne reported on cheating at her school, Noyes, Caveon cleared the school of cheating.
The firm A&M as a DCPS investigative agency is a greater joke. Here is an example of their handiwork in New Orleans following Hurricane Katrina:
Functionaries of the accounting firm Alvarez & Marsal, for example, which will have taken more than $50 million out of its New Orleans public schools’ operation by year’s end, were earning in the multiple hundreds of thousands, billing at anywhere from $150 to more than $500 per hour. The firm’s contracts continued unchallenged, despite the fact that one of its chief assignments — the disposition of left-over NOPS real estate — was being handled without the services of a single architect, engineer, or construction expert. This omission cost the city a year of progress in determining how and where to rebuild broken schools, and endangered hundreds of millions of dollars in FEMA money. It only came to light when the two Pauls (Pastorek and Vallas) were forced to hire yet more consultants for real estate duty, and to bring in the National Guard to oversee the engineering operations. [Emphasis added.]
Of course, there is also the A&M’s “help” to St. Louis, MO, Schools. Millions later to A&M, St. Louis schools faced what they hired the firm to prevent: bankruptcy:
A corporate consulting firm that touted its experience overhauling the St. Louis public schools to secure a $15.8 million no-bid contract with the New York City schools could see its track record tarnished by reports this week that St. Louis schools are headed for bankruptcy. After Alvarez & Marsal, which had no previous experience in education, finished revamping the St. Louis schools two years ago, it patted itself on the back for a job well done.
A representative of Alvarez & Marsal, William Roberti, said at the time: “The district is no longer on the brink of bankruptcy.”
It is now, according to reports this week from a committee of experts analyzing the school system’s finances. [Emphasis added.]
Despite their shady history in both New Orleans and St. Louis, New York Schools Chancellor Joel Klein did hire A&M. The result: Students standing in the cold, waiting for buses that never showed. A&M also offered St. Louis schools busing confusion. New York’s was worse. It required a “busing hotline”:
When Schools Chancellor Joel I. Klein hired the consulting firm Alvarez & Marsal, without competitive bidding, one provision of the $15.8 million contract called for “restructuring the Office of Pupil Transportation to obtain annual cost savings.”
Simply put: drive down the cost of busing children to school. For the firm, which specializes in rescuing bankrupt companies, it was a logical task. After all, Alvarez & Marsal had helped the St. Louis schools carry out a consolidation of bus routes in 2003 as part of a broad effort to overhaul the financially strained school system. But as the plan in New York combusted this week, leaving shivering students waiting for buses in the cold and thousands of parents hollering about disrupted routines, the complaints threatened to morph into a renewed attack on Mr. Klein’s reliance on outside consultants. …
Closing out a week of confusion in New York, officials said, a special hot line set up to handle bus problems had received 2,043 calls as of 4:30 p.m. yesterday, down somewhat from the day before. [Emphasis added.]
Question: Why did DC’s Office of the State Superintendent of Education (OSSE) hire a firm that “specializes in rescuing bankrupt companies” to “investigate” DCPS cheating?
Another question: Why do Dee and Wyckoff place their confidence in a firm clearly NOT specializing in test investigation and clearly NOT devoid of scandal to ensure that DCPS 2009-11 data is devoid of the effects of cheating?
Consider this result of ME category teacher “improvement” reported by Dee and Wyckoff:
Interestingly, Table 5 indicates that the performance gains observed among teachers with ME ratings from AY 2010-11 are partly due to large improvements in the test performance of students (i.e., the IVA measure). [Emphasis added.]
Teachers under threat of dismissal improved their scores by raising student performance on tests. A suspect finding given DCPS history.
“Low Performing” Teachers
In their paper, Dee and Wyckoff examine two groups of teachers: Those classed as “minimally effective” (ME) and “highly effective” (HE). Dee and Wyckoff describe those classed as ME as being “low performing” teachers. They also note that these “low performing” ME teachers are noticeably fleeing the classroom following their first ME year. However, Mark Simon of the Economic Policy Institute raises a valid question regarding the Dee and Wyckoff results as indicating improved teaching:
Mark Simon, an education policy analyst with the Economic Policy Institute in Washington who has written about IMPACT and works with unions and school systems to design evaluation programs, called the study “exciting” but cautioned that it does not show that the district’s evaluations have improved the quality of teaching. “IMPACT is designed to identify a very small number of teachers at the bottom end for firing and a slightly larger but still small number of teachers at the top end for bonuses. So it tends to be a motivator at the extremes,” Stein said. “But for the vast majority of teachers, the question is whether IMPACT is helping them to teach better, and that question was not addressed by this study.” [Emphasis added.]
Not only is teacher quality not guaranteed by higher student test scores or even higher observed ratings; it is possible that IMPACT consequences are detrimental to less experienced teachers. Consider this Dee and Wyckoff observation:
We also know from the DCPS data that IMPACT scores for teachers in their first two years of teaching average 17 points less than those with three or more years of experience.
Now note this explanation that Dee and Wyckoff provide to illustrate score increases in the ME group:
…We estimate that the typical teacher who entered DCPS in 2009-10 with no prior teaching experience improves by 24 IMPACT score points over the first three years of teaching.
If it takes new teachers three years to improve 24 IMPACT points, and the ME score range for the 2009-11 data was 175 to 249 points, then it is logical to assume that a number of new teachers could be dismissed under IMPACT in their second year of teaching. The new IMPACT system includes a third category, “Developing”; yet DCPS teachers are expected to achieve the “Effective” category by the end of their third year. A study examining the degree to which IMPACT is removing less experienced teachers from the classroom is warranted.
Teach for America Recruits: Not Really IMPACTed
There is, however, a group of novice “teachers” who remain virtually untouched by the effects of IMPACT: Teach for America (TFA) recruits. These “teachers” are provided with five weeks of training and sent into DC’s classrooms for a usual stint of two years. They leave, and more TFA recruits arrive. Constant churn. As such, TFA recruits could perpetually be classified as ME and still be allowed in the classroom to fulfill their two-year agreements. Based on information from the TFA DC webpage, 280 DC teachers are TFA recruits and they affect 14,000 DC students.
Nevertheless, DCPS offers the following as its purpose for creating IMPACT:
Recognizing the importance of ensuring that talented and committed individuals are serving all of our students, DCPS developed IMPACT, our system for assessing the performance of teachers and other school-based staff. [Emphasis added.]
The duplicity is astounding. How is employing temporary TFA recruits who escape any possible career detriment from IMPACT in any way connected to “ensuring talented and committed individuals” become (and remain) DCPS teachers? Yet Michelle Rhee, who instituted IMPACT, was once one of these TFA teachers. Rhee remained three years. Therefore, her IMPACT score would have had to reach the “Developing” level. However, Rhee’s three TFA years– had they been in DC ending in 2013-14– would not have had to reach the “Effective” category before she moved out of the classroom and with her limited classroom experience eventually followed the TFA scheme to lead American education.
IMPACT has a built-in, TFA-friendly, gaping hole.
TFA DC currently advertises that 2,000 TFA alumni are in the DC region.
They are no longer teachers. They have moved on. They will not know the career threatening pressure of IMPACT.
The Financial Cost of Teacher Evaluation
In their paper, Dee and Wyckoff allude to a question any district must answer if it decides to emulate the DC IMPACT model: financial cost:
Any teacher-evaluation system will make some number of objectionable errors in how teachers are rated and in the corresponding consequences they face. Districts may be able to reduce these errors through more sophisticated systems of teacher assessment (e.g., higher-frequency observations with multiple, carefully trained raters) but, in so doing, they will face both implementation challenges and possibly considerable direct financial costs.
There is the cost associated with the ever-increasing number of teachers who are to be rated via standardized tests. There is the cost of five classroom observations per year per school-based staff. And there is the cost of those teacher bonuses— initially borne using philanthropy funds– and later transferred to the DC schools:
There’s plenty for DCPS stakeholders to dislike in the newly released school-based budget allocations proposed for 2012-13. While they reflect a two-percent increase in the Uniform Per Student Funding Formula, they will still mean larger class sizes–as in, fewer teachers–at the middle and high school levels, along with cuts in positions such as special education coordinator.
One of the drivers of this unpleasant news is the projected exhaustion of the private foundation money that has underwritten IMPACT bonuses for “highly effective” teachers under the terms of the 2010 collective bargaining agreement. After the current school year, the annual cost of the bonuses–about $7.2 million for the first two years of the program, according DCPS–will be borne by the individual schools. For FY 2013, those bonus obligations will be loaded into the average cost of a classroom teacher, which will rise from $90,681 to $95,574. DCPS was clear in 2010 that this reckoning was coming. The $64 million pledged by the Broad, Arnold, Robertson and Walton foundations was a three-year ride set to end in the fall of 2012. What was largely unexpected was to see these costs passed down to the school level. [Emphasis added.]
And this is only the crisis incurred for HE teacher bonuses. Districts must balance teacher evaluation cost with the “impact” of the evaluation outcome.
For its phenomenal price tag ($7.2 million for two years of teacher bonuses alone), one must consider whether IMPACT is really purging DCPS of “bad” teachers or just creating a highly costly professional churn.
Limited Scope of Study and Stunted Application of Findings
Dee and Wyckoff analyzed only two years of IMPACT ratings (2009-10 and 2010-11), and their focus was on retention and score improvement of only two teacher classifications: ME teachers (those given two years to raise scores beyond the ME category) and HE (those deemed highly effective). Furthermore, Dee and Wyckoff restricted their focus to the general teacher population (only Groups 1 and 2).
There is nothing wrong with researchers’ restricting a study so long as they are clear about the restriction. Dee and Wyckoff were clear regarding such study parameters. The problem comes when the media seize restricted work as global and undisputed proof of a program’s value. IMPACT has already changed. Both the number of IMPACT categories and the calculation of IMPACT scores have changed for 2013-14. Thus, the limited scope of Dee and Wyckoff’s work is overshadowed by the fact that their findings cannot be extended to the current state of DCPS’s IMPACT. The findings can only be discussed in terms of the way that IMPACT “used to be.”
Dee and Wyckoff have written a working paper on a version of IMPACT that is now obsolete. They used data that originated with both a former chancellor and a current chancellor who themselves have not been cleared of involvement in DC’s cheating scandal. The results of their work raise questions as to whether novice teachers are unduly penalized in a teacher evaluation system likely to purge them first– all the while allowing TFA recruits to escape any meaningful evaluation due to both their limited stay and constant replacement in the classroom. The work also raises questions of the phenomenal cost of instituting (and “improving”) the IMPACT system.
These are the associated Dee and Wyckoff study issues that should be the focus of media attention.