Commentary on Mathematica’s “First Study of Its Kind” of PARCC
On October 05, 2015, Mathematica published a study comparing the Massachusetts Comprehensive Assessment System (MCAS) to the Partnership for Assessment of Readiness for College and Careers.
In this post, I comment on a number of details related to the Mathematica study, which does not support the “next generation” hype associated with supposedly Common-Core-aligned PARCC.
The study is entitled, “Predictive Validity of MCAS and PARCC: Comparing 10th Grade MCAS Tests to PARCC Integrated Math II, Algebra II, and 10th Grade English Language Arts Test.”
Even the title is problematic, for it implies that the study is predictive– that 10th-grade students completed MCAS and PARCC and were tracked across years to see the degree to which these two tests predict the success of these 10th graders once they reach college. Later in the study, the researchers admit that they were only able to measure “concurrent validity.”
Not so. As it turns out, Mathematica administered these 10th-grade tests to current college freshman. So, there is no prediction. At best, the study can assess the degree to which college freshman who do well on 10th-grade MCAS and 10th/11th-grade PARCC have college grades that the researchers deem to indicate “college readiness.” For MCAS, researchers could have compared the students’ 10th-grade MCAS results to their college GPAs, but these students did not take PARCC when they were in 10th grade, so the best “predictive” evidence that the researchers had involved comparing the students’ 10th grade MCAS results with the outcomes of the MCAS the students took in college for this study. The researchers call the correlations between the two MCAS administrations “strong,” but they are not:
The correlation of .71 for MCAS math means that only half (.71 x .71 = .50) of the difference in a participant’s MCAS math result for the Mathematica study can be accounted for by the same participant’s score in MCAS math as completed when the participant was a sophomore in high school.
MCAS ELA was even weaker, with a correlation of .51, which means that only one fourth (.51 x .51 = .26) of the difference in a participant’s MCAS math result for the Mathematica study can be accounted for by the same participant’s score in MCAS math as completed when the participant was a sophomore in high school.
These results are not strong, and the study is not predictive. In fact, Mathematica excuses the rush because Massachusetts “cannot wait that long to choose its assessments.” The researchers add that their choice to administer high school tests to college students “generates immediate evidence.”
Everything about the Common Core and its assessments has been a rush job. As it stands, the best that Mathematica can offer Massachusetts (and the remaining few PARCC states) is a distraction as it compared MCAS and PARCC to each other without focusing on the more pressing question of whether PARCC should be used to gauge “college readiness.”
Mathematica did not even address “career readiness.” Too nebulous a term, so let’s just dismiss it.
So, who completed the study? It started with 866 college students who graduated high school in Massachusetts. Participants were offered gift cards for general participation and more gift cards for answering correctly on two randomly selected test questions. The PARCC/MCAS results of 19 participants were removed from the analysis for being “outliers who did not complete the exam or did not make a good-faith effort to answer the questions correctly.” Thus, the final overall sample included 847 participants.
The comparisons of MCAS and PARCC were not impressive. Mathematica offered a correlation of .23 as the outcome measure of a PARCC ELA score (students only completed a single component, not the entire test) to college GPA. That same correlation was the outcome of MCAS ELA score (also only single component) to college GPA. However, Mathematica noted that determining college success “was not the original aim of MCAS.” Yet, for all of its “next generation assessment” hype, PARCC ELA yielded the same correlation to college GPA as did MCAS.
A correlation of .23 means that the variance in single-component PARCC ELA scores that was shared by college GPA was .23 x .23 = .0529, or just over 5 percent.
Thus, single components of PARCC ELA did not account for 95 percent of the variance (the differences) in college GPA for the participants in the Mathematica study.
The same was true of single-component MCAS ELA scores and college GPA.
As for PARCC math to college GPA, the correlation was .43. This value squared (.43 x .43) indicates that the proportion of variance in college GPA that was accounted for by a single component of PARCC math was .18, or 18 percent.
The PARCC single-component math correlation result means that for the college students in this study, 82 percent of the variance (the differences) in college GPA was not accounted for by participants’ PARCC math scores.
Moreover, even though PARCC single-component math beat out MCAS single-component math (correlation .36, resulting in a squared value of .36 x .36 = .13); only 13 percent of the variance (the differences) in college GPA could be accounted for how those college students did on the single component of the MCAS math test.
Conversely, 87 percent of the variance (the differences) in the college GPAs of the study participants could not be accounted for by their scores on the single component of the MCAS math test they took.
Then again, MCAS was not designed to account for college GPA.
PARCC supposedly was, and it only beat out MCAS by five percent of variance accounted for.
There really is nothing to see here, folks.
However, Mathematica presents the above weak information as though it is worthy of the computer screen upon which it is displayed:
Both the MCAS and the PARCC predict college readiness. Scores on the assessments explain about 5 to 18 percent of the variation in first-year college grades, depending on the subject.
The Mathematica researcher do acknowledge some of the study’s limitations:
The study sample is limited to enrolled college students at public institutions in the state, who might not be representative of the statewide population of high school students. One reason for this is that testing students who are already in college misses the students who did not enroll in college or who dropped out of college before the spring semester. Another reason is that even for the test-takers in the study, students’ academic growth since 10th grade might differentially affect performance on the PARCC or MCAS tests. In addition, due to the time burdens of completing these exams in full, the study could recruit students to take only one component of the MCAS (which has two components of interest) or the PARCC exam (which has five components of interest). As a result, our analysis depends on additional assumptions to predict the combined validity of multiple test components at the same time.
The college students in the study did not take the entire PARCC or MCAS, just a slice of PARCC or MCAS.
And again with the need for the rush job:
Addressing these methodological concerns would require a longitudinal study that tracks the outcomes of students over three years from the point when they complete each exam in 10th grade through the end of their first year in college. Policymakers choosing between the two exams in 2015 cannot wait that long to make decisions about the tests.
What follows is indeed a terrible truth:
This is also the first study of its kind. To date, no reliable evidence demonstrates whether the new Common Core-aligned assessments provide accurate information about which students are prepared for success in college.
The Mathematica study, which is not a true prediction study of MCAS and PARCC because Massachusetts cannot wait, is the first study of its kind. Consortium tests were promoted in 2009 as President Obama and US Secretary of Education Arne Duncan dangled that federal $350 million before the eyes of governors who had already signed on for Common Core.
The $350 million was part of $4 billion in stimulus money dished out in the name of Obama’s Race to the Top. Four billion, and not a dollar spent to research either Common Core or its federally-promoted consortium assessments prior to RTTT-incited adoption.
A 2015 study should not be the first to state that its work “provides a model for other states considering difficult choices about whether to change their current statewide assessment systems.”
And again, never mind even discussing the ridiculous notion that “career ready” can somehow be measured by consortium assessments. As it stands, “college readiness” as assessed in this single study related to PARCC “college ready” outcomes was flimsy.
The researchers also analyzed raw scores rather than scaled scores. PARCC did not have its scaled scores available prior to the Mathematica study, and the researchers note, “it is possible that non-linear scaling on the PARCC exam could change the correlations we report in this study.” That means if there is not a one-to-one correspondence between a raw score values converted into scale scores (skipping no values), then the Mathematica study results could be affected.
Too, the researchers assume “converting raw scores to scaled scores does not change students’ proficiency rating on the PARCC exam.” However, the researchers do not consider that raw-score-to-scale-score conversion could allow for students to be declared “PARCC proficient” based upon few items correct– that the scaled scores do not reveal, for example, that on the Louisiana version of “PARCC” for eighth-grade math, the “proficient” scaled score of 750 requires a student to answer only 26 out of 80 items (or 33 percent) correctly.
(Keep in mind that Louisiana did not contract with Pearson for PARCC tests, and the Louisiana public has seen no official documentation regarding its alleged “PARCC” tests.)
The researchers attempted to test the PARCC proficiency cut score (which the PARCC consortium set at 750 for scaled scores in September 2015 and which the researchers somehow translate from raw score to scale score for whatever PARCC test forms they use). The PARCC consortium wanted to set a proficiency cut score such that students achieving that cut score would have at least a 75 percent chance of a C-average in college. The researchers concluded that the PARCC proficiency cut score works because 89 percent of the students in their study who scored proficient on the single component of the PARCC test (in either ELA or math) had at least a C-average college GPA.
As previously mentioned, the researchers did not consider that a scaled score for PARCC proficiency could correspond to a notably low raw score. In other words, the researchers did not account for the potential “moving parts” behind those PARCC scaled scores.
Moreover, the researchers did not provide information about the percentage of students who did not score “proficient” on PARCC but who have GPAs of C or higher anyway. Thus, readers do not know, for example, if a large percentage of students who did not score “proficient” on PARCC have satisfactory college grades despite a poor PARCC testing outcome.
The fact that Mathematica did not report information on the GPAs of students who did not do well on PARCC (and whose college success challenges the marketed utility of PARCC) is a major limitation of the study.
Mathematica also notes that participants who scored “college ready” on PARCC were less likely to require remedial coursework than participants who scored “proficient” on MCAS. However, only in a footnote did they provide critical information regarding students who did not achieve PARCC or MCAS proficiency and who still did not need remediation: 40 percent for MCAS, and 66 percent for PARCC.
This means that 66 percent of study participants who did not score proficient on PARCC still did not need to take a remedial course in college– definitely not good press for PARCC.
There is more to the study, but I will end my commentary here. What I offer in this post is concern enough about the inflated importance placed upon standardized testing in the American classroom in general and upon PARCC as some “next generation” super product.