Skip to content

Twisted Validity: Guilting Parents to “Opt” Kids to Test in Order to Save Teachers and School

March 9, 2015

Test validity concerns how well a test measures what it is actually supposed to measure.

As such, student achievement tests should be confined to measuring student achievement– not for grading teachers and schools. It really is that simple.


The validity issue has been completely ignored with George W. Bush’s No Child Left Behind (NCLB), and with the Obama/Duncan NCLB waivers, and with both Elementary and Secondary Education Act (ESEA) reauthorizations currently in the House and Senate.

All of this works for those wielding the high-stakes-testing power wand and raking in financial rewards because of what I will term “the compartmentalized beauty for ed reform.” Here is how it works related to the Common Core State Standards (CCSS) assessment appendage, the Partnership for Assessment of Readiness for College and Careers (PARCC) test:

1. PARCC assessment vendor, Pearson, can make a test and call it a student assessment based upon CCSS.

2. Federal or state legislation can mandate that the student assessment scores are to be used to measure individuals who do not complete the tests, such as teachers, or even entities, such as schools.

3. The likes of Pearson can offer individualized score reports; consolidated reports on schools, districts, states, or even the entire (dwindling) PARCC consortium. And since test companies like Pearson are not offering reports that directly pass judgment on non-testers (teachers, schools), then even though these companies know that their individualized student tests are to be used to pass high-stakes judgment on non-testers, they need not pay any attention to it.

4. Those wielding test driven reform (lawmakers, policy pushers) are not readily held responsible for violating test validity because if stupidity is written into the law, that somehow makes the violation non-existent.

5. Meanwhile, the actual test takers– the students– have undue pressure placed on them to carry both teacher careers and school destinies upon their young backs.

You see, in our little American education top-down “reform” hierarchy, as one goes down the chain and moves farther away from the non-classroomed– the legislators, policy opiners, and education profiteers– responsibility increases and power decreases. In the case of test-score-driven reform, the student is at the lowest level of this chain and is therefore the most susceptible to abuse for having incredible (dysfunctional) responsibility for “making” the system “work.”

Only just above the student is the teacher: Lots of responsibility, little power in this ed reform lunacy. Then above the teacher are the school-based admin; then, the local admin, and next, state admin.

And let us not forget the coercive, federal role in holding its funding to states hostage as part of this damaging chain.

In this test-score-dependent hierarchy, those higher on chain (e.g., state, local, and school-based administrators, and even teachers) can add to pressure to those lower on the chain (the lowest being students– and the pressure often funneled from admin to parents).

One way is by piling the guilt to “carry the fates” of teachers and schools upon those little backs.

Consider this letter from the Beauregard Parish School System (Louisiana). It is a prime example of a warped system of “accountability” in which the child is forced to carry the fates of adults.

Be sure to read it closely (click to magnify):

beauregard opt out

The language of the above “opt out” letter is a study in the structural damage to American public education introduced in 2002 by a wrecking ball NCLB swinging back our way via continued-test-driven, proposed ESEA reauth bills in both House and Senate.

I’m still watching for the legislation that requires testing companies to print a validity “guarantee” that their student tests are suited to measure teachers and schools before said tests could be considered in any state that would use the tests for such purposes.

Now that would be a “test-based accountability” likely to curtail the top-down testing obsession.

wrecking ball


Schneider is a southern Louisiana native, career teacher, trained researcher, and author of the ed reform whistle blower, A Chronicle of Echoes: Who’s Who In the Implosion of American Public Education.


  1. permalink

    Outright child abuse! Seriously, how can this be presumed legally?

    Anecdotally, Shouldn’t a test refusal be scored a “999”?


  2. USEd is complicit in the diabolical intent of our superintendent White to exact punishment upon schools that don’t conform to his will. They actually approve his school performance score formula. See page 57 of White’s waiver and his braggadocio in offering up our schools on a platter to Arne Duncan’s privatization agenda. The man boy needs to be booted out as soon as the next to BESE is seated!

  3. Validity according to ESEA – No way our state assessment makes the grade:


    As reflected in the Standards, the primary consideration in determining validity is whether the State has evidence that the assessment results can be interpreted in a manner consistent with their intended purpose(s).

    The Standards speaks of four broad categories of evidence used to determine construct validity: (1) evidence based on test content, (2) evidence based on the assessment’s relation to other variables, (3) evidence based on student response processes, and (4) evidence from internal structure.

    1) Using evidence based on test content (content validity). Content validity, that is, alignment of the standards and the assessment, is important but not sufficient. States must document not only the surface aspects of validity illustrated by a good content match, but also the more substantive aspects of validity that clarify the “real” meaning of a score.

    2) Using evidence of the assessment’s relationship with other variables. This means documenting the validity of an assessment by confirming its positive relationship with other assessments or evidence that is known or assumed to be valid. For example, if students who do well on the assessment in question also do well on some trusted assessment or rating, such as teachers’ judgments, it might be said to be valid. It is also useful to gather evidence about what a test does notmeasure. For example, a test of mathematical reasoning should be more highly correlated with another math test, or perhaps with grades in math, than with a test of scientific reasoning or a reading comprehension test.

    3) Using evidence based on student response processes. The best opportunity for detecting and eliminating sources of test invalidity occurs during the test development process. Items obviously need to be reviewed for ambiguity, irrelevant clues, and inaccuracy. More direct evidence bearing on the meaning of the scores can be gathered during the development process by asking students to “think-aloud” and describe the processes they “think” they are using as they struggle with the task. Many States now use this “assessment lab” approach to validating and refining assessment items and tasks.

    4) Using evidence based on internal structure. A variety of statistical techniques have been developed to study the structure of a test. These are used to study both the validity and the reliability of an assessment. The well-known technique of item analysis used during test development is actually a measure of how well a given item correlates with the other items on the test. Newer technologies including generalizability analyses are variations on the theme of item similarity and homogeneity. A combination of several of these statistical techniques can help to ensure a balanced assessment, avoiding, on the one hand, the assessment of a narrow range of knowledge and skills but one that shows very high reliability, and on the other hand, the assessment of a very wide range of content and skills, triggering a decrease in the consistency of the results.

    In validating an assessment, the State must also consider the consequences of its interpretation and use. Messick (1989) points out that these are different functions, and that the impact of an assessment can be traced either to an interpretation or to how it is used. Furthermore, as in all evaluative endeavors, States must attend not only to the intended effects, but also to unintended effects. The disproportional placement of certain categories of students in special education as a result of accountability considerations rather than appropriate diagnosis is an example of an unintended–and negative–consequence of what had been considered proper use of instruments that were considered valid.

  4. I have issues beyond just validity though that is certainly the most important flaw. We are using the wrong measures. A more accurate way to measure standards is by using criterion referenced assessment (even better would be performance/authentic assessments) not norm referenced. The tests we are using now will always make sure half are below average no matter how much improvement is made by everyone. With criterion referenced assessments, anyone who meets the criteria of the standard can pass. And it should not have to be by a predetermined time since people develop at different rates. Ranking students, teachers, and schools is not useful if as the reformers claim we need to see what people can and cannot do. If you multiply better than me- who cares as long as we can both multiply! Speaking of validity, early childhood readiness assessments are way too faulty to accurately measure anything about a child’s future. But of course that is what is being proposed in my state!

    • Janna, I assure you, my issues with this testing obsession extend far beyond the reaches of this post. But I needed to keep the post to a manageable length given both my energy level and reader digestion. 😉

  5. I feel your pain. There is so much wrong with the assessments that it could be your 3rd book 🙂

    • Laura H. Chapman permalink

      Irresistible if lengthy contribution.

      Pearson tests of students are not instructionally sensitive. Who says so? Pearson!!!
      Pearson does not know how to make tests instructionally sensitive. None of the major test-makers will verify that their tests are instructionally sensitive. Many, like Pearson, are sort of committed to doing some research. A Pearson expert says all this at 15 14).pdf

      This Pearson PDF includes a diagram that is a perfect example of circular reasoning that makes no sense.

      The PARCC tests are not instructionally sensitive. Those tests, like ALL standardized tests in use should not be used to evaluate teachers. And they are not even a good “snapshot” of student learning. And they are not ever “objective.” Anyone who says “tests are objective” has offered proof positive that they no nothing about the process of constructing a test.

      Here are a few of the conditions that must be met for any claim that standardized tests are “instructionally sensitive.” This is was my thought experiment. It sounds like a satire, but it is designed to show one way to make student test scores instructionally insensitive, and in the strictest possible way.

      1. The students enter school as a blank slate and they do so every year. The slate wiped clean of prior learning in and beyond school. Every student test measures just one clearly delineated “interval” of instruction and only learning based on that instructional interval.

      2. The teacher instructs each student in content that has absolutely no connection to anything other than content for this designated interval instruction. This reaffirms the blank slate principle that teacher of record for the this interval solely responsible for what the student learns.

      3. All content is literal in meaning and structured around rules that reduce the need to make inferences. This means that analogies, metaphors, allusions, and the like are kept out of instruction since these can trope in many unruly ways.

      3. The content is presented in a decontextualized manner–meaning free of any connection to anything other than what the teacher and the instructional materials offer.

      4. The students are only permitted to consider the content while they are in the cocoon of the classroom and with diligent teacher oversight of every question and answer. “Homework” and independent research will breach the walls of the cocoon. “Other” unplanned for and unsupervised instruction may amplify learning that is not intended by the teacher. Students may also acquire misconceptions, a serious breech of the intended fidelity of learning to instruction.

      I will let others elaborate on the idiotic assumptions that govern the testing scene in beginning of the 21st century.

      The purveyors of tests for students do not want to understand that the variability in student scores is beyond the control of the teacher of record for a given interval instruction (course, grade level),

      They do not want to accept that variability is inevitable. It can be reduced only by teaching to the test and only to the test preferably item by item.

      They do not want to accept that so-called value added metrics (VAM)–intended to “isolate” the influence of a specific teacher on the test score of each student has the same logic as thinking that students are blank slates when they enter the classroom.

      It seems to me that statistical models intended to isolate “teacher effects” on student scores also require a belief that the best teaching is teaching to the test.

  6. One more validity issue… simple but critical. To be statistically (and honestly) valid, non-testers should be considered MISSING data not ZERO.

  7. Jill Reifschneider permalink

    I know students who are “opting out” by taking the dang tests, but with the attitude/belief that the tests don’t matter. They are dealing with the test stress and ridiculous expectations by placing their name on the paper as instructed, but from there? They know the tests aren’t a valid assessment of anything about them. So, they are playing the game, doing what they are told, sitting in front of the computer screen….but, trying???? No. Why bother? Validity? Give me a break. Nothing about the use of this time, money, labor and resources to conduct this torturous testing is valid.

Trackbacks & Pingbacks

  1. REFUSING THE TESTS. | stopcommoncorenys
  2. The Common Core Weekend Reads – 03-15-15 | Lady Liberty 1885

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s