Is the PARCC test valid? ?>

Is the PARCC test valid?

The New Jersey Board of Education announced last week its intent to make passage of the PARCC test a requirement for high school graduation.  Students in the eleventh grade are given a computer-delivered test of tenth grade English and Algebra I, typically taught in the ninth grade.  Students will need to achieve a score of 4: Met Expectations or 5: Exceeded Expectations.  These score reflect the standards adopted by the federal Common Core for education

New Jersey scores, like other states, have been dreadfully low.  By other measures though, New Jersey schools are among the top three state school systems in the country.  The state’s spending on a per student basis matches.

Parents and teachers alike are up in arms over the tests.  In 2016, half the students in the eleventh grade didn’t achieve a passing score of 4 or 5.  Starting in 2021, they won’t graduate.  It is ludicrous that we would fail to graduate half our high school seniors based on this test, they say.  The result is opposition to the test by a diverse group of detractors.

The multi-state consortium which is PARCC and its member the NJBOE, have failed miserably to communicate the objectives of PARCC, how it works, how scores are calculated, and what is being done to improve the test.

At one level, opposition to PARCC comes from the gut.  Students and parents hate a test that requires studying but frequently returns disappointing results.  Teachers fear the PARCC as a tool to implement performance measurement and change seniority-based promotion and pay scales.  At another level, PARCC and the vendors describe academic studies that prove efficacy of the tests.  The PARCC website lists the studies, but doesn’t give details or give links to the reports.  Opponents jump right into the technical details, arguing the differences between “predictive validity” and “construct validity.  They know that the layman’s eyes will glaze over, the psychometric jargon evoking an aversion to the whole discussion.

First principles

Before we argue about PARCC specifically, it’s worthwhile to lay the underlying questions on the table and get some agreement or core values.

High School Diploma

What should a high school diploma mean?  At its most fundamental, a high school diploma should represent a level of literacy and numeracy to hold an entry level job with opportunity for advancement.  A graduate should be able to read the newspaper, assess it critically, and participate in the political life of the nation.  A graduate should have been exposed to enough biology and health to make reasoned decisions about health and reproduction.  A graduate should have been exposed to enough disciplines including the arts and sciences to make career choices that result in a reasonable level of fulfillment.

Most people will agree that a high school should comprise a well-rounded basic education.  The arguments start when we get to specifics.  Who’s to say exactly what should be included in the high school curriculum?  Should one be able to read and comprehend USA Today or The New Yorker?  Should one be able to write a police report or a love letter?  Should one be able to convert imperial to metric units, or calculate the volume of air in a dome?

The underlying standard is that the high school diploma should allow its bearer to be a full member of society.  One needs to be able to measure that.  One could correlate level of high school achievement with lifetime earnings.  Sure that’s a measure easy to study, but it’s a one-dimensional measure.  We could enumerate the size of the student’s vocabulary.

PARCC makes no promises that it measures a student’s preparation for life.  The name explains it all: Partnership for Assessment of Readiness for College and Careers.  There’s no indication that PARCC is a valid instrument to determine readiness for a career.  PARCC is a valid predictor of success in college, which is the next stage for most high school graduates.

Validity and Reliability

A test has two primary measures of its goodness.  The first is reliability.  Reliability is the tendency of the test to deliver the same score to a student who takes the test multiple times.  Are the results consistent from year to year?  A test that is not reliable is simply useless.

The second measure of goodness of a test is validity.  Validity is the answer to the question, “Does the test measure what we intended to measure?”  For example, we could pose a test for intelligence that is based on eye color.  The test would be reliable, i.e. in multiple administrations, the results would be consistent.  The test would be entirely invalid.

A third measure of goodness is discrimination.  In this usage, discrimination is a good thing.  Discrimination answers the question, “Do two different scores tell us that the test takers are indeed different?”  In the case of PARCC, the important discrimination is between Level 3: Approached Expectations and Level 4: Met expectations.

Gateway to college

As we enter the twenty-first century, a high school diploma may not represent enough skills to ensure a working life in the middle class.  Even for those in the trades, high school read writing and ‘rithmetic aren’t enough.  It’s no longer enough for a welder to have excellent manual skills.  She needs to be able to program a bank of robotic welders.  She’s going to need more advanced education: programming, some metallurgy, perhaps fundamentals of supervision, if she is to advance.

For most American kids, high school is a milestone on the way to college. They may not finish college, but they need to be prepared.  PARCC, as it turns out, is a reliable predictor of college success, at least as reliable as the SAT.

Since 1905, we’ve know that the size of a person’s vocabulary is correlated with IQ.  Since 1926, we’ve known that vocabulary is a good predictor of college success.  While the PARCC may not be a good a predictor of success in life, the kinds of questions it asks are a good predictor of college success.  Since freshman year of college is the next stage of life for most high school seniors, the PARC is a valid test.  It divvies students into five categories in English (ELA) and math.

  • Level 1: Did not yet meet expectations.
  • Level 2: Partially met expectations.
  • Level 3: Approached expectations.
  • Level 4: Met expectations.
  • Level 5: Exceeded expectations.

Discriminating Level 3 and 4

When an eleventh grader looks forward to graduation, what matters most is whether she earns a score of three (not good) or four (graduate).  College choice will determined by the SAT.

A student’s SAT scores certainly influence college matriculation, but there is no absolute cutoff.  A score of 710 may get you into Harvard, but a 690 won’t exclude you.  Colleges understand that the SAT is as discriminating as it can be, but it’s an imperfect measure.  We know exactly how imperfect.

PARCC scores are comparable to SAT scores in predicting college outcomes. (Nichols-Barrer 2015)

In English language arts, the PARCC end-of-year and performance-based assessment scores have a combined correlation with college grades (0.23) that is virtually identical to the corresponding correlation between MCAS English language arts test scores and college grades (0.23).  For mathematics, the correlation with college grades for scores on the two PARCC integrated math components (0.43) is also statistically indistinguishable from the association for MCAS math test scores (0.36). (Nichols-Barrer 2015)

Correlation is the measure of how closely test result predicts college performance.  It can be valued from -1 to 1.  A correlation of 1.0 is a perfect prediction.  Zero indicates no prediction.  A negative value indicates an association, but backwards.

What does a correlation of .43 mean?  Statistics don’t mean anything unless you can compare them to something.  The correlation of high school GPA to college GPA is .845, an almost perfect predictor (Belfield 2012).  PARCC is a pretty good predictor of college performance.  It is better than the SAT, which has a correlation of .31 (Nichols-Barrer 2015).

Read more deeply in these studies and you’ll see that kids self-select into colleges appropriate to their academic performance.  Votech grads don’t go to Yale, where their previous academic success is unlikely to follow them.  They might go to a community college where it does.  A 3.0 average from a votech high school is not the same as one from a STEM school.  Grades may be predictive of a student’s success in college where he goes, but it doesn’t make the high school the same.

It’s clear that we have some wide disparities in PARCC performance between New Jersey schools and districts.  The Department of Education reports that the average High School GPA is 3.0 (D Education 2009).  It’s probably that all New Jersey schools cluster around 3.0 as well.

Take for example Newark Vocational High School, which had 0% of its students score 4 or 5 on the PARCC.  This school graduated only 77% of its students.  Eighty-eight percent are eligible for free or reduced price lunches.  The average teacher salary is $81,000.  The middle 50% of GPAs range from 2.9 to 3.6.

In comparison, consider Morris Knolls High School, which I chose because my taxes go there.  Forty percent of its students score a 4 or 5 on the PARCC.  This school graduates 97% of its students.  Only ten percent are eligible for free or reduced price lunches.  The average teacher salary is $89,000 (but the student-teacher ratio is higher than at Newark).

Wow.  That’s a disparity.  If New Jersey is to continue to have the best or third best schools in the country, we need to make the best excellent, the average better, and the worst at least average.

To make a school better, we need to improve the performance of each student there.  Students who aren’t doing well, say at level 1 or 2, need one kind of support different from that of the student with 4 shooting for a 5.  You can’t judge who’s doing well by their grades.  There are plenty of A students at Newark that are at 3 or below.

Norm-Referenced vs. Criterion-Referenced Tests

PARCC is intended to be a criterion-referenced test.  When Level 4 says expectations, the expectations are those established by the federally mandated Common Core Standards.

I made up this example of words that might appear on a PARCC test.  The words might not make for a reliable test item (question).  Test makers can adjust the test items each year based on the previous year’s results.  That’s what ETS does with the SAT.

  1. afford
  2. compress
  3. embark
  4. adjunct
  5. obsequious

If everyone who takes the test can identify the first four words, all get a 4, a pass.  Those who can identify all five get a 5.  No one fails.

Opponents say PARC is a norm-based test.  Scores on a norm-based don’t reflect the absolute level of achievement.  The bottom k percent of test takers are scored level 1.  The next l percent of test takers as scored level 2, and so on, until the total is 100%.  IQ tests work that way.  The middle 67% of the population get scores between 85 and 115.  Normative scores are great for academic studies, but they are inappropriate to determine who graduates.  After a couple of years of experience, educators will learn that next year’s scores will likely be 5%, 10%, 20%, 50%, 15% (or something).  If PARCC is to be used to determine who graduates, the score should depend on how many questions were answered correctly, not where you fall in the pack.  That may be appropriate in primary grades, but not for graduation.

Is the PARCC test valid?

Validly describes whether a test answer the original question you asked.  PARCC measures achievement of the Common Core standards.  We can argue whether the Common Core represents a well-rounded education for a citizen, but the test is valid.

PARCC is a tool that will help New Jersey maintain its preeminent position is U.S education by identifying the needs of each student.

Leave a Reply