The Standards Are Watching You
By Samuel Schwartz
Standardized tests have become a rite of passage in public high schools. Everyone sits for one. Everyone stresses about one. Everyone anxiously checks, re-checks, and re-re-checks their inbox on that fateful day when the mysterious test-grading powers-that-be descend from their palanquins to distribute scores to the expectant masses.
These tests have become nearly de rigeur, but why?
After all, they’re effectively worthless—at least, assuming that you’re not too interested in hearing the College Board (the organization that produces the SAT) expatiate on the wonders of its own product and that you would rather listen to the analysis of independent sources. If these tests were reliable, high scores would mean good college performances. And the better the test was, the tighter this relationship would be. The mathematical name for the “tightness” of the two numbers is the “correlation coefficient.” They range between 0 and 1, with 1 being really good and 0 being really bad. A fairly good “correlation coefficient” is around 0.7. For reference, that looks like this.
Well then, what is the correlation coefficient between SAT performance and first year college GPA? By the very most optimistic of statistics it’s 0.118, or something like this.
When some more realistic filters are applied (like taking into account how good someone’s high school was), that “tightness” value plunges down to -0.007. That looks about like this.
Or, put another way, the SAT is literally around as good of a predictor of college performance as which college’s merchandise you’ve purchased.
Well, with 975 test-optional schools (i.e. schools that don’t require either an ACT or an SAT score to be submitted) and more schools opting to go test-optional every year, perhaps all of this can be explained by the fact that there’s a whole bunch of hidden scores that would set this data straight—perhaps there’s a massive coterie of students whose GPAs and graduation rates we need to take a second look at.
That might sound like a convincing argument, but there’s no basis for it. In his 2014 paper on the subject, William C. Hiss, a former Dean of Admissions for Bates, found that the difference in college GPAs between students who submitted scores and students who didn’t panned out to five one-hundredths of a GPA point. As for some theoretical discrepancy in graduation rates between the two groups, that disparity landed at about six-tenths of one percent— or, in Hiss’ words, “trivial differences.“
So what can this cabal of tests show then? Your ethnicity and income, as it turns out. And they do it fairly consistently as well.
According to a 2013 study by Ezekiel J. Dixon-Romania from the University of Pennsylvania, John J. Mcardle from the University of South Carolina, and Howard Everson from SRI International (an American non-profit organization), “not only [does] family income have a meaningful direct association with the SAT, but… its association [is] non-linear and substantially higher for Black test-takers than for White test-takers.” In other words, not only does it mean that the more money your family makes per year, the better you’re going to do on average on one of these things, but it also means that if you’re a person of color, you’ll do worse on average than your white counterparts.
When you dig into the data, the difference between students whose families make more than $100,000 a year and students whose families make less than $10,000 dollars a year is 283-ish points (bearing in mind that this is an SAT out of 1600 points) and 157-ish points between white students and black students. The difference between a black student whose family comes from the lowest income bracket and a white student whose family comes from the highest income bracket averages out to 647 points, or more than a third of the points available on the test.
For those of you too thoroughly steeped in testing-land mumbo jumbo, you might notice a problem here. See, the SAT needs new questions every year, and it does this by generating a bazillion questions and then test driving all of them on an optional fifth section of the SAT to sort out the good questions from the bad ones. If students who did well across the board get a question right, it stays in. If not, it gets tossed out.
However, since white students and rich students generally do better on the test as a whole, this system doesn’t produce good questions so much as it produces questions that are answered correctly by students who do well on the SAT. And since the test already demonstrates a racial bias, it skews towards churning out questions that white people answer correctly.
Spanish-speaking students are particularly affected by this in the language section, as their knowledge of a second language can both hinder and help them. Take this antonym problem for example.
- Loyalty (correct answer)
For an English-only student, there is no obvious connection between the words “infidelity” and “loyalty,” but a Spanish-speaking student will recognize “infidelity” as its Spanish equivalent “infidelidad” and loyalty as its Spanish equivalent “fidelidad,” thus, the Spanish-speaking students will choose option C and move on.
“False cognates,” on the other hand, or words that look like a similar word in another language but have a different meaning, often pop up on the exam and disproportionately disadvantage Spanish-speaking students, confusing them and tempting them with alternate meanings. But since English-only students don’t get lead down the same wild goose chase that the Spanish-speaking students do, they do fine on those questions. And then, because the students already doing well on the exam are rich and white, the false cognate confusion questions get on (since those white students weren’t affected) and the true cognate questions get dropped since the white students didn’t garner the same advantage that the Spanish-speaking students did. Indeed, that question above was a real question on which Latinx students fared significantly better and one which was thrown out of an exam in the mid-to-late 90s. To quote a 2002 paper by William C. Rosen and Jay Kidder “questions that are “biased” in favor of Whites have a fair chance of making their way onto a scored section of the SAT; ones that are “biased” against Whites have virtually no chance of appearing on a real SAT section.”
These types of antonym questions aren’t featured on the SAT exam anymore, but that system of question test-driving still is, ensuring that bias is carried on from year to year. And since the question generation system can never really get reset, prejudice just carries on indefinitely.
This is especially concerning in light of the fact that the past of the SAT in particular is marred by discrimination and racism. The test originated in the First World War when the British psychologist Carl Brigham was tasked with creating an aptitude assessment for placing soldiers into units. Brigham happened to be a staunch eugenicist and virulent racist—and though he later recanted some of his most odious remarks, his test was formed in 1926, long before any such retraction. To make matters worse, the SAT was introduced with the experimental section from day one, meaning that the cycle has remained unimpeded since its conception with the creme dela creme of one year’s stock of students determining the questions for the next, and so on and so forth ad infinitum.
Of course, that bigotry has declined in the decades that followed, but it certainly hasn’t disappeared. In the 80s, the test removed the famous analogy questions (e.g. “Tablecloth is to table as _____ is to floor”), after they were remonstrated for being rather absurdly socioeconomically biased. The most famous example of this is the almost hilariously unbalanced question “Runner is to marathon as oarsman is to regatta.” A regatta, by the way, is a yacht race, in case you weren’t fully seized of your recreational-nautical terminology. According to that same 2002 Rosen & Kidder paper, while 53% of whites students answered this regatta question correctly, only 22% of their African Americans peers did so.
These exams are better litmus tests for skin color and income than skill or ability.
In the past, it has been argued that an outstanding score on one of these tests can help to distinguish a low-income student and that these tests, while biased, are less biased than other assessment mechanisms like grades or extracurriculars. However, this argument speaks in no way to the qualification of that particular high-scoring candidate. Just because someone from a lower income bracket has done well on one of these exams doesn’t means that the right someone was the one who did it. And while that one someone certainly has benefitted, on average, people of lower economic status simply do worse, meaning that a there are a whole multitude of other someones who did not reap this same reward and who were unfairly disadvantaged by the test. One vaunted success does not justify these thousand quiet injustices.
So why hasn’t this system changed?
Well, for two reasons, the first being the absolutely massive profitability of the standardized testing industry. The two organizations that design and manufacture the SAT are The College Board and the Education Training Service (also known as the ETC). Both of these organizations are non-profits, of course, but the ETC has non-profited to the tune of a tad over $43 million in 2016, $17 million in 2015 and $68 million in 2014. The College Board, by contrast, non-took in $37.5 million in 2016 and $77 million in 2015. They lost around $41 million the year before, but, not to fret, in the four years prior, they found the room within their charitability to non-earn $98 million, $55 million, $45 million, and $71 million. Last year, the two behemoth’s combined to non-pay their executives a meager total of $20,654,586.
These tests are scored by Pearson, plc., the for-profit educational gargantua with a tentacle in almost every pedagogical industry—including one that has strangled up the Common Core standardized test manufacturing market and another that has coiled around the textbook manufactury field. Last year, they scooped up $2.6 billion in gross corporate earnings and $1.2 billion in gross corporate profits.
As for the slew of companies feeding off the slop from this booming industry, The Princeton Revue hauled in $300 million dollars in revenue last year, and Kaplan, inc. managed to swallow up $1.6 billion in gross corporate earnings.
And the other reason that the system hasn’t changed is that it already has. It has changed again, and again, and again, and every times, nothing happens.
It changed in 1994 to remove the antonym questions disadvantaging Spanish-speaking students—and the score gap didn’t budge. It changed in 2005 to remove those yachting-based comparison questions—and scores stayed static. It changed in 2013, claiming to have an eye towards correcting bias, but then knowingly implemented systems that disadvantaged low-income, Spanish-speaking, and or African American students even further. Why? Because the sad reality is that those with the fewest obstacles to making change are those with great surpluses of time, wealth, and social capital at their disposal—and those people are usually rich and white, the very people whom test benefits. And thus, despite the ostensible fervor with which racism and inequality are combated, when there is a real opportunity to make change, those with the loudest voices fall silent.
It is easy to piously decry your neighbor’s misdeeds. It is much more difficult to humbly mend your own. And so, even though they know that Lady Justice is watching them, they find themselves suddenly struck blind.