Gerrymandering and “Proportionality:” Setting the Baseline

In my last post I considered what I called the “Breyer criterion” for identifying partisan gerrymandering — a party winning half the vote in a state receives only a third of the seats.  That criterion identified just seven races out of the nearly eight hundred I examined, or just 0.9 percent of the state-level elections to Congress where candidates of both major parties stood. Breyer proposed his criterion to identify “real outliers,” elections that are “really extraordinary.” A one-in-a-thousand criterion probably fits that definition.

However the Court also discussed the general concept of how to measure “proportionality” between seats and votes. The attorney for the plaintiffs, Paul Clement, brought up the notion of a “one standard deviation from proportional representation” criterion mostly as a straw horse. Leaving aside his use of “proportional representation,” which as the oral argument shows is fraught with constitutional issues, Clement then claimed that it is impossible to know what the correct baseline should be from which to measure seat outcomes.

So I think the fundamental problem is there is no one standard deviation from proportional representation clause in the Constitution. And, indeed, you can’t talk even generally about outliers or extremity unless you know what it is you’re deviating from.

Clement’s argument ignores decades of political science research into the relationship between votes won and seats awarded.  Studies dating back to at least 1948 have theorized about and examined empirically the relationship between seats and votes.

Measuring the Baseline

I’ve written a number of times about the relationship between votes won and seats awarded in “first-past-the-post” or “plurality” electoral systems like ours.  These types of electoral systems routinely award the majority winner of the vote a disproportionately greater share of seats. Here is a simple example, using national electoral results for Congress.

The dark blue line represents the “best-fit” relationship between the percent of votes won by the Democrats in each election year and the percent of House seats the party won using simple “ordinary least squares” regression. The historical relationship is substantially steeper than the thin line in the chart representing parity, or when a party’s share of seats equals its share of votes.1

Using simple regression the equation that best describes this relationship is, in round numbers,2

% Seats Democrat = 2 X (% Votes Democrat) – 50

So, for instance, in a year when the Democrats win 55 percent of the vote, they should receive on average (2 X 55) – 50 = 60 percent of the seats.

Since gerrymanders take place at the state level, data from national elections do not provide the correct basis for determining whether a particular state’s election deviated “too far” from some predicted baseline. To develop such a baseline for Congressional elections I turn again to the MIT database of Congressional races I used in the preceding blog post.  Here is the relationship between votes and seats for state-year combinations. Each point represents a general election in a given state in a particular year, like Alabama in 1976.

A number of races resulted in one party or the other winning all the seats. These unanimous outcomes pose mathematical problems for our method, so I excluded those 84 races in the calculation of the slope and intercept for the regression line in the chart.

(The horizontal lines come from states with small numbers of districts where the number of outcomes is mathematically restricted. For instance, a state with four districts will often return a 3-1 result for one party. That leads to clustering at values of 25 or 75 percent.)

Using state-level election results gives us a model that is numerically quite similar to the simple method based on election years above:

% Seats Democrat = 2.3 X (% Votes Democrat) – 66

Here the slope of the line is slightly steeper than two and the intercept slightly more negative. In practice, though, the difference between these results and predictions using the simpler model from national-level data are negligible. The lines are so close that I could not represent them both on the chart.

Given the convergence between these two sets of estimates, I propose that

The best “baseline” estimate for the division of seats given the division of the vote in state-level Congressional elections is

% Seats Democrat = 2 X (% Votes Democrat) – 50

That formula uses simple numbers like two and fifty and produces results nearly identical to those using the estimated regression coefficients of 2.3 and -66.

The regression method also produces a measure of the “standard deviation” of actual outcomes around the predicted values. I use that quantity in the next post to identify potential gerrymanders using the deviation from proportionality method.

Next: Gerrymandering: Finding the Deviant Elections



1The results for the last two Democratic off-year House victories, retaking the chamber in 2006 and 2018, both fall on this parity line. Given the historical relationship, the Democrats did not receive the usual reward in the House for their victories in the popular vote. The elections in 2012 and 2018 also show significant negative effects for Democrats.

2Complete results for both models here.