Technical Appendix: Seats and Votes in the 2018 Election

I am extending the simple model I presented in 2012 relating the proportion of House seats won by Democrats against that party’s share of the (two-party) national popular vote for Congressional candidates.  It uses dummy variables to represent each redistricting period (e.g., the 2000 Census was used to redistrict elections from 2002-2010), and a slope change that starts with the Republican House victory of 1994.

To review, the earlier model showed this pattern of partisan advantage for elections conducted since 1940:

The results for the 2010 redistricting were based solely on the 2012 election.  As we’ll see in a moment, adding in 2014 and 2016 only made that result more robust.

As I argued earlier, not all of this trend results from partisan gerrymandering.  Americans have sorted themselves geographically over the past half-century with Democrats representing seats from urban areas and Republicans holding seats from suburban and rural areas.  As partisans self-segregate, the number of “safe” seats rises, and electoral competitiveness declines.

Partisan self-segregation also makes gerrymandering easier.  Opponents can be “packed” into districts where they make up a super-majority.  House Minority Leader Nancy Pelosi routinely wins 80 percent or more of the voters in her tiny, but densely populated San Francisco district.  Many of these seats are held by minority Members of Congress because of our national policy of encouraging “majority-minority” districts.   These efforts were well-motivated as a response to racist gerrymandering that would “crack” minority areas and distribute pieces of them in a number of majority-white districts.  Unfortunately for the Democrats these policies have meant that too many of the party’s voters live in heavily-Democratic districts.

Here is the result of an ordinary least squares regression for the share of House seats won by the Democrats in elections since 1940:

If I plot the predicted and actual values for Democratic seats won, the model unsurprisingly follows the historical pattern quite closely:

The Democrats routinely won around sixty percent of House seats between 1940 and 1992.  Since then they have only held a majority in the House twice, in 2006 and 2008.  Notice, too, that both the actual and predicted values for 1994 to the present show much less variance than the earlier decades.  The results above show that the “swing ratio” relating seats and votes has become much smaller falling from 1.92 before 1994 to 1.33 (=-0.59+1.92) since.  A smaller swing ratio indicates that House elections have become less competitive since Bill Clinton was elected President in 1992.  Changes in vote shares are still amplified in seat outcomes, as they are in all first-past-the-post electoral systems like ours, but the effect has been diminished because of the increase in the number of safe seats on both sides of the aisle.

We can use this model to estimate the share of votes required in order for the Democrats to win a majority in the House.  This chart shows the predicted relationships between seats and votes for two historical periods, one through the election of Bill Clinton in 1992, and the other beginning with the Republican victory in the House election of 1994 under Newt Gingrich and his “Contract with America.”

The slope in the latter period is substantially flatter than in the earlier period, meaning that Congressional elections have become somewhat less competitive since 1992.  Changes in vote shares have a smaller effect on changes in seat shares than they did before 1994.

Finally, the third line represents an estimate for the relationship in 2018, using the 1994-2016 slope and only the post-2010 intercept shift.  The chart shows that for the Democrats to win half the seats in 2018 they will need to garner a bit over 53 percent of the two-party popular vote for the House.

 

 

*The intercepts in these charts represent weighted averages of the adjustments for the various Census years. For instance, the 1994-2016 line includes the coefficients for the 1990, 2000, and 2010 Census weighted by the number of elections in each decade. So in this case the 1990 and 2000 adjustments would have weights of five, and the 2010 adjustment a weight of three. The 2018 line applies only the 2010 redistricting adjustment.

Procedures Used with Data from Huffington Post Pollster

In the past few weeks, Pollster has begun reporting multiple results for a single poll.  Some polling organizations have been reporting separate results for Democratic, Republican, and independent respondents, as well as the aggregated data for all respondents.  They have also begun providing detailed information on the question(s) asked to determine voting intention.  Pollster reports separate results for each question wording.

Since all my analyses use just one entry per poll, I have begun removing this extra data before analysis.  Unless specifically stated, I am using only the first “question iteration” for each poll (coded “1” at Pollster) and only data for the entire population.  Using the first iteration helps insure consistency across all the polls from a single organization.

 

Why Weighted Least Squares for Polling Regressions

Standard ordinary least squares regression assumes that the error term has the same variance across all the observations.  When the units are polls, we know immediately that this assumption will be violated.  The error in a poll in inversely proportional to its sample size.  The “margin of error” that pollsters routinely report is twice the standard error of estimate evaluated at 50%, the worst case with the largest possible variance.  That comes from the well-known statistical formula

SE(p) = sqrt(p[1-p]/N)

where N is the sample size.  This formula reaches its maximum at p=0.5 (50%) making the standard error 0.5/sqrt(N).

Weighted least squares adjusts for these situations where the error term has a non-constant variance (technically called “heteroskedasticity”). To even out the variance across observations, each one is weighted by the reciprocal of its estimated standard error.  For polling data, then, the weights should be proportional to the reciprocal of 1/sqrt(N), or just sqrt(N) itself. I thus weight each observation by the square root of its sample size.

More intuitively we are weighting polls based on their sample sizes.  However, because we are first taking the square roots of the sample sizes, the weights grow more slowly as samples increase in size.

Iowa: So Many Polls. So Few Respondents.

Pollsters have conducted over 44,000 interviews among Iowa’s 160,000 Republicans, but they probably interviewed just 15,000 unique people.  A majority of those polled took part in at least three interviews over the course of the campaign.

It seems like new Presidential polling figures are released every day.  We generally talk about each new poll as a unique snapshot of the campaign with some fresh sample of a few hundred caucus-goers.  That concept of polls might apply to national samples, but when polling in states as small as Iowa and New Hampshire, the number of eligible respondents is not that large compared to the number of interviews being conducted.

Making the problem worse is the falling “response rate” in polling, the proportion of eligible respondents who complete an interview.  Mobile phones, caller-ID, answering machines, all have enabled potential respondents to avoid the pollster’s call.  Pew reports that response rates have fallen from 21 percent to 9 percent just between 2006 and 2012.  If we assume a response rate of ten percent, only some 16,000 of Iowa’s 160,000 eligible Republican caucus-goers might have agreed to take part in a poll.

Huffington Post Pollster lists a total of 94 polls of Republican caucus-goers through today, January 31, 2016, constituting a total of 44,433 interviews.  I will use this figure to see how the composition of the sample changes with different response rates.1

How large is the electorate being sampled?

Around 120,000 people participated in the Republican caucuses in 2008 and 2012.  While some observers think turnout in 2016 could be higher because of the influx of voters for Donald Trump, I have stuck with the historical trend and estimate Republican turnout in 2016 at just under 124,000 citizens.

To that baseline we have to add in people who agree to complete an interview but do not actually turn out for the caucuses.  In my 2012 analysis I added 20 percent to the estimated universe to account for these people, but recent findings from Pew suggest 30 percent inflation might be more accurate.  With rounding, I will thus use 160,000 as my estimate for the number of Iowa Republicans who might have been eligible to be polled about the 2016 Republican caucuses.

How low is the response rate?

Most of those 160,000 people will never take part in a poll.  Pew estimated 2012 “response rates,” the proportion of eligible respondents who actually complete an interview, in the neighborhood of 9 percent.  To see what this means for Iowa, here is a table that presents the average number of interviews a cooperating respondent would have conducted during the 2016 campaign at different response rates.  At a ten percent response rate like Pew reports, the 16,000 cooperating prospects would each need to complete an average of 2.78 interviews to reach the total of 44,433.

table-estiimated-respondents-iowa-rep

How many people gave how many interviews?

Finally, I’ll apply the Poisson distribution once again to estimate the number of people being interviewed once, twice, three times, etc., to see the shape of the samples at each response rate.

iowa-rep-sample-rates2

Even if everyone cooperates, random chance alone would result in about 13 percent of respondents being interviewed at least twice.  When the response rate falls to 10 percent, most respondents are being interviewed three or four times, with fifteen percent of them being interviewed five times or more.  Even with a 20 percent response rate, about double what Pew reports, a majority will have been interviewed at least twice.

Certainly someone willing to be interviewed three, four, five times or more about a campaign must have a higher level of interest in politics than the typical Iowa caucus-goer who never cooperates with pollsters.  That difference could distort the figures for candidate preferences if some candidates’ supporters are more willing to take part in polls.

Basing polls on a relatively small number of cooperative respondents might also create false stability in the readings over time.  Polling numbers reflect more the opinions of the “insiders” with a strong interest in the campaign and may be less sensitive to any winds of change.  We might also imagine that, as the campaign winds down and most everyone eligible has been solicited by a pollster, samples become more and more limited to the most interested.

Overarching all these findings remains the sobering fact that only about one-in-ten citizens is willing to take part in polling.  Pollsters can adjust after the fact for any misalignments of the sample’s demographics, but they cannot adjust for the fact that people who participate in polling may simply not represent the opinions of most Americans.  We’ll see how well the opinions of those small numbers of respondents in Iowa and New Hampshire match the opinions of those states’ actual electorates on Primary Day.

 


1For comparison, Pollster archives 65 polls for the 2012 Iowa Republican caucuses totalling 36,300 interviews.  The expanded demand for polls has increased their number by 45 percent and increased the number of interviews conducted by 22 percent in just one Presidential cycle. (To afford polling at greater frequencies, the average sample size has fallen from 558 in 2012 to 473 in 2016.)

 

Technical Appendix: Modelling Senatorial Elections

In an effort to examine how the 2016 Senatorial elections might turn out, I have been estimating some simple models of Senate elections using aggregate data from 1946 to 2014.[1]   For my dependent variable I have chosen to use the (logit of the) total vote for Senate candidates in the President’s party. This removes party labels from the analysis and treats the two parties symmetrically. I conduct some “regression experiments” of this measure using three types of predictors.

One type represents the electoral setting.  Is this an on-year or off-year election?  And, in on-years, is the incumbent President running for re-election?  Alone these two factors account for over twenty percent of the variance in incumbent support, with President re-election bids having by far the greatest impact.  This results for this model appears in the left-hand column of the table below.  The remaining columns add additional explaatory factors to the basic political environment.

Senate-Election-Models2

Right away we see that when a President is running for re-election, his co-partisans in the Senate have a much greater chance of winning.  Because these are measured as logits, values below zero correspond to a percentage value below fifty, while positive logits imply values above fifty percent.  Without the President running, the model has a slight negative prediction equal to the constant term.  In Presidential re-election years that negative value turns positive being the sum of the constant (-0.08) and the effect for relection years (0.14).  By this model the Democrats in the Senate will be short that extra boost that comes from having an incumbent seeking re-election.[2]

One reason the Democrats are optimistic about their chances to retake the Senate in 2016 is that these seats were last contested in the Republican wave election of 2010.  This year those seats will be fought in the context of a Presidential election with its greater visibility and higher turnout.  I have measured this effect by including the vote from the election held six years prior.  In principle, we should expect a negative effect, as “regression toward the mean” sets in.  Republicans perhaps won by unexpectedly larger margins in 2010 so their margins should fall closer to the average this time around.

Adding the prior vote for each Senatorial “class” improves the predictive power of this simple model slightly, but the coefficient itself fails to reach significance.  It has the expected negative sign, however, and will prove much more significant in further reformulations.

The third column adds the effect of presidential approval, a common predictor in models of voting for the House.  For the Senate it turns out to have a more subtle effect.  Presidential approval has the expected positive effect on votes for the incumbent’s Senators, but only in off-year elections.  A long literature in political science has examined off-year elections espousing a variety of theories to explain the President’s usual losses.  I generally adhere to the “referendum” school of thought on off-years, that they give the public a chance to express their approval or disapproval of a President mid-way through his term.  That presidential approval matters not in years when a Presidential election is being held reinforces my belief in the referendum explanation for off-year voting.

The last explanatory factor is the year-on-year percent change in real disposable personal income.  Political scientists and economists have included pretty much every economic variable that might affect election outcomes in their models of presidential and congressional voting, but the one factor that often proves significant is personal income. Adding it to the model increased “explained” variance by over ten percent.

Here is a chart showing the expected national vote for the Senate Democrats as a function of the size of the increase in personal income heading into the election. They do get a small positive compensation from having lost the popular vote for these seats six years before.  However, without the re-election boost, even reaching an absurdly high four percent growth real income would not push the expected vote for the Democrats over the 50 percent line.  In the best years of the Obama Administration, real income growth reached slightly over two percent, which would give the Democratic candidates for Senate about 48 percent of the vote.

pred-econ-incum2

I admit there are many shortcomings to this analysis.  First, I only account for 45 percent of the variance in Senatorial voting, and this only at the national level.  Senatorial campaigns are played out in states, where local forces can exert a major role.  With only 33 or 34 seats up in each election, idiosyncratic factors can swing a few decisive states.

If, as the model predicts, the Democrats should expect to win about 48 percent of the popular vote for Senate, they can nevertheless still win back the Senate.  They might follow the path taken by the Republicans in 2004 and most dramatically in the first off-year election under Ronald Reagan in 1982.  In both those years the Republicans won a majority of the contested Senate seats with a minority of the popular vote.

senate-seats-votes-4

Technical Appendix: Comparing Trump and Sanders

trump-sanders3

The results above come from the 145 national Republican primary polls as archived by Huffington Post Pollster whose fieldwork was completed after June 30, 2015, and on or before January 6, 2016.  I started with July polling since the current frontrunner, Donald Trump, only announced his candidacy on June 16th. For Bernie Sanders I used the 155 national polls of Democrats starting after April 30th, the day Sanders made his announcement.

The models I am using are fundamentally similar to those I presented for the 2012 Presidential election polls and include these three factors:

  • a time trend variable measured as the number of days since June 30, 2015;
  • a set of “dummy” variables corresponding to the universe of people included in the sample — all adults, registered voters, and “likely” voters as determined by the polling organization using screening questions; and,
  • a set of dummy variables representing the method of polling used — “live” interviews conducted over the phone, automated interviews conducted over the phone, and Internet polling.

Trump’s support is best fit by a “fourth-order polynomial” with a quick uptick in the summer, a plateau in the fall, and a new surge starting around Thanksgiving that levelled off at the turn of the year. Support for Sanders follows a “quadratic” time trend.  His support has grown continuously over the campaign but at an ever slower rate.

Of more interest to students of polling are the effects by interviewing method and sampled universe.  Trump does over four percent worse in polls where interviews are conducted by a live human being.  Sanders does worse in polls that use automated telephone methods.  The result for Trump may reflect an unwillingness on the part of his supporters to admit to preferring the controversial mogul when talking with an actual human interviewer.

Sanders does not suffer from this problem, but polls relying on automated telephone methods show results about four percent lower than those conducted by human interviewers or over the Internet (the excluded category represented by the constant).  Since we know that Sanders draws more support from younger citizens, the result for automated polling may represent their greater reliance on cell phones which cannot by law be called by robots. This result contradicts other studies by organizations like Pew that find only limited differences between polls of cell phone users and those of landline users. Nevertheless when it comes to support for Bernie Sanders, polls that rely exclusively on landlines appear to underestimate his levels of support.

Turning to the differences in sampling frames, we find that polls that screen for “likely” voters show greater levels of support for Bernie Sanders than do polls that include all registered voters or all adults.  Trump’s support shows no relationship with the universe of voters being surveyed.  Both candidates, as “insurgents,” are thought to suffer from the problem of recruiting new, inexperienced voters who might not actually show up at the polls for primaries and caucuses.  That seems not to be an issue for either man, and in Sanders’s case it appears that the enthusiasm we have seen among his supporters may well gain him a couple of percentage points when actual voting takes place.

Finally it is clear that Trump’s polling support shows much more variability around his trend line than does Sanders’s. The trend and polling methods variables account for about 59 percent of the variation in Trump’s figures, but fully 72 percent of the variation for Sanders.

Honey, It’s the Pollster Calling Again!

Back in 1988 I had the pleasure of conducting polls during the New Hampshire primaries on behalf of the Boston Globe.  The Globe had a parochial interest in that year’s Democratic primary because the sitting Massachusetts governor, Michael Dukakis, had become a leading contender for the Presidential nomination.  The Republican side pitted Vice-President George H. W. Bush against Kansas Senator Bob Dole, the upset winner of the Iowa caucuses a week before the primaries. Also in the race were well-known anti-tax crusader Jack Kemp and famous televangelist Pat Robertson.  Bush had actually placed third in Iowa behind both Dole and Robertson.

We had been polling both sides of the New Hampshire primary as early as mid-December of 1987, but after the Iowa caucuses, the pace picked up enormously. Suddenly we were joined by large national polling firms like Gallup and media organizations like the Wall Street Journal and ABC News.  As each day brought a new round of numbers from one or another pollster, we began to ask ourselves whether we were all just reinterviewing the same small group of people.

Pollsters conducting national surveys with samples of all adults or all registered voters never face this problem.  Even with the volume of national polling conducted every day, most people report never being called by a pollster.  In a population of over 240 million adults, the odds of being called to participate in a survey, even ones with a relatively large sample like 2,000 people, are miniscule.  That is still true even if we account for the precipitous decline in the”response rate,” the proportion of households that yield a completed interview.  A wide array of technological and cultural factors have driven survey response rates to historic lows over the past few years as this table from Pew shows clearly:

In 2012, fewer than ten percent of households were represented in a typical poll.  Still, even at such a low response rate, the huge size of the United States population means that any individual has only a tiny chance of being selected from a sampling universe numbers of 24 million homes.  Even for a large survey of 2,000 people, the chance of any individual household being selected is a mere 0.000008.

Those odds change drastically when we narrow the universe of eligible people to “likely” voters in an upcoming New Hampshire Republican primary.  Even including people who claim they will vote but later do not, the total universe of eligible respondents in 2012 was probably just 300,000 people.   To reach that figure I started with the total of 248,485 ballots cast in the Republican primary.  To those voters we need to add the other people who reported that they would take part in the primary but did not actually turn out on Primary Day.  For our purposes, I have used an inflation factor of 20% which brings the estimated the total number of self-reported likely Republican primary voters to 298,182 people.  I rounded that figure up to 300,000 in the tables below.

Over a dozen polling organizations conducted at least one survey in New Hampshire according to the Pollster archive for the 2012 Republican primary.  In all there are 55 separate polls in the archive representing a total of  36,839 interviews, or about 12% of the universe of likely voters.  If all 300,000 likely Republican primary voters had been willing to cooperate with pollsters in 2012, about one in every eight of them would have been interviewed.  If we choose a much more realistic response rate like ten percent, there are actually fewer cooperating likely voters than the total number of surveys collected, so some respondents must be contributing multiple interviews.  Can we estimate how many there are?

It turns out the chances a person will be interviewed, once, twice, etc., or never at all can be modelled using the “Poisson distribution.”  Usually a statistical distribution relies on two quantities, its average and its “variance,” but the Poisson distribution has the attractive feature that the mean and variance are identical.  Thus we need only know the average number of interviews per prospect to estimate how many people completed none, one, two, or more interviews.  Here are estimates of the number of interviews conducted per potential respondent at different overall cooperation rates.  At a 20 percent cooperation rate, only 60,000 of the 300,000 likely voters are willing to complete an interview.  Dividing the number of interviews, 36,839, by the estimated number of prospects gives us an average figure of 0.614 interviews per prospect.

how-often-republicans-polled-table1

Now we plug those values into the Poisson formula to see how many people are interviewed multiple times during the campaign.

how-often-republicans-polled-table2

In an ideal world where every one of the 300,000 likely primary voters is willing to be interviewed, 88.4% of them would never be interviewed, 10.9% would complete one interview, and 0.7% would be interviewed twice.  If response rates fall to  8-10%, only 20-30% of likely voters are never interviewed.

Though only a few prospects would be interviewed more than once in the ideal, fully-cooperative world, at more realistic response rates closer to what Pew reports, many people were interviewed multiple times in the run up to the 2012 primary.  If only eight percent of likely voters were willing to complete an interview, about a quarter of the prospects were interviewed twice, and one in five of them were interviewed at least three times.

We can use those estimates to see how the size and composition of the actual survey samples change as a function of response rate.

sample-size-and-composition2

At 100% cooperation, obtaining nearly 37,000 interviews from 300,000 people means a small number, about 2,000 people, would be interviewed twice merely by random chance.  So those 37,000 interviews represented the opinions of  32,000 people who were interviewed once, and another 2,000 people interviewed twice.  As response rates fall, the total number of unique respondents, the height of each bar, declines, with a larger share of interviews necessarily coming from people interviewed multiple times.  At a 10% response rate the proportion of people interviewed multiple times just about equals the proportion of people interviewed only once.  Below that rate the proportion of people interviewed only once declines quickly.

Technical Appendix: The Model for Health Insurance Coverage

Dependent variable: Uninsured Adults 19-64 without Dependents
                    2009-2010 data for 49 states except MA

All variables are proportions and measured as "logits."
                coefficient   std. error   t-ratio   p-value 
  -----------------------------------------------------------
  const           1.22691     0.347772      3.528    0.0010   ***
  lgt_U3          0.728698    0.0918831     7.931    5.94e-10 ***
  lgt_Extr+Cons   0.280012    0.0747943     3.744    0.0005   ***
  lgt_UnPriv     −0.164442    0.0460793    −3.569    0.0009   ***
  lgt_NotEng      0.119194    0.0296539     4.020    0.0002   ***
  lgt_McCain      0.217339    0.0810080     2.683    0.0103   **

S.E. of regression   0.141875   Adjusted R-squared   0.726246

The predictors include:

U(3) – the state’s U(3) unemployment rate;
Extr+Cons – the proportion of the state’s workforce in mining, logging, and construction;
UnPriv – the proportion of the state’s private workforce that is unionized;
NotEng – the proportion of the state’s citizens who have no English language skills;
McCain – the proportion of the state’s 2008 Presidential vote won by Republican John McCain

Sources:
U(3) – Bureau of Labor Statistics, Alternative Measures of Labor Underutilization for States, 2010;
Extr+Cons – Bureau of the Census, Table 631. Employees in Nonfarm Establishments–States: 2010
UnPriv – Barry Hirsch and David Macpherson, Union Membership and Coverage Database from the CPS
NotEng – Census Bureau, Table 4A.  Population 5 Years and Over Speaking a Language Other Than English at Home by English-Speaking Ability by State: 2007; percent responding “not at all” to speaking ability;
McCain – David Leap, Atlas of U.S. Presidential Elections, Results for 2008

Technical Appendix: The Model for Voting on the Amash Amendment

I took a more conventional, less spatially-oriented approach than that used on Voteview.  I used logit analysis to estimate a model using the two DW-NOMINATE scores and a variety of dummy variables to measure other possible influences like when the Member was elected and the committees on which the Member serves.  The dataset consists of the 342 Members who served in the 112th Congress (and thus for whom the DW-NOMINATE scores are publicly available) and voted on the Amash Amendment.

It soon became clear that the relationship between support for the amendment and ideological position was more complex than a simple linear model would predict.  What piques our interest in this vote is how the two parties divided over the amendment more than the division between the parties themselves.  I thus included separate terms for each ideological dimension within each party.  After doing so, including a dummy variable for party has no independent effect.

I also examined various measures of seniority in an effort to see whether there is any truth to pundits’ observation of a generational divide between older and newer members over the issue of domestic surveillance.  It turns out that the generational divide is especially pronounced for Republicans.  Those who were first elected to Congress in 2008 or 2010 were more likely to vote for the Amash amendment regardless of ideology.  For the Democrats, the results are more muted.  Those Democrats who were voted into office alongside President Obama in 2008 were especially likely to oppose him on domestic surveillance.  However the few Democrats who were first elected in 2010 were no more likely to support the amendment than Democrats elected in 2004 or before.

Finally there is a strong effect for committee memberships.  Members who serve on the House Armed Services or Select Intelligence committees were much more likely to vote against the Amash amendment.  The effect was especially pronounced for members of the Intelligence Committee.

Favored Amendment to Cut Funds for NSA Metadata Collection
Logit, 342 observations

             coefficient   std. error      z      p-value 
  --------------------------------------------------------
  Constant    −4.11799      0.699829    −5.884    4.00e-09 ***

DW-Nominate Scores

Dimension 1 ("Liberal-Conservative")
  Dems       −12.2661       1.82358     −6.726    1.74e-11 ***
  Reps         5.13557      0.972324     5.282    1.28e-07 ***
Dimension 2 (?)
  Dems        −0.879454     0.662692    −1.327    0.1845  
  Reps        −0.302442     0.642723    −0.4706   0.6380  

First Elected to Congress
  Dems 2008     3.25482      1.07614      3.025    0.0025   ***
  Reps 2008-10  0.751204     0.334374     2.247    0.0247   **

Committee Memberships
  Armed Srvcs  −1.03730      0.453937    −2.285    0.0223   **
  Intel        −3.15582      0.992305    −3.180    0.0015   ***

Estimated R2 (McFadden) = 0.276671
Number of cases 'correctly predicted' = 258 (75.4%)

While we have come some way to understanding the factors motivating Members’ votes on the Amash amendment, the model still cannot account for the votes of about a quarter of the House.

Was Gerrymandering the Culprit? — Part I

Results updated on November 23, 2012, with final Congressional results for 434 races; NC 7th is still undecided.

It is now time to put some of the findings from earlier postings together and try to determine the extent of gerrymandering in the 2012 Congressional Elections.

Three factors should influence the number of House seats a party wins in a state Congressional election:

I have taken two separate measurements of the first item, the relationship between seats and votes.  I have calculated both a longitudinal measurement using elections from 1942 on, and a cross-sectional measurement using state results for 2012.  In both approaches I estimate the coefficients α and β of this “logit” model:

log(Democratic Seats/Republican Seats) = α + β log(Democratic Votes/Republican Votes)

The two models produce very different estimates for α, the seat “bias,” because it varies historically.  However the two estimates for β are nearly identical. The longitudinal estimate was 1.92; the cross-sectional estimate is 2.08.  For simplicity, I will just use two for the value of β.  (Mathematically, that implies that the ratio of Democratic to Republican seats varies in proportion to the square of the ratio of their votes.)

In this Technical Appendix, I explain why, if the Democrats win exactly half the vote, the only way they can win exactly half the seats is if the “bias” term α is zero. We can use this fact to create an “unbiased” distribution of seats.  I simply substitute two for β and apply it to the logit of the state-wide Democratic vote for Congress.  I will call this the “unbiased allocation.”  For each state I compare this estimate to the number of seats the Democrats actually won. Here are the results:

I have included all states where the difference between the predicted and actual number of Democratic seats was at least 0.7.  The state that gave us the word “gerrymander,” Massachusetts, shows the largest pro-Democratic deviation.  While the unbiased allocation model would award the Democrats only seven or eight of the nine seats in that state, not one Republican represents the Commonwealth of Massachusetts in Congress. The other state where Democrats did better than expected is Arizona, where they won a majority of the state’s Congressional seats with a minority of the popular vote.  Arizona had two of the closest races in the country, and they both fell to the Democrats by slim margins. All told, eight states including four New England states, have new Congressional delegations with an “extra” Democratic member in their numbers.

Many more states deviate from the unbiased allocation on the Republican side, with half-a-dozen states showing a pro-Republican bias of two, three, or, in the case of Pennsylvania, four seats. All told, sixteen states met our 0.7 criterion.  Compared to an unbiased allocation, the results in these sixteen states probably cost the Democrats 28 seats.  When we subtract out the eight extra seats the Democrats won in the pro-Democratic states, we get a net Democratic deficit in 2012 of some twenty seats compared to an “unbiased” allocation based solely on the popular vote for Congress in each state.

Before we start attributing all those seats to Republican gerrymandering, we first need to consider what other factors might influence the translation of Democratic votes to Democratic seats.  There is good reason to believe that the geographic distribution of Democratic voters by itself creates a pro-Republican bias when district lines are drawn.

 Accounting for Geography