How Undecideds Split – Evidence from 1980-2004

Undecideds split evenly using data from five elections

I extended the analysis of undecided voters described in the preceding post using the American National Election Studies for all Presidential elections starting in 1980 where an incumbent was running for re-election.  There is no evidence that undecideds broke disproportionately for the challenger in any of these elections.  Reagan did best among the challengers, picking up 43% of the undecided vote compared to Carter’s 34% in 1980.  However this advantage probably had as much to do with Reagan himself as with his challenger status that year.  Four years later when Reagan was the incumbent he outdrew Mondale among undecideds 47-41.  In the three other elections undecideds were just as likely to give their votes to the incumbent as to the challenger.

How Undecideds Split – Evidence from 2004

No evidence they disproportionately prefer the challenger

Some people posting over on Nate Silver’s 538 blog keep making the claim that undecided voters usually split 2:1 in favor of the challenger when they get to the polls.  One can imagine a good reason for this pattern, namely that the incumbent is much better known than the challenger.  By that argument, people who are still undecided in the last days of the campaign probably have a stronger anti-incumbent attitude than a pro-challenger one.  When forced to make a decision, the more powerful negative attitude prevails, and the voter opts for the challenger. (There is also some evidence from psychology that negative attitudes may influence behavior more than positive ones.  The emphasis on negative campaigning seems to take this view.)

One could, of course, make the counter-argument, that undecideds may be upset with the incumbent but uncomfortable with the challenger.  People with that balance of opinion might eventually choose the “devil they know” over the one they know not.

Unfortunately the proponents of the view that the challenger is the ultimate beneficiary of undecideds rarely if ever muster any evidence for this view.  To test this hypothesis requires identifying people who were undecided before an election then later asking them how they actually voted. For this you require what is called a “panel” study where the same people are interviewed both before and after an election.

Now, as it happens, the academic political science community has been conducting just this sort of study for decades.  Most of the American National Election Studies include a pre-election and post-election wave of interviewing.  I used the 2004 dataset since that is the most recent election where we had an incumbent President running for re-election. The trajectory of the 2012 election also seems  quite similar to eight years ago.  Here are the results:

The table presents the same data in two different ways.  The top table includes all 1,211 respondents who completed an interview in the pre-election wave conducted mostly in September and October. I placed all respondents who did not choose Bush, Kerry, or Nader in the pre-election wave into a residual “undecided” category.  (The results are unchanged if only pure undecideds, people who said “don’t know” when asked their preference, are considered.)

The top table shows that it was harder to secure a second interview with people who preferred Nader or who were undecided in the first wave.  Nearly half those respondents did not complete a post-election interview, compared to about 30% for major party voters.  Since all but one of the respondents in the post-election wave reported having voted for President, we might expect people who failed to complete a second interview were also less likely to have voted.  So one plausible suspicion about undecided voters is supported by these data, that undecideds are less likely to vote than decideds.

However, there is no support for the notion that undecideds split disproportionately for the challenger.  By my definition of undecideds, Bush and Kerry each received 38% of their votes.  If we limit the definition of undecided to only people who explicitly said “don’t know” to the pre-election preference question, fifteen such undecided people said they had voted for Kerry versus fourteen who said they had voted for Bush.

I note with amusement that of the fifteen people who said they would vote for Nader in the pre-election wave not one of them report casting a ballot for him at the polls.  Kerry got four of their votes; Bush got two.  Half of them could not be reinterviewed.

(Technical note:  For the budget conscious like me, it is a lovely thing that the ANES studies are freely distributed as SPSS “portable” files which can be read directly by the open-source SPSS clone, PSPP.)


The Model for Ohio

No trends found for Ohio, just a two-point debate effect.

I usually do not publish raw regression results on the main pages of this blog, relegating them instead to the Technical Topics category.  However rather than spend time building an uninformative graph of the Ohio campaign, I’ll just report the results here and explain their meaning.

I have applied the simple trends and house effects model that I developed for national polling to the results for Ohio.  Once again I am using the Pollster archive and including all polls of “likely” voters conducted after June 30th.  I also included a variable for the first Presidential debate and dummies for the pollsters to measure house effects.  I get these results:

Model 15: OLS, using observations 1-66
Dependent variable: Dem_Lead

                 coefficient   std. error   t-ratio   p-value 
  const           5.31164      1.00663       5.277    2.21e-06 ***
  DaysBefore     −0.0235519    0.0147563    −1.596    0.1161  
  Debate1        −2.23855      0.845398     −2.648    0.0105   **
  Rasmussen      −2.65019      0.821597     −3.226    0.0021   ***
  Gravis Mktg    −2.28954      0.925727     −2.473    0.0164   **
  ARG            −2.39894      1.26821      −1.892    0.0637   *
  NBC/WSJ/Marist  3.37662      1.27493       2.648    0.0105   **
  Wenzel/CtznsU  −5.55798      2.20545      −2.520    0.0146   **
  Qunn/NYT/CBS    3.74099      1.31111       2.853    0.0061   ***
  WaPo            3.72465      2.17768       1.710    0.0927   *

Mean dependent var   2.939394   S.D. dependent var   2.833203
Sum squared resid    250.9900   S.E. of regression   2.117065
R-squared            0.518953   Adjusted R-squared   0.441642
F(9, 56)             6.712523   P-value(F)           1.86e-06

First, we see at best only a very weak trend in the President’s favor over the course of the summer, one that fails to prove significant even at the 0.10 level.  The best interpretation is that there have been no discernible trends in Ohio since June, just a one-time drop of two points in the President’s lead from a bit over five percent before the first Presidential debate to three since then.

Seven pollsters showed statistically significant deviations from the consensus for Ohio though only Rasmussen and ARG also appeared on the list for national polls. The measured effects for those two organizations in Ohio are both slightly larger than the effects measured in national polling.

Polls by the three major media organizations — NBC/WSJ/Marist, Quinnipiac/NY Times/CBS, and the Washington Post — all had results over three points more Democratic than the consensus.  While these organizations are often criticized as being “in the tank” for Obama (leaving aside the Wall Street Journal, of course), I don’t find any such partisan bias in their national polling.  The poll with the largest outlier, Wenzel/Citizens United, had a obvious partisan sponsor and reported results consistent with its Republican ideology.


Partisan House Effects in National Polls

Until now I have been using data from the polling archives at RealClearPolitics for this blog.  Today I began looking at the  larger archive of polls at the Huffington Post’s Pollster site.  One nice feature of this site is that they offer a copy of their data in a format (“.csv”) that can be easily imported into spreadsheets or the gretl econometrics package.

I produced the table above from a regression of the size of President Obama’s lead over Governor Romney using the 146 national likely-voter polls in the Pollster database with fieldwork starting after June 30th and ending October 28th.  Along with “dummy variables” to capture any differences among the 37 polling organizations represented in this sample I included a few other important predictors:

  • the number of days remaining between the end of fieldwork and Election Day, November 6th;
  • a dummy variable for polls whose fieldwork began after the first Presidential debate on October 3rd;
  • an “interaction” term that is the product of these last two variables to allow the estimated trend line to differ before and after the debate; and,
  • dummy variables for the method of polling using Pollster’s categories of in-person telephone interviews, automated telephone interviews, and Internet interviews.

As you might imagine, only a few of the polling organizations diverge so markedly from the consensus that we can statistically measure any effect for them.  I narrowed down the search to the ten organizations that appear in the table above which have discernible partisan house effects.

Five organizations show “statistically significant” house effects at conventional levels (p<0.05).  Three report figures with a measurable pro-Republican bias, Gallup, ARG, and Rasmussen, while two, JZAnalytics and the openly partisan DemocracyCorps, report figures favorable to the President.  Gallup’s three recent likely-voter polls diverged so substantially from the polling consensus that it tops our list with an estimated four-point tilt for Romney.  ARG and Rasmussen are often suspected of GOP leanings, and this analysis estimates that their polls lean about two percent more Republican than the model’s consensus.  Polls conducted by JZAnalytics, either alone or with co-sponsors Newsmax and the Washington Times, report results 2.6% more Democratic than the consensus.  Polls by DemocracyCorps run a bit over two percent more Democratic.

Rasmussen’s pro-Republican leaning is especially important when you consider how it dominates the polling landscape.  There are 39 Rasmussen polls in the sample, or 27% of these 146 polls from the Pollster database. I’ve looked at what the polling consensus would be like in a world without Rasmussen in this post.

Another five organizations reported results that deviated sufficiently from the consensus that they met the criterion of statistical significance at the 10% level.  Three of these come from organizations that conducted only one poll in this period, and all three had bias figures approaching four percent.  NPR’s two polls averaged 2.8% more favorable to the President, while the YouGov/Economist polls lead a bit under two points in the President’s direction.

I found no systematic differences by type of interviewing method used.  Interviewing by live telephone, automated telephone, or the Internet does not produce results that systematically favor one candidate over the other.  The fact that automated interviewing shows no consistent effects suggests that including cell phones in the sampling frame may not matter at all.  The firms like Rasmussen and PPP that use automated interviewing are banned from calling cell phones by Federal rule.  Yet there is no evidence of a bias in automated interviewing where cell phones are excluded.

The full results appear here.

The Race Without Rasmussen

Rasmussen Reports accounts for 38 of the 141 recent likely-voter polls in my dataset from Pollster or fully 27% of all the observations.  Earlier I have shown that Rasmussen has a pro-Romney “house effect” of over two percent.  Given the combination of Rasmussen’s dominance and its pro-Republican bias, we might wonder what the polling consensus would be if there were no Rasmussen Reports.  Here is the graph from my most recent post after excluding Rasmussen’s polling from the sample.

Without Rasmussen, we do not see evidence that the campaign has stagnated.   We do find that the other pollsters recorded a somewhat larger drop for President Obama after the first debate putting him a point behind Governor Romney.  However, a model that excludes Rasmussen shows no evidence of stagnation but keeps the President on his slow upward trend with a slim margin of 0.6% in the polls on Election Day.  Here is the graph from the earlier post with Rasmussen included for comparison.

Full regression results are here.

Is it 2004 all over again?

Using the 141 national likely-voter polls from the Pollster database since July 1st we can examine how the dynamics of the race have changed since the first Presidential debate.  So far the pattern looks very reminiscent of the period after the first Presidential debate in 2004 between George W. Bush and John Kerry.  In that election, support for Bush dropped about five points after the first debate, leaving the President with a two-point lead that he maintained up until Election Day..  The 2012 Presidential campaign seems to be following the same trajectory except that President Obama no longer has a lead to maintain.

The polls indicate that support for the President fell by about four points immediately after the first debate leaving him essentially tied with Mitt Romney.  If the President had reverted to trend at that point and followed the dotted line, he would have gained back a lead in the polls of about 1.3% by Election Day.  So far, however, the polling suggests no such return to form.  On the basis of the polling since the first debate, the candidates look to remain roughly tied from now until the election.

(These data include the adjustments for “house effects” described in my previous post.  The results for the full regression model are here.  The later debates show no measurable effects.)


Polling Results in the Competitive States

President leads in three states that would cement an Electoral College victory

The Electoral Vote application at the New York Times has allocated all but nine of the states to either Barack Obama or Mitt Romney.  In the chart below I have tabulated the number of likely-voter polls conducted in each of those nine states between June 1st, when Romney lined up enough delegates to win the nomination, and October 19th.   Rather than examine the size of the difference between the candidates, I took a simpler approach and just counted up the number of polls in which the President held a lead over Mr. Romney.  I then used a statistical test called “chi-squared” to see whether the President led in “too many” polls for it to just be the result of chance.


So, to take the case of Nevada, there have been 20 likely-voter polls conducted between June 1st and October 19th, and President Obama has led in 18 of them.  If the race were really tied, each candidate should have led in ten of those twenty polls on average.  Statistics enables us to ask whether the President’s tally of eighteen polls is large enough to reject the notion that the President is tied or behind in Nevada.  The calculated statistic in the table above is 12.80.  The column headed with “p” reports the probability that the race is tied.  In this case there are only three chances in 10,000 the President actually trails in Nevada if he is leading in eighteen out of twenty polls.

Two other states besides Nevada also appear firmly in President Obama’s coalition, Wisconsin and Ohio.  If the President does indeed win these three states, Ohio, Wisconsin and Nevada, and retain his leads in “leaning” states like Michigan and Pennsylvania, he will win precisely 271 Electoral Votes, or one more than he needs to gain re-election.  Replacing Nevada with New Hampshire would leave the President one EV short at 269.

The bottom half of the chart replicates the same analysis using only polls taken since September 1st.  The President is still a significant favorite in all three states based on just the more recent polling as well as all the polls since June 1st.


Polling Update

A bunch of new polls today, October 23rd, show more positive results for the President than we have seen recently. Our trend model still detects no evidence that the race has stagnated like it did in 2004, but the President has not rebounded since his drop in the polls after October 4th.  As of now, he is predicted to hold an 0.8% lead on Election Day.

If the Gallup Tracking outlier (the -7 value) is excluded, the predicted lead for the President jumps to 1.1%.

What can we learn from the Kerry/Bush 2004 debate effect?


This graph presents the pre-debate and post-debate trends in polls conducted during the 2004 election.  RCP does not report whether the data represent registered voters or only likely voters for that year, so I have had to include both types of polls in the chart above.  The change in the campaign’s trajectory after the September 30th, 2004, first debate was dramatic.  Mr. Kerry erased nearly nine points from President Bush’s predicted margin in the polls on Election Day.  Kerry also halted the President’s rapid gains in the polls; the daily rate of increase for George W. Bush between July 4th and September 30th was 0.12 percentage points per day, twice the rate scored by President Obama in 2008.

Unfortunately for Senator Kerry he never advanced any further in the  polls after that.  Kerry stemmed Bush’s advance but could not manage to pick up that last remaining two percent that would have brought him even with the President.  Polls after the first debate average Bush+2 with no visible or statistically measurable trend.  According to the 2004 polls, the campaign stagnated after the first debate.

Whether we might expect a similar pattern in 2012 cannot yet be determined.  All we can tell at the moment is that Mr. Romney picked up almost as much ground (+6%) as Kerry did eight years ago (+8%).  There is not yet enough data to tell whether the campaign will stagnate at the current small Romney lead, or whether the President might recover from the blow in the weeks ahead.