Iowa: So Many Polls. So Few Respondents.

Pollsters have conducted over 44,000 interviews among Iowa’s 160,000 Republicans, but they probably interviewed just 15,000 unique people.  A majority of those polled took part in at least three interviews over the course of the campaign.

It seems like new Presidential polling figures are released every day.  We generally talk about each new poll as a unique snapshot of the campaign with some fresh sample of a few hundred caucus-goers.  That concept of polls might apply to national samples, but when polling in states as small as Iowa and New Hampshire, the number of eligible respondents is not that large compared to the number of interviews being conducted.

Making the problem worse is the falling “response rate” in polling, the proportion of eligible respondents who complete an interview.  Mobile phones, caller-ID, answering machines, all have enabled potential respondents to avoid the pollster’s call.  Pew reports that response rates have fallen from 21 percent to 9 percent just between 2006 and 2012.  If we assume a response rate of ten percent, only some 16,000 of Iowa’s 160,000 eligible Republican caucus-goers might have agreed to take part in a poll.

Huffington Post Pollster lists a total of 94 polls of Republican caucus-goers through today, January 31, 2016, constituting a total of 44,433 interviews.  I will use this figure to see how the composition of the sample changes with different response rates.1

How large is the electorate being sampled?

Around 120,000 people participated in the Republican caucuses in 2008 and 2012.  While some observers think turnout in 2016 could be higher because of the influx of voters for Donald Trump, I have stuck with the historical trend and estimate Republican turnout in 2016 at just under 124,000 citizens.

To that baseline we have to add in people who agree to complete an interview but do not actually turn out for the caucuses.  In my 2012 analysis I added 20 percent to the estimated universe to account for these people, but recent findings from Pew suggest 30 percent inflation might be more accurate.  With rounding, I will thus use 160,000 as my estimate for the number of Iowa Republicans who might have been eligible to be polled about the 2016 Republican caucuses.

How low is the response rate?

Most of those 160,000 people will never take part in a poll.  Pew estimated 2012 “response rates,” the proportion of eligible respondents who actually complete an interview, in the neighborhood of 9 percent.  To see what this means for Iowa, here is a table that presents the average number of interviews a cooperating respondent would have conducted during the 2016 campaign at different response rates.  At a ten percent response rate like Pew reports, the 16,000 cooperating prospects would each need to complete an average of 2.78 interviews to reach the total of 44,433.

table-estiimated-respondents-iowa-rep

How many people gave how many interviews?

Finally, I’ll apply the Poisson distribution once again to estimate the number of people being interviewed once, twice, three times, etc., to see the shape of the samples at each response rate.

iowa-rep-sample-rates2

Even if everyone cooperates, random chance alone would result in about 13 percent of respondents being interviewed at least twice.  When the response rate falls to 10 percent, most respondents are being interviewed three or four times, with fifteen percent of them being interviewed five times or more.  Even with a 20 percent response rate, about double what Pew reports, a majority will have been interviewed at least twice.

Certainly someone willing to be interviewed three, four, five times or more about a campaign must have a higher level of interest in politics than the typical Iowa caucus-goer who never cooperates with pollsters.  That difference could distort the figures for candidate preferences if some candidates’ supporters are more willing to take part in polls.

Basing polls on a relatively small number of cooperative respondents might also create false stability in the readings over time.  Polling numbers reflect more the opinions of the “insiders” with a strong interest in the campaign and may be less sensitive to any winds of change.  We might also imagine that, as the campaign winds down and most everyone eligible has been solicited by a pollster, samples become more and more limited to the most interested.

Overarching all these findings remains the sobering fact that only about one-in-ten citizens is willing to take part in polling.  Pollsters can adjust after the fact for any misalignments of the sample’s demographics, but they cannot adjust for the fact that people who participate in polling may simply not represent the opinions of most Americans.  We’ll see how well the opinions of those small numbers of respondents in Iowa and New Hampshire match the opinions of those states’ actual electorates on Primary Day.

 


1For comparison, Pollster archives 65 polls for the 2012 Iowa Republican caucuses totalling 36,300 interviews.  The expanded demand for polls has increased their number by 45 percent and increased the number of interviews conducted by 22 percent in just one Presidential cycle. (To afford polling at greater frequencies, the average sample size has fallen from 558 in 2012 to 473 in 2016.)

 

Technical Appendix: Modelling Senatorial Elections

In an effort to examine how the 2016 Senatorial elections might turn out, I have been estimating some simple models of Senate elections using aggregate data from 1946 to 2014.[1]   For my dependent variable I have chosen to use the (logit of the) total vote for Senate candidates in the President’s party. This removes party labels from the analysis and treats the two parties symmetrically. I conduct some “regression experiments” of this measure using three types of predictors.

One type represents the electoral setting.  Is this an on-year or off-year election?  And, in on-years, is the incumbent President running for re-election?  Alone these two factors account for over twenty percent of the variance in incumbent support, with President re-election bids having by far the greatest impact.  This results for this model appears in the left-hand column of the table below.  The remaining columns add additional explaatory factors to the basic political environment.

Senate-Election-Models2

Right away we see that when a President is running for re-election, his co-partisans in the Senate have a much greater chance of winning.  Because these are measured as logits, values below zero correspond to a percentage value below fifty, while positive logits imply values above fifty percent.  Without the President running, the model has a slight negative prediction equal to the constant term.  In Presidential re-election years that negative value turns positive being the sum of the constant (-0.08) and the effect for relection years (0.14).  By this model the Democrats in the Senate will be short that extra boost that comes from having an incumbent seeking re-election.[2]

One reason the Democrats are optimistic about their chances to retake the Senate in 2016 is that these seats were last contested in the Republican wave election of 2010.  This year those seats will be fought in the context of a Presidential election with its greater visibility and higher turnout.  I have measured this effect by including the vote from the election held six years prior.  In principle, we should expect a negative effect, as “regression toward the mean” sets in.  Republicans perhaps won by unexpectedly larger margins in 2010 so their margins should fall closer to the average this time around.

Adding the prior vote for each Senatorial “class” improves the predictive power of this simple model slightly, but the coefficient itself fails to reach significance.  It has the expected negative sign, however, and will prove much more significant in further reformulations.

The third column adds the effect of presidential approval, a common predictor in models of voting for the House.  For the Senate it turns out to have a more subtle effect.  Presidential approval has the expected positive effect on votes for the incumbent’s Senators, but only in off-year elections.  A long literature in political science has examined off-year elections espousing a variety of theories to explain the President’s usual losses.  I generally adhere to the “referendum” school of thought on off-years, that they give the public a chance to express their approval or disapproval of a President mid-way through his term.  That presidential approval matters not in years when a Presidential election is being held reinforces my belief in the referendum explanation for off-year voting.

The last explanatory factor is the year-on-year percent change in real disposable personal income.  Political scientists and economists have included pretty much every economic variable that might affect election outcomes in their models of presidential and congressional voting, but the one factor that often proves significant is personal income. Adding it to the model increased “explained” variance by over ten percent.

Here is a chart showing the expected national vote for the Senate Democrats as a function of the size of the increase in personal income heading into the election. They do get a small positive compensation from having lost the popular vote for these seats six years before.  However, without the re-election boost, even reaching an absurdly high four percent growth real income would not push the expected vote for the Democrats over the 50 percent line.  In the best years of the Obama Administration, real income growth reached slightly over two percent, which would give the Democratic candidates for Senate about 48 percent of the vote.

pred-econ-incum2

I admit there are many shortcomings to this analysis.  First, I only account for 45 percent of the variance in Senatorial voting, and this only at the national level.  Senatorial campaigns are played out in states, where local forces can exert a major role.  With only 33 or 34 seats up in each election, idiosyncratic factors can swing a few decisive states.

If, as the model predicts, the Democrats should expect to win about 48 percent of the popular vote for Senate, they can nevertheless still win back the Senate.  They might follow the path taken by the Republicans in 2004 and most dramatically in the first off-year election under Ronald Reagan in 1982.  In both those years the Republicans won a majority of the contested Senate seats with a minority of the popular vote.

senate-seats-votes-4

Technical Appendix: Comparing Trump and Sanders

trump-sanders3

The results above come from the 145 national Republican primary polls as archived by Huffington Post Pollster whose fieldwork was completed after June 30, 2015, and on or before January 6, 2016.  I started with July polling since the current frontrunner, Donald Trump, only announced his candidacy on June 16th. For Bernie Sanders I used the 155 national polls of Democrats starting after April 30th, the day Sanders made his announcement.

The models I am using are fundamentally similar to those I presented for the 2012 Presidential election polls and include these three factors:

  • a time trend variable measured as the number of days since June 30, 2015;
  • a set of “dummy” variables corresponding to the universe of people included in the sample — all adults, registered voters, and “likely” voters as determined by the polling organization using screening questions; and,
  • a set of dummy variables representing the method of polling used — “live” interviews conducted over the phone, automated interviews conducted over the phone, and Internet polling.

Trump’s support is best fit by a “fourth-order polynomial” with a quick uptick in the summer, a plateau in the fall, and a new surge starting around Thanksgiving that levelled off at the turn of the year. Support for Sanders follows a “quadratic” time trend.  His support has grown continuously over the campaign but at an ever slower rate.

Of more interest to students of polling are the effects by interviewing method and sampled universe.  Trump does over four percent worse in polls where interviews are conducted by a live human being.  Sanders does worse in polls that use automated telephone methods.  The result for Trump may reflect an unwillingness on the part of his supporters to admit to preferring the controversial mogul when talking with an actual human interviewer.

Sanders does not suffer from this problem, but polls relying on automated telephone methods show results about four percent lower than those conducted by human interviewers or over the Internet (the excluded category represented by the constant).  Since we know that Sanders draws more support from younger citizens, the result for automated polling may represent their greater reliance on cell phones which cannot by law be called by robots. This result contradicts other studies by organizations like Pew that find only limited differences between polls of cell phone users and those of landline users. Nevertheless when it comes to support for Bernie Sanders, polls that rely exclusively on landlines appear to underestimate his levels of support.

Turning to the differences in sampling frames, we find that polls that screen for “likely” voters show greater levels of support for Bernie Sanders than do polls that include all registered voters or all adults.  Trump’s support shows no relationship with the universe of voters being surveyed.  Both candidates, as “insurgents,” are thought to suffer from the problem of recruiting new, inexperienced voters who might not actually show up at the polls for primaries and caucuses.  That seems not to be an issue for either man, and in Sanders’s case it appears that the enthusiasm we have seen among his supporters may well gain him a couple of percentage points when actual voting takes place.

Finally it is clear that Trump’s polling support shows much more variability around his trend line than does Sanders’s. The trend and polling methods variables account for about 59 percent of the variation in Trump’s figures, but fully 72 percent of the variation for Sanders.

A Tale of Two Candidacies

Trump fares better in polls conducted by robots; Sanders polls better when humans conduct the interview.  Sanders also shows greater strength in polls of “likely” voters.

Commentators often treat Donald Trump and Bernie Sanders as representing two “insurgencies” within the Republican and Democratic Parties.  While there are certainly some surface similarities between the two candidacies, national polling data for the two candidates show substantial differences as well.  I begin with two charts comparing the trends in their national polling support using data from Huffington Post Pollster since each candidate announced he was running for President of the United States.

trump-trend4

sanders-trend

While both men’s support has continued to grow over the course of the campaign, the trajectories of their support are radically different.  Sanders’ polling figures have increased consistently over the course of the campaign, but the rate at which his support has risen has slowed as the campaign progressed.  Trump’s support seems to have gone through three phases — rapid growth at the outset, a plateau over the fall, and a second surge beginning around Thanksgiving that slowed at the turn of the year.  Extrapolating out to February 1st when the Iowa Caucuses take place, Trump would be approaching just under forty-five percent in national polls with Sanders  reaching thirty-five percent.

polling-effects

I have examined two types of polling effects: the method of interviewing and the type of sample drawn. Trump does over four percent worse when interviews are conducted by a live human being.  Sanders does worse by an essentially identical margin in polls that use automated telephone methods.  The result for Trump may reflect an unwillingness on the part of his supporters to admit to preferring the mogul when talking with an actual human interviewer.

Sanders’ poorer showing in polls that rely on automated polling may have to do with their exclusion of cell phones which cannot by law be called by robots. Usually this problem is adjusted for after the poll has been conducted by weighting the data to conform to expected demographic breakdowns.  However Sanders large lead among younger voters who are much less likely to have a landline phone may be suppressing his support in automated polling.  In a recent FoxNews poll, for instance, Sanders holds a 61-31 lead over Hillary Clinton among respondents under 45 years of age; older voters strongly prefer Clinton 71-21.  That same demographic explanation does not work for Trump, however, since he drew an identical 35 percent among voters in that same poll from both age groups.  The “social desirability” explanation probably has greater force when accounting for his poorer showing in polls conducted by human interviewers.

Turning to the differences in sampling frames, we find that polls that screen for “likely” voters show surprisingly greater levels of support for Bernie Sanders than do polls that include all registered voters or all adults.  Trump’s support shows no relationship with the sample of voters drawn.  Both candidates, as “insurgents,” are thought to suffer from the problem of recruiting new, inexperienced voters who might not actually show up at the polls for primaries and caucuses.  That seems not to be an issue for either man, and in Sanders’ case it appears that the enthusiasm we have seen among his supporters may well gain him a couple of percentage points when actual voting takes place.