As in 2012, I am using a “chi-squared” test* to determine whether each candidate has led in so many polls in each state that it is statistically unlikely that person is not actually ahead there. I’ve used all state polls archived at Huffington Post Pollster since June 1st and conducted a separate test using only polls conducted after the release of the “Access Hollywood” tape on October 7th where Trump claims to have committed sexual assault. In this more recent set of polls, Arizona moves from Trump’s column to a toss-up.

In three other states, Iowa, Nevada, and Ohio, the race appears statistically tied. Neither candidate has led in a sufficient number of polls to determine whether one of them is truly in the lead. Hillary Clinton has a significant lead in the remaining eight states, with a total of 116 Electoral Votes. Combined with the other solidly Democratic states, she should win at least 317 Electoral Votes on Tuesday, and as many as 347 were she to take all three of Nevada, Iowa, and Ohio. More likely, given the data above, she will lose Iowa and Ohio and end up with 323 Electoral Votes adding Nevada to her column.

__________

*Values of chi-squared greater than 3.84 are “significant at the 0.05 level” (with one “degree of freedom”), meaning there is a 95 percent probability that Clinton is ahead. Values greater than 6.64 are significant at the 99 percent level. In all eight states where Clinton has led in the polls since June 1st, her chances of actually being ahead in those states are very much higher than 99 percent. (Return)

Back in 2012 I modelled the dynamics of national Presidential polling using a combination of time trends, survey methodologies, and campaign events. In this posting I will present a similar model for the 2016 campaign using the 190 polls archived at Huffington Post Pollster covering the period from June 1st through October 25th. All these polls include both minor party candidates, Gary Johnson and Jill Stein, in the list of alternatives.

As before I am using three types of explanatory factors to model polling dynamics:

- a simple linear time trend that measures the number of days remaining in the campaign until Election Day; using higher-order polynomials like quadratics or cubics does not improve explained variance;
- “dummy” variables that correspond to various features of each survey like the sample drawn (registered versus “likely” voters), the method of polling (live interviewers, automated interviewing, or via the Internet), and the identity of the polling organization;
- dummy variables to represent various events during the course of the campaign.

For the polling organizations, I included dummies only for those who had contributed at least nine polls, or five percent of the sample. Only six organizations met this criterion. For the events, I included both parties’ national conventions and the first Presidential debate on September 26th. I also included a term for the release of the “Access Hollywood” tape where Donald Trump was recorded as claiming to have engaged in sexual assault. Because the second debate followed only two days after the release of the tape on October 9th I have combined those events together into a single dummy variable. I have included a third variable which represents the period since the third debate on October 19th. All dates are measured from the midpoint of each poll’s fieldwork.

Measuring the effects of the conventions was especially difficult this year since the DNC took place in the week following the RNC. The RNC dummy is coded one starting on the close of the convention, July 21st, and extends through the following Sunday. Eight polls were conducted during this period. Rather than measure a separate effect for the Democratic convention, I have instead used a “post-convention” variable that is coded as one from the close of the DNC until the first debate. All models are estimated using “weighted least squares” with the weights proportional to the square root of each poll’s sample size.

Dependent Variable: Clinton lead over Trump

Weighted Least Squares; N=190

I present three different specifications of the model. The first uses only the trend, method, and event variables. The second version includes effects for the six pollsters who met the criterion of nine or more polls. The last specification removes terms that were not statistically significant in prior specifications. (The marginally significant effect for Ipsos/Reuters disappears in a more restricted specification.)

Starting first with the time trend, the positive value indicates that Clinton held a larger lead early in the campaign season. A value of 0.07 means that Trump picks up about one percentage point on his opponent every fourteen days (=1/0.07). This is a much faster pace than in 2012. Four years ago, it took President Obama about forty-seven days to gain a single percentage point over Mitt Romney. The constant indicates the predicted margin between the candidates on Election Day when the “Days Before” variable is zero. Without any intervening events the model predicts a Trump victory by five to six percent.

Rather surprisingly none of the methodological variables have any effect in 2016. Poll watchers generally expect to see a one- to two-point tilt in the Republican direction when samples are constrained to “likely” voters. That difference reflects the generally higher propensity of Republicans to turn out since their age and social characteristics correlate with voting. This year we see no such effect. Nor is this likely to be a statistical artifact; polls of likely voters represent only 58 percent of the sample so there are sufficient numbers of each type of poll to generate reliable results.

In 2012, polls conducted on the Internet were about one percentage point more favorable to Obama than polls conducted by other means. This year we see no differences between Internet polls and those conducted by live interviewers. Two organizations, the Republican-leaning Rasmussen Reports and the Democratic-leaning Public Policy Polling, use automated calling systems where respondents are asked to enter their answers by pressing the phone’s dialpad or speaking directly to the calling robot. Because there are only two such agencies, I included dummy variables for each of them rather than a single variable denoting the method they use. The results for the two organizations are quite different. Rasmussen continues to show a significant bias in favor of the Republican candidate, while PPP shows no such bias. This difference parallels that found for 2012, where Rasmussen’s results showed a pro-Romney bias. Rasmussen’s polling in 2016 has an even greater Republican tilt of over four points, compared to two to three points in 2012.

What the model shows most clearly, though, is the powerful effect of campaign events on the margin between the candidates. Clinton’s lead fell after the Republican National Convention then rebounded after the Democrats convened in Philadelphia. The debates and the release of the Access Hollywood tape further boosted Clinton’s margin. Since the effects of these events must be measured against the overall pro-Trump trend in the polls, I have incorporated these data into a chart.

The aftermath of the conventions brought the race back to more or less the same place it was on June 1st with Clinton holding about a seven-point lead. Her advantage decayed over the weeks that followed until the combined effects of the first debate and the release of the Access Hollywood tape again brought her lead up to nearly eight points. The model predicts that her advantage will have fallen back to about five points on Election Day itself. Since the model has a standard error of about 0.5 percent, the confidence interval on the Election Day prediction is roughly four to six percent.

A few other observations from these results. First, the notion that there is a hidden vote for Donald Trump that does not appear in public polling is contradicted by the lack of any effects by polling method. Back in January I found that Trump did over four points better in polls of Republican primary voters when they were interviewed by automated methods. I attributed that result to the so-called “social desirability” effect; Trump supporters might have felt more shy about admitting their preference to a human interviewer. I see no such effect in the general election polls now that Trump has been legitimated by being the Republican nominee.

Second, though I do not show the results here, including the size of the vote for the two minor-party candidates, or the proportion of undecideds, has no systematic effect on the margin between the major-party candidates. If prospective supporters for one major candidate were disproportionately likely to defect to one of the minor candidates, or to remain undecided, we would expect to see fluctuations in the size of those groups influence the size of Clinton’s lead over Trump. Instead it appears that potential supporters of both those candidates have moved in and out of the minor-party columns or remained undecided at roughly equal rates. If so, as the minor candidates get squeezed as Election Day draws near, and the number of undecided voters dwindles, we should not expect to see those changes affect the competitive positions of Clinton or Trump.

]]>Since all my analyses use just one entry per poll, I have begun removing this extra data before analysis. Unless specifically stated, I am using only the first “question iteration” for each poll (coded “1” at Pollster) and only data for the entire population. Using the first iteration helps insure consistency across all the polls from a single organization.

]]>

In Wisconsin, Hillary Clinton has led in every poll conducted in the state dating back to last fall. She has nearly as impressive a lead in both Michigan and Pennsylvania, both states typically mentioned as targets for Donald Trump’s “rust-belt” strategy. In those two states there is less than one chance in twenty that Clinton is truly behind given the number of polls in which she held the lead. In the remaining states the results are still too mixed to draw any conclusions about which candidate is in the lead. Clinton does especially poorly in the traditionally-Republican states of Arizona and Georgia, but there haven’t been enough polls taken to draw any conclusions there. The other states remain toss-ups.

]]>

SE(p) = sqrt(p[1-p]/N)

where N is the sample size. This formula reaches its maximum at p=0.5 (50%) making the standard error 0.5/sqrt(N).

Weighted least squares adjusts for these situations where the error term has a non-constant variance (technically called “heteroskedasticity”). To even out the variance across observations, each one is weighted by the reciprocal of its estimated standard error. For polling data, then, the weights should be proportional to the reciprocal of 1/sqrt(N), or just sqrt(N) itself. I thus weight each observation by the square root of its sample size.

More intuitively we are weighting polls based on their sample sizes. However, because we are first taking the square roots of the sample sizes, the weights grow more slowly as samples increase in size.

]]>Two states – Michigan and Pennsylvania – have supported Hillary Clinton consistently enough that there is just a small chance, less than one in twenty, the race is actually tied or she is behind Donald Trump in those states. The other four states remain toss-ups.

Pennsylvania tempts Republicans to compete there every election cycle, and this one is no exception. Still the state has trended Democrat in Presidential elections since the late 1960’s.

]]>

Using the data at Huffington Post Pollster I calculated the “net favorability” for each candidate, equal to the percent of respondents saying they view a candidate favorably versus the percent who say they view that candidate unfavorably. I begin with Hillary Clinton, for whom we have favorability data dating back to 2009.

It might be hard to imagine today, but during her tenure as Secretary of State in Barack Obama’s first term, Hillary Clinton was viewed quite positively by the American public. Between Fall, 2009. and Fall, 2012, about three out of five Americans surveyed reported that they viewed Secretary Clinton favorably. Even as late as April, 2013, Clinton was favorably viewed by 64 percent of the adults surveyed by Gallup, compared to 31 percent who viewed her unfavorably. That translates into a net score of +33 (=64-31) in the graph above. She would never attain that level of popularity again.

Opinions about Clinton did not fall right away after the attack on the U.S. Consulate in Benghazi, Libya, on September 11, 2012, but the downward trajectory began soon thereafter. When she announced her candidacy for President on April 12, 2015, the proportion of Americans holding favorable and unfavorable views of Secretary Clinton were just about equal. A few months later her favorability score was “underwater,” with the proportion of Americans holding unfavorable views outnumbering those with favorable ones by between ten and twenty percent.

Opinions about Donald Trump have also remained pretty constant, and consistently negative, since he announced his candidacy on June 12, 2015. At no time since he began his campaign for President have more Americans reported feeling “favorable” toward Donald Trump than “unfavorable.” His ratings improved somewhat after his announcement and through the summer of 2015, but when the primary campaign began in earnest starting in January of 2016, Trump saw his favorability score fall further south. It has rebounded and levelled off since he became the presumptive nominee after winning the Indiana primary on March 26th. Compared to Hillary Clinton’s ratings, though, Donald Trump’s net favorability score averages about -24 compared to her average net rating of -11.

If we now take the difference between these two net favorability scores, we can see whether both candidates are equally disliked, or whether one is disliked more than the other. For most of the campaign so far, Hillary Clinton has been winning the contest over which of them is less disliked. Her net favorability scores generally run around 11-12 percent less negative than Trump’s. For instance, over the month of June, 2016, Clinton averaged 41 percent favorable versus 55 percent unfavorable, for a net favorability score of -14. Trump’s scores were 35 percent favorable and 60 percent unfavorable, for a net score of -25, or eleven points worse than Clinton’s.

As you might expect, there is a strong correlation between this net favorability score and the proportion of respondents intending to vote for Clinton or Trump. Net favorability alone explains about two-thirds of the variance in voting intention across the 113 polls where both questions were asked. Given the relationship shown in the graph, a score of +11 in net favorability should yield about a five percent lead in voting intention.

One interesting finding from the regression results is that the constant term of 1.06 percent is significantly different from zero. (It has a standard error of 0.38 with *p*<0.01.) The constant predicts Clinton’s lead when net favorability is zero, or in a poll where the proportion of people favoring and disfavoring each candidate is identical. When net favorability is zero, Clinton leads Trump on average by a bit over one percent.

While the Presidential race gets all the media and pundit attention, the battle for control of the Senate also looms large in this election year. Republicans enter the election holding 54 of the 100 Senate seats, so a net Republican loss of just five seats would put the Senate back in Democratic hands. The Democrats have the advantage that many more Republican seats, twenty-four, are at risk in the 2016 election compared to only ten held by Democrats. This lopsided margin reflects the result of the 2010 off-year election when Republicans picked up six seats from the Democrats. In principle, some of those Republican senators may be more vulnerable in a Presidential year with higher turnouts and more visibility. The Democrats certainly believe they can retake the Senate this November. The Party aggressively recruited candidates for the Senate elections and had secured bids from all but one of its top-tier candidate selections by early October of 2015.

The Democrats also have the advantage that the party of the incumbent President usually wins a slim majority of the Senate vote in “on-year” elections when a Presidential election also takes place but loses in “off-year” elections.

Unfortunately for the Democrats the relationship between Senate electoral success and type of election is not so simple. If I divide up on-year elections into ones when the President ran for re-election and ones when, like the upcoming election, he did not, a very different pattern emerges. The President’s party fared substantially worse in the five open-seat elections since 1946 than it did in elections with the President at the top of the ticket. While open-seat years gave the President’s party a one-percent boost compared to off-years, that difference is not statistically significant. What matters is whether the President is running or not.

The Democrats’ optimism is also based on the much larger number of Republican seats at risk in 2016. I find some support for the notion that a Senate “Class” with a comparatively lopsided division of the vote in one election becomes more competitive six years later. Statisticians call this phenomenon “regression toward the mean,” where observations that were outliers at one time show more average scores when measured again. But this effect is weak, and the division of the Senate vote in 2010, 53-47 percent Republican, was not as lopsided as the margin in terms of seats, 65-35 percent Republican. All told the estimated “rebound” effect given the Republican 2010 landslide is just 0.6%, raising the expected Democratic vote from 47.1% to 47.7%.

Where else might the Democrats gain some relief? Perhaps the generally positive state of the economy might provide some help. Political scientists and economists have tested many different measures of economic conditions in models of voting for President and Congress. One simple measure that has consistently proven significant is the change in personal income, and that proves true for Senate elections as well.

This chart adds the effects of the year-on-year percent change in real disposable personal income to our simple model. While the effect of rising incomes is positive and statistically significant, it alone cannot overcome the substantial deficit facing the Democrats in an open-seat election year. In the six years of the Obama Administration, personal income rose by at most 2.2 percent in a single year, 2012. With a likely figure for annual income growth in 2016 at around two percent, we should expect the Democrats to win only about 48 percent of the Senate vote in 2016.

Winning a majority of the Senate vote is not a requirement for winning a majority of the contested seats. In 2004, and most dramatically in 1982, the Republicans managed to win a majority of the seats with a minority of the votes cast.

The high “swing ratio” of 2.4 means that a change of one percent in the percentage of votes won translates on average to a 2.4 percent increase in the percentage of seats. So even fairly small changes in the division of the vote can have much larger effects on the composition of the United States Senate.

I tried some other possible influences like the approval rating of the President and the size of the President’s margin in on-year elections. I found no “coattails” effect for the Presidential vote either in years when the President is running or years when he not. Presidential approval does matter, but only in off-year elections, so I did not include it in this discussion about 2016. That finding is consistent with a conventional view that off-year elections reflect public opinion about the President’s performance in office.

]]>

It seems like new Presidential polling figures are released every day. We generally talk about each new poll as a unique snapshot of the campaign with some fresh sample of a few hundred caucus-goers. That concept of polls might apply to national samples, but when polling in states as small as Iowa and New Hampshire, the number of eligible respondents is not that large compared to the number of interviews being conducted.

Making the problem worse is the falling “response rate” in polling, the proportion of eligible respondents who complete an interview. Mobile phones, caller-ID, answering machines, all have enabled potential respondents to avoid the pollster’s call. Pew reports that response rates have fallen from 21 percent to 9 percent just between 2006 and 2012. If we assume a response rate of ten percent, only some 16,000 of Iowa’s 160,000 eligible Republican caucus-goers might have agreed to take part in a poll.

Huffington Post Pollster lists a total of 94 polls of Republican caucus-goers through today, January 31, 2016, constituting a total of 44,433 interviews. I will use this figure to see how the composition of the sample changes with different response rates.^{1}

Around 120,000 people participated in the Republican caucuses in 2008 and 2012. While some observers think turnout in 2016 could be higher because of the influx of voters for Donald Trump, I have stuck with the historical trend and estimate Republican turnout in 2016 at just under 124,000 citizens.

To that baseline we have to add in people who agree to complete an interview but do not actually turn out for the caucuses. In my 2012 analysis I added 20 percent to the estimated universe to account for these people, but recent findings from Pew suggest 30 percent inflation might be more accurate. With rounding, I will thus use 160,000 as my estimate for the number of Iowa Republicans who might have been eligible to be polled about the 2016 Republican caucuses.

Most of those 160,000 people will never take part in a poll. Pew estimated 2012 “response rates,” the proportion of eligible respondents who actually complete an interview, in the neighborhood of 9 percent. To see what this means for Iowa, here is a table that presents the average number of interviews a cooperating respondent would have conducted during the 2016 campaign at different response rates. At a ten percent response rate like Pew reports, the 16,000 cooperating prospects would each need to complete an average of 2.78 interviews to reach the total of 44,433.

Finally, I’ll apply the Poisson distribution once again to estimate the number of people being interviewed once, twice, three times, etc., to see the shape of the samples at each response rate.

Even if everyone cooperates, random chance alone would result in about 13 percent of respondents being interviewed at least twice. When the response rate falls to 10 percent, most respondents are being interviewed three or four times, with fifteen percent of them being interviewed five times or more. Even with a 20 percent response rate, about double what Pew reports, a majority will have been interviewed at least twice.

Certainly someone willing to be interviewed three, four, five times or more about a campaign must have a higher level of interest in politics than the typical Iowa caucus-goer who never cooperates with pollsters. That difference could distort the figures for candidate preferences if some candidates’ supporters are more willing to take part in polls.

Basing polls on a relatively small number of cooperative respondents might also create false stability in the readings over time. Polling numbers reflect more the opinions of the “insiders” with a strong interest in the campaign and may be less sensitive to any winds of change. We might also imagine that, as the campaign winds down and most everyone eligible has been solicited by a pollster, samples become more and more limited to the most interested.

Overarching all these findings remains the sobering fact that only about one-in-ten citizens is willing to take part in polling. Pollsters can adjust after the fact for any misalignments of the sample’s demographics, but they cannot adjust for the fact that people who participate in polling may simply not represent the opinions of most Americans. We’ll see how well the opinions of those small numbers of respondents in Iowa and New Hampshire match the opinions of those states’ actual electorates on Primary Day.

]]>

One type represents the electoral setting. Is this an on-year or off-year election? And, in on-years, is the incumbent President running for re-election? Alone these two factors account for over twenty percent of the variance in incumbent support, with President re-election bids having by far the greatest impact. This results for this model appears in the left-hand column of the table below. The remaining columns add additional explaatory factors to the basic political environment.

Right away we see that when a President is running for re-election, his co-partisans in the Senate have a much greater chance of winning. Because these are measured as logits, values below zero correspond to a percentage value below fifty, while positive logits imply values above fifty percent. Without the President running, the model has a slight negative prediction equal to the constant term. In Presidential re-election years that negative value turns positive being the sum of the constant (-0.08) and the effect for relection years (0.14). By this model the Democrats in the Senate will be short that extra boost that comes from having an incumbent seeking re-election.[2]

One reason the Democrats are optimistic about their chances to retake the Senate in 2016 is that these seats were last contested in the Republican wave election of 2010. This year those seats will be fought in the context of a Presidential election with its greater visibility and higher turnout. I have measured this effect by including the vote from the election held six years prior. In principle, we should expect a negative effect, as “regression toward the mean” sets in. Republicans perhaps won by unexpectedly larger margins in 2010 so their margins should fall closer to the average this time around.

Adding the prior vote for each Senatorial “class” improves the predictive power of this simple model slightly, but the coefficient itself fails to reach significance. It has the expected negative sign, however, and will prove much more significant in further reformulations.

The third column adds the effect of presidential approval, a common predictor in models of voting for the House. For the Senate it turns out to have a more subtle effect. Presidential approval has the expected positive effect on votes for the incumbent’s Senators, but only in off-year elections. A long literature in political science has examined off-year elections espousing a variety of theories to explain the President’s usual losses. I generally adhere to the “referendum” school of thought on off-years, that they give the public a chance to express their approval or disapproval of a President mid-way through his term. That presidential approval matters not in years when a Presidential election is being held reinforces my belief in the referendum explanation for off-year voting.

The last explanatory factor is the year-on-year percent change in real disposable personal income. Political scientists and economists have included pretty much every economic variable that might affect election outcomes in their models of presidential and congressional voting, but the one factor that often proves significant is personal income. Adding it to the model increased “explained” variance by over ten percent.

Here is a chart showing the expected national vote for the Senate Democrats as a function of the size of the increase in personal income heading into the election. They do get a small positive compensation from having lost the popular vote for these seats six years before. However, without the re-election boost, even reaching an absurdly high four percent growth real income would not push the expected vote for the Democrats over the 50 percent line. In the best years of the Obama Administration, real income growth reached slightly over two percent, which would give the Democratic candidates for Senate about 48 percent of the vote.

I admit there are many shortcomings to this analysis. First, I only account for 45 percent of the variance in Senatorial voting, and this only at the national level. Senatorial campaigns are played out in states, where local forces can exert a major role. With only 33 or 34 seats up in each election, idiosyncratic factors can swing a few decisive states.

If, as the model predicts, the Democrats should expect to win about 48 percent of the popular vote for Senate, they can nevertheless still win back the Senate. They might follow the path taken by the Republicans in 2004 and most dramatically in the first off-year election under Ronald Reagan in 1982. In both those years the Republicans won a majority of the contested Senate seats with a minority of the popular vote.

]]>