Trump’s Job Approval Rating Key to Democratic Victory in 2018

In the previous article I showed that Democrats must win at least 53 percent of the national two-party vote for Congress in order to retake control of the House of Representatives.  That higher hurdle to success reflects the combined effects of more extensive partisan gerrymandering by Republican state governments and the tendency of Democrats to live in densely-populated urban districts.  These factors make Democratic votes for the House less “efficient” than Republican votes when it comes to determining which party controls the chamber.

So what combination of political and economic factors might result in a Democratic vote of 53 percent?  Political scientists have presented a number of models for mid-term elections over the years.  In an early paper, Edward Tufte showed that presidential approval and short-term changes in personal economic conditions both influenced support for the incumbent using the small sample of mid-term elections he had available at the time.  I find little support for an economic effect, but presidential job approval does play an important role.

I have analyzed both all Congressional elections and off-year elections separately.  The overall results are quite similar.  I am basing the conclusions below on the data for the seventeen off-year elections in my sample from 1950 to 2014.  Rather than treat the parties symmetrically and examine support for the President’s party as I did for the Senate, I am focused this time specifically on factors influencing support for the Democrats in off-year elections since their vote is what matters to this analysis.  It turns out just three variables account for over 90 percent of the variance in the Democratic vote for the House:

As always, the dependent variable is measured as a logit. Values above zero are associated with probabilities above 0.5; negative values represent probabilities below 0.5.  So the positive constant term indicates that the Democrats had an advantage over the period, but the coefficient for the dummy variable representing elections after 1992 is about equal in size and opposite in value.  That pattern corresponds to what we saw in the last article where Democrats had a seat advantage in the House until 1994 that vanished for two decades and has now turned significantly negative.

The other two variables capture the “referendum” aspect of off-year elections.  The Democrats do worse on average when one of their partisans occupies the White House.  However rising job approval ratings do translate into more support at the polls in the off year.  (The approval variable is coded positively for Democrats and negatively for Republicans.  If separate terms are included for Democratic and Republican presidents, the estimated coefficients are nearly identical in size but opposite in sign.  The coding I used imposes the constraint that changes in Presidential approval ratings have the same sized effect for both parties. The job approval data comes from Gallup and is based on averages of their polls near the election.)

I tried a variety of measures of economic conditions, specifically changes in real per capita disposable personal income, and none of them showed any additional effect.  I included a test of the “myopic” voter theory using only the change in income comparing the third and second quarters of the election year.  That fared no better than an approach with a longer time horizon, the growth rate over the past twelve months.  Thus there is no term in my model for economic conditions.

Since we have a Republican president, my estimates are based on the sum of the constant term and the term for elections after 1992.  If I plot the model’s predictions against President Trump’s potential approval ratings, I get this relationship:

If the President’s job approval rating falls below 32 percent, the model predicts the Democrats would win the 53.2 percent of the national House vote that we saw in the last article is required to obtain a majority of the seats in the chamber.  The last three Gallup polls reported Trump’s job approval at 38 or 39 percent.

An approval rating below thirty is historically very unlikely.  Richard Nixon in 1974 and George W. Bush in 2008 had ratings in the mid-twenties.  Jimmy Carter in 1978, George H. W. Bush in 1992, and his son in 2006 received job approval scores in the mid-thirties.  Of course, all of these incumbents had much higher ratings when they took office than did Donald Trump.

The average decline in Presidential job approval between Inauguration Day and the first subsequent off-year election has been a bit under nine points.  That would take Trump’s score down toward the mid-thirties.  However because he started at just 45 percent approval when inaugurated, he may not experience the same decline as did presidents who started from a higher rating.  For instance, it seems unlikely that Trump will experience a decline on the order of 23 points like Barack Obama did going into the 2010 midterm.   In fact, the table suggests the public treats Republican and Democratic presidents quite differently.  The Democrats all posted double-digit declines in job approval by the first mid-term election; none of the Republicans lost more than nine points over the same period, and approval for both Bush presidencies actually increased.



Can the Democrats Retake the House in 2018?

Now that all the gnashing of teeth has ended after the Republicans managed to hold on to the Georgia Sixth, perhaps we can step back and take a more systematic look at the Democrats’ prospects in 2018. Democrats will likely not make any gains in the Senate since the Republicans have only eight seats at-risk compared to twenty-three Democrats and both independents, Maine’s Angus King and Vermont’s Bernie Sanders.  That leaves the House as the only target.

There are two steps involved in answering this question.  The first is to use our historical experience with House elections to examine how votes are translated into seats.  With that information we can estimate the proportion of the two-party House vote that the Democrats need to win to take back the House in 2018.

As I wrote back in 2012, a combination of geographic clustering by party and good old partisan gerrymandering has created a “Republican bulwark” in the House since the last redistricting after the 2010 Census.  That means that the Democrats will need to win more than a majority of the popular vote for Congress if they intend to win a majority of House seats.

I have refined this simple seats and votes model in two ways.  First, I let the “swing ratio” vary between two historical periods, 1940-1992 and 1994-2016. Empirically the effects of voting “swings” on seat “swings” is significantly smaller in the more recent period.  As Tufte argues in his classic paper on the seats/votes relationship, a decline in the swing ratio indicates an increase in the proportion of “safe” seats.  As fewer and fewer seats have vote shares around fifty percent, there are consequently fewer that can be “flipped” by an equivalent shift in voters’ preferences.

I also use the results for the 2014 and 2016 elections to more sharply estimate the effect since 2010.  If we calculate the popular vote share required for the Democrats to win half the seats in the House, they would need to secure a bit over 53 percent of the (two-party) votes cast.

That brings us to the second question, what are the chances that the Democrats could win 53 percent of the Congressional vote in 2018?  Answering that question deserves an article unto itself.


Technical Appendix: Seats and Votes in the 2018 Election

I am extending the simple model I presented in 2012 relating the proportion of House seats won by Democrats against that party’s share of the (two-party) national popular vote for Congressional candidates.  It uses dummy variables to represent each redistricting period (e.g., the 2000 Census was used to redistrict elections from 2002-2010), and a slope change that starts with the Republican House victory of 1994.

To review, the earlier model showed this pattern of partisan advantage for elections conducted since 1940:

The results for the 2010 redistricting were based solely on the 2012 election.  As we’ll see in a moment, adding in 2014 and 2016 only made that result more robust.

As I argued earlier, not all of this trend results from partisan gerrymandering.  Americans have sorted themselves geographically over the past half-century with Democrats representing seats from urban areas and Republicans holding seats from suburban and rural areas.  As partisans self-segregate, the number of “safe” seats rises, and electoral competitiveness declines.

Partisan self-segregation also makes gerrymandering easier.  Opponents can be “packed” into districts where they make up a super-majority.  House Minority Leader Nancy Pelosi routinely wins 80 percent or more of the voters in her tiny, but densely populated San Francisco district.  Many of these seats are held by minority Members of Congress because of our national policy of encouraging “majority-minority” districts.   These efforts were well-motivated as a response to racist gerrymandering that would “crack” minority areas and distribute pieces of them in a number of majority-white districts.  Unfortunately for the Democrats these policies have meant that too many of the party’s voters live in heavily-Democratic districts.

Here is the result of an ordinary least squares regression for the share of House seats won by the Democrats in elections since 1940:

If I plot the predicted and actual values for Democratic seats won, the model unsurprisingly follows the historical pattern quite closely:

The Democrats routinely won around sixty percent of House seats between 1940 and 1992.  Since then they have only held a majority in the House twice, in 2006 and 2008.  Notice, too, that both the actual and predicted values for 1994 to the present show much less variance than the earlier decades.  The results above show that the “swing ratio” relating seats and votes has become much smaller falling from 1.92 before 1994 to 1.33 (=-0.59+1.92) since.  A smaller swing ratio indicates that House elections have become less competitive since Bill Clinton was elected President in 1992.  Changes in vote shares are still amplified in seat outcomes, as they are in all first-past-the-post electoral systems like ours, but the effect has been diminished because of the increase in the number of safe seats on both sides of the aisle.

We can use this model to estimate the share of votes required in order for the Democrats to win a majority in the House.  This chart shows the predicted relationships between seats and votes for two historical periods, one through the election of Bill Clinton in 1992, and the other beginning with the Republican victory in the House election of 1994 under Newt Gingrich and his “Contract with America.”

The slope in the latter period is substantially flatter than in the earlier period, meaning that Congressional elections have become somewhat less competitive since 1992.  Changes in vote shares have a smaller effect on changes in seat shares than they did before 1994.

Finally, the third line represents an estimate for the relationship in 2018, using the 1994-2016 slope and only the post-2010 intercept shift.  The chart shows that for the Democrats to win half the seats in 2018 they will need to garner a bit over 53 percent of the two-party popular vote for the House.



*The intercepts in these charts represent weighted averages of the adjustments for the various Census years. For instance, the 1994-2016 line includes the coefficients for the 1990, 2000, and 2010 Census weighted by the number of elections in each decade. So in this case the 1990 and 2000 adjustments would have weights of five, and the 2010 adjustment a weight of three. The 2018 line applies only the 2010 redistricting adjustment.

It Don’t Mean a Thing …

Here are my final tests for who is ahead in the swing states.  The situation looks rather bleak for Donald Trump.


As in 2012, I am using a “chi-squared” test* to determine whether each candidate has led in so many polls in each state that it is statistically unlikely that person is not actually ahead there.  I’ve used all state polls archived at Huffington Post Pollster since June 1st and conducted a separate test using only polls conducted after the release of the “Access Hollywood” tape on October 7th where Trump claims to have committed sexual assault.  In this more recent set of polls, Arizona moves from Trump’s column to a toss-up.

In three other states, Iowa, Nevada, and Ohio, the race appears statistically tied.  Neither candidate has led in a sufficient number of polls to determine whether one of them is truly in the lead.  Hillary Clinton has a significant lead in the remaining eight states, with a total of 116 Electoral Votes.  Combined with the other solidly Democratic states, she should win at least 317 Electoral Votes on Tuesday, and as many as 347 were she to take all three of Nevada, Iowa, and Ohio.  More likely, given the data above, she will lose Iowa and Ohio and end up with 323 Electoral Votes adding Nevada to her column.


*Values of chi-squared greater than 3.84 are “significant at the 0.05 level” (with one “degree of freedom”), meaning there is a 95 percent probability that Clinton is ahead.  Values greater than 6.64 are significant at the 99 percent level.  In all eight states where Clinton has led in the polls since June 1st, her chances of actually being ahead in those states are very much higher than 99 percent. (Return)

The State of the Race

Donald Trump has gained ground over Hillary Clinton during the campaign, but the combined effect of events leaves her with a predicted five-point advantage on Election Day.

Back in 2012 I modelled the dynamics of national Presidential polling using a combination of time trends, survey methodologies, and campaign events.  In this posting I will present a similar model for the 2016 campaign using the 190 polls archived at Huffington Post Pollster covering the period from June 1st through October 25th.  All these polls include both minor party candidates, Gary Johnson and Jill Stein, in the list of alternatives.

As before I am using three types of explanatory factors to model polling dynamics:

  • a simple linear time trend that measures the number of days remaining in the campaign until Election Day; using higher-order polynomials like quadratics or cubics does not improve explained variance;
  • “dummy” variables that correspond to various features of each survey like the sample drawn (registered versus “likely” voters), the method of polling (live interviewers, automated interviewing, or via the Internet), and the identity of the polling organization;
  • dummy variables to represent various events during the course of the campaign.

For the polling organizations, I included dummies only for those who had contributed at least nine polls, or five percent of the sample.  Only six organizations met this criterion.  For the events, I included both parties’ national conventions and the first Presidential debate on September 26th.  I also included a term for the release of the “Access Hollywood” tape where Donald Trump was recorded as claiming to have engaged in sexual assault.  Because the second debate followed only two days after the release of the tape on October 9th I have combined those events together into a single dummy variable.  I have included a third variable which represents the period since the third debate on October 19th.  All dates are measured from the midpoint of each poll’s fieldwork.

Measuring the effects of the conventions was especially difficult this year since the DNC took place in the week following the RNC.  The RNC dummy is coded one starting on the close of the convention, July 21st, and extends through the following Sunday.  Eight polls were conducted during this period.  Rather than measure a separate effect for the Democratic convention, I have instead used a “post-convention” variable  that is coded as one from the close of the DNC until the first debate.  All models are estimated using “weighted least squares” with the weights proportional to the square root of each poll’s sample size.

Dependent Variable: Clinton lead over Trump
Weighted Least Squares; N=190


I present three different specifications of the model.  The first uses only the trend, method, and event variables.  The second version includes effects for the six pollsters who met the criterion of nine or more polls.  The last specification removes terms that were not statistically significant in prior specifications.  (The marginally significant effect for Ipsos/Reuters disappears in a more restricted specification.)

Starting first with the time trend, the positive value indicates that Clinton held a larger lead early in the campaign season.  A value of 0.07 means that Trump picks up about one percentage point on his opponent every fourteen days (=1/0.07).  This is a much faster pace than in 2012.  Four years ago, it took President Obama about forty-seven days to gain a single percentage point over Mitt Romney.  The constant indicates the predicted margin between the candidates on Election Day when the “Days Before” variable is zero.  Without any intervening events the model predicts a Trump victory by five to six percent.

Rather surprisingly none of the methodological variables have any effect in 2016.  Poll watchers generally expect to see a one- to two-point tilt in the Republican direction when samples are constrained to “likely” voters.  That difference reflects the generally higher propensity of Republicans to turn out since their age and social characteristics correlate with voting.  This year we see no such effect.  Nor is this likely to be a statistical artifact; polls of likely voters represent only 58 percent of the sample so there are sufficient numbers of each type of poll to generate reliable results.

In 2012, polls conducted on the Internet were about one percentage point more favorable to Obama than polls conducted by other means.  This year we see no differences between Internet polls and those conducted by live interviewers.  Two organizations, the Republican-leaning Rasmussen Reports and the Democratic-leaning Public Policy Polling, use automated calling systems where respondents are asked to enter their answers by pressing the phone’s dialpad or speaking directly to the calling robot.  Because there are only two such agencies, I included dummy variables for each of them rather than a single variable denoting the method they use.  The results for the two organizations are quite different.  Rasmussen continues to show a significant bias in favor of the Republican candidate, while PPP shows no such bias.  This difference parallels that found for 2012, where Rasmussen’s results showed a pro-Romney bias.  Rasmussen’s polling in 2016 has an even greater Republican tilt of over four points, compared to two to three points in 2012.

What the model shows most clearly, though, is the powerful effect of campaign events on the margin between the candidates.  Clinton’s lead fell after the Republican National Convention then rebounded after the Democrats convened in Philadelphia.  The debates and the release of the Access Hollywood tape further boosted Clinton’s margin.  Since the effects of these events must be measured against the overall pro-Trump trend in the polls, I have incorporated these data into a chart.
clinton-lead-trend-2 The aftermath of the conventions brought the race back to more or less the same place it was on June 1st with Clinton holding about a seven-point lead.  Her advantage decayed over the weeks that followed until the combined effects of the first debate and the release of the Access Hollywood tape again brought her lead up to nearly eight points.  The model predicts that her advantage will have fallen back to about five points on Election Day itself.  Since the model has a standard error of about 0.5 percent, the confidence interval on the Election Day prediction is roughly four to six percent.

A few other observations from these results.  First, the notion that there is a hidden vote for Donald Trump that does not appear in public polling is contradicted by the lack of any effects by polling method.  Back in January I found that Trump did over four points better in polls of Republican primary voters when they were interviewed by automated methods.  I attributed that result to the so-called “social desirability” effect; Trump supporters might have felt more shy about admitting their preference to a human interviewer.  I see no such effect in the general election polls now that Trump has been legitimated by being the Republican nominee.

Second, though I do not show the results here, including the size of the vote for the two minor-party candidates, or the proportion of undecideds, has no systematic effect on the margin between the major-party candidates.  If prospective supporters for one major candidate were disproportionately likely to defect to one of the minor candidates, or to remain undecided, we would expect to see fluctuations in the size of those groups influence the size of Clinton’s lead over Trump.  Instead it appears that potential supporters of both those candidates have moved in and out of the minor-party columns or remained undecided at roughly equal rates.  If so, as the minor candidates get squeezed as Election Day draws near, and the number of undecided voters dwindles, we should not expect to see those changes affect the competitive positions of Clinton or Trump.

Procedures Used with Data from Huffington Post Pollster

In the past few weeks, Pollster has begun reporting multiple results for a single poll.  Some polling organizations have been reporting separate results for Democratic, Republican, and independent respondents, as well as the aggregated data for all respondents.  They have also begun providing detailed information on the question(s) asked to determine voting intention.  Pollster reports separate results for each question wording.

Since all my analyses use just one entry per poll, I have begun removing this extra data before analysis.  Unless specifically stated, I am using only the first “question iteration” for each poll (coded “1” at Pollster) and only data for the entire population.  Using the first iteration helps insure consistency across all the polls from a single organization.


Swing State Update

I have expanded the list of states that might play a role in determining the outcome of the Presidential vote in the fall.  For each state in the list below, I have compiled all the available polls at Huffington Post Pollster and calculated the percent of polls in which Clinton held a lead.  For each state I then calculated a statistic called “chi-squared” to see whether her lead was sufficiently consistent to conclude she was truly ahead in the state.  Here are the results through today:


In Wisconsin, Hillary Clinton has led in every poll conducted in the state dating back to last fall.  She has nearly as impressive a lead in both Michigan and Pennsylvania, both states typically mentioned as targets for Donald Trump’s “rust-belt” strategy.  In those two states there is less than one chance in twenty that Clinton is truly behind given the number of polls in which she held the lead.  In the remaining states the results are still too mixed to draw any conclusions about which candidate is in the lead.  Clinton does especially poorly in the traditionally-Republican states of Arizona and Georgia, but there haven’t been enough polls taken to draw any conclusions there. The other states remain toss-ups.


Why Weighted Least Squares for Polling Regressions

Standard ordinary least squares regression assumes that the error term has the same variance across all the observations.  When the units are polls, we know immediately that this assumption will be violated.  The error in a poll in inversely proportional to its sample size.  The “margin of error” that pollsters routinely report is twice the standard error of estimate evaluated at 50%, the worst case with the largest possible variance.  That comes from the well-known statistical formula

SE(p) = sqrt(p[1-p]/N)

where N is the sample size.  This formula reaches its maximum at p=0.5 (50%) making the standard error 0.5/sqrt(N).

Weighted least squares adjusts for these situations where the error term has a non-constant variance (technically called “heteroskedasticity”). To even out the variance across observations, each one is weighted by the reciprocal of its estimated standard error.  For polling data, then, the weights should be proportional to the reciprocal of 1/sqrt(N), or just sqrt(N) itself. I thus weight each observation by the square root of its sample size.

More intuitively we are weighting polls based on their sample sizes.  However, because we are first taking the square roots of the sample sizes, the weights grow more slowly as samples increase in size.

Who Leads in the Swing States?

As in every Presidential election, the outcome will be determined by a very small number of states. As I did in 2012, I have compiled the polls in these “swing” states and counted up the number of times Hillary Clinton or Donald Trump was in the lead.  I have included every poll conducted so far that includes both candidates; the oldest poll was taken in late June of 2015.    I intend to update these results limiting them to only recent polls as the election nears.


Two states – Michigan and Pennsylvania – have supported Hillary Clinton consistently enough that there is just a small chance, less than one in twenty, the race is actually tied or she is behind Donald Trump in those states.  The other four states remain toss-ups.


Pennsylvania tempts Republicans to compete there every election cycle, and this one is no exception.  Still the state has trended Democrat in Presidential elections since the late 1960’s.


Race to the Bottom

As most everyone who follows politics knows by now, we enter the unprecedented 2016 Presidential election with the candidates of both major parties disliked by a majority of Americans.  In this posting I examine the trends in “favorability” for both Hillary Clinton and Donald Trump.

Using the data at Huffington Post Pollster I calculated the “net favorability” for each candidate, equal to the percent of respondents saying they view a candidate favorably versus the percent who say they view that candidate unfavorably. I begin with Hillary Clinton, for whom we have favorability data dating back to 2009.



It might be hard to imagine today, but during her tenure as Secretary of State in Barack Obama’s first term, Hillary Clinton was viewed quite positively by the American public. Between Fall, 2009. and Fall, 2012, about three out of five Americans surveyed reported that they viewed Secretary Clinton favorably.  Even as late as April, 2013, Clinton was favorably viewed by 64 percent of the adults surveyed by Gallup, compared to 31 percent who viewed her unfavorably.  That translates into a net score of +33 (=64-31) in the graph above. She would never attain that level of popularity again.

Opinions about Clinton did not fall right away after the attack on the U.S. Consulate in Benghazi, Libya, on September 11, 2012, but the downward trajectory began soon thereafter.  When she announced her candidacy for President on April 12, 2015, the proportion of Americans holding favorable and unfavorable views of Secretary Clinton were just about equal.  A few months later her favorability score was “underwater,” with the proportion of Americans holding unfavorable views outnumbering those with favorable ones by between ten and twenty percent.


Opinions about Donald Trump have also remained pretty constant, and consistently negative, since he announced his candidacy on June 12, 2015.   At no time since he began his campaign for President have more Americans reported feeling “favorable” toward Donald Trump than “unfavorable.”  His ratings improved somewhat after his announcement and through the summer of 2015, but when the primary campaign began in earnest starting in January of 2016, Trump saw his favorability score fall further south.  It has rebounded and levelled off since he became the presumptive nominee after winning the Indiana primary on March 26th.  Compared to Hillary Clinton’s ratings, though, Donald Trump’s net favorability score averages about -24 compared to her average net rating of -11.


If we now take the difference between these two net favorability scores, we can see whether both candidates are equally disliked, or whether one is disliked more than the other.  For most of the campaign so far, Hillary Clinton has been winning the contest over which of them is less disliked.  Her net favorability scores generally run around 11-12 percent less negative than Trump’s.  For instance, over the month of June, 2016, Clinton averaged 41 percent favorable versus 55 percent unfavorable, for a net favorability score of -14.  Trump’s scores were 35 percent favorable and 60 percent unfavorable, for a net score of -25, or eleven points worse than Clinton’s.


As you might expect, there is a strong correlation between this net favorability score and the proportion of respondents intending to vote for Clinton or Trump.  Net favorability alone explains about two-thirds of the variance in voting intention across the 113 polls where both questions were asked.  Given the relationship shown in the graph, a score of +11 in net favorability should yield about a five percent lead in voting intention.


One interesting finding from the regression results is that the constant term of 1.06 percent is significantly different from zero.  (It has a standard error of 0.38 with p<0.01.)  The constant predicts Clinton’s lead when net favorability is zero, or in a poll where the proportion of people favoring and disfavoring each candidate is identical.  When net favorability is zero, Clinton leads Trump on average by a bit over one percent.