Modeling Senate Elections Redux

I have reworked my model for Senate elections using data for elections in 2016 and 2018. That model relied on three factors to predict the vote for the Democratic candidate:

  • the “net favorability” (favorable – unfavorable) of the incumbent Senator;
  • a measure of the state’s favorability toward Donald Trump; in 2016, I used his proportion of the two-party vote; in 2018, I used his job approval rating; the two measures proved to have identical effects;
  • the ratio of spending by the campaign for the Democratic candidate versus spending by the campaign for the Republican candidate.

Using Net Approval for Donald Trump

In the original formulation, the favorability of the incumbent Senator was measured on a “net” basis, favorable – unfavorable, while the measure for Trump support was not.  Since most everyone polled has an opinion about the President’s job performance, the approval rating alone is typically sufficient. The sum of favorable and unfavorable job approval ratings for Donald Trump generally sum to about 96 percent.

Asking about other politicians results in much higher “don’t know” responses. On average the sum of favorable and unfavorable responses for the average Senator in this sample of races is just 79 percent with 21 percent undecided. Net approval only measures the difference between approvers and disapprovers and leaves out the undecideds.

In this reformulation of the model I put the two measures on an equal footing by imputing a net job approval figure for Trump. I have done so assuming the sum of positive and negative figures for him equals 96 percent. Then simple algebra results in the formula

(Approve – Disapprove) = 2 X Approve – 96

Using net approval for both measures improves the model’s clarity since both scores are measured in the same units, and the constant term reflects the situation where a state has a value of zero (50 approve, 50 disapprove) on both support for Trump and favorability toward the incumbent Senator (and the campaigns are spending identical amounts of money).

Using Base-Two Logarithms for Spending Figures

One other change I’ve made to the model is measuring campaign spending using logs to the base two rather than ten. Using base two makes the associated coefficient easier to interpret. An increase of one unit in this measure represents the difference between a race where both campaigns spend the same amount of money and a race where one candidate spends twice as much money as her opponent (since log2(2/1) = 1).

In this formulation we are left with two predictors. One is the difference between the Democratic candidate’s net approval and the same figure for Donald Trump. A Senate candidate who has a six-point advantage over Trump in net approval wins on average one more point at the polls (0.17 X 6 = 1.02).

The campaign spending coefficient indicates that a candidate whose campaign spends twice as much as his opponent can expect to add 1.4 percent his margin on election day.

If the difference in net approval is zero, and the candidates spend identical amounts of money so the logarithm of the ratio is also zero, then the Democratic candidate is predicted to win 49.4 percent of the two-party vote. Given that the 95% confidence interval for this value ranges from 48.1 to 50.7, a fifty-fifty outcome in this case is highly probable.

Which Factor is More Important?

One way to compare the coefficients in this model is to convert them to “standardized” units. Standardized coefficients measure the effect of each predictor if it were first divided by its standard deviation (and usually its mean subtracted as well) and applying the same transformation to the dependent variable. These standardized coefficients measure the effect in standard deviation units of a one standard deviation increase in the predictor. In that sense they provide a standard for comparing the importance of each predictor.

In this model the standardized coefficients are not all that different from one another. The standardized coefficient for the net approval variable is 0.54; for campaign spending it is 0.42.  It’s not surprising that the more partisan approval variable is slightly more important, but the difference between the two is relatively small.

Will Retirements Further Reduce the GOP’s Ranks?

I have written a couple of articles here about the net difference by party in the number of Representatives retiring from the House. I found a relatively strong relationship between retirements and the number of seats won or loss in off-year elections, but I found no relationship between the two measures in Presidential years.

These two charts tell the basic story. On the left we have the relationship for off-year elections, where changes in the number of Republican retirements correlate with the number of House seats won or lost in each election. The chart on the right presents the same measures for Presidential years. In off-year elections, the number of Republican seats won or lost depends to a degree on the difference between the number of Republicans and Democrats retiring from the House. In years like 1958 and 2018, relatively large numbers of Republicans left the House, and the party lost seats overall.

The chart on the right shows there was no systematic relationship between retirements and House results in election years dating back to 1936 when the President is on the ballot. However before we jump to the conclusion that retirements will again not be predictive in 2020, a closer look is in order.

This year 28 Republicans (counting Justin Amash) have relinquished their seats in the House of Representatives compared to nine Democrats. This difference of +19 in net Republican retirements is the second-largest number recorded for an on-year election since the New Deal, just behind the value for 2008. In that year there were 21 net Republican retirements, and the party lost 24 seats. Only the 1964 landslide election between Johnson and Goldwater saw more Republican seat losses.  Might the 2008 result be a bellwether for the result in 2020?

Using the ratings at the Cook Political Report we find two open seats in the “likely Republican” category, three in the “lean Republican” category, and five more seats considered Republican “toss-ups.” For the Democrats, two open seats fall into the “likely Democratic” category, one in the “lean Democratic” group, and just one more is considered a toss-up. Overall we have eight Republican seats in the lean/toss-up categories compared to just two Democrats.

Having as many as 19 retirements from the President’s party in a year when he is running for re-election is extremely rare. Since 1948, net retirements in years when the President is on the ballot averaged just 3.6 (both Democrats and Republicans), reaching a maximum of seven in 1996. That makes it difficult to evaluate the meaning of this year’s net departure of nineteen Republicans.  Perhaps we may not see a “blue wave” result like 2008, with its 21 net Republican retirements and a net loss of 24 GOP seats. But it wouldn’t be surprising to see the Democrats pick up some eight to ten House seats in November.

The Lag Between COVID Cases and Deaths

Observers often point to the lag between COVID cases and COVID deaths to explain the current situation of rapidly rising caseloads but no corresponding spike in deaths. Still, after accounting for caseloads 14, 28, and 42 days prior, the growth in the number of deaths seems to have leveled off starting around July 1st.

Recent data on the expansion of the coronavirus pandemic in the United States show two somewhat contradictory trends. The number of diagnosed cases has skyrocketed driven by states like Florida, Texas, Arizona, and California. While the rest of the developed world is bringing the virus under control, cases in the US are growing exponentially.

Yet even as cases are rising, the death toll attributed to the virus has leveled off.  These apparently contradictory trends can occur because of the lag between when someone is diagnosed with the virus and the time when he or she dies.  Today’s death count does not reflect today’s caseload, but the number of cases some weeks back.  To study the effects of this lag, I am using the daily reported numbers of cases and deaths for the US as a whole from Johns Hopkins.  The data begin on January 22, 2020, when the first case was reported, and continue daily through July 6th.

I tried a number of lag specifications in a simple regression model to predict total deaths from total cases.  I tried including sixty individual lags but, unsurprisingly, while they explained nearly all the variance in deaths, none of the individual terms was significant.  Eventually I settled on a model where today’s deaths depend on the number of cases 14, 28, and 42 days prior.

The model predicts that ten percent of people contracting COVID will die fourteen days later, though that effect is tempered by the number of cases at longer lags.  This could reflect “learning” by the medical providers.  As we have had growing experience with treating an ever greater number of cases, the effectiveness of treatments and procedures improved.

More interesting perhaps is this chart showing the model’s predictions for the number of deaths and the actual number.

In the first half of April, this model based solely on lagged case counts tended to under-predict the death toll, but the predicted and actual lines merge later that month and remained remarkably in lockstep through May and June.  Since July began though, the actual death count has slowed relative to the predictions based on the case count fourteen, twenty-eight, and forty-two days ago.

Since this model relies on past caseloads to predict contemporary deaths, we can extrapolate the death rate out fourteen days.  The future looks bleak with the model projecting that we could reach a total of 200,000 deaths before the end of July. We have to hope that the slower-growing trend in observed deaths persists.

Some Observations on Biden’s Margin in Presidential Polling

A simple trend model predicts Joe Biden will hold a lead in the polls between 9.8 and thirteen points on Election Day. Biden has increased his lead since the first of the year by a point every fifty days. Were Biden to win by the estimated eleven points, he would carry the Electoral College by 390-148.

Over the weekend I downloaded the complete set of presidential general election polls archived by FiveThirtyEight. For this post I’m going to concentrate my attention on national polls matching up Donald Trump and Joe Biden in head-to-head mock general elections.

Anyone paying attention to contemporary politics knows that Biden has led Trump in recent polling, but the extent of Biden’s margin is impressive when all the polls are taken together. Here are the 265 polls pitting Biden against Trump conducted since January 1st, 2020:

The average lead is about 6.5 points, but more commonly Biden leads by seven or eight points.

Let’s turn now to my standard model for polling data which I have used back to 2008.  This simple model combines the number of days left before the election and various characteristics like the population sampled, the polling method used, and measures of individual “house effects.”

The most significant results from these regressions are the constant term and the coefficient on days before the election.  First, the constant term predicts the size of Biden’s polling lead on election day, when the days before the election variable is zero. If current trends continued until the election, Joe Biden would have an eleven-point edge in national polling.  The standard error for this estimate of the constant is 0.81, meaning the likely range of margins for Biden would fall between 9.8 and thirteen percent.

The negative coefficient on days before the election means that statistically, since the first of the year, Joe Biden has been slowly gaining ground on Donald Trump. However, with a coefficient of just -0.02, it takes fifty days for Biden to gain a full point on Trump.  That comes to just under three points by election day.

In model (1), live phone polls show a small bias in favor of Biden. Some might read this as evidence of “shy” Trump voters who are unwilling to tell live interviewers their true preference for Trump but have no trouble doing so when they are using some form of automated polling. As it turns out, the effect for live interviews goes away once we account for individual pollsters’ “house effects.”  The same is true for the small pro-Trump effect seen for polls of registered voters. It too vanishes when we account of differences among pollsters.  All of the pollsters for which I find significant effects report figures more favorable to Trump compared to the consensus of all pollsters.

It’s hard to understate how big an eleven point lead would be. The implied two-party vote division of 55.5-45.5% would be the largest Democratic victory since Lyndon Johnson’s landslide over Barry Goldwater in 1964.  Given the historical relationship between the popular and Electoral College votes, a 55.5% win in the popular vote translates to a 72% victory in the College, or a margin of 390-148 Electoral Votes.

Technical Appendix: Estimating COVID Caseloads in the States

The Johns Hopkins Center for Systems Science and Engineering deserve kudos for providing daily statistics of the spread of the novel coronavirus known as COVID-19. Data on confirmed cases, deaths, tests conducted, and hospitalizations are available for a variety of geographic units. For the US, there are data for counties and aggregates for states. I’m going to focus on the state-level measures and present a few “regression experiments” using various predictors for the number of cases reported by each state.

The Baseline Model

The dependent variable in all the models I will present is the base-10 logarithm of total number of cases confirmed for each state on April 24, 2020.  These range from a high of 271,590 cases in New York state to a low of 339 cases confirmed in Alaska. In my initial model (1) below I include a state’s area and population size as predictors for the number of cases.  By using logs on both sides of the equation, the coefficient estimates are “elasticities,” measuring the proportional effect of a one-percent increase in a predictor.

COVID’s spread is much more determined by the size of a state’s population than its area. Moreover the coefficient of 1.26 means that states with larger populations have disproportionately more cases, no doubt a consequence of the contagion effect.

At the bottom of the column for model (1) is the coefficient for a “dummy” variable representing New York state.  In this simple size-based model, New York has (10^0.84), or 6.9, times the number of cases that its population and area would predict.  The reason for this will become clear in a moment.

Testing, Testing Testing

In model (2) I add the estimated proportion of the population that has been tested for the virus as of April 17th, a week before the caseload figures. The testing numbers also come from Johns Hopkins. For this measure, and all the proportions that follow, I calculate the “logit” of the estimated proportion. For the testing measure this works out to:

logit(testing) = ln(number_tested/(total_population – number_tested))

The quantity number_tested/(total_population – number_tested) measures the odds that a random person in the state’s population has been tested for the virus. Taking the logarithm of this quantity produces a measure that ranges over the entire number line.

Testing has a strong statistical relationship to the number of identified coronavirus cases in a state. Moreover the coefficient has a plausible magnitude.  If we increase testing by one percent, the expected number of cases will grow by 0.4 percent.  In other words, increasing testing at the margin identifies an infection in about forty percent of those newly tested.

Notice how the effect for a state’s physical area declines when testing is accounted for. One apparent reason why large states have fewer cases is because it is more difficult to test for the virus over a larger area.

Finally, when testing is accounted for, the caseload for the state of New York is no different from any other state with its population size and physical area.

We can simulate the effects of testing by imagining a fairly typical state with five million people living on 50,000 square miles of land area, then using the coefficients from model (1) to see how the estimated number of confirmed cases varies with the testing rate. This chart shows how the infection rate, the proportion of the population found to have the virus, increases with the rate at which the population is tested.

If we test only one percent of the state’s population, we will find about 0.1 percent of the population with a COVID infection. If we test five percent of the population, about 0.6 percent of that state’s people will be identified as having the virus.*

Old Folks in Homes

Now lets turn to some demographic factors that are thought to increase caseloads. First is the age of the population. In general, it is thought that older people have more susceptibility to the virus. However, model (3) shows there is little evidence that states with larger proportions of elderly have greater caseloads. What does matter, as model (4) shows, is the proportion of a state’s 75 and older population living in nursing facilities. When the virus gets into one of these facilities, it can run rampant throughout the resident population and the staff.

Race, Ethnicity, and Location

Reports of higher rates of infection among black and Hispanic Americans appear in these data as well.  In model (5), it appears the effect of larger Hispanic populations is twice that of equivalent black populations.  If we also adjust for the size distribution of a state’s population in model (6), the effect of its proportion Hispanic declines. This pattern suggests that Hispanics are more likely to live in smaller communities than other ethnic groups.

It is important to remember that these analyses apply to states. Finding no relationship between the proportion of a state’s population that is Native American and the state’s number of coronavirus cases does not imply that native populations are more or less at risk.  For that we need data at the individual level where we find that Native populations are more at risk.

I’ve also said nothing about deaths arising from the novel coronavirus.  That is the subject of my next report.



*We have no way of knowing what the “true” number of cases are; we have only the Johns Hopkins figures for “confirmed” cases.

Senate Elections in a Time of Economic Contraction

The novel corona virus pretty much guarantees that the American economy will decline this year. While the President and most pundits have focused on how a falling economy might affect his re-election, an economy in recession also improves the Democrats’ chances of taking control of the Senate in 2021. A ten percent decline in real GDP translates into the Democrats winning about 53 percent of the national vote for Senate candidates.

Pretty much every forecaster predicts that the economy will contract substantially over the next three months as large portions of the American economy remain idle in the face of the COVID-19 pandemic.  Most of these forecasts are clearly guesswork since we still have only a glimmer about the toll the virus will take on the U.S. economy.  Fortune magazine describes forecasts for the second quarter as ranging from “horrible” to “catastrophic,” with the estimated change in real Gross Domestic Product (GDP) in the range of -8% to -15%.  Morgan Stanley‘s estimate is especially grim, predicting a decline of -38%. Like many other analysts Morgan Stanley expects the economy to rebound some in the third quarter, but the rebound will not be sufficient to overcome the enormous declines of the first half of 2020.  They expect the year to end with real GDP down by 5.5%.

These declines eclipse anything we have seen since World War II.  The economy contracted by about 3.3% during the recession year of 2009 and fell between 2.2% and 2.9% in the earlier recessions of 1958, 1975, and 1982.

Back in 2016 I constructed a “simple model of Senate elections” that looked at how political and economic factors influence the nationwide Senatorial vote since the War.  Three factors proved to have statistically significant relationships with the share of the vote won by the President’s co-partisans in those years. One of these factors favors the Republicans, the fact that Donald Trump will head the ticket in November.* The President’s party has won, on average, 51 percent of the two-party vote for the Senate in years when the President heads the ticket, compared to just 47 percent in elections when the President is not running.  (This includes both off-year elections, and open-seat Presidential elections like 2016.)

Two factors favor the Democrats in 2020.  One is a weak “regression-toward-the-mean” effect based on the votes won in the Senate elections six years earlier. Senators who win election with an above-average share of the vote in one election are likely to see their vote decline slightly when they run for re-election six years hence.  Republicans did unusually well in the 2014 mid-term elections so we might expect their vote shares fall back slightly in 2020.

The economy also plays a role. My model uses the year-on-year change in real per-capita disposable income as of September as a measure of the state of the economy.  I will use this “simple model” to estimate the effects of the likely recession on the upcoming Senate vote in 2020.

Forecasters rarely estimate the change in real per-capita disposable income and focus instead on changes in real GDP or employment. Unsurprisingly, though, changes in real GDP do filter through to personal income as shown in this chart.

I have marked the seven recessionary years, ones where real GDP fell year-over-year.  One thing to notice is that even when real GDP remains flat, personal income is still predicted to grow by one percent.  Moreover, only 37 percent of changes in real GDP are transmitted to personal income.

I have used this “simple” model to examine how different predictions for the state of economy in November might translate into Senate electoral outcomes.**  The baseline appears on the line below with zero growth in GDP. The Republicans are predicted to win about 48% of the nationwide vote for Senate candidates in such an election. This estimate combines the positive effect of having the President on the ticket with the negative effect of the Republicans’ substantial victory in the 2014 midterm elections where their candidates won 53.5 percent of the two-party vote. In the context of my model those factors predict that the Republicans will win 48.3 percent of the nationwide two-party vote for Senate.

Because changes in GDP are attenuated when translated into changes in per-capita disposable income, even a drop of ten percent in GDP results in a vote for the Republicans that is just one percentage point less than if GDP remained flat. Even if the worst predictions of the forecasters hold true, and GDP falls by twenty percent, the predicted Republican vote falls to only 46 percent.


*Presidential popularity does not appear to play a role in on-year elections, though it does matter for elections held in off-years.

**These results are based on a reestimation of the published model including data for the 2016 and 2018 elections. While the coefficients change slightly, none of the substantive conclusions are altered.

The Politics of Stay-At-Home Orders

On her blog, journalist Marcy Wheeler helpfully tallied the twenty-seven states whose governors have imposed stay-at-home orders during the COVID-19 pandemic. Virginia joined this group late Monday. I have used her data, and figures from Johns Hopkins University on the number of identified cases, to do a quick analysis of the political forces driving the decision to impose such orders.

I used simple binary logit models for these tests.  The predictors include whether each state’s governor and legislature is controlled by Democrats, the February net job approval rating (approve – disapprove) for Donald Trump in each state from Morning Consult, and the number of reported cases in each state as of March 15th and March 30th.  Model (1) below includes all these factors; model (2) includes just the two that proved significant.

As you can see, only two factors proved nominally “significant,” whether the governor is a Democrat, and Trump’s approval rating in the state.  States with Democratic governors, and those where Trump’s net job approval is negative (“underwater”), are more likely to have instituted a stay-at-home policy. The number of COVID-19 cases surprisingly did not seem to matter.  (Using the logarithms of the number of cases did not improve things nor did looking at rates of growth.)

Using these results, I have generated the predicted probability that each state will have instituted a stay-at-home order and compared those predictions to the actual policies.

There are ten states where the predicted policy does not match the actual decision.  Thirty-three states are predicted to have imposed stay-at-home orders, but only twenty-eight have done so. Democratic strongholds like California and New York all have predicted probabilities above 0.9. However Nevada, Maine, Pennsylvania, Massachusetts, and Kentucky should all have instituted stay-at-home policies but have yet to do so.  In contrast, the governors of West Virginia, Idaho, Indiana, Alaska, and Ohio have all instituted such policies despite the political context of their states.

We can use the same set of predictors to estimate the duration of a state quarantine. Here I use a “Tobit” model, which handles dependent variables with zero lower bounds. States without a quarantine are coded zero on the duration variable.

The general pattern here is the same as for whether a quarantine was imposed.  However, the growth in cases between 3/15 and 3/30 has a weak statistical relationship with duration. Because the case figures are expressed as base-10 logarithms, the coefficient of 26.8 implies that a state whose caseload grew by a factor of ten during the latter half of March would impose a quarantine of 26.8 days, other things equal.


Money in Senate Elections

Senate campaigns that outspent their opponents by two-to-one in 2016 and 2018 typically gained a bit over one percent at the polls. Spending by outside groups, and the “quality” of challengers, had no measurable effects.

My earlier model of recent contests for the U.S. Senate relied entirely on two measures of popularity, the favorability score for the incumbent Senators in their states, and support for President Trump in those same states.  While those two measures alone explain 81 percent of the variance in the vote for Senatorial candidates, the model obviously lacks a few important items, most notably data on campaign spending and on challenger “quality.”  In this post I add measures of both these factors.

For campaign spending I have used the figures reported to the Federal Election Commission and compiled by OpenSecrets.  I chose to use spending rather than funds raised because in most cases campaigns spent nearly all of what they raise, and sometimes more. For instance, here is the record for campaign spending in the 2018 Missouri Senate race where incumbent Claire McCaskell lost to Republican Josh Hawley, then Attorney General.

The other major source of campaign financing is, of course, spending by outside groups.  Here, OpenSecrets separately reports funding in support of and opposed to each candidate.  My measure of outside spending adds together monies spent supporting a candidate and those spent criticizing her opponent. I use the base-10 logarithm of spending which has a better fit to the data and incorporates the basic economic intuition of decreasing returns to scale.

Spending by the Campaigns

I first added the campaign spending figures for Republicans and Democrats separately with results as shown in column (2). Democratic spending appears to have had a larger effect than Republican spending, but a statistical hypothesis test showed the two coefficients were not significantly different in magnitude. So in (3) I use the difference between the two spending measures, which is equivalent to the base-10 logarithm of the ratio of Democratic to Republican spending.*

An increase of one unit in these logarithms is equivalent to multiplication by ten. So the coefficient of 4.39 tells us that a ten-fold increase in the Democrats’ spending advantage would improve their share of the two-party vote by somewhat over four percent.  While a ten-fold advantage might seem implausibly high, some races have seen such lopsided spending totals. In Alabama’s 2016 Senate election Republican incumbent Richard Shelby spent over twelve million dollars on his race for re-election; his Democratic opponent spent less than $30,000. In that same year, Hawaii Democrat Brian Schatz spent nearly eight million dollars while his opponent spent $54,000.  These sorts of drastic imbalances typically appear in non-competitive races where the incumbents are seen as shoo-ins to retain their seats.

To see more intuitively how spending affects results I have plotted the predicted change in the Democratic vote for various ratios of Democratic to Republican spending.  The state codes represent the seven most competitive races as identified by my model. (I will examine the implications for 2020 in a separate post.)

In states where the Democrats outspent the Republicans by a ratio of two-to-one, the Democrats were rewarded on average with an increase of about 1.3 percent in their vote shares.

Spending by Outside Groups

In sharp contrast to the results for spending by the campaigns themselves, I find no systematic influence for spending by outside groups. Neither including separate terms for pro-Democratic and pro-Republican outside spending as in model (4) above, nor including the difference between those figures in model (5), displays significant effects.

While I’m not ready to make strong claims for this rather surprising finding without an expansive review of the literature on spending in Senate campaigns,1 I don’t find the result all that surprising. Since outside groups may not, by law, “coordinate” with the campaigns they support, these groups must focus their attention on television advertising, direct mail, and other messaging strategies.  Perhaps these strategies simply are not as effective as they once were, as demonstrated by the Presidential primary candidacies of Michael Bloomberg and Tom Steyer. They both spent hundreds of millions of dollars on television advertising but garnered few votes on election day.

Effects of Challenger “Quality”

Another common factor used to explain legislative elections is the “quality” of the challengers that choose to take on an incumbent. While some people launch vanity Senatorial campaign to make themselves better known to the public at-large, most Senatorial bids are undertaken by people who already hold elective office at either the state or the Federal level.  I have coded the backgrounds for the challengers facing each incumbent in my dataset of 2016 and 2018 elections.  They fell into four categories — current or former Members of Congress, current or former members of the state legislature, governors and others who have held state-wide office, and a miscellaneous category that includes local-level politicians like mayors and non-politicians like activists.  I find no statistical effects for any of these categories either separately or in combination.

We are thus left with a model of Senate elections that includes three factors — the incumbent’s net favorability, the state’s level of support for Donald Trump, and the ratio of spending by incumbent and opposition campaigns.



*Remember from high-school math that log(A/B) = log(A) – log(B).

1I have since discovered this article examining television advertising in Senatorial elections using data for the 2010 and 2012 elections. The authors use a novel technique that compares adjacent counties that reside in different media markets. Overall, they find significant effects on vote share for negative (but not positive) advertising by the candidates and no effects for advertising by PACs. This paper by political scientists John Sides, Lynn Vavreck, and Christopher Warshaw find significant effects for television advertising in Senate races, but again they find like I do that the effects are small. A change from -3 standard deviations to +3 standard deviations in advertising produced just a 1% change in Senate races. They do not analyze the effects of spending by the campaigns versus that by outside groups.

Senate Update, March, 2020

The Democrats have a decent chance to take control of the Senate.

I have updated my Senate predictions using the fourth-quarter, 2019, favorability data for Senators and February, 2020, job approval ratings for Donald Trump. Both come from Morning Consult.  I have also cleaned up a few errors in the earlier data used to estimate the model’s coefficients. Here are the updated results:

Maine’s Susan Collins now joins Alabama’s Doug Jones as the most-vulnerable Senators up for re-election.  Both Senators face adverse political environments in the states they represent.  Mainers don’t care for Collins very much, and they’re slightly negative when it comes to Donald Trump. Unlike Collins, Jones is liked by a plurality of Alabamians, but Trump is liked so much more that it overwhelms Jones’s personal popularity.

Steve Bullock’s musings about running against incumbent Montana Senator Steve Daines find little support in the data here.  Both Daines and Donald Trump have positive ratings in Big Sky Country, with the Senator predicted to win re-election with 57 percent of the vote. Jaime Harrison also faces a pretty uphill quest in his bid to oust Lindsey Graham in South Carolina.

If these estimates were to hold, the Democrats stand a good chance of flipping the Senate in November. If Jones, Collins, Gardner, and Ernst all lose, the Democrats would net three seats. That would create a 50-50 tie and require the Vice President to be decisive.  Also defeating one of McConnell, McSally, or Tillis would give the Democrats a 51-seat majority.


The 2020 Senate Elections

In a prior series of posts, I constructed a “simple model of Senate elections” using national data across elections.  This helped identify some key factors that influence the overall vote for Senators but provided no insight on the results in specific states.  In this post I develop another “simple” model that is designed to predict the voting outcome based on two factors, a state’s partisanship as measured by support for President Trump, and the net favorability of the incumbent Senator.  I estimated the model using data from the 2016 and 2018 elections. The results appear here and are best summarized in this chart:

The lines portray how the vote for an incumbent Senate Democrat improves as her net favorability grows. The top line represents the result for a Senator from a strongly pro-Democratic state, one where only 40% of the state’s voters approve of the President.  Even a Democratic incumbent with a net favorability of zero is predicted to win nearly 55% of the vote in this state and hold the seat.  In contrast, a Democratic incumbent in a pro-Trump state like Doug Jones in Alabama fails to win 50% of the vote even if he is unusually popular despite the party mismatch.  Overall the Republicans hold a slight advantage. The model predicts that in a state where support for Trump is 50-50, the purple line, only a Democratic incumbent with at least a +8 favorability has a chance of holding the seat.

We can apply the results of this model to the 2020 Senate elections.  We only have available the current measures for Trump support and candidate favorability, so we obviously cannot predict how things will stand a year from now.  For the estimates below, I have used the most recent Trump approval rating and Senate incumbent favorability ratings as reported by Morning Consult.  The President’s score is from the month ending September 1st; the Senators’ ratings are averages over the third quarter, July-September, 2019.

The highlighted rows at the top of the table correspond to incumbent Senators whose predicted vote is below fifty percent.The top and bottom spots on the list are held by Democrats. The most vulnerable incumbent is Doug Jones’s whose slight positive favorability rating of +5 is nowhere near large enough to overcome Alabama’s warm feelings for Donald Trump.

Jones is followed by the four most commonly discussed vulnerable Republicans — Susan Collins of Maine, Cory Gardner of Colorado, Joni Ernst of Iowa, and Thom Tillis of North Carolina. Martha McSally would hold her Arizona seat by the slimmest of margins. Majority Leader Mitch McConnell is lucky to represent solidly pro-Trump Kentucky or else his dismal favorability score might lead to his defeat.

It’s anyone’s guess what Donald Trump’s approval rating might be come the election next November, though his score has remained remarkably persistent in the face of events.  Using the averages at FiveThirtyEight, we see his low point came in the summer of 2017 when he fell to 37%. Over that winter and into the spring of 2018 his approval rating improved to about 42% where it has largely remained. There was a dip in his popularity during the government shutdown, and another now as the impeachment inquiry expands.  Given the observed variation in his popularity since the Inauguration, Trump’s approval rating might move up or down by three or four points over the course of the next year.  A four-point movement would represent a ten-percent change from his current rating of 41%.  The chart below shows how each Senator’s predicted vote would change given a ten percent increase or decrease in Trump’s approval rating in each state.

The four Senators at the top of the list in the darker grey area are predicted to lose their seats even if Trump’s approval rating were to improve by ten percent.  The next three Senators survive their re-election bids if Trump’s approval runs about where it is today or improves by November, 2020.  However a ten-percent decline in Trump’s approval threatens the seats of Thom Tillis, Martha McSally, and even Mitch McConnell.

Right now the Republicans control the Senate by a 53-47 margin, plus the tie-breaking vote of the Vice President. Assuming a Democratic victory in the Presidential election next fall, the Democrats need to flip at least three seats, while losing Alabama back to the Republicans. Maine, Colorado, and Iowa look promising for the Democrats and North Carolina and Arizona are both tightly contested.

Four Republican seats have vacancies. In Georgia a special election will be conducted in 2020 alongside the regular election to fill the seat that Johnny Isakson will leave at the end of this year.  Three other Republican-held seats will also be vacant in 2020.  My model predicts the Republicans will hold all these seats with Georgia the most competitive.  (To construct these estimates I impute a favorability score for a “normal” Democrat by regressing net favorability on Trump support to account for the partisanship component of favorability.)

In strong Republican states like Wyoming and Tennessee, we see support for Trump running in the mid-fifties.  In states like these, a Democratic challenger would do well to card a favorability score better than -15.  The states where the Democrat might have some chance are Georgia and Kansas, where support for Trump splits evenly, but still the Democrats are predicted to lose those elections by three or four points.