Over the weekend I downloaded the complete set of presidential general election polls archived by FiveThirtyEight. For this post I’m going to concentrate my attention on national polls matching up Donald Trump and Joe Biden in head-to-head mock general elections.

Anyone paying attention to contemporary politics knows that Biden has led Trump in recent polling, but the extent of Biden’s margin is impressive when all the polls are taken together. Here are the 265 polls pitting Biden against Trump conducted since January 1st, 2020:

The average lead is about 6.5 points, but more commonly Biden leads by seven or eight points.

Let’s turn now to my standard model for polling data which I have used back to 2008. This simple model combines the number of days left before the election and various characteristics like the population sampled, the polling method used, and measures of individual “house effects.”

The most significant results from these regressions are the constant term and the coefficient on days before the election. First, the constant term predicts the size of Biden’s polling lead on election day, when the days before the election variable is zero. **If current trends continued until the election, Joe Biden would have an eleven-point edge in national polling**. The standard error for this estimate of the constant is 0.81, meaning the likely range of margins for Biden would fall between 9.8 and thirteen percent.

The negative coefficient on days before the election means that statistically, since the first of the year, **Joe Biden has been slowly gaining ground on Donald Trump.** However, with a coefficient of just -0.02, **it takes fifty days for Biden to gain a full point on Trump**. That comes to just under three points by election day.

In model (1), live phone polls show a small bias in favor of Biden. Some might read this as evidence of “shy” Trump voters who are unwilling to tell live interviewers their true preference for Trump but have no trouble doing so when they are using some form of automated polling. As it turns out, the effect for live interviews goes away once we account for individual pollsters’ “house effects.” The same is true for the small pro-Trump effect seen for polls of registered voters. It too vanishes when we account of differences among pollsters. All of the pollsters for which I find significant effects report figures more favorable to Trump compared to the consensus of all pollsters.

It’s hard to understate how big an eleven point lead would be. The implied two-party vote division of 55.5-45.5% would be the largest Democratic victory since Lyndon Johnson’s landslide over Barry Goldwater in 1964. Given the historical relationship between the popular and Electoral College votes, a 55.5% win in the popular vote translates to a 72% victory in the College, or a margin of 390-148 Electoral Votes.

]]>The dependent variable in all the models I will present is the base-10 logarithm of total number of cases confirmed for each state on April 24, 2020. These range from a high of 271,590 cases in New York state to a low of 339 cases confirmed in Alaska. In my initial model (1) below I include a state’s area and population size as predictors for the number of cases. By using logs on both sides of the equation, the coefficient estimates are “elasticities,” measuring the proportional effect of a one-percent increase in a predictor.

COVID’s spread is much more determined by the size of a state’s population than its area. Moreover the coefficient of 1.26 means that states with larger populations have disproportionately more cases, no doubt a consequence of the contagion effect.

At the bottom of the column for model (1) is the coefficient for a “dummy” variable representing New York state. In this simple size-based model, New York has (10^0.84), or 6.9, times the number of cases that its population and area would predict. The reason for this will become clear in a moment.

In model (2) I add the estimated proportion of the population that has been tested for the virus as of April 17th, a week before the caseload figures. The testing numbers also come from Johns Hopkins. For this measure, and all the proportions that follow, I calculate the “logit” of the estimated proportion. For the testing measure this works out to:

logit(testing) = ln(number_tested/(total_population – number_tested))

The quantity number_tested/(total_population – number_tested) measures the odds that a random person in the state’s population has been tested for the virus. Taking the logarithm of this quantity produces a measure that ranges over the entire number line.

Testing has a strong statistical relationship to the number of identified coronavirus cases in a state. Moreover the coefficient has a plausible magnitude. If we increase testing by one percent, the expected number of cases will grow by 0.4 percent. In other words, increasing testing at the margin identifies an infection in about forty percent of those newly tested.

Notice how the effect for a state’s physical area declines when testing is accounted for. One apparent reason why large states have fewer cases is because it is more difficult to test for the virus over a larger area.

Finally, when testing is accounted for, the caseload for the state of New York is no different from any other state with its population size and physical area.

We can simulate the effects of testing by imagining a fairly typical state with five million people living on 50,000 square miles of land area, then using the coefficients from model (1) to see how the estimated number of confirmed cases varies with the testing rate. This chart shows how the infection rate, the proportion of the population found to have the virus, increases with the rate at which the population is tested.

If we test only one percent of the state’s population, we will find about 0.1 percent of the population with a COVID infection. If we test five percent of the population, about 0.6 percent of that state’s people will be identified as having the virus.*

Now lets turn to some demographic factors that are thought to increase caseloads. First is the age of the population. In general, it is thought that older people have more susceptibility to the virus. However, model (3) shows there is little evidence that states with larger proportions of elderly have greater caseloads. What does matter, as model (4) shows, is the proportion of a state’s 75 and older population living in nursing facilities. When the virus gets into one of these facilities, it can run rampant throughout the resident population and the staff.

Reports of higher rates of infection among black and Hispanic Americans appear in these data as well. In model (5), it appears the effect of larger Hispanic populations is twice that of equivalent black populations. If we also adjust for the size distribution of a state’s population in model (6), the effect of its proportion Hispanic declines. This pattern suggests that Hispanics are more likely to live in smaller communities than other ethnic groups.

It is important to remember that these analyses apply to states. Finding no relationship between the proportion of a state’s population that is Native American and the state’s number of coronavirus cases does not imply that native populations are more or less at risk. For that we need data at the individual level where we find that Native populations are more at risk.

I’ve also said nothing about deaths arising from the novel coronavirus. That is the subject of my next report.

____________________

*We have no way of knowing what the “true” number of cases are; we have only the Johns Hopkins figures for “confirmed” cases.

]]>Pretty much every forecaster predicts that the economy will contract substantially over the next three months as large portions of the American economy remain idle in the face of the COVID-19 pandemic. Most of these forecasts are clearly guesswork since we still have only a glimmer about the toll the virus will take on the U.S. economy. *Fortune* magazine describes forecasts for the second quarter as ranging from “horrible” to “catastrophic,” with the estimated change in real Gross Domestic Product (GDP) in the range of -8% to -15%. Morgan Stanley‘s estimate is especially grim, predicting a decline of -38%. Like many other analysts Morgan Stanley expects the economy to rebound some in the third quarter, but the rebound will not be sufficient to overcome the enormous declines of the first half of 2020. They expect the year to end with real GDP down by 5.5%.

These declines eclipse anything we have seen since World War II. The economy contracted by about 3.3% during the recession year of 2009 and fell between 2.2% and 2.9% in the earlier recessions of 1958, 1975, and 1982.

Back in 2016 I constructed a “simple model of Senate elections” that looked at how political and economic factors influence the nationwide Senatorial vote since the War. Three factors proved to have statistically significant relationships with the share of the vote won by the President’s co-partisans in those years. One of these factors favors the Republicans, the fact that Donald Trump will head the ticket in November.* The President’s party has won, on average, 51 percent of the two-party vote for the Senate in years when the President heads the ticket, compared to just 47 percent in elections when the President is not running. (This includes both off-year elections, and open-seat Presidential elections like 2016.)

Two factors favor the Democrats in 2020. One is a weak “regression-toward-the-mean” effect based on the votes won in the Senate elections six years earlier. Senators who win election with an above-average share of the vote in one election are likely to see their vote decline slightly when they run for re-election six years hence. Republicans did unusually well in the 2014 mid-term elections so we might expect their vote shares fall back slightly in 2020.

The economy also plays a role. My model uses the year-on-year change in real per-capita disposable income as of September as a measure of the state of the economy. I will use this “simple model” to estimate the effects of the likely recession on the upcoming Senate vote in 2020.

Forecasters rarely estimate the change in real per-capita disposable income and focus instead on changes in real GDP or employment. Unsurprisingly, though, changes in real GDP do filter through to personal income as shown in this chart.

I have marked the seven recessionary years, ones where real GDP fell year-over-year. One thing to notice is that even when real GDP remains flat, personal income is still predicted to grow by one percent. Moreover, only 37 percent of changes in real GDP are transmitted to personal income.

I have used this “simple” model to examine how different predictions for the state of economy in November might translate into Senate electoral outcomes.** The baseline appears on the line below with zero growth in GDP. The Republicans are predicted to win about 48% of the nationwide vote for Senate candidates in such an election. This estimate combines the positive effect of having the President on the ticket with the negative effect of the Republicans’ substantial victory in the 2014 midterm elections where their candidates won 53.5 percent of the two-party vote. In the context of my model those factors predict that the Republicans will win 48.3 percent of the nationwide two-party vote for Senate.

Because changes in GDP are attenuated when translated into changes in per-capita disposable income, even a drop of ten percent in GDP results in a vote for the Republicans that is just one percentage point less than if GDP remained flat. Even if the worst predictions of the forecasters hold true, and GDP falls by twenty percent, the predicted Republican vote falls to only 46 percent.

____________________

*Presidential popularity does not appear to play a role in on-year elections, though it does matter for elections held in off-years.

**These results are based on a reestimation of the published model including data for the 2016 and 2018 elections. While the coefficients change slightly, none of the substantive conclusions are altered.

]]>I used simple binary logit models for these tests. The predictors include whether each state’s governor and legislature is controlled by Democrats, the February net job approval rating (approve – disapprove) for Donald Trump in each state from Morning Consult, and the number of reported cases in each state as of March 15th and March 30th. Model (1) below includes all these factors; model (2) includes just the two that proved significant.

As you can see, only two factors proved nominally “significant,” whether the governor is a Democrat, and Trump’s approval rating in the state. States with Democratic governors, and those where Trump’s net job approval is negative (“underwater”), are more likely to have instituted a stay-at-home policy. The number of COVID-19 cases surprisingly did not seem to matter. (Using the logarithms of the number of cases did not improve things nor did looking at rates of growth.)

Using these results, I have generated the predicted probability that each state will have instituted a stay-at-home order and compared those predictions to the actual policies.

There are ten states where the predicted policy does not match the actual decision. Thirty-three states are predicted to have imposed stay-at-home orders, but only twenty-eight have done so. Democratic strongholds like California and New York all have predicted probabilities above 0.9. However Nevada, Maine, Pennsylvania, Massachusetts, and Kentucky should all have instituted stay-at-home policies but have yet to do so. In contrast, the governors of West Virginia, Idaho, Indiana, Alaska, and Ohio have all instituted such policies despite the political context of their states.

We can use the same set of predictors to estimate the duration of a state quarantine. Here I use a “Tobit” model, which handles dependent variables with zero lower bounds. States without a quarantine are coded zero on the duration variable.

The general pattern here is the same as for whether a quarantine was imposed. However, the growth in cases between 3/15 and 3/30 has a weak statistical relationship with duration. Because the case figures are expressed as base-10 logarithms, the coefficient of 26.8 implies that a state whose caseload grew by a factor of ten during the latter half of March would impose a quarantine of 26.8 days, other things equal.

]]>

My earlier model of recent contest for the U.S. Senate relied entirely on two measures of popularity, the favorability score for the incumbent Senators in their states, and support for President Trump in those same states. While those two measures alone explain 81 percent of the variance in the vote for Senatorial candidates, the model obviously lacks a few important items, most notably data on campaign spending and on challenger “quality.” In this post I add measures of campaign spending and of challenger quality.

For campaign spending I have used the figures reported to the Federal Election Commission and reported by OpenSecrets. I chose to use spending rather than funds raised because most cases campaigns spent nearly all of what they raise, and sometimes more. For instance, here is the record for campaign spending in the 2018 Missouri Senate race where incumbent Claire McCaskell lost to Republican Josh Hawley, then Attorney General.

The other major source of campaign financing is, of course, spending by outside groups. Here, OpenSecrets separately reports funding in support of and opposed to each candidate. My measure of outside spending adds together monies spent supporting a candidate and those spent criticizing her opponent.^{1} I use the base-10 logarithm of spending which has a better fit to the data and incorporates the basic economic intuition of decreasing returns to scale.

I first added the campaign spending figures for Republicans and Democrats separately with results as shown in column (2). Democratic spending appears to have had a larger effect than Republican spending, but a statistical hypothesis test showed the two coefficients were not significantly different in magnitude. So in (3) I use the difference between the two spending measures, which is equivalent to the base-10 logarithm of the ratio of Democratic to Republican spending.*

An increase of one unit in these logarithms is equivalent to multiplication by ten. So the coefficient of 4.39 tells us that a ten-fold increase in the Democrats’ spending advantage would improve their share of the two-party vote by somewhat over four percent. While a ten-fold advantage might seem implausibly high, some races have seen such lopsided spending totals. In Alabama’s 2016 Senate election Republican incumbent Richard Shelby spent over twelve million dollars on his race for re-election; his Democratic opponent spent less than $30,000. In that same year, Hawaii Democrat Brian Schatz spent nearly eight million dollars while his opponent spent $54,000. These sorts of drastic imbalances typically appear in non-competitive races where the incumbents are seen as shoo-ins to retain their seats.

To see more intuitively how spending affects results I have plotted the predicted change in the Democratic vote for various ratios of Democratic to Republican spending. The state codes represent the seven most competitive races as identified by my model. (I will examine the implications for 2020 in a separate post.)

In states where the Democrats outspent the Republicans by a ratio of two-to-one, the Democrats were rewarded on average with an increase of about 1.3 percent in their vote shares.

In sharp contrast to the results for spending by the campaigns themselves, I find no systematic influence for spending by outside groups. Neither including separate terms for pro-Democratic and pro-Republican outside spending as in model (4) above, nor including the difference between those figures in model (5), displays significant effects.

While I’m not ready to make strong claims for this rather surprising finding without an expansive review of the literature on spending in Senate campaigns, I don’t find the result all that surprising. Since outside groups may not, by law, “coordinate” with the campaigns they support, these groups must focus their attention on television advertising, direct mail, and other messaging strategies. Perhaps these strategies simply are not as effective as they once were, as demonstrated by the Presidential primary candidacies of Michael Bloomberg and Tom Steyer. They both spent hundreds of millions of dollars on television advertising but garnered few votes on election day.

Another common factor used to explain legislative elections is the “quality” of the challengers that choose to take on an incumbent. While some people launch vanity Senatorial campaign to make themselves better known to the public at-large, most Senatorial bids are undertaken by people who already hold elective office at either the state or the Federal level. I have coded the backgrounds for the challenger facing each incumbent in my dataset of 2016 and 2018 elections. They fell into four categories — current or former Members of Congress, current or former members of the state legislature, governors and others who have held state-wide office, and a miscellaneous category that includes local-level politicians like mayors and non-politicians like activists. I find no statistical effects for any of these categories either separately or in combination.

We are thus left with a model of Senate elections that includes three factors — the incumbent’s net favorability, the state’s level of support for Donald Trump, and the ratio of spending by incumbent and opposition campaigns.

____________________

*Remember from high-school math that log(A/B) = log(A) – log(B).

]]>

I have updated my Senate predictions using the fourth-quarter, 2019, favorability data for Senators and February, 2020, job approval ratings for Donald Trump. Both come from Morning Consult. I have also cleaned up a few errors in the earlier data used to estimate the model’s coefficients. Here are the updated results:

Maine’s Susan Collins now joins Alabama’s Doug Jones as the most-vulnerable Senators up for re-election. Both Senators face adverse political environments in the states they represent. Mainers don’t care for Collins very much, and they’re slightly negative when it comes to Donald Trump. Unlike Collins, Jones is liked by a plurality of Alabamians, but Trump is liked so much more that it overwhelms Jones’s personal popularity.

Steve Bullock’s musings about running against incumbent Montana Senator Steve Daines find little support in the data here. Both Daines and Donald Trump have positive ratings in Big Sky Country, with the Senator predicted to win re-election with 57 percent of the vote. Jaime Harrison also faces a pretty uphill quest in his bid to oust Lindsey Graham in South Carolina.

If these estimates were to hold, the Democrats stand a good chance of flipping the Senate in November. If Jones, Collins, Gardner, and Ernst all lose, the Democrats would net three seats. That would create a 50-50 tie and require the Vice President to be decisive. Also defeating one of McConnell, McSally, or Tillis would give the Democrats a 51-seat majority.

]]>

The lines portray how the vote for an incumbent Senate Democrat improves as her net favorability grows. The top line represents the result for a Senator from a strongly pro-Democratic state, one where only 40% of the state’s voters approve of the President. Even a Democratic incumbent with a net favorability of zero is predicted to win nearly 55% of the vote in this state and hold the seat. In contrast, a Democratic incumbent in a pro-Trump state like Doug Jones in Alabama fails to win 50% of the vote even if he is unusually popular despite the party mismatch. Overall the Republicans hold a slight advantage. The model predicts that in a state where support for Trump is 50-50, the purple line, only a Democratic incumbent with at least a +8 favorability has a chance of holding the seat.

We can apply the results of this model to the 2020 Senate elections. We only have available the current measures for Trump support and candidate favorability, so we obviously cannot predict how things will stand a year from now. For the estimates below, I have used the most recent Trump approval rating and Senate incumbent favorability ratings as reported by *Morning Consult.* The President’s score is from the month ending September 1st; the Senators’ ratings are averages over the third quarter, July-September, 2019.

The highlighted rows at the top of the table correspond to incumbent Senators whose predicted vote is below fifty percent.The top and bottom spots on the list are held by Democrats. The most vulnerable incumbent is Doug Jones’s whose slight positive favorability rating of +5 is nowhere near large enough to overcome Alabama’s warm feelings for Donald Trump.

Jones is followed by the four most commonly discussed vulnerable Republicans — Susan Collins of Maine, Cory Gardner of Colorado, Joni Ernst of Iowa, and Thom Tillis of North Carolina. Martha McSally would hold her Arizona seat by the slimmest of margins. Majority Leader Mitch McConnell is lucky to represent solidly pro-Trump Kentucky or else his dismal favorability score might lead to his defeat.

It’s anyone’s guess what Donald Trump’s approval rating might be come the election next November, though his score has remained remarkably persistent in the face of events. Using the averages at FiveThirtyEight, we see his low point came in the summer of 2017 when he fell to 37%. Over that winter and into the spring of 2018 his approval rating improved to about 42% where it has largely remained. There was a dip in his popularity during the government shutdown, and another now as the impeachment inquiry expands. Given the observed variation in his popularity since the Inauguration, Trump’s approval rating might move up or down by three or four points over the course of the next year. A four-point movement would represent a ten-percent change from his current rating of 41%. The chart below shows how each Senator’s predicted vote would change given a ten percent increase or decrease in Trump’s approval rating in each state.

The four Senators at the top of the list in the darker grey area are predicted to lose their seats even if Trump’s approval rating were to improve by ten percent. The next three Senators survive their re-election bids if Trump’s approval runs about where it is today or improves by November, 2020. However a ten-percent decline in Trump’s approval threatens the seats of Thom Tillis, Martha McSally, and even Mitch McConnell.

Right now the Republicans control the Senate by a 53-47 margin, plus the tie-breaking vote of the Vice President. Assuming a Democratic victory in the Presidential election next fall, the Democrats need to flip at least three seats, while losing Alabama back to the Republicans. Maine, Colorado, and Iowa look promising for the Democrats and North Carolina and Arizona are both tightly contested.

Four Republican seats have vacancies. In Georgia a special election will be conducted in 2020 alongside the regular election to fill the seat that Johnny Isakson will leave at the end of this year. Three other Republican-held seats will also be vacant in 2020. My model predicts the Republicans will hold all these seats with Georgia the most competitive. (To construct these estimates I impute a favorability score for a “normal” Democrat by regressing net favorability on Trump support to account for the partisanship component of favorability.)

In strong Republican states like Wyoming and Tennessee, we see support for Trump running in the mid-fifties. In states like these, a Democratic challenger would do well to card a favorability score better than -15. The states where the Democrat might have some chance are Georgia and Kansas, where support for Trump splits evenly, but still the Democrats are predicted to lose those elections by three or four points.

]]>

- big chunk of favorability’s effect is partisanship; controlling for Trump support brings the favorability coefficient down
- no measurable difference in effect of Trump support measured either using his 2016 vote or his 2018 approval
- two elections had large residuals, Alaska in 2016 where there was a strong third-party contender, and Utah in 2016 where Mike Lee trounced a transgender female Democrat in the home of the Mormons.

With deBlasio included:

Same relationship without deBlasio.

]]>

In general, the better-known candidates are also the better-liked. In the chart above the percentage of likely Democratic voters able to rate a candidate appears on the horizontal axis. The vertical axis measures “net favorability,” the difference between the percent of the voters rating a candidate favorably and those rating the candidate unfavorably. The figures in the chart represent the averages of the two polls. The regression equation in the upper-left-hand corner of the chart shows that a ten percent increase in exposure brings the average candidate a net +7 increase in favorability.

At the top of the rankings is, no surprise, Joe Biden. 92 percent of the Democrats polled could give an assessment of Biden, and he scored at the top of the list in favorability with 76 percent favorable versus just 15 unfavorable. Bernie Sanders is nearly as well known (89 percent) as Biden but not as well liked, with a net favorability score of 47. Two other candidates join Sanders at just under fifty percent favorability, Elizabeth Warren and Kamala Harris. Harris’s favorability, however, substantially exceeds the value we would predict given her familiarity score. At the other end of the spectrum is New York mayor Bill deBlasio. About half the respondents said they knew him well enough to give him a rating; unfortunately for him only an average of 16 percent of the Democrats in the two polls viewed him favorably versus 32 percent who viewed him unfavorably. (Removing him from the regression increases R^{2} from 0.76 to 0.92, and reduces the standard error from 9.9 points to 5.4. The slope is largely unchanged, but the intercept naturally moves slightly upward since it no longer needs to incorporate deBlasio’s negative score.)

Here are the actual and predicted net favorability scores for every candidate from the model where deBlasio is omitted. Harris is 10 points ahead in terms of favorability than her exposure predicts. She’s followed by Pete Buttigieg and Eric Swalwell at around six percent. (Swalwell’s frequent appearances on MSNBC might have something to do with this.) At the other end of the spectrum is the remarkably poor showing for Beto O’Rourke. Fifty-five percent of Democratic voters say they can score Beto, but his net 21 percent favorability is nearly nine points what we would expect to see given his familiarity. Sanders’s unfavorable numbers also put him near the bottom of this list. 89 percent of Democrats know enough about Sanders to give him a favorability score, but his 47 percent net favorability lags about eight points behind what we would expect given how well known he is.

]]>