What is the probability Harris wins? Building a Statistical Model.

By . Last Modified:

After Joe Biden made the historic decision to exit the presidential race, the country no longer has to choose between two octogenarians. It's a new exciting time with lots of uncertain questions. In this article we'll try to see just how uncertain we are, and build a probabilistic model of whether Kamala Harris will win if nominated.

There are lots of online election forecasts (e.g., from Silver et al., Morris et al., Gelman et al., among others). So why build another one? In a previous article I created an election model to help understand if Biden should drop out (focusing on Gretchen Whitmer as a replacement). That had a motivation and rhetorical purpose. I partially made a version for Harris, but didn't publish before Biden dropped out. So while this Harris model is now mostly recreational, it has some potential contributions for those interested in election modeling:

  • Transparency - Each part tries to include links to corresponding source code
  • Analysis and visualization of the amount of available polling (Section 1.2). This is to model the potentially limited initial Harris polling.
  • Alternative approach/presentation of errors and poll movement (Section 2)
  • Timeliness/impatience - At time of writing, prominent modelers like FiveThirtyEight, Silver, and the Economist haven't released a version for Harris yet. This gives one initial estimate.

For those just looking for a number, here's jumping ahead for the top-line:

We'll go through each of the steps to reach this estimate. The general approach follows that of other polls-based models. We create average of polls in each swing state, estimate the expected polling miss, and then estimate how polls might move. Then we run thousands of random simulationsto get a fractionwhere each candidate wins.

Where are Polls Today

Gathering Data

The first step is gathering polling data (sourced via FiveThirtyEight).

Not all pollsters are equal, with some pollsters having a better track record. Thus, we weight each poll. Our weighting is intended to be scaled where 1.0 is the value of a poll from a top-rated pollster (eg, Siena/NYT, Emerson College, Marquette University, etc.) that interviewed their sample yesterday or sooner.

Less reliable/transparent pollsters are weighted as some fraction of 1.0. Additionally, older polls are weighted lessWeight decays with an approximate half-life of 9 days. We use the middle of the start and end date, and wait 1.5 days starting decay.. Polls before Biden dropped out carry ¼ weight. This function is only a heuristic estimate (see codefor exact definition).

If a pollster reports multiple numbers (eg, with or without RFK Jr., registered voters or likely voters, etc), we use the version with the largest sum covered by the Democrat and Republican.

National Polls
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.91Siena/NYT (3.0)07/22-07/2447% : 48%49.5
0.87YouGov (2.9)07/22-07/2344% : 46%48.9
0.80Ipsos (2.8)07/22-07/2344% : 42%51.2
0.78Marist (2.9)07/22-07/2245% : 46%49.5
0.56RMG Research (2.3)07/22-07/2346% : 48%48.9
0.39Morning Consult (1.8)07/22-07/2446% : 45%50.5
0.36SurveyMonkey (1.9)07/22-07/2438% : 39%49.4
0.33HarrisX (1.6)07/22-07/2549% : 51%49.0
0.30Fabrizio/GBAO (?)07/23-07/2547% : 49%49.0
0.27Big Village (1.6)07/22-07/2443% : 44%49.1
0.21YouGov (2.9)07/21-07/2341% : 44%48.2
0.20Change Research (1.4)07/22-07/2444% : 43%50.6
0.19YouGov (2.9)07/19-07/2246% : 46%50.0
0.17Echelon Insights (2.7)07/19-07/2147% : 49%49.0
0.15Quinnipiac (2.8)07/19-07/2147% : 49%49.0
0.14YouGov (2.9)07/16-07/1848% : 51%48.5
0.12YouGov (2.9)07/13-07/1639% : 44%47.0
0.12Ipsos (2.8)07/15-07/1644% : 44%50.0
0.11SurveyUSA (2.8)07/12-07/1542% : 45%48.3
0.08Morning Consult (1.8)07/21-07/2245% : 47%48.9
0.08Marist (2.9)07/09-07/1050% : 49%50.5
0.08Beacon/Shaw (2.8)07/07-07/1048% : 49%49.5
0.07YouGov (2.9)07/07-07/0938% : 42%47.5
0.07Emerson (2.9)07/07-07/0843% : 49%46.6
0.06ActiVote (?)07/21-07/2350% : 50%49.5
0.06Ipsos (2.8)07/05-07/0949% : 47%51.0
0.06HarrisX (1.6)07/19-07/2147% : 53%47.0
0.06Hart/POS (2.6)07/07-07/0945% : 47%48.9
0.06Noble Predictive Insights (2.4)07/08-07/1144% : 48%47.8
0.05SoCal Research (?)07/21-07/2143% : 51%45.7
0.05Morning Consult (1.8)07/15-07/1545% : 46%49.5
0.05Florida Atlantic University/Mainstreet Research (?)07/19-07/2144% : 49%47.1
0.04SoCal Research (?)07/17-07/1744% : 52%45.8
0.04Ipsos (2.8)07/01-07/0242% : 43%49.4
0.04HarrisX (1.6)07/13-07/1548% : 52%48.0
0.04YouGov (2.9)06/28-07/0145% : 47%48.9
0.03Data for Progress (2.7)06/28-06/2845% : 48%48.4
0.03Big Village (1.6)07/12-07/1437% : 42%47.3
0.03Manhattan Institute (?)07/07-07/1346% : 48%48.9
0.02Redfield & Wilton Strategies (1.8)07/08-07/0837% : 44%45.7
0.01J.L. Partners (1.6)07/01-07/0338% : 49%43.7
0.01Split Ticket/Data for Progress (?)07/01-07/0346% : 46%50.0
0.01HarrisX (1.6)06/28-06/3047% : 53%47.0
0.01CNN/SSRS (?)06/28-06/3045% : 47%48.9
0.01Bendixen & Amandi International (1.0)07/02-07/0642% : 41%50.6
Sum 8.2TotalAvg 49.3
Pennsylvania
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.92From Natl. Avg. (0.91⋅x + 3.70)48.5
0.91Beacon/Shaw (2.8)07/22-07/2449% : 49%50.0
0.85Emerson (2.9)07/22-07/2349% : 51%48.9
0.32Redfield & Wilton Strategies (1.8)07/22-07/2442% : 46%47.7
0.28Bullfinch (?)07/23-07/2548% : 47%50.5
0.08Siena/NYT (3.0)07/09-07/1147% : 48%49.5
0.07Civiqs (2.5)07/13-07/1644% : 46%48.9
0.06InsiderAdvantage (2.0)07/15-07/1640% : 47%46.0
0.05SoCal Research (?)07/20-07/2146% : 50%47.9
0.04North Star Opinion Research (1.2)07/20-07/2345% : 47%48.9
0.04PPP (1.4)07/17-07/1843% : 45%48.9
0.02PPP (1.4)07/11-07/1245% : 51%46.9
Sum 3.6TotalAvg 49.0
Wisconsin
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.91Beacon/Shaw (2.8)07/22-07/2449% : 50%49.5
0.85Emerson (2.9)07/22-07/2351% : 49%50.6
0.81From Natl. Avg. (0.91⋅x + 3.80)48.6
0.27Redfield & Wilton Strategies (1.8)07/22-07/2444% : 44%50.0
0.07Civiqs (2.5)07/13-07/1648% : 48%50.0
0.02PPP (1.4)07/10-07/1148% : 49%49.5
0.01North Star Opinion Research (1.2)07/06-07/1047% : 48%49.5
Sum 2.9TotalAvg 49.6
Georgia
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.91From Natl. Avg. (0.71⋅x + 11.45)46.3
0.83Emerson (2.9)07/22-07/2349% : 51%48.9
0.39Landmark Communications (2.1)07/22-07/2247% : 48%49.3
0.32Redfield & Wilton Strategies (1.8)07/22-07/2442% : 47%47.2
0.07U. Georgia SPIA (2.2)07/09-07/1846% : 50%47.6
0.06InsiderAdvantage (2.0)07/15-07/1637% : 47%43.8
0.03Florida Atlantic University/Mainstreet Research (?)07/14-07/1544% : 49%47.3
0.03Florida Atlantic University/Mainstreet Research (?)07/12-07/1343% : 49%46.7
Sum 2.6TotalAvg 47.7
Michigan
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.91Beacon/Shaw (2.8)07/22-07/2449% : 49%50.0
0.90From Natl. Avg. (1.13⋅x + -7.24)48.5
0.83Emerson (2.9)07/22-07/2349% : 51%49.1
0.27Redfield & Wilton Strategies (1.8)07/22-07/2441% : 44%48.2
0.26Glengariff Group Inc. (1.5)07/22-07/2442% : 41%50.2
0.25SoCal Research (?)07/25-07/2646% : 49%48.4
0.07Civiqs (2.5)07/13-07/1646% : 46%50.0
0.04PPP (1.4)07/17-07/1841% : 46%47.1
0.02PPP (1.4)07/11-07/1246% : 48%48.9
Sum 3.6TotalAvg 49.1
North Carolina
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.91From Natl. Avg. (1.08⋅x + -7.22)46.0
0.28Redfield & Wilton Strategies (1.8)07/22-07/2443% : 46%48.3
0.04PPP (1.4)07/19-07/2044% : 48%47.8
Sum 1.2TotalAvg 46.6
Arizona
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.94From Natl. Avg. (0.99⋅x + -1.75)47.1
0.83Emerson (2.9)07/22-07/2347% : 53%47.4
0.27Redfield & Wilton Strategies (1.8)07/22-07/2443% : 46%48.3
0.06InsiderAdvantage (2.0)07/15-07/1642% : 48%46.7
0.04PPP (1.4)07/19-07/2040% : 46%46.5
0.02PPP (1.4)07/10-07/1144% : 52%45.8
Sum 2.2TotalAvg 47.3
Nevada
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.94From Natl. Avg. (1.52⋅x + -28.61)46.4
0.26Redfield & Wilton Strategies (1.8)07/22-07/2443% : 45%48.9
0.06InsiderAdvantage (2.0)07/15-07/1640% : 50%44.4
Sum 1.3TotalAvg 46.8
Florida
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.93From Natl. Avg. (1.73⋅x + -41.30)43.8
0.28Redfield & Wilton Strategies (1.8)07/22-07/2439% : 47%45.3
0.06InsiderAdvantage (2.0)07/15-07/1639% : 49%44.2
Sum 1.3TotalAvg 44.2

Table Source

Estimating Poll Miss

Morris (2024) at FiveThirtyEight reports that the polling average typically misses the actual swing state result by about ~2 points for a given candidate (or ~3.8 points for the margin). This is pretty remarkable. Even combining dozens of pollsters each asking thousands of people their vote right before the election, we still expect to be several points off. Elections are hard to predict.

Our current situation is even more uncertain. The ~2 points of miss is for a typical candidate on election day. With Harris we start in a time with potentially limited polling data. To estimate this we look at the weighted count of polls.

Right now we estimate we have the equivalent of 8.2 top-quality national polls for Harris. For comparison, we estimate we had 21.5 top-quality national polls for Biden the day before he dropped out and 58.4 top-quality polls for Biden on 2020 Election Day.

For swing state polls we apply the same weighting. To fill in gaps in swing state polling, we also combine with national polling. Each state has a different relationship to national polls. We fit a linear function (ie, a slope and a intercept) going from our custom national polling average to 538's state polling average for Biden in 2020 and 2024. We average this mapped value with available polls (its weight is somewhat arbitrarily defined as the  R2 of the linear fit). We highlight that the national polling-average was highly predictive of 538's swing state polling-averages (avg R2=0.91).

In Figure 1 we show both the weighted count of polls and the square root of the weighted count. Probability theory tells us that sampling is not linear. As an example, say we had a poll of 1000 people estimating a vote of 45% with some amount of error. If we repeated that poll and now had 2000 people, we would not be twice as confident of our estimate. Instead, we would need roughly 4000 people to be 2x as confident. Having 8.2 polls now compared to 58.4 polls on election day 2020 is like having 8.2/58.4=0.4 as much polling information.

We make the assumption that the amount of swing state polls we had on election day 2020 was enough for the typical ~2 point average miss. We then estimate expected error in each swing state given how many polls we currently have there. We assume that only half of the average miss (ie, ~1 point) could be reduced with more polling. The other half this is half fraction is purely heuristic. More rigorous work could try to estimate this empirically of the miss is some unrecoverable error (for example due inability to contact subsets of voters, or industry-wide methodology flaws). See codefor precise details.

Using this method, we estimate that the average swing state polling miss is currently 3.6 points.

We emphasize the average miss here is just an average across thousands of simulations. The actual miss can be higher or lower in either direction. Following Morris (2024), we model the distribution of these errors as a t-distribution with five degrees of freedom.

Additionally, we assume poll misses are correlatedbetween states. Similar to our previous Whitmer model, we use from the 2020 538 and Economist models (via Pearce (2020)). A more pure version of this model would try to reestimate this from data.

Estimates For Today

If we use this expected miss and pretend the election was today, we would estimate a 36% chance Harris would win.

Where will Polls be in 101 Days

If a candidate is behind, they would hope for more variance in outcomes. If they are ahead, less variance is better. In addition to typical poll misses, we would expect some variance from movement polls in the next 101 days to Election Day.

Here we show the trends in 2020 and 2024. Dem Share is share of vote with just the Democrats and Republicans, ie (Dem/(Dem+Rep))·100. Keep in mind, the magnitude of shifts in this value is half that of shifts in margin.

The average 101-day movement in 2020 was 1.00 points and the average 101-day movement in Biden 2024 was 0.77 points (movements are absolute value. It could be up or down.). The largest movement observed is 4.2 points in Michigan between March 12, 2020 and June 21, 2020. During this time Covid went from an abstract concept to most Americans, to over 600,000 Americans dead while Trump mused about injecting people with disinfectants and slowing testing to make himself look better.

We attempt to estimate the mean expected move for Harris, just based on limited polling so far. We do this via a rough process of random walks sampled from her movement so far (src). We then have an estimate of an average expected move of 9.75 using just Harris 2024 data.

To get a final estimate of Harris's expected move in each state, we average together Biden 2020, Biden 2024, and Harris 2024 (placing 2x weight on Harris 2024) and blend data for a given state and the national moves. Please refer to the codefor a precise definition. This process estimates an average expected move across the 8 swing states is 3.99 points using all data.

Using this expected average, we modelthe distribution of movements as a t-distribution with 5 degrees of freedom.

Results with Movement

Here show estimated win probability taking into account both the average 3.6 point polling miss and the expected 3.99 point average poll movement.

Thus, allowing variance from movement slightly changes odds for Harris. One might intuitively expect a larger change, however we must remember that under this model, the polling miss and poll movement is assumed to be independent. As an example, we could sample a 2 point move up in polls, but this could be cancelled out by sampling a -3 point poll miss. The average combined poll miss is 5.5 points.

Model Limitations

There are several limitations of this model

Polling is a limited tool: As mentioned earlier, even with data from dozens of pollsters right before Election Day, polls typically miss by several points. One thing I took away from building this model is an encouragement to just care about polls less. We are unlikely to end up in a world on election day where polls tell us much different than a coinflip.

No mean-reversion or trends: This model assumes misses in either direction equally probable. However, in reality we should expect moves far from the historical mean to be less likely (as these voters become increasingly partisan).

Poll uncertainty quantification not empirically validated: We make assumptions about the behavior of poll uncertainty for a given number of polls that might not be valid (in particular the fraction of aleatoric uncertainty is unclear). As mentioned, stronger work would better estimate this using data from past races.

Not all states: We only model 8 swing states. This is because in situations where a state like Texas or Iowa goes blue, the election is almost certainly already decided elsewhere. We also don't model the atypical way Nebraska and Maine distribute their electors.

Ignoring 3rd party votes: Better factoring in RFK Jr. might slightly change things. Also, undecided voters are essentially assumed to split equally rather than more complex schemes.

Correlations simple: This model is not completely pure, as it uses state correlations extracted from other models. Additionally, we only consider correlations at a state-level correlations, rather than more complex schemes.

Many assumptions of this model are rough and not particularly principled.

Conclusion

The race is highly uncertain. Harris is likely slightly behind, but our "Election Today" results imply there's a fairly high chance that is polling error. Making confident predictions either way is unwise.

The situation didn't have to be like this. In a fantasy world, Biden would have withdrawn sooner, or Democrats would have been a bit slower to crack the whip of unity, perhaps giving a better a allowing an opening for a candidate that starts ahead. Also in this fantasy Republicans years ago would have moved on from their criminal and abusive candidate who's pushing 80.

However, there's no need to be burdened by dreams of the past. With Harris we can see an exciting future.