What is the probability Harris wins? Building a Statistical Model.

By . Last Updated:

After Joe Biden made the historic decision to exit the presidential race, the country no longer has to choose between two octogenarians. It's a new exciting time with lots of uncertain questions. In this article we'll try to see just how uncertain we are, and build a probabilistic model of whether Kamala Harris will win if nominated.

There are lots of online election forecasts (e.g., from Silver et al., Morris et al., Gelman et al., among others). So why build another one? In a previous article I created an election model to help understand if Biden should drop out (focusing on Gretchen Whitmer as a replacement). That had a motivation and rhetorical purpose. I partially made a version for Harris, but didn't publish before Biden dropped out. So while this Harris model is now mostly recreational, it has some potential contributions for those interested in election modeling:

  • Transparency - Each part tries to include links to corresponding source code
  • Analysis and visualization of the amount of available polling (Section 1.2). This is to model the potentially limited initial Harris polling.
  • Alternative approach/presentation of errors and poll movement (Section 2)
  • Timeliness/impatience - At time of writing, prominent modelers like FiveThirtyEight, Silver, and the Economist haven't released a version for Harris yet. This gives one initial estimate.

For those just looking for a number, here's jumping ahead for the top-line:

This estimate should update ~daily.

We'll go through each of the steps to reach this estimate. The general approach follows that of other polls-based models. We create average of polls in each swing state, estimate the expected polling miss, and then estimate how polls might move. Then we run thousands of random simulationsto get a fractionwhere each candidate wins.

Where are Polls Today

Gathering Data

The first step is gathering polling data (sourced via FiveThirtyEight).

Not all pollsters are equal, with some pollsters having a better track record. Thus, we weight each poll. Our weighting is intended to be scaled where 1.0 is the value of a poll from a top-rated pollster (eg, Siena/NYT, Emerson College, Marquette University, etc.) that interviewed their sample yesterday or sooner.

Less reliable/transparent pollsters are weighted as some fraction of 1.0. Additionally, older polls are weighted lessWeight decays with an approximate half-life of 9 days. We use between the start and end date, and wait 1.5 days to start decay.. Polls before Biden dropped out carry ¼ weight. This function is only a heuristic estimate (see codefor exact definition).

If a pollster reports multiple numbers (eg, with or without RFK Jr., registered voters or likely voters, etc), we use the version with the largest sum covered by the Democrat and Republican.

National Polls
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.73Siena/NYT (3.0)07/22-07/2447% : 48%49.5
0.72AtlasIntel (2.7)07/23-07/2548% : 50%48.9
0.69YouGov (2.9)07/22-07/2344% : 46%48.9
0.64Ipsos (2.8)07/22-07/2344% : 42%51.2
0.62Marist (2.9)07/22-07/2245% : 46%49.5
0.45RMG Research (2.3)07/22-07/2346% : 48%48.9
0.42Morning Consult (1.8)07/26-07/2847% : 46%50.5
0.39Morning Consult (1.8)07/25-07/2747% : 45%51.1
0.37Angus Reid (2.0)07/23-07/2544% : 42%51.2
0.36Morning Consult (1.8)07/24-07/2646% : 45%50.5
0.33Morning Consult (1.8)07/23-07/2546% : 45%50.5
0.31Morning Consult (1.8)07/22-07/2446% : 45%50.5
0.29SurveyMonkey (1.9)07/22-07/2438% : 39%49.4
0.26HarrisX (1.6)07/22-07/2549% : 51%49.0
0.24Fabrizio/GBAO (?)07/23-07/2547% : 49%49.0
0.22Big Village (1.6)07/22-07/2443% : 44%49.1
0.17YouGov (2.9)07/21-07/2341% : 44%48.2
0.16Change Research (1.4)07/22-07/2444% : 43%50.6
0.15YouGov (2.9)07/19-07/2246% : 46%50.0
0.13Echelon Insights (2.7)07/19-07/2147% : 49%49.0
0.12Quinnipiac (2.8)07/19-07/2147% : 49%49.0
0.11YouGov (2.9)07/16-07/1848% : 51%48.5
0.10YouGov (2.9)07/13-07/1639% : 44%47.0
0.09Ipsos (2.8)07/15-07/1644% : 44%50.0
0.09SurveyUSA (2.8)07/12-07/1542% : 45%48.3
0.07Morning Consult (1.8)07/21-07/2245% : 47%48.9
0.06Marist (2.9)07/09-07/1050% : 49%50.5
0.06Beacon/Shaw (2.8)07/07-07/1048% : 49%49.5
0.06YouGov (2.9)07/07-07/0938% : 42%47.5
0.05Emerson (2.9)07/07-07/0843% : 49%46.6
0.05ActiVote (?)07/21-07/2350% : 50%49.5
0.05Ipsos (2.8)07/05-07/0949% : 47%51.0
0.05HarrisX (1.6)07/19-07/2147% : 53%47.0
0.05Hart/POS (2.6)07/07-07/0945% : 47%48.9
0.04Noble Predictive Insights (2.4)07/08-07/1144% : 48%47.8
0.04SoCal Research (?)07/21-07/2143% : 51%45.7
0.04Morning Consult (1.8)07/15-07/1545% : 46%49.5
0.04Florida Atlantic University/Mainstreet Research (?)07/19-07/2144% : 49%47.1
0.03SoCal Research (?)07/17-07/1744% : 52%45.8
0.03Ipsos (2.8)07/01-07/0242% : 43%49.4
0.03HarrisX (1.6)07/13-07/1548% : 52%48.0
0.03YouGov (2.9)06/28-07/0145% : 47%48.9
0.03Data for Progress (2.7)06/28-06/2845% : 48%48.4
0.02Big Village (1.6)07/12-07/1437% : 42%47.3
0.02Manhattan Institute (?)07/07-07/1346% : 48%48.9
0.02Redfield & Wilton Strategies (1.8)07/08-07/0837% : 44%45.7
0.01J.L. Partners (1.6)07/01-07/0338% : 49%43.7
0.01Split Ticket/Data for Progress (?)07/01-07/0346% : 46%50.0
0.01HarrisX (1.6)06/28-06/3047% : 53%47.0
0.01CNN/SSRS (?)06/28-06/3045% : 47%48.9
0.01Bendixen & Amandi International (1.0)07/02-07/0642% : 41%50.6
Sum 9.1TotalAvg 49.6
Pennsylvania
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.92From Natl. Avg. (0.91⋅x + 3.70)48.7
0.73Beacon/Shaw (2.8)07/22-07/2449% : 49%50.0
0.67Emerson (2.9)07/22-07/2349% : 51%48.9
0.25Redfield & Wilton Strategies (1.8)07/22-07/2442% : 46%47.7
0.23Bullfinch (?)07/23-07/2548% : 47%50.5
0.07Siena/NYT (3.0)07/09-07/1147% : 48%49.5
0.06Civiqs (2.5)07/13-07/1644% : 46%48.9
0.05InsiderAdvantage (2.0)07/15-07/1640% : 47%46.0
0.04SoCal Research (?)07/20-07/2146% : 50%47.9
0.03North Star Opinion Research (1.2)07/20-07/2345% : 47%48.9
0.03PPP (1.4)07/17-07/1843% : 45%48.9
0.02PPP (1.4)07/11-07/1245% : 51%46.9
Sum 3.1TotalAvg 49.1
Wisconsin
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.81From Natl. Avg. (0.91⋅x + 3.80)48.8
0.73Beacon/Shaw (2.8)07/22-07/2449% : 50%49.5
0.67Emerson (2.9)07/22-07/2351% : 49%50.6
0.22Redfield & Wilton Strategies (1.8)07/22-07/2444% : 44%50.0
0.06Civiqs (2.5)07/13-07/1648% : 48%50.0
0.02PPP (1.4)07/10-07/1148% : 49%49.5
0.01North Star Opinion Research (1.2)07/06-07/1047% : 48%49.5
Sum 2.5TotalAvg 49.6
Georgia
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.91From Natl. Avg. (0.71⋅x + 11.45)46.5
0.66Emerson (2.9)07/22-07/2349% : 51%48.9
0.31Landmark Communications (2.1)07/22-07/2247% : 48%49.3
0.26Redfield & Wilton Strategies (1.8)07/22-07/2442% : 47%47.2
0.22SoCal Research (?)07/25-07/2646% : 50%48.2
0.05U. Georgia SPIA (2.2)07/09-07/1846% : 50%47.6
0.05InsiderAdvantage (2.0)07/15-07/1637% : 47%43.8
0.02Florida Atlantic University/Mainstreet Research (?)07/14-07/1544% : 49%47.3
0.02Florida Atlantic University/Mainstreet Research (?)07/12-07/1343% : 49%46.7
Sum 2.5TotalAvg 47.7
Michigan
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.90From Natl. Avg. (1.13⋅x + -7.24)48.8
0.73Beacon/Shaw (2.8)07/22-07/2449% : 49%50.0
0.66Emerson (2.9)07/22-07/2349% : 51%49.1
0.22SoCal Research (?)07/25-07/2646% : 49%48.4
0.21Redfield & Wilton Strategies (1.8)07/22-07/2441% : 44%48.2
0.20Glengariff Group Inc. (1.5)07/22-07/2442% : 41%50.2
0.06Civiqs (2.5)07/13-07/1646% : 46%50.0
0.03PPP (1.4)07/17-07/1841% : 46%47.1
0.02PPP (1.4)07/11-07/1246% : 48%48.9
Sum 3.0TotalAvg 49.2
North Carolina
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.91From Natl. Avg. (1.08⋅x + -7.22)46.3
0.22Redfield & Wilton Strategies (1.8)07/22-07/2443% : 46%48.3
0.03PPP (1.4)07/19-07/2044% : 48%47.8
Sum 1.2TotalAvg 46.7
Arizona
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.94From Natl. Avg. (0.99⋅x + -1.75)47.3
0.66Emerson (2.9)07/22-07/2347% : 53%47.4
0.21Redfield & Wilton Strategies (1.8)07/22-07/2443% : 46%48.3
0.05InsiderAdvantage (2.0)07/15-07/1642% : 48%46.7
0.03PPP (1.4)07/19-07/2040% : 46%46.5
0.02PPP (1.4)07/10-07/1144% : 52%45.8
Sum 1.9TotalAvg 47.4
Nevada
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.94From Natl. Avg. (1.52⋅x + -28.61)46.7
0.21Redfield & Wilton Strategies (1.8)07/22-07/2443% : 45%48.9
0.05InsiderAdvantage (2.0)07/15-07/1640% : 50%44.4
Sum 1.2TotalAvg 47.0
Florida
WeightPollster (rating)DatesHarris: TrumpHarris Share
0.93From Natl. Avg. (1.73⋅x + -41.30)44.2
0.22Redfield & Wilton Strategies (1.8)07/22-07/2439% : 47%45.3
0.05InsiderAdvantage (2.0)07/15-07/1639% : 49%44.2
Sum 1.2TotalAvg 44.4

Table Source

Estimating Poll Miss

Morris (2024) at FiveThirtyEight reports that the polling average typically misses the actual swing state result by about ~2 points for a given candidate (or ~3.8 points for the margin). This is pretty remarkable. Even combining dozens of pollsters each asking thousands of people their vote right before the election, we still expect to be several points off. Elections are hard to predict.

Our current situation is even more uncertain. The ~2 points of miss is for a typical candidate on election day. With Harris we start in a time with potentially limited polling data. To estimate this we look at the weighted count of polls.

Right now we estimate we have the equivalent of 10.4 top-quality national polls for Harris. For comparison, we estimate we had 21.5 top-quality national polls for Biden the day before he dropped out and 58.4 top-quality polls for Biden on 2020 Election Day.

For swing state polls we apply the same weighting. To fill in gaps in swing state polling, we also combine with national polling. Each state has a different relationship to national polls. We fit a linear function (ie, a slope and a intercept) going from our custom national polling average to 538's state polling average for Biden in 2020 and 2024. We average this mapped value with available polls (its weight is somewhat arbitrarily defined as the  R2 of the linear fit). We highlight that the national polling-average was highly predictive of 538's swing state polling-averages (avg R2=0.91).

In Figure 1 we show both the weighted count of polls and the square root of the weighted count. Probability theory tells us that sampling is not linear. As an example, say we had a poll of 1000 people estimating a vote of 45% with some amount of error. If we repeated that poll and now had 2000 people, we would not be twice as confident of our estimate. Instead, we would need roughly 4000 people to be 2x as confident. Having 10.4 polls now compared to 58.4 polls on election day 2020 is like having 10.4/58.4=0.4 as much polling information.

We make the assumption that the amount of swing state polls we had on election day 2020 was enough for the typical ~2 point average miss. We then estimate expected error in each swing state given how many polls we currently have there. We assume that only half of the average miss (ie, ~1 point) could be reduced with more polling. The other half this is half fraction is purely heuristic. More rigorous work could try to estimate this empirically of the miss is some unrecoverable error (for example due inability to contact subsets of voters, or industry-wide methodology flaws). See codefor precise details.

Using this method, we estimate that the average swing state polling miss is currently 3.7 points.

We emphasize the average miss here is just an average across thousands of simulations. The actual miss can be higher or lower in either direction. Following Morris (2024), we model the distribution of these errors as a t-distribution with five degrees of freedom.

Additionally, we assume poll misses are correlatedbetween states. Similar to our previous Whitmer model, we use from the 2020 538 and Economist models (via Pearce (2020)). A more pure version of this model would try to reestimate this from data.

Estimates For Today

If we use this expected miss and pretend the election was today, we would estimate a 37% chance Harris would win.

Where will Polls be in 98 Days

If a candidate is behind, they would hope for more variance in outcomes. If they are ahead, less variance is better. In addition to typical poll misses, we would expect some variance from movement polls in the next 98 days to Election Day.

Here we show the trends in 2020 and 2024. Dem Share is share of vote with just the Democrats and Republicans, ie (Dem/(Dem+Rep))·100. Keep in mind, the magnitude of shifts in this value is half that of shifts in margin.

The average 98-day movement in 2020 was 1.09 points and the average 98-day movement in Biden 2024 was 0.62 points (movements are absolute value. It could be up or down.). The largest movement observed is 4.13 points in Michigan between March 14, 2020 and June 20, 2020. During this time Covid went from an abstract concept to most Americans, to over 600,000 Americans dead while Trump mused about injecting people with disinfectants and slowing testing to make himself look better.

We attempt to estimate the mean expected move for Harris, just based on limited polling so far. We do this via a rough process of random walks sampled from her movement so far (src). We then have an estimate of an average expected move of 8.24 using just Harris 2024 data.

To get a final estimate of Harris's expected move in each state, we average together Biden 2020, Biden 2024, and Harris 2024 and blend data for a given state and the national moves. Please refer to the codefor a precise definition. This process estimates an average expected move across the 8 swing states is 3.41 points using all data.

Using this expected average, we modelthe distribution of movements as a t-distribution with 5 degrees of freedom.

Results with Movement

Here show estimated win probability taking into account both the average 3.7 point polling miss and the expected 3.41 point average poll movement.

Thus, allowing variance from movement slightly changes odds for Harris. One might intuitively expect a larger change, however we must remember that under this model, the polling miss and poll movement is assumed to be independent. As an example, we could sample a 2 point move up in polls, but this could be cancelled out by sampling a -3 point poll miss. The average combined poll miss is 5.2 points.

Model Limitations

There are several limitations of this model

Polling is a limited tool: As mentioned earlier, even with data from dozens of pollsters right before Election Day, polls typically miss by several points. One thing I took away from building this model is an encouragement to just care about polls less. We are unlikely to end up in a world on election day where polls tell us much different than a coinflip.

No mean-reversion or trends: This model assumes misses in either direction equally probable. However, in reality we should expect moves far from the historical mean to be less likely (as these voters become increasingly partisan).

Poll uncertainty quantification not empirically validated: We make assumptions about the behavior of poll uncertainty for a given number of polls that might not be valid (in particular the fraction of aleatoric uncertainty is unclear). As mentioned, stronger work would better estimate this using data from past races.

Not all states: We only model 8 swing states. This is because in situations where a state like Texas or Iowa goes blue, the election is almost certainly already decided elsewhere. We also don't model the atypical way Nebraska and Maine distribute their electors.

Ignoring 3rd party votes: Better factoring in RFK Jr. might slightly change things. Also, undecided voters are essentially assumed to split equally rather than more complex schemes.

Correlations simple: This model is not completely pure, as it uses state correlations extracted from other models. Additionally, we only consider correlations at a state-level correlations, rather than more complex schemes.

Many assumptions of this model are rough and not particularly principled.

Conclusion

The race is highly uncertain. Harris is likely slightly behind, but our "Election Today" results imply there's a fairly high chance that is polling error. Making confident predictions either way is unwise.

As a reference, using polling data from July 21 this model would have estimated that Biden had a 27% chance of winning when he dropped out with 107 days to the election. Thus, with Harris estimated at 43% today, we'd estimate she so far has improved odds over Biden then. There are potentially exciting times ahead.