How FiveThirtyEight’s House, Senate And Governor Models Work

FiveThirtyEight’s 2020 Senate and House models are mostly unchanged from 2018, so the large majority of the methodological detail below will still apply. (Note that we are not planning to run gubernatorial forecasts in 2020.) The handful of changes we do have, however, reflect either takeaways from our review of the model’s performance in the 2018 midterms or efforts to make our congressional forecasts more consistent with the current best-practices in our presidential model. The changes include the following Model tweak
Sept.18, 2020 :

Our “fundamentals” formulas, as used in the Classic and Deluxe models, reflect changes associated with rising levels of partisanship. In particular, higher partisanship is associated with the partisan lean of a state or district being more predictive of election outcomes, while it makes candidate quality factors (such as the candidate’s level of experience) less predictive.
As is now true in our presidential model, House effects are calculated mostly based on how a pollster’s surveys compare against other polls in the same state or district, rather than in other races. That’s because a pollster could have a Democratic-leaning house effect in one state and a Republican-leaning one in another.
Election Day error is assumed to be 20 percent higher than it otherwise would be because of the uncertain impact COVID-19 and the increase in mail balloting will have on voter turnout. However, we have not changed any assumptions about how much uncertainty there will be in the run-up to the election — e.g., whether polling averages will be more or less volatile than usual.
CANTOR similarity scores include consideration of what percentage of a state’s vote is expected to be cast by mail.
As in our presidential model, we account for changes in how easy it is to vote in each state based on the Cost of Voting Index, as easing barriers to voting tends to help Democratic candidates.
We default to the Deluxe rather than the Classic version of the models in our interactive, although all three should still be considered official versions of our forecast.

T he principles behind our House, Senate and gubernatorial forecasts should be familiar to our longtime readers. They take lots of polls, perform various types of adjustments to them, and then blend them with other kinds of empirically useful indicators (what we sometimes call “the fundamentals”) to forecast each race. Then they account for the uncertainty in the forecast and simulate the election thousands of times. Our models are probabilistic in nature; we do a lot of thinking about these probabilities, and the goal is to develop probabilistic estimates that hold up well under real-world conditions. For instance, when we launched the 2018 House forecast, Democrats’ chances of winning the House were about 7 in 10 — right about what Hillary Clinton’s chances were on election night in 2016! So ignore those probabilities at your peril.

The methods behind our House and Senate forecasts are nearly identical. In fact, they’re literally generated by the same program, with data from House races informing the Senate forecasts and vice versa. The governor forecasts are also largely similar, 4 but with some important differences: For instance, House and Senate data is used to inform the gubernatorial forecasts, but not the other way around. Where there are further differences between how House, Senate and gubernatorial races are handled, we’ll describe them as they arise below.

Overall, the congressional and gubernatorial models have a different flavor to our presidential forecasts in two important respects:

Races for Congress and governor are more localized than presidential races, and this is reflected in the design of the models. In presidential elections, the same candidates are on the ballot everywhere, and outcomes are extremely correlated from one state to the next. It wasn’t a surprise that President Trump won Michigan in 2016 given that he also won demographically similar states such as Wisconsin and Pennsylvania, for instance. Sometimes that sort of thing happens in congressional elections too, with one party winning almost all of the toss-up races. Nonetheless, about three-quarters of the uncertainty in the House and Senate forecasts comes from local, race-by-race factors. If the presidential model is laser-focused on how the polls are changing from day to day and what they say about the Electoral College, the goal of the congressional models is to shine some light into the darker corners of the electoral landscape. Gubernatorial races are even more localized, and partisanship is a much weaker factor than in presidential or congressional races.
As compared with the presidential model, the House model is less polling-centric. (This is less the case for the Senate and governor models.) The House, Senate and gubernatorial models use a broad variety of indicators in addition to polling. With 435 separate House races every other year — plus races for each of the 100 Senate seats once every six years and each of the 50 governorships at least once every four years 5 — it’s possible to make robust empirical assessments of which factors really predict congressional and gubernatorial races well and which ones don’t. Nonetheless, our models default toward using polling once there’s a lot of high-quality polling in a particular state or district. In Senate and gubernatorial races, most states have abundant polling. But this is less true in the House, where districts are polled sporadically and polling can be an adventure because of small sample sizes and the demographic peculiarities of each district.

Three versions of the models: Lite, Classic, Deluxe

In 2016, we published what we described as two different election models: “polls-only” and “polls-plus.” 6 But now we’re running what we think of as three different versions of the same model, which we call Lite, Classic and Deluxe. I realize that’s a subtle distinction — different models versus different versions of the same model.

But the Lite, Classic and Deluxe versions of our models somewhat literally build on top of one another, like different layers of toppings on an increasingly fancy burger. I’ll describe these methods in more detail in the sections below. First, a high-level overview of what the different versions account for.

The layers in FiveThirtyEight’s House forecast

Which versions use it?
Layer	Description	Lite	Classic	Deluxe
1a	Polling	District-by-district polling, adjusted for house effects and other factors.	✓	✓	✓
1b	CANTOR	A system which infers results for districts with little or no polling from comparable districts that do have polling.	✓	✓	✓
2	Fundamentals	Non-polling factors such as fundraising and past election results that historically help in predicting congressional races.	✓	✓
3	Expert forecasts	Ratings of each race published by the Cook Political Report, Inside Elections and Sabato’s Crystal Ball	✓

Lite is as close as you get to a “polls-only” version of our forecast — except, the problem is that a lot of congressional districts have little or no polling. So we use a system we created called CANTOR 7 to fill in the blanks. It uses polls from states and districts that have polling, as well as national generic congressional ballot polls, to infer what the polls would say in those that don’t or that have only minimal polling. The Lite forecast phases out CANTOR and becomes truly “polls-only” in districts that have a sufficient amount of polling.

The Classic version also uses local polls 8 but layers a bunch of non-polling factors on top of it, the most important of which are incumbency, past voting history in the state or district, fundraising and the generic ballot. These are the “fundamentals.” The more polling in a race, the more heavily Classic relies on the polls as opposed to the “fundamentals.” Although Lite isn’t quite as simple as it sounds, the Classic model is definitely toward the complex side of the spectrum. With that said, it should theoretically increase accuracy. In the training data, 9 Classic miscalled 3.3 percent of House races, compared with 3.8 percent for Lite. 10 You should think of Classic as the preferred or default version of FiveThirtyEight’s forecast unless we otherwise specify.

Finally, there’s the Deluxe flavor of the model, which takes everything in Classic and sprinkles in one more key ingredient: expert ratings. Specifically, Deluxe uses the race ratings from the Cook Political Report, Nathan Gonzales’s Inside Elections and Sabato’s Crystal Ball, all of which have published forecasts for many years and have an impressive track record of accuracy. 11

Within-sample accuracy of forecasting methods

Share of races called correctly based on House elections from 1998 to 2016

Forecast	100 Days Before Election	Election Day
Lite model (poll-driven)	94.2%	96.2%
Fundamentals alone	95.4	95.7
Classic model (Lite model + fundamentals)	95.4	96.7
Expert ratings alone*	94.8	96.6
Deluxe model (Classic model + expert ratings)	95.7	96.9

* Based on the average ratings from Cook Political Report, Inside Elections/The Rothenberg Political Report, Sabato’s Crystal Ball and CQ Politics. Where the expert rating averages out to an exact toss-up, the experts are given credit for half a win.

So if we expect the Deluxe forecast to be (slightly) more accurate, why do we consider Classic to be our preferred version, as I described above? Basically, because we think it’s kind of cheating to borrow other people’s forecasts and make them part of our own. Some of the fun of doing this is in seeing how our rigid but rigorous algorithm stacks up against more open-ended but subjective ways of forecasting the races. If our lives depended on calling the maximum number of races correctly, however, we’d go with Deluxe.

Collecting, weighting and adjusting polls

Our House, Senate and governor forecasts use almost all the polls we can find, including partisan polls put out by campaigns or other interested parties. We had not traditionally used partisan polls in our forecasts, but they are a necessary evil for the House, where much of the polling is partisan. Having developed a system we like for handling partisan polls in our House forecasts, we’re also using them for our Senate and our governor forecasts.

However, as polling has gotten more complex, including attempts to create fake polls, there are a handful of circumstances under which we won’t use a poll:

We don’t use polls if we have significant concerns about their veracity or if the pollster is known to have faked polls before.
We don’t use DIY polls commissioned by nonprofessional hobbyists on online platforms such as Google Surveys. (This is a change in policy since 2016. Professional or campaign polls using these platforms are still fine.)
We don’t treat subsamples of multistate polls as individual “polls” unless certain conditions are met. 12
We don’t use “polls” that blend or smooth their data using methods such as MRP. These can be perfectly fine techniques — but if you implement them, you’re really running a model rather than a poll. We want to do the blending and smoothing ourselves rather than inputting other people’s models into ours.

Polls are weighted based on their sample size, their recency and their pollster rating (which in turn is based on the past accuracy of the pollster, as well as its methodology). These weights are determined by algorithm; we aren’t sticking our fingers in the wind and rating polls on a case-by-case basis. Also, the algorithm emphasizes the diversity of polls more than it has in the past; in any particular race, it will insist on constructing an average of polls from at least two or three distinct polling firms even if some of the polls are less recent.

There are also three types of adjustments to each poll:

First, a likely voter adjustment takes the results of polls of registered voters or all adults and attempts to translate them to a likely-voter basis. Traditionally, Republican candidates gain ground in likely voter polls relative to registered voter ones, but the gains are smaller in midterms with a Republican president. The likely voter adjustment is dynamic; it starts with a prior that likely voter polls slightly help Republicans, but this prior is updated as pollsters publish polls that directly compare likely and registered voter results. (If you’re a pollster, please follow Monmouth University’s lead and do this!).
Second, a timeline adjustment adjusts for the timing of the poll, based on changes in the generic congressional ballot. For instance, if Democrats have gained a net of 5 percentage points on the generic ballot since a certain district was polled, the model will adjust the poll upward toward the Democratic candidate (but not by the full 5 points; instead, by roughly two-thirds that amount in congressional races and half that amount in gubernatorial races, depending on the elasticity score 13 of the state or district). As compared with the timeline adjustment in our presidential model, which can be notoriously aggressive, the one in the House and Senate models is pretty conservative.
A house effects adjustment corrects for persistent statistical bias from a pollster. For instance, if a polling firm consistently shows results that are 2 points more favorable for Democrats than other polls of the same district, the adjustment will shift the poll part of the way back toward Republicans. 14

Our models use partisan and campaign polls, which typically make up something like half of the overall sample of U.S. House district polling. 15 Partisanship is determined by who sponsors the poll, rather than who conducts it. Polls are considered partisan if they’re conducted on behalf of a candidate, party, campaign committee, or PAC, super PAC, 501(c)(4), 501(c)(5) or 501(c)(6) organization that conducts a large majority of its political activity on behalf of one political party.

Partisan polls are subject to somewhat different treatment than nonpartisan polls in the model. They receive a lower weight, as partisan-sponsored polls are historically less accurate. And the house effects adjustment starts out with a prior that assumes these polls are biased by about 4 percentage points toward their preferred candidate or party. If a pollster publishing ostensibly partisan polls consistently has results that are similar to nonpartisan polls of the same districts, the prior will eventually be overridden.

CANTOR: Analysis of polls in similar states and districts

CANTOR is essentially PECOTA or CARMELO (the baseball and basketball player forecasting systems we designed) for congressional districts. It uses a k-nearest neighbors algorithm to identify similar congressional districts and states based on a variety of demographic, 16 geographic 17 and political 18 factors. For instance, the district where I was born, Michigan 8, is most comparable to other upper-middle-income Midwestern districts such as Ohio 12, Indiana 5 and Minnesota 2 that similarly contain a sprawling mix of suburbs, exurbs and small towns. Districts can be compared to states, 19 so data from House races informs the CANTOR forecasts for Senate races, and vice versa. Gubernatorial races are not used in the CANTOR calculation for House and Senate races, but House and Senate races are used in CANTOR for gubernatorial races.

The goal of CANTOR is to impute what polling would say in unpolled or lightly polled states and districts, given what it says in similar states and districts. It attempts to accomplish this goal in two stages. First, it comes up with an initial guesstimate of what the polls would say based solely on FiveThirtyEight’s partisan lean metric (FiveThirtyEight’s version of a partisan voting index, which is compiled based on voting for president and state legislature) and incumbency. For instance, if Republican incumbents are polling poorly in the districts where we have polling, it will assume that Republican incumbents in unpolled districts are vulnerable as well. Then, it adjusts the initial estimate based on the district-by-district similarity scores.

All of this sounds pretty cool, but there’s one big drawback. Namely, there’s a lot of selection bias in which races are polled. A House district usually gets surveyed only if one of the campaigns or a media organization has reason to think the race is close — so unpolled districts are less competitive than you’d infer from demographically similar districts that do have polls. CANTOR projections are adjusted to account for this.

Overall, CANTOR is an interesting method that heavily leans into district polling and gets as close as possible to a “polls-only” view of the race. However, in terms of accuracy, it is generally inferior to using …

The fundamentals

The data-rich environment in gubernatorial and congressional elections — 435 individual House races every other year, compared with just one race every four years for the presidency — is most beneficial when it comes to identifying reliable non-polling factors for forecasting races. There’s enough data, in fact, that rather than using all districts and states to determine which factors were most predictive, I instead focused the analysis on competitive races (using a fairly broad definition of “competitive”).

In competitive House districts with incumbents, the following factors have historically best predicted election results, in roughly declining order of importance:

The incumbent’s margin of victory in his or her previous election, adjusted for the national political environment and whom the candidate was running against in the prior election.
The generic congressional ballot.
Fundraising, based on the share of individual contributions for the incumbent and the challenger as of the most recent filing period. 20
FiveThirtyEight partisan lean, which is based on how a district voted in the past two presidential elections and (in a new twist) state legislative elections. In our partisan lean formula, 50 percent of the weight is given to the 2016 presidential election, 25 percent to the 2012 presidential election and 25 percent to state legislative elections.
Congressional approval ratings, which are a measure of the overall attitude toward incumbents. 21
Whether either the incumbent or the challenger was involved in a scandal. 22
The incumbent’s roll call voting record — specifically, how often the incumbent voted with his or her party in the past three Congresses. “Maverick-y” incumbents who break party ranks more often outperform those who don’t.
Finally, the political experience level of the challenger, based on whether the challenger has held elected office before.

In addition, in Pennsylvania, which underwent redistricting in 2018, the model accounts for the degree of population overlap between the incumbent’s old and new district. And in California and Washington state, it accounts for the results of those states’ top-two primaries.

The Senate model uses almost all the same factors for incumbents, but there are some subtle differences given that senators face election once every six years instead of once every other year. For instance, previous victory margin is less reliable in Senate races; that’s because a longer time has passed since the previous election was held. In addition, the Senate model uses more sophisticated data in calculating the effects of incumbency. Candidates in smaller, 23 more demographically distinct states 24 tend to have larger incumbency advantages. The Senate model also accounts for changes in the state’s partisan orientation since the seat was last contested. Finally, the Senate model uses a more advanced method of calculating candidate experience. 25

For gubernatorial races, we use the same factors as for Senate races, with two exceptions:

The model does not use roll-call voting or any other measure of candidate ideology.
Fundraising data, which is derived from the invaluable Campaign Finance Institute, is based on all contributions rather than just individual contributions. 26

Note, however, that while the variables used in the gubernatorial model are largely the same as in the congressional ones, their weights can be a lot different. In particular, the generic ballot is somewhat less predictive in gubernatorial races than in congressional races (somewhere around two-thirds as much), and a state’s partisan lean is much less predictive than in congressional races (somewhere around one-third as much). A state that leans toward the Democrats by 12 points in a race for Congress would be predicted to do so by only about 4 points in a race for governor, for example.

In open-seat races, the model uses the factors from the list above that aren’t dependant on incumbency, namely the generic ballot, fundraising, FiveThirtyEight partisan lean, scandals, experience and (where applicable) top-two primary results. It also uses the results of the previous congressional election in the state or district for congressional elections, but this is a much less reliable indicator than when an incumbent is running for re-election. (Previous election results aren’t used at all in gubernatorial races without incumbents.)

But wait — there’s more! In addition to combining polls and fundamentals, the Classic and Deluxe models compare their current estimate of the national political climate to a prior based on the results of congressional elections since 1946, accounting for historic swings in midterms years and presidential approval ratings. The prior is designed in such a way that it phases it out completely by Election Day.

Incorporating expert ratings

Compared with the other steps, incorporating expert ratings and creating the Deluxe version of the model is fairly straightforward. We have a comprehensive database of ratings from Cook and other groups in House races and gubernatorial races since 1998 and in Senate races since 1990, so we can look up how a given rating corresponded, on average, with a margin of victory. For instance, going into the 2018 midterms, candidates who were deemed to be “likely” winners in their House races won by an average of about 12 points:

What do ratings like “lean Republican” really mean?

Expert Rating	Average margin of victory
Toss-up	0 points
“Tilts” toward candidate	4 points
“Leans” toward candidate	7 points
“Likely” for candidate	12 points
“Solid” or “safe” for candidate	34 points

Based on House races since 1998.

But, of course, there are complications. One is that there’s an awful lot of territory covered by the “solid” and “safe” categories: everything from races that could almost be considered competitive to others where the incumbent wins every year by a 70-point margin. Therefore, the Deluxe forecast doesn’t adjust its projections much when it encounters “solid” or “safe” ratings from the experts, except in cases where the rating comes as a surprise because other factors indicate that the race should be competitive.

Also, although the expert raters are really quite outstanding at identifying “micro” conditions on the ground, including factors that might otherwise be hard to measure, they tend to be lagging indicators of the macro political environment. Several of the expert raters shifted their projections sharply toward the Democrats in early 2018, for instance, even though the generic ballot was fairly steady over that period. Thus, the Deluxe forecast tries to blend the relative order of races implied by the expert ratings with the Classic model’s data-driven estimate of national political conditions. Deluxe and Classic will usually produce relatively similar forecasts of the overall number of seats gained or lost by a party, therefore, even though they may have sharp disagreements on individual races.

Simulating the election and accounting for uncertainty

Sometimes what seem like incredibly pedantic questions can turn out to be important. For years, we’ve tried to design models that account for the complicated, correlated structure of error and uncertainty in election forecasting. Specifically, that if a candidate or a party overperforms the polls in one swing state, they’re also likely to do so in other states, especially if those states are demographically similar. Understanding this principle was key to understanding why Clinton’s lead wasn’t nearly as safe as it seemed in 2016.

Fortunately, this is less of a problem in constructing a congressional or a gubernatorial forecast; there are different candidates on the ballot in every state and district, instead of just one presidential race, and the model relies on a variety of inputs, instead of depending so heavily on polls. Nonetheless, the model accounts for four potential types of error in an attempt to self-diagnose the various ways in which it could go off the rails:

First, there’s local error — that is, error pertaining to individual states and districts. Forecasts are more error-prone in districts where there’s less polling or in districts where the various indicators disagree with one another. Some states and districts are also “swingier” (or more elastic) than others; conditions tend to change fairly quickly in New Hampshire, for instance, but more slowly in the South, where electorates are often bifurcated between very liberal and very conservative voters.
Second, there’s error based on regional or demographic characteristics. For instance, it’s possible that Democrats will systematically underperform expectations in areas with large numbers of Hispanic voters or overperform them in the rural Midwest. The model uses CANTOR similarity scores to simulate these possibilities.
Third, there can be error driven by incumbency status. In some past elections, polls have systematically underestimated Republican incumbents, for example, even if they were fairly accurate in open-seat races. The models account for this possibility as well.
Fourth and finally, the model accounts for the possibility of a uniform national swing — i.e., when the polls are systematically off in one party’s direction in almost every race. The uniform swing component is smaller for gubernatorial races than for congressional ones.

Error becomes smaller as Election Day approaches. In particular, there’s less possibility of a sharp national swing as you get nearer to the election because there’s less time for news events to intervene.

Nonetheless, you shouldn’t expect pinpoint precision in a House forecast, and models that purport to provide it are either fooling you or fooling themselves. Even if you knew exactly what national conditions were, there would still be a lot of uncertainty based on how individual races play out. And individual Senate and governor races are, at best, only slightly more predictable, as they can be highly candidate-driven.

Odds and ends

OK, that’s almost everything. Just a few final notes:

Our models also project the turnout in each race, based on factors such as the eligible voter population and turnout in past midterms and presidential races. Competitive races tend to produce higher turnout than noncompetitive ones. Projecting turnout is important in understanding the relationship between the national popular vote and the number of seats that each party might gain or lose.
I should emphasize that we do not make ad-hoc adjustments to the forecasts in individual races. They’re all done strictly by algorithm.

Editor’s note: This article is adapted from a previous article about how our election forecasts work.