Lots of discussion and criticism came out of this past Tuesday’s election results, much of it focusing on the Kentucky Governor’s race and the lack of reliable polling measures, which lead to a somewhat surprising election night result for many pundits. One particular story, written here by @ForecasterEnten, made very valid points about the scope and issues with base sampling and turnout modeling:
The former suggests an electorate modeling problem that could be a big problem during the presidential primaries, when turnout is low. On the other hand, trouble modeling the electorate would be less of an issue in the 2016 general election, when turnout is at its highest.
More and more reporters are beginning to look at the national horserace polls and wonder if a serious problem is developing, especially as it relates to debate participant selection criteria.
polling is just abysmal and yet it drives so, so much of our political coverage — and determines presidential debate participants!
— Peter Hamby (@PeterHamby) November 4, 2015
None of these criticisms are new, but many of them are valid. Determining the base of who to poll is becoming more important than ever. However, this is a problem researchers and data analysts like us have been facing for many cycles.
November 4th, 2015 marked one year since Republicans re-took the majority in the United State Senate. One campaign that was one of the most forward-thinking Republican US Senate races in terms of analytics was Joni Ernst’s US Senate campaign in Iowa, a state with approximately 600k Republicans, 600k Democrats and 1MM Independents. Understanding the likely participating electorate was extremely important to the targeting.
Throughout the campaign cycle, there was a constant internal discussion around 1) who was going to turn out and 2) how was the Independent share of the electorate going to vote. In years past, Independents made up 25%-30% of the electorate. They’ll vote anywhere from 50% Democrat, all the way to D+6. When looking as public polls at the time, they would often be weighted with Ernst losing Independents by upwards of 8 or 10 points. However, in surveys, the race was either very close, or as Election Day approached, Ernst leading with that bloc of voters. Independents would make up over 50% of the sample and, historically, that had never happened.
Going into Election Day it became clear: most of the public polls were sampling the wrong voters, and felt very confident our internal numbers were accurate.
Ernst ended up winning Independents by 10% and the overall election by over 8%.
As Chief Data Officer for Governor Scott Walker’s presidential campaign, it was my team’s job to manage not only our targeting and delegate strategy, but also our measurement and polling efforts. The art of a good researcher often comes into focus in these lower turnout, high barrier-to-entry, highly contested elections. Anywhere from 130k to 160k Iowa Republicans will caucus on February 1, 2016. That means, of the 600k registered Republicans in the state, only 25% will participate. Simply calling “self identified” Republicans isn’t enough — the sample and data must line up.
Republicans and Democrats agree on this fact, yet the news media continues to cover horserace polls without much scrutiny:
— Mark Stephenson (@markjstephenson) September 30, 2015
Recently, Public Policy Polling released an Iowa study, with its sample made up of “638 usual Republican primary voters”. As is typical, the results were publicized far and wide, as “likely Republican caucusgoers” and with a narrative of a significant shift in the electoral standing of various candidates, which contributed to the larger news cycle of the day.
Let’s discuss that for a moment: Iowa has not had a competitive primary election that wasn’t a Presidential Caucus for numerous cycles. In addition, Iowans do not primary when selecting their Presidential nominees — they caucus. This particular survey, while using a potentially flawed base sample, did not even screen for likely caucus participation, which could pass as at least taking some small step toward validity.
And yet, the research was taken on its face as a fundamental, impactful shift in the Iowa electorate. Instead of a thorough analysis and pairing of the likely voting Republicans and Democrats with who was actually surveyed, the data was taken on its face as a valid picture of caucusgoer sentiment.
On our campaign, and the other smart Republican campaigns in 2016, the polling and data operations were, and will be very tightly integrated. As predictive modelers, it’s our job to study the electorate and provide probabilities of turnout likelihood, along with establishing the base of who we believe will turn out to vote. Polling, both public and private, should, and will, converge in 2016 to more accurately use that baseline set of voters to survey the electorate.
As that happens, you will see more accuracy in public polls and more confidence from data and analytics consultants in their internal numbers. Like I did in 2014, when you have confidence in your likely voting sample, and therefore your horserace numbers, you sleep well before Election Day.