# Exit polls: Triumph of the Bayesian

Election Metrics has always wondered why Indian pollsters don’t use Bayesian methods to forecast elections. The basic premise is that there is a wealth of prior information we have about voting patterns of a particular population, and if we were to take into account such information, we can do reliable opinion polls with much smaller sample sizes.

In a Bayesian opinion poll, pollsters use prior knowledge of a population to construct a prior distribution of how the population will vote. Then, after they interview a random sample of the target population, they update the distribution (to get a so-called posterior distribution), and use this to make the forecasts.

There are two advantages with using such an approach. First, the sample size required for a certain degree of accuracy is much smaller, and secondly, pollsters make use of any prior information they have.

In Bayesian opinion polls, sometimes the survey sample need not even be random. If pollsters can identify that the preferences of a particular set of people reflect that of the larger population, they can precisely target such people in the survey without losing accuracy. The downside with Bayesian methods, of course, is that constructing prior distributions can sometimes be an art rather than a science, and errors could have a cascading effect.

Bayesian forecasting is coming to the fore for the first time in India thanks to a polling agency called Today’s Chanakya that created a storm by being the only polling agency to remotely call the elections right. In December, it was also the only agency to forecast correctly the composition of the Delhi state assembly.

While the consensus forecast gave the National Democratic Alliance somewhere between 270 and 300 seats, Chanakya stuck its neck out and predicted that the coalition would get 340, a number close to its final tally of 336.

Now, Chanakya has put out its predictions for both seat and vote distributions according to its survey in large states, and these numbers don’t tally.

In Uttar Pradesh, for example, the agency has reasonably correctly forecast that the NDA will win 70 seats (the formation won 73), but the vote share they have ascribed to the NDA based on their survey is 34%, much lower than the 40% that the other pollsters (CSDS-Lokniti and Hansa) forecast. We are not analysing the results of any other pollster since they did not disclose their vote-share data.

Using a 34% voteshare in Uttar Pradesh to forecast 90% of the seats can be explained by only two factors. First, it could be a case of sampling error cancelling out the modelling error. Chanakya’s vote-to-seat conversion algorithm (the hardest part of an opinion poll in India) might have overestimated the seat impact of vote-share and cancelled out their significant underestimation of the NDA’s voteshare.

The other (and more likely) explanation is that Chanakya is actually using Bayesian statistics to come up with its forecasts. The explanation here is that the agency’s sample is highly targeted at a particular population of the state, and its finding that 34% of this population said they voted for the Bharatiya Janata Party (BJP) led them to conclude that the BJP would win 70 seats from Uttar Pradesh. This is quite plausible under Bayesian forecasting conditions.

To check which of the above hypotheses is more likely to be true, we will check Chanakya’s vote and seat numbers in a few other states. This is again a Bayesian process.

In Gujarat, 45% of Chanakya’s respondents said they would vote for the BJP (and 34% for the Congress). Based on this, Chanakya predicted that the BJP would get all 26 seats in the state. CSDS and Hansa, on the other hand, predicted 53% and 57% voteshare, respectively, for the BJP, but translated that to only 23 and 22 seats, respectively, from the state.

In Punjab, Chanakya predicted the Aam Aadmi Party would get five seats, and the NDA another five. As it turned out, the NDA got six and the AAP got four. Chanakya’s prediction that the Congress would get three seats in Punjab was correct. CSDS and Hansa overestimated the fortunes of the NDA and the Congress, at the cost of AAP. In terms of vote-share prediction, however, Hansa was significantly superior to Chanakya.

Both these examples indicate that Chanakya, despite not getting the vote shares right, got the final seat distributions better than the other two agencies. Considering that it got their seatshare predictions right to a large degree in each state (there were a couple of exceptions), it is more likely that it used some kind of Bayesian statistics rather than pulling a number out of thin air.

The verdict

So, which was the best exit polling agency? It depends on whether we are looking at vote shares or seats. In order to objectively measure the quality of each polling agency, we use a metric called total absolute deviation (TAD), which is adapted from the popular statistical concept called mean absolute deviation. To calculate this, for each state we calculate the absolute value of the difference between the forecast and actual values for each party. Then we add up these absolute differences to come up with the total absolute deviation for each polling agency for each state.

Graphic by Ahmed Raza Khan/Mint

For example, let us look at Uttar Pradesh. Table 1 shows how the TAD for vote-share in Uttar Pradesh is calculated, and Table 2 shows how the TAD for seats for Uttar Pradesh is calculated.

The lower the TAD, the lower is the forecast error. Thus, for Uttar Pradesh, we find that in terms of voteshare, Hansa is the best polling agency, but in terms of seatshare prediction, Chanakya significantly outperforms the other two.

How do these three polling agencies fare across states? Table 3 shows the TAD for seats in the bigger states, while Table 4 shows TAD for vote-share. As a further summary, we have simply added up the TAD for each state for each pollster to give an overall score. From tables 3 and 4, the verdict as to who the best polling agency in India is clear.

When it comes to estimating vote-share, no one does it better than Hansa. In all states but Jharkhand, its estimates of the voteshares are no worse than the estimates of either of the other two. Interestingly, by this metric, Chanakya is in the last place.

Before we declare Hansa the winner, however, we should also look at the costs, and hence the sample sizes of the polls. While Today’s Chanakya interviewed a total of 38,984 respondents, CSDS-Lokniti interviewed a total of only 22,295 respondents for its post-poll.

Hansa separately conducted an exit poll and a post poll (opinion poll conducted after the elections are over at people’s homes), and the sample size for these two polls put together was a mammoth 1,55,452.

While in terms of overall vote-share prediction, Hansa put up the best performance, it should be noted that it came at a much higher cost compared with competitors.

If we look at number of seats, however, which is what ultimately matters in an Indian election, the hands down winner is Chanakya. By our metric, it has a total score of 94 (the lower the better). Interestingly, this is primarily due to three states that they got horribly wrong—West Bengal, Odisha and Tamil Nadu. Luckily for Chanakya, the NDA is weak in all these states, and thus its errors in these states did not affect its overall prediction for the number of seats the NDA would get.

That it has got these states wrong also suggests one thing. In order to do Bayesian forecasting, domain knowledge is necessary. Could Chanakya have got these states wrong because of lack of domain knowledge? Chanakya is based in Delhi and has done spectacularly well in northern states.

Again, the fact that it has got wrong states where it lacks domain knowledge perversely suggests it is likely to have used a Bayesian forecasting system.

In the post-election chatter on social media, someone declared that Chanakya is India’s Nate Silver, and this thought got a fair bit of traction. The advantage Chanakya had, however, was that it was commissioning its own survey, a luxury Silver didn’t have when he was predicting the US presidential elections in 2008 and 2012.

Based on available data, Election Metrics still sticks to its prediction that there is no Nate Silver in India.

Postscript: Sharon Bertsch McGrayne has written a masterful history of the Bayes’ Theorem titled The theory that would not die.