## Debunking Two Nate Silver Myths

I followed our election pretty closely. My favorite source of information was Nate Silver. He's a smart guy, and I love the analysis that he does. He's using solid math in a good way to produce excellent results. But in the aftermath of the election, I've seen a lot of bad information going around about him, his methods, and his result.

First: I keep seeing proclamations that "Nate Silver proves that big data works".

Rubbish.

There is nothing big data about Nate's methods. He's using straightforward Bayesian methods to combine data, and the number of data points is remarkably small.

Big data is one of the popular jargon keywords that people use to appear smart. But it does actually mean something. Big data is using massive quantities of information to find patterns: using a million data points isn't really big data. Big data means terabytes of information, and billions of datapoints.

When I was at Google, I did log analysis. We ran thousands of machines every day on billions of log records (I can't say the exact number, but it was in excess of 10 billion records per day) to extract information. It took a data center with 10,000 CPUs running full-blast for 12 hours a day to process a single days data. Using that data, we could extract some obvious things - like how many queries per day for each of the languages that Google supports. We could also extract some very non-obvious things that weren't explicitly in the data, but that were inferrable from the data - like probable network topologies of the global internet, based on communication latencies. That's big data.

For another example, look at this image produced by some of my coworkers. At foursquare, we about five million points of checkin data every day, and we've got a total of more than 2 1/2 billion data points. By looking at average checkin densities, and then comparing that to checkin densities after the hurricane, we can map out precisely where in the city there was electricity, and where there wasn't. We couldn't do that by watching one person, or a hundred people. But by looking at the patterns in millions and millions of records, we can. That is big data.

This doesn't take away from Nate's accomplishment in any way. He used data in an impressive and elegant way. The fact is, he didn't need big data to do this. Elections are determined by aggregate behavior, and you just don't need big data to predict them. The data that Nate used was small enough that a person could do the analysis of it with paper and pencil. It would be a huge amount of work to do by hand, but it's just nowhere close to the scale of what we call big data. And trying to do big data would have made it vastly more complicated without improving the result.

Second: there are a bunch of things like this.

The point that many people seem to be missing is that Silver was not simply predicting who would win in each state. He was publishing the odds that one or the other candidate would win in each statewide race. That's an important difference. It's precisely this data, which Silver presented so clearly and blogged about so eloquently, that makes it easy to check on how well he actually did. Unfortunately, these very numbers also suggest that his model most likely blew it by paradoxically underestimating the odds of President Obama's reelection while at the same time correctly predicting the outcomes of 82 of 83 contests (50 state presidential tallies and 32 of 33 Senate races).

Look at it this way, if a meteorologist says there a 90% chance of rain where you live and it doesn't rain, the forecast wasn't necessarily wrong, because 10% of the time it shouldn't rain - otherwise the odds would be something other than a 90% chance of rain. One way a meteorologist could be wrong, however, is by using a predictive model that consistently gives the incorrect probabilities of rain. Only by looking a the odds the meteorologist gave and comparing them to actual data could you tell in hindsight if there was something fishy with the prediction.

Bzzt. Sorry, wrong.

There are two main ways of interpreting probability data: frequentist, and Bayesian.

In a frequentist interpretation, saying that an outcome of an event has a probability X% of occuring, you're saying that if you were to run an infinite series of repetitions of the event, then on average,
the outcome would occur in X out of every 100 events.

The Bayesian interpretation doesn't talk about repetition or observation. What it says is: for any specific event, it will have one outcome. There is no repetition. But given the current state of information available to me, I can have a certain amount of certainty about whether or not the event will occur. Saying that I assign probability P% to an event doesn't mean that I expect my prediction to fail (100-P)% of the time. It just means that given the current state of my knowledge, I expect a particular outcome, and the information I know gives me that degree of certainty.

Bayesian statistics and probability is all about state of knowledge. The fundamental, defining theorem of Bayesian statistics is Bayes theorem, which tells you, given your current state of knowledge and a new piece of information, how to update your knowledge based on what the new information tells you. Getting more information doesn't change anything about whether or not the event will occur: it will occur, and it will have either one outcome or the other. But new information can allow you to improve your prediction and your certainty of that prediction's correctness.

The author that I quoted above is being a frequentist. In another section of his articple, he's more specific:

...The result is P= 0.199, which means there's a 19.9% chance that it rained every day that week. In other words, there's an 80.1% chance it didn't rain on at least one day of the week. If it did in fact rain everyday, you could say it was the result of a little bit of luck. After all, 19.9% isn't that small a chance of something happening.

That's frequentist intepretation of the probability - which makes sense, since as a physicist, the author is mainly working with repeated experiments - which is a great place for frequentist interpretation. But looking at the same data, a Bayesian would say: "I have an 19.9% certainty that it will rain today". Then they'd go look outside, see the clouds, and say "Ok, so it looks like rain - that means that I need to update my prediction. Now I'm 32% certain that it will rain". Note that nothing about the weather has changed: it's not true that before looking at the clouds, 80.1 percent of the time it wouldn't rain, and after looking, that changed. The actual fact of whether or not it will rain on that specific day didn't
change.

Another way of looking at this is to say that a frequentist believes that a given outcome has an intrinstic probability of occurring, and that our attempts to analyze it just bring us closer to the true probability; whereas a Bayesian says that there is no such thing as an intrinsic probability, because every event is different. All that changes is our ability to make predictions with confidence.

One last metaphor, and I'll stop. Think about playing craps, where you're rolling two six sided dice.
For a particular die, a frequentist would say "A fair die has a 1 in 6 chance of coming up with a 1". A
Bayesian would say "If I don't know anything else, then my best guess is that I can be 16% certain that a 1
will result from a roll." The result is the same - but the reasoning is different. And because of the difference in reasoning, you can produce different predictions.

Nate Silver's predictions of the election are a beautiful example of Bayesian reasoning. He watched daily polls, and each time a new poll came out, he took the information from that poll, weighted it according to the historical reliability of that poll in that situation, and then used that to update his certainty. So based on his data, Nate was 90% certain that his prediction was correct.

## Fuzzy Logic vs Probability

In the comments on my last post, a few people asked me to explain the difference between fuzzy logic and probability theory. It's a very good question.

The two are very closely related. As we'll see when we start looking at fuzzy logic, the basic connectives in fuzzy logic are defined in almost the same way as the corresponding operations in probability theory.

The key difference is meaning.

There are two major schools of thought in probability theory, and they each assign a very different meaning to probability. I'm going to vastly oversimplify, but the two schools are the frequentists and the Bayesians

First, there are the frequentists. To the frequentists, probability is defined by experiment. If you say that an event E has a probability of, say, 60%, what that means to the frequentists is that if you could repeat an experiment observing the occurrence or non-occurrence of E an infinite number of times, then 60% of the time, E would have occurred. That, in turn, is taken to mean that the event E has an intrinsic probability of 60%.

The other alternative are the Bayesians. To a Bayesian, the idea of an event having an intrinsic probability is ridiculous. You're interested in a specific occurrence of the event - and it will either occur, or it will not. So there's a flu going around; either I'll catch it, or I won't. Ultimately, there's no probability about it: it's either yes or no - I'll catch it or I won't. Bayesians say that probability is an assessment of our state of knowledge. To say that I have a 60% chance of catching the flu is just a way of saying that given the current state of our knowledge, I can say with 60% certainty that I will catch it.

In either case, we're ultimately talking about events, not facts. And those events will either occur, or not occur. There is nothing fuzzy about it. We can talk about the probability of my catching the flu, and depending on whether we pick a frequentist or Bayesian interpretation, that means something different - but in either case, the ultimate truth is not fuzzy.

In fuzzy logic, we're trying to capture the essential property of vagueness. If I say that a person whose height is 2.5 meters is tall, that's a true statement. If I say that another person whose height is only 2 meters is tall, that's still true - but it's not as true as it was for the person 2.5 meters tall. I'm not saying that in a repeatable experiment, the first person would be tall more often than the second. And I'm not saying that given the current state of my knowledge, it's more likely than the first person is tall than the second. I'm saying that both people possess the property tall - but in different degrees.

Fuzzy logic is using pretty much the same tools as probability theory. But it's using them to trying to capture a very different idea. Fuzzy logic is all about degrees of truth - about fuzziness and partial or relative truths. Probability theory is interested in trying to make predictions about events from a state of partial knowledge. (In frequentist terms, it's about saying that I know that if I repeated this 100 times, E would happen in 60; in Bayesian, it's precisely a statement of partial knowledge: I'm 60% certain that E will happen.) But probability theory says nothing about how to reason about things that aren't entirely true or false.

And, in the other direction: fuzzy logic isn't particularly useful for talking about partial knowledge. If you allowed second-order logic, you could have fuzzy meta-predicates that described your certainty about crisp first-order predicates. But with first order logic (which is really where we want to focus our attention), fuzzy logic isn't useful for the tasks where we use probability theory.

So probability theory doesn't capture the essential property of meaning (partial truth) which is the goal of fuzzy logic - and fuzzy logic doesn't capture the essential property of meaning (partial knowledge) which is the goal of probability theory.

• Scientopia Blogs