# Against All Odds — Upsets

It’s a great time to be a Dayton fan!  It’s the first time the school has reached the Sweet Sixteen since before all the Dayton fans I know where born…and they did it as an 11 seed!  Their game against Ohio State was the first game of the tournament to tip.  A little over two hours later, everyone’s brackets were busted.  After watching the tournament this weekend, I felt like there were a lot of upsets this year.   (Or at the very least my bracket was getting busted up pretty quickly.)  But are there really more upsets this year than normal?

First, I’m going to define an upset as any lower seed beating a higher seed.  I’m of the personal belief that 8/9, 4/5, and 1/2 match-ups shouldn’t count as upsets, but for this analysis, I’m going to consider these as possible upsets.  First, let’s look at how many upsets there were this year.  Through two rounds, there have been 13 upsets.  That’s one less than last year at this time, and just at the average (if you round).  So this is a rather average year.  Three of four 1-seeds are still alive — not too much different from what you might expect.

NCAA Tournament Upsets By Year

Historically, 1999 had the most upsets with 19 in the first weekend of play.  Nothing really stuck out like how Florida Gulf Coast got to the Sweet Sixteen as a 15-seed last year.  1991 had the fewest number of upsets in the first weekend with just nine.  All the upsets through out the years appear to be random noise fluctuating around an average of 12.8 upsets (out of 48 games played) in the first two rounds per year.  The conclusion you can draw from this is that the number of upsets is rather consistent over the years with not much systematic change from year to year.

Thirteen upsets is a lot; it’s almost 1/3 of all the games played this weekend.  Last week, I posted the probability that a seed would win in the first round of the tournament.  This was a linear relationship starting with an almost certain probability for the 1-seeds and then going to a 50/50 split for an 8/9 game.  On the surface it doesn’t seem like it almost 1/3 of the games would be upsets, but if you look at all the possibilities it will make more sense.

Let’s look at Dayton’s 11-seed.  A 11-seed has a historical 34% chance of upsetting a 6-seed in the first round, but when considering there are four distinct 6-11 seed match-ups each year there’s only a 19% chance that all 6-seeds will win their first round games.  In fact, the most likely scenario is that just one 6-seed will upset a 11 seed.  This year there were two 6-11 upsets which is the second most likely scenario at 30% (still more likely than not getting any upsets).

The following table depicts the probability of different scenarios for each first round seeding combination.  All the green area on the table is why everyone’s brackets bust every year.  Keep reading if you are interested in the math, otherwise you might want to bounce, because it’s gonna get boring.

NCAA Tournament First Round Win Probability by Seed

Still here?  Ok.  The basis for determining the probability of the upset scenario is the binomial distribution.  A binomial distribution requires two things, a binary outcome (hence the bi- prefix)  and a set probability of how that outcome is achieved.  The simplest example of a binomial distribution is determining the probability of successive coin flips.  The probability function is given as

$P(X) = (^n_k) p^k q^{1-k}$

The $(^n_k)$ term is the combination of n terms taken k-at a time.
$p$ is the probability of the event happening — the win probability
$q$ is the compliment of the event so in this cause it would be probability of losing
$n$ will be 4 since there are four games for a seed match up
$k$ will be 0-4 depending on how many upsets we are looking for.

Looking at the probability that two (and only two) 11-seeds upset 6-seeds that will be

$P(X) = (^4_2) (.34)^2 (1-.34)^2 = 6 * .34^2 * .66^2 = .302 = 30$%

You can derive this equation by writing out probability trees (if you remember those from high school math).  The problem with that method is that for each outcome (# of upsets = [0, 1, 2, 3, 4]) you have to write out the different combinations of games for each outcome.   This can get unwieldy quickly.  Binomial distributions can be used for many different applications, including the aforementioned coin-flip, likelihood of combinations of boy/girl babies, the probability that the ‘better’ team loses a 7-game playoff series, the likely number of winners for the lottery…so this will rear it’s head again for NHL, NBA, or MLB playoffs.

# 2014 NCAA Tournament Predictions — Monte Carlo

The process of simulating the NCAA tournaments involves two-steps.  The first is determining what statistical prediction model to use to determine the outcome of a game.  The second step is to simulate the entire tournament.   Simulating the tournament multiple times and keeping track of each outcome is called a Monte Carlo simulation. This simulates the entire tournament 10,000 times and tabulates the results from each round.

On to what the computer says [entire bracket png]!  Surprise, surprise the computer says almost everything that you might surmise by using your gut.  It predicts Florida winning  the entire tournament with almost a 20% probability, and it also predicts very few upsets.  Everything is pretty much what you would expect.  As I said earlier in the week, the committee does a pretty good job seeding everybody overall.

There are a few ‘undervalued’ teams the simulation has picked.  Villanova, a 2-seed , is projected to go to the Final Four.  All the other Final Four teams are 1-seeds, which is the seed with the highest probability to reach the Final Four historically.  The most dramatic prediction, I think, is the North Dakota State upset.  NDSt is a 12-seed.  12-seeds are the most frequently undervalued seed performing well above the expected winning percentage.  In betting on an upset, I’d be looking for a lower seed’s win probability to be higher than average for that seed.  However, NDSt’s win probability is not only higher than average, it’s higher than Oklahoma’s win probability.  I’m putting NDSt to win the first game in all my brackets today.

The rest is pretty obvious, but I’m going to be interested to see how this bracket does against what really happens.   And how simulations look after each round.

The bracket is broken into its four regions for viewing ease.  The winners are in green and the upset winners are in yellow.  The Final Four is the last graphic.