# Do MLB Playoff Odds Work?

One of the more fan-accessible advanced stats are playoff odds [technically postseason probabilities]. Playoff odds range from 0% – 100% telling the fan the probability that a certain team will reach the MLB postseason. These are determined by creating a Monte Carlo simulation which runs the baseball season thousands of times [FanGraph runs theirs 10,000 times]. In those simulations, if a team reaches the postseason 5,000 times, then the team is predicted to have a 50% probability for making the postseason. FanGraphs and Baseball Prospectus run these every day, so playoff odds can be collected every day and show the story of a team’s season if they are graphed.

Above is a composite graph of the three different types of teams. The Dodgers were identified as a good team early in the season and their playoff odds stayed high because of consistently good play. The Brewers started their season off strong but had two steep drop offs in early July and early September. Even though the Brewers had more wins than the Dodgers, the FanGraphs playoff odds never valued the Brewers more than the Dodgers. The Royals started slow and had a strong finish to secure themselves their first postseason birth since 1985. All these seasons are different and their stories are captured by the graph. Generally, this is how fans will remember their team’s season — by the storyline.

Since the playoff odds change every day and become either 100% or 0% by the end of the season, the projections need to be compared to the actual results at the end of the season. The interpretation of having a playoff probability of 85% means that 85% of the time teams with the given parameters will make the postseason.

I gathered the entire 2014 season playoff odds from FanGraphs, put their predictions in buckets containing 10% increments of playoff probability. The bucket containing all the predictions for 20% bucket means that 20% of all the predictions in that bucket will go on to postseason. This can be applied to all the buckets 0%, 10%, 20%, etc.

Above is a chart comparing the buckets to the actual results. Since this is only using one year of data and only 10 teams made the playoffs, the results don’t quite match up to the buckets. The desired pattern is encouraging, but I would insist on looking at multiple years before making any real conclusions. The results for any given year is subject to the ‘stories’ of the 30 teams that play that season. For example, the 2014 season did have a team like the 2011 Red Sox, who failed to make the postseason after having a > 95% playoff probability. This is colloquially considered an epic ‘collapse’, but the 95% probability prediction not only implies there’s chance the team might fail, but it PREDICTS that 5% of the teams will fail. So there would be nothing wrong with the playoff odds model if ‘collapses’ like the Red Sox only happened once in a while.

The playoff probability model relies on an expected winning percentage. Unlike a binary variable like making the postseason, a winning percentage has a more continuous quality to the data, so this will make the evaluation of the model easier. For the most part most teams do a good job staying around the initial predicted winning percentage coming really close to the prediction by the end of the season. Not every prediction is correct, but if there are enough good predictions the predictive model is useful. Teams also aren’t static, so bad teams can become worse by trading away players at the trade deadline or improve by acquiring those good players who were traded. There are also factors like injuries or player improvement, that the prediction system can’t account for because they are unpredictable by definition. The following line graph allows you to pick a team and check to see how they did relative to the predicted winning percentage. Some teams are spot on, but there are a few like the Orioles or Red Sox which are really far off.

The residual distribution [the actual values – the predicted values] should be a normal distribution centered around 0 wins. The following graph shows the residual distribution in numbers of wins, the teams in the middle had their actual results close to the predicted values. The values on the edges of the distribution are more extreme deviations. You would expect that improved teams would balance out the teams that got worse. However, the graph is skewed toward the teams that become much worse implying that there would be some mechanism that makes bad teams lose more often. This is where attitude, trades, and changes in strategy would come into play. I’d would go so far to say this is evidence that soft skills of a team like chemistry break down.

Since I don’t have access to more years of FanGraphs projections or other projection systems, I can’t do a full evaluation of the team projections. More years of playoff odds should yield probability buckets that reflect the expectation much better than a single year. This would allow for more than 10 different paths to the postseason to be present in the data. In the absence of this, I would say the playoff odds and predicted win expectancy are on the right track and a good predictor of how a team will perform.

# Statistics — Probability vs. Odds

Probability and odds are two basic statistic terms to describe the likeliness that an event will occur. They are often used interchangeably in causal conversation or even in published material. However, they are not mathematically equivalent because they are looking at likeliness in different contexts. In everyday conversation when numbers or values aren’t given, the two terms are synonymous . If an event has a high probability, then it has high odds for happening. The incorrect usage arises when a person ascribes a mathematical value to either the odds or probability they are discussing. Hopefully, if you aren’t quite sure what the exact mathematical difference is, this will clear it up for you.

Probability is defined as the fraction of desired outcomes in the context of every possible outcome with a value between 0 and 1, where 0 would be an impossible event and 1 would represent an inevitable event. Probabilities are usually given as percentages. [ie. 50% probability that a coin will land on HEADS.] Odds can have any value from zero to infinity and they represent a ratio of desired outcomes versus the field. Odds are a ratio, and can be given in two different ways: ‘odds in favor’ and ‘odds against’. ‘Odds in favor’ are odds describing the if an event will occur, while ‘odds against’ will describe if an event will not occur. If you are familiar with gambling, ‘odds against’ are what Vegas gives as odds. More on that later. For the coin flip odds in favor of a HEADS outcome is 1:1, not 50%.

Visual Math

Simple probability of event A occurring is mathematically defined as:

$P(A) = \frac{Number \ of \ Event \ A}{Total \ Number \ of \ Events}$

The best way to illustrate this is with the classic marbles-in-a-bag example. The graphic below depicts all the marbles in an opaque bag that one marble will be pulled out of. There are 6 blue, 3 red, 2 yellow, and 1 green for a total of 12 marbles in the bag.

The probability of pulling a red marble would be calculated by taking the total number of red marbles and dividing it by the total number of marbles.

OR

$P(RED) = \frac{3 \ RED \ marbles}{12 \ TOTAL \ marbles} = 25\%$.

Notice that the probability calculation includes the red marbles in the denominator of the calculation, because probability considers the context of the entire event space. Odds, on the other hand, are the ratio of favorable outcomes to unfavorable outcomes. The denominator contains ONLY the marbles that aren’t the favorable outcomes. Odds uses the contexts of good outcomes and bad outcomes. Written as fractions, these two values are completely different. Probability is 1/4 while odds in favor are 1/3. You can see how mistakenly interchanging the terms could give the wrong information. The ‘odds in favor’ of RED would be mathematically calculated by

OR

$Odds\_Favor(RED) = \frac{3 \ RED \ marbles}{9 \ NOT \ RED \ marbles} = 1:3$.

To find ‘odds against’ you would simply flip odds in favor upside down and this describes the odds of the event not occurring.

OR

$Odds\_Against(RED) = \frac{9 \ NOT \ RED \ marbles}{3 \ RED \ marbles} = 3:1$.

Gambling

‘Odds against’ are commonly are used in the context of gambling. When you hear that the Seattle Seahawks Vegas odds to win the Super Bowl are 5:1 [Retrieved 9/19/2014], the 5:1 is referring to the ‘odds against’ Seattle winning the Super Bowl. Using some quick math we could determine the probability of Seattle winning the Super Bowl would be 1/6 or 16.7%.

Vegas odds are technically payoff odds, because they describe the payout if you were to win the bet. The payout on the Seahawks would win you $5 for every$1 bet on the Seattle winning the Super Bowl. They aren’t true odds, since no one is really sure what the true odds are, because you can’t simply count and weigh the possibilities like with the bag of marbles. The payoff will increase when the event becomes less likely. If you could create a reliable predictive model that told you the Seahawks actually had a 20% probability to win the Super Bowl, you could bet on the Seahawks, knowing that their actual probability to win is better than what Vegas is giving them. And if you made enough bets like this you could beat Vegas.

Mathematical Relationship

I stated earlier that probability and odds were colloquially interchangeable when values aren’t given. This is true, because the two are mathematically related. Odds can be computed from probability and probability from odds.

$P(A) = \frac{Odds\_Favor(A)}{1 + Odds\_Favor(A)}$

$Odds\_Favor(A) = \frac{P(A)}{1 - P(A)}$

Using the RED marble example [P(RED) = 1/4 and Odds_Favor(RED) = 1/3] we can demonstrate how these are equivalent:

$P(RED) = \frac{1/3}{1 + 1/3} = \frac{1/3}{4/3} = \frac{1}{4}$

$Odds\_Favor(RED) = \frac{1/4}{1 - 1/4} = \frac{1/4}{3/4} = \frac{1}{3}$

# Probability and Sunday Night Baseball

There’s nothing I like more than a bases-loaded, no-outs situation in baseball. This might be my favorite situation/stat no one realizes. There’s around a 15% chance that the team who has the bases loaded will not score at all that inning! 15% might not seem like much, but over the course of the season it happens often.

Let’s set the scene: Bottom of the ninth, down by two, the Pirates knock in a run and get McCutchen on 1st with no outs to move within one run of the Cardinals.

This is a win probability graph FanGraphs has for every game. I’m not entirely sure what all they consider when calculating a win probability, but it mirrors the data I have, so there’s not much to discuss there. Clearly, the closest they came to winning the game was after Barmes walked putting Alvarez, the winning run on 2nd.

source: FanGraphs

According my run probability calculations for 2013, the probability to score at least one run with bases loaded and no outs was lower than the Pirates batting with a runner on second/third or first/third and no outs [Probabilities –123: 77.9%, 1_3: 82.4%, _23: 90.9%] The advantage of having the bases loaded is a walk or HBP brings a runner home, but the downside is there is an easy force at home. That would hurt the Pirates in this instance because Mercer didn’t hit the ball past the pitcher’s mound making for an easy 1-2-3 double play.