Do Homer Umps Exist?

It is well known to any decent baseball totals gambler, or astute baseball watcher for that matter, that the interpretation of the strike zone varies widely from umpire to umpire. In the past, pounding the under on the announcement of infamous umps like Bill Miller or Phil Cuzzi was a "free money" betting angle. Nowadays everyone is pretty sharp and the umpire tends to be mostly priced in, although you still have to account for the umpire to avoid making "fish" bets. As for the size of the impact an umpire can have on a baseball total, the most extreme umpires used to affect a total plus or minus about one run per game, a huge edge given the umpire was under-appreciated by the market. In recent years, through league review or other scrutiny, most of the bad outlier umps have been fixed, and nowadays an ump who would adjust one's total projection by more than half a run is rare.

The impact of the umpire on totals is well-known, but a more interesting question is whether so-called "homer" umps currently exist - that is, umpires who favor the home team more than the away team, or vice versa, in a repeatable manner. We will study this concept here, through introducing an umpire model, then testing whether umpires who showed a home or road-favoring tendency in past games actually go on to favor the same side in future games.

There are a bunch of cool ways to model the "average" strike zone across the internet. To find the umpire's tendency, one then compares the number of strikes the umpire actually called versus the average expectation across the league. I will introduce my way here, which is for lack of a better word, ghetto, but has always performed better at predicting the umpire's influence on future games than anything else I tried, probably because I don't know how to use the complicated techniques well enough to get full value out of them.

The first thing to realize in strike call prediction is that pitches over the heart of the plate, or well off the plate, don't matter. The probabilities are too close to 0 or 100% to affect one's prediction much. The important pitches are those near the edges of the zone. To simply the problem, I throw out all pitches other than those within 0.4 feet of the edge of the official strike zone:

 Pitches Kept

I then create a second set of coordinates, relative to how far above/below and outside/inside the pitch is from the official strike zone. These coordinates are then flipped so that a pitch farther away from the dead-center of the zone is always positive, and a pitch nearer the dead-center of the zone is negative. Finally, because umpire behavior is different based on whether the pitch is above or below the zone, or inside or outside of the zone, I track that aspect as well. The final result is a model that uses five variables - how far off the edge of the zone the pitch is in the y-direction, how far off the edge of the zone the pitch is in the x-direction, whether the pitch is high or low, whether the pitch is inside or outside, and finally, the count of the pitch. The benefit of this approach is that the probability of a pitch being called a strike, in this coordinate system and with this subset of pitches, falls fairly neatly into simple polynomial equations that can be plugged into a logistic regression.

With the model in hand, we can use it to predict the probability a pitch in these edge regions is called a strike, and compare it to the number of strikes each umpire actually called in each game, splitting the results by whether the home or road team is pitching. We can then calculate the called strikes gained over expectation of each umpire, for both home pitches, away pitches, and both types.

As gamblers, we are most interested in whether we can use past statistics, in this case homer tendency, to predict the future. This means when testing whether our umpire tendency measurement has any value, we need to use statistics up to the point in time that the game took place to measure each umpire. A question is how far back in time we should go in compiling these statistics. There are only about 30 marginal calls per game, so if we choose a sample too short, we won't have enough calls to get an accurate picture of the umpire's true tendency. However, if we use too long of a sample, the umpire's tendency may have changed over the time period in question, meaning we would be using statistics that no longer really reflect the true tendency of the umpire.

My studies have shown that umpires tend to be fairly consistent from year to year, so I prefer taking a large sample, and use all games within two calendar years (less than 730 days) before the game took place. Using these past stats one can calculate a past strikes gained above/below average statistic for each umpire, as it was at the time of each game.

Using this statistic, and limiting our sample only to games between 2015-2018 where we have at least 500 umpire called pitches in past games (which throws out most of 2015), we find that if one wants to find the number of strikes called above average on marginal pitches in the subsequent game, one should use about 70% of the umpire's tendency in past games. So for example, an umpire who called a strike 5% more often than the average umpire on these marginal pitches, after adjusting for the pitch location and count, in past games would be expected to call the same strike 3.5% more often in the current game. Not adjusting for how likely the pitch in the marginal area is to be called a strike in the current game (e.g. just looking at raw called strike percentage in the marginal area) we would still carry over about 70% of the difference. This is evidence that the approach used is measuring the umpire's tendency effectively.

This brings us to the key question, whether one can predict home/away tendency better using past home/away tendency, as compared to the "null case" where one is better off ignoring home/away tendency and just using the umpire's tendency across all pitches. There are two ways to test this. The first is to compare two models: one using the umpire's tendency only from home (or away) pitchers to estimate strikes called with home (or away) pitchers, compared to using the tendency across all pitches to estimate strikes called with home (or away) pitchers. If the model that leverages only the same-side tendency outperforms the model that uses the umpire's tendency across all pitches, that would be evidence that "homer umps" are real. This test suggests that they are not. No matter what, it is always more predictive to include the umpire's stats from all past pitches, not just the relevant home/away stats.

The second test is to use a combination model, using both the umpire's statistics across all games and the statistics from relevant home/away split, to predict the home/away strike call percentage in the next game. If the umpire's home/away tendency is real, the regression will catch on to this, and will use a portion of their home/away stats in addition to their overall stats in its equation to predict future called strike percentage. Once again, the models do not do this. Instead they tell us to use 70% of the strikes gained statistic across all pitches, and around 0% of the relevant home-away statistic.

In other words, there are no homer umpires, or to the extent that there are, the effect is not strong enough to worry about. To drive the point home, below is a table of all games from 2015-2018 where the umpire had at least 500 relevant strike calls in games leading up to the game. We have sorted umpires into five groups by home-away tendency. The average umpire calls a strike 0.5% more often in favor the home team, with a range between -5 and just over 5% in past games:

Homer Umps

Basically, no matter what the umpire's tendency was in past games, the best prediction to make is that they will favor the home team by about 0.5% in the following game. The set of umpires who favored the road team the most in past games actually called more strikes for the home team relative to any other group, which is probably just variance. Either way, while there may be homer umps, the impact is too small to be measured here, meaning it is probably also too small to affect the betting line in any meaningful way.