Veteran Bias in MLB Umpiring: Pitchers

With a little kid at home I don't have much time to enjoy televised baseball nowadays, but I used to watch a decent amount of NPB baseball. One of the things that becomes pretty obvious watching NPB for any period of time is that foreign pitchers can't get a call and foreign hitters face a much larger strike zone especially in their first year. Also, older players seem to get way more calls, which combined with the soccer boom around 2000 that decreased baseball participation here and MLB posting, means there are many guys in their 40s still in the league. This leads to a sort of farce given how Japanese baseball broadcasts work. Many NPB viewers watch MLB as well since games featuring Japanese stars are on TV in the morning, meaning there is viewer demand for pitchfx-style pitch charts with ball tracking. However, most of the NPB broadcasts still don't have access to a pitch tracking system.

What this means is that there is someone in the back-end who creates the pitch tracking charts, presumably by hand as the pitches come in, which are then posted on TV. The result is that if the pitch was called a strike, in the chart, it generally shows up as having been in the official strike zone, regardless of where the pitch actually was (note this isn't always the case as sometimes they mess up). In this way no one can question the accuracy of the call. Out of sight, out of mind. They do occasionally do a subtle slow motion replay after egregiously bad calls where the announcers inevitably say something like "that one was certainly right on the corner there." I think this is a very powerful example of how things work in Japan compared to the West.

Anyway there's not much more one can say about this topic until they install a pitch-tracking system in the NPB and release the data to the public, which they won't do for this exact reason. But having shown it is unlikely umpires are racist here in the past, we can explore if an age effect exists in MLB. This may have important handicapping implications. For example, in a game with a young pitcher facing an old pitcher, if old umpires favor old pitchers more, that would be a solid handicapping angle. Even a couple percentage points of strike percentage is enough to give a big edge (spoiler: it doesn't work this way sadly so if you are looking for free money there is none to be had here). In a future article we will discuss hitter age bias as well.

Building a Strike Call Model

I recently purchased the amazing book Applied Predictive Modeling which I can't recommend highly enough to anyone with some basic statistical knowledge who is looking to learn to use some of the more fancy next-gen models that win all the Kaggle competitions as well as sharpen their knowledge on the more straightforward tools of the trade such as linear regression. The best part of this book is that the hard math is thrown aside in favor of understanding how to use the tools better. And the caret R package the book introduces makes certain aspects of coding and validating these models a fraction as hard as it once was.

Most problems in sports handicapping are linear and I am finding that many of the more complex model types wind up reducing down to something not far from a linear regression if optimized properly for use in handicapping. But the baseball strike zone is definitely not a linear problem with complex interactions not only with the location of the pitch and edges of the zone, but also with the movement of the pitch and height and handedness of the batter. To take another run at the strike zone problem I set up a GBM model in caret using all of the common variables one might try, and let the black box do its magic. After my computer fans whirred for several hours and I tried all manner of tuning parameters I ended up with something that was 0.6% more accurate out of sample than a crude logistic regression I used and discussed here before. This is a huge difference, given that maybe 80-85% of calls are obvious and any model would predict the same thing on those plus the umpires randomly blow calls another 5-10% of the time. Where the GBM model really seems to do well is at the tails - far from the zone its prediction dips to ~0 much quicker than what I have tried in the past and it assigns >99% more often to pitches down the middle, and it does this in a way that is not "overfit" for the lack of a better word.

There are no coefficients to a GBM model for interpretation but we do have how influential each variable is in predicting balls and strikes. The most influential variables by far were location, but a bunch of other stuff that shouldn't really matter but does because umpires are humans came in as well:

Variable Influence
Pitch Height 60.62745
X-distance from Plate Center 36.35853
X-distance from Inside of Zone 0.848462
Strikes in Count 0.671077
Height of Bottom of Zone 0.325032
Height of Top of Zone 0.251261
Release Point Relative to Batter in Feet (flipped for lefties) 0.234282
Pitch Height Squared 0.193349
Balls in Count 0.122402
Z-Movement of Pitch 0.118963
Year of Game 0.108137
Movement of Pitch 0.082047
Month of Game 0.059003

While the location of the pitch makes up 97.2% of the prediction, and more like 97.7% if you include the batter's height, other factors like the movement of the pitch and count are important as well in getting the last few marginal calls right.

Measuring the Influence of Age

With a prediction of how likely each pitch is to be a strike, we can now look at how many calls pitchers and hitters got above or below that prediction. If more strikes were called than what we predicted that would indicate pitchers are favored, while if fewer strikes are called hitters would be favored. Just in case you don't believe the model here or think models are trash, we also will compare called strike percentage to whether the pitch was within the "official" strike zone rather than our predicted strike zone. Our sample is all games since 2015 and we find the age of each pitcher and hitter using the excellent "People" file from the Chadwick Baseball Bureau. I cannot emphasize the thanks I have to them for providing this file to us, they are truly doing God's work.

Called Strikes Above Average by Age

Note we are excluding pitchers younger than 22 or older than 37 due to sample size. We see a steady upward trend indicating the older the pitcher, the more calls above expected the pitcher gets. The oldest pitchers get almost 1.5% more calls than the youngest pitchers in this sample. A decent estimate is that one pitch called a strike instead of a ball is worth around 0.14 runs. So a veteran starter who throws 50 taken pitches in a game gets an extra 0.75 calls times 0.14 runs or benefits ~0.1 runs per start compared to a young starter. This happens regardless of whether you use my model-predicted zone or the official zone.

It is clear older pitchers get more calls on marginal pitches, but this does not necessarily mean this is due to their age. I haven't found any hard analysis on this around the web but at least in the early days of when I watched baseball in the 90s, it was well known that the best control pitchers always got an extra six inches off the corner due to their reputation. More recently the framing boom has led to an understanding that hitting the catcher's glove gets more calls, even if that glove wasn't necessarily in the strike zone but the pitch was. And older pitchers, if for no other reason than because they have worse stuff and therefore need to have better control to hang around, have better control, at least if pre-season Steamer walk rate predictions are to be believed:

Of course, the walk rate decline may have nothing to do with control and everything to do with these pitchers getting more calls. It turns out that an extra 0.75 strikes per game is not enough to cause a walk rate change this big and veteran pitchers really are more accurate. One way to prove this is to look at situations where no pitcher would intentionally throw a ball - 3-0 and 3-1 counts. We find that older pitchers "hit the target" about 2% more often in this situation:

Zone Percentage

Older pitchers get more calls, but this might be because they are more accurate pitchers. It turns out that more accurate pitchers get more calls across age groups, but older pitchers seem to get more calls even after controlling for accuracy:

Strikes Gained above Predicted
Pitcher Age
Less than 26 26-32 33+
Walk Rate > 9% -0.94% -0.46% -0.17%
Walk Rate 7-9% -0.35% -0.01% 0.64%
Walk Rate < 7% -0.03% 0.65% 1.06%

The final question is whether it is age that leads to this advantage, or years of experience in the league. Age is so correlated to years in the league that there are no conclusions to be made with simple charts. One way to test the influence of all three variables here (Age, experience, and walk rate) is to put them all in a model predicting strikes called along with the original prediction of strike called percentage. Given the choice of these variables the model will choose the coefficients, if any, that use these variables to best predict the chance the pitch is called a strike. It turns out the model picks a mix of all three:

Estimate z value
Pitcher Age 0.005368 3.26
Pitcher Experience in Years 0.008564 5.35
Pitcher Walk Rate, Steamer -4.1676 -22.3

Experience in years tends to be a little more influential than age but a model combining both performs better than a model using experience alone. But age and experience level is so correlated that there is little to choose between either. Either way, the pitcher's walk rate is also extremely important. At the extremes each of these variables can influence the chance the umpire calls a strike on the same pitch by up to about 2%, which is around what we estimated in the charts earlier.

Do Older Umps Favor Old Pitchers More?

We can now move on to the important handicapping question of whether older umpires favor older pitchers, or vice versa. Umpires tend to be between 30 and 65 with a noted dip involving umpires in their early-mid 40s:

Age of Umpire at Game Time

 We can now look at strikes called above predicted by age of umpire and pitcher:

Strikes Gained above Predicted
Pitcher Under 26 26 to 32 33 or Older
Umpire Under 40 -0.58% 0.07% 0.68%
40 to 50 -0.69% 0.08% 0.66%
Umpire Over 50 -0.43% 0.04% 0.68%

It looks like umpires of all ages have respect for the aged with no obvious advantage to be found. Perhaps younger umpires are more intimidated by veteran players and this is offset by the friendliness that older umpires have with veteran pitchers. Either way, there is no edge to be had here. Also, old umpires don't seem to have smaller or larger zones than new umpires which I found to be somewhat surprising.

Summary

Veteran pitchers do get more calls than rookie pitchers, even controlling for veteran pitchers being more accurate in general. Still it is possible that the increased number of calls is due to veteran pitchers having a better knowledge of the umpire's tendencies, or what kinds of pitches (and pitch sequences) in general are likely to get the call. It is also possible that there is some kind of prior effect, where veteran pitchers are even more accurate than Steamer walk rate alone would predict when compared to rookie pitchers. My opinion is that there almost certainly is a bias after controlling for everything, because the same effect is present in hitting, and hitters don't have the ability to hit the catcher's glove.

One gambling conclusion that I can make with certainty is that accurate veteran pitchers will likely lose out when the automated strike zone is implemented, and if I were part of the MLBPA who mostly works to benefit veterans, I would band with the umpires to stop it. There are two factors at work here in opposing directions. While accurate pitchers get more calls controlling for the count as we showed here, they benefit less from being behind in the count less often as compared to inaccurate pitchers. Of pitches where the hitter doesn't swing, pitchers with a walk rate of below 7% throw 2.94% of their pitches in 3-1 and 3-0 counts where the count's influence is largest, while pitchers with a walk rate above 9% throw 4.33% of their pitches in these counts. But the bias toward strikes in these late counts is only about 10% at most meaning about a 0.14% benefit in called strikes gained across all pitches to wild throwers from facing worse counts - while accurate pitchers gain about 1% or more across all pitches due to umpires favoring accuracy. So when the automated strike zone is implemented, knock accurate pitchers down a couple tenths of ERA in your projections compared to less accurate pitchers.