top of page
Kai Franke

Predicting Pitcher Performance Based on 'Stuff' Alone

If you follow me on Twitter or Instagram or are connected with me on LinkedIn, you might have seen that I have just started an analytics internship with the Minnesota Blizzard. One of my first projects that I decided to do for them was to predict the quality of a pitch based on the metrics of the pitch itself, not location, batter, sequence, etc. This week, I decided to show you guys the fastball and curveball predicted values.


First off, I decided that I am only going to use Rapsodo data because I wanted just evaluate how nasty the pitch actually is. So, the stats that I used were true spin and velocity. This can tell me what quality a pitch is independent of how the pitcher uses it. If their predicted performance is good, but they don’t get those results, then it’s most likely due to the pitcher throwing it in a bad spot, not sequencing well, or their arm slot not being consistent amongst all of their pitches. The stats that I tried to predict were xwOBA and Whiff%, we’ll call them pxwOBA and pWhiff%


It’s widely accepted that a fastball and curveball that spin very efficiently are good compared to having more of a gyro spin. Therefore in my calculation, pitches that have the best combination of high velocity and true spin (useful spin, spin rate * spin efficiency) would be given the best predicted stats. Also, I used ‘+’ stats of each metric as it would initially put them on an even playing field. Here is the formula I used to predict the stats for fastballs:


Fastball pxwOBA = 0.62011 + ((Velocity+) * -0.00202) + ((True Spin+) * -0.00071)

Fastball pWhiff% = -38.62874 + ((Velocity+) * 0.48009) + ((True Spin+) * 0.10864)


To get the constants for each of the Velocity+ and True Spin+, I went into R Studio and ran a linear regression to get numbers that would best predict xwOBA and Whiff%. In both equations, Velocity+ has a larger constant, this means that velocity is more important than true spin for predicting performance.


Now that we have the formulas to predict xwOBA and whiff% for pitchers, let’s calculate it out for all pitchers with at least 1,000 pitches! The leaderboard of the top and bottom 20 ordered by pxwOBA is shown here:


*pWhiff% average is 20.2, pxwOBA average is .350

**Whiff% average is 20.2, xwOBA average is .347


As you can see, there are two pitchers that you’d expect to be at the top: Gerrit Cole and Justin Verlander. Hansel Robles is a guy who isn’t talked about much that has a great fastball as well. You may not recognize many names at the bottom, but a couple that are talked about as having slower fastballs are Dallas Keuchel and Kyle Hendricks. Since they don’t throw as hard and therefore have less RPMs than some of top guys, their predicted stats are very low. However, both of them beat their predictions as they can command the fastball, those predicted stats would be closer to correct for pitchers who don’t have as much skill, like the pitchers around them such as Erick Fedde or Mike Leake.


A player that really surprised me on these predictions is Kyle Ryan of the Chicago Cubs. Every time I would make edits to this formula, he’d come in last, despite beating out his pWhiff% by almost 10 points and his pxwOBA by .104. He ended up being above average in the actual statistics which shows that he uses his fastball extremely well. I decided to go and take a look at his Savant page and I was even more perplexed. His fastball velocity was in the 5th percentile and his spin rate on it was in the 14th percentile, which isn’t the odd part about this, we probably could’ve guessed that. But he had a 34th percentile exit velo against, an 8th percentile hard hit rate and a 97th percentile barrel percentage. Ryan was hit very hard in 2019 and those stats really show that, however, he was able to stay away from giving up barrels at a high rate. He was able to do that as he had an extremely high ground ball rate of 57.3%, this could draw back to the theory that if a pitcher has a very low spin rate, he will generate more ground balls as batters wouldn’t be used to the drop of the pitch. I believe that could be part of why he performed well, but he probably also was able to put the ball where he wanted it a good chunk of the time.


Next, let’s look at the curveball predicted stats. Here are the formulas for them:


Curveball pxwOBA = 0.57058 + ((Velocity+) * -0.00230) + ((True Spin+) * -0.00062)

Curveball pWhiff% = -25.96548 + ((Velocity+) * 0.54547) + ((True Spin+) * 0.02011)


I did the same thing as I did with the fastball to get my constants for the curveball, and again, it appears that velocity is more important than spin to estimate how a player performed.


Here are the top and bottom 20 finishers in pxwOBA for pitchers with at least 1,000 pitches thrown:


*pWhiff% average is 30.6 and pxwOBA average is .278

**Whiff% average is 30.6 and xwOBA average is .278


Just like the fastball rankings, we see that the top is filled with guys that we know of having great curveballs, Seth Lugo and Charlie Morton are two of them. Most of the players up top pitched like they were predicted to with their curveball. The one that jumps out to me that didn’t perform as well as his counterparts is Dylan Cease who had a pxwOBA of .247 while he actually had an xwOBA of .322. He was also below what he was predicted for whiff% with a 29.7 actual whiff% versus a 32.5 predicted one.


Alex Young was extremely effective with his curveball in 2019 with a .153 xwOBA against it while he was predicted to have a .305. He was predicted to get a lot of swings and misses though, and that’s because on the pWhiff% formula, velocity is worth about 27 times more than spin rate while on the pxwOBA formula it is only worth about 3 times more the spin rate. Ryan Yarborugh beat both of his predictions out as my metrics thought he was going to be below average while he finished with a 37.3 whiff% and a .182 xwOBA. Just like the others who are ranked low, his curveball is slow and doesn’t have a ton of spin. The guys on the bottom had extremely low true spin rates, this made me start to think that they threw more of a slider, but since Baseball Savant had classified them as curveballs, I decided to leave it alone.


The last thing I wanted to do with this was to see how correlated the predicted metrics were with the actual ones. So, I found the R^2 values of each of the metrics. Here are the results:



As shown, both fastball correlations are better than the curveball ones, with pWhiff% being more reliable for fastballs and pxwOBA being more reliable for curveballs. The correlations are not very high for any of them though, but that is to be expected since we are leaving out many other parts of the story, such as location, sequence, count, and even more. I think that it makes sense that pWhiff% has more correlation for fastballs since they can be put in play a lot easier and it would be much easier to predict since velocity is the main cause for swinging and missing. I feel the same for pxwOBA on curveballs, to get a hitter to swing and miss at a curveball, it has to be sequenced and commanded well, although the R^2 values are very close, for those reasons I think it makes sense that it is easier to predict xwOBA than Whiff% by just using velocity and spin rate.


This isn’t perfect by any means, that can be seen with the correlations, but I think that this can give a good general idea of how good a pitcher’s pitch is just based on the raw stuff of it. There are many other factors that apply to how good a pitch is in game, however, if a pitcher can be above average in both velocity and true spin rate, they will have a much easier path to success.




ALL STATS FROM BASEBALL SAVANT HELP FROM ETHAN MOORE ON GETTING FORMULA CONSTANTS

COVERPHOTO FROM DA WINDY CITY

173 views0 comments

Recent Posts

See All

Ohio State Sports Analytics Conference

I attended the Ohio State Sports Analytics Conference on April 1, 2023. I presented about Predicting Run Value to Evaluate MLB Team and...

Stuff+ RShiny Dashboard

I built a Stuff+ RShiny dashboard intended to help with pitch design based on 2022 MLB pitch data. It is based on a Stuff+ model that I...

Comentarios


bottom of page