Monday, June 29, 2009

Is Strike 1 the Most Important Pitch?

Runners on second and third, nobody out, #3 hitter in the lineup is up, 0-0 score, 1st inning. How do you attack the batter and what is your aim for the at-bat?

The aim, first of all, is to get out of the inning without too much damage being done. In a situation like this, if you can hold your opponent to 1 run, you’ve done a very good job. In terms of the batter, the ideal situation is a strike-out, and if you can’t get that, a pop-up in the infield. Depending on the hitter and scouting report, I’d say the plan of attack should be the oldest that exists in baseball: hard stuff in, soft stuff away.


I’m willing to bet at some point, everyone who has listened to a baseball game on the radio or watched a game on TV has heard the announcers talk about the importance of throwing strike 1 on the first pitch of the at-bat. Before we delve too deeply into this, I should remind you that statistics can say nearly anything that a person wants them to say. Having gotten the disclaimer out of the way, let us resume our discussion.

First Pitch Strikes; the Argument in Favor

Throwing a strike on the first pitch has the potential to severely affect the entire confrontation. If a batter is trying to be patient and get a good pitch to hit, a first pitch strike signals to him that the pitcher is going to be around the plate and that his command might be good, thus not allowing him the level of patience he may have wanted. It also gives a boost of confidence to the pitcher at a position that is sometimes more about confidence than about ability.
In 2008 in Major League Baseball, when a pitcher started with strike 1, batters went on to hit .234 with an OPS of .635. When a pitcher started with ball 1, those numbers jumped to .277 and .843 respectively. There is definitely a large gap between the two. The numbers are skewed across the board in favor of the notion that starting with strike 1 is better than starting with ball 1. Batters walked 10% less and struck out 12% more after strike 1 than ball 1.
The question is, does a 0-1 count really give the pitcher that much more of an advantage than a 1-0 count?

The answer to that question is a very difficult one and cannot completely be measured by statistics and metrics. One of the reasons that these numbers are quite as skewed as they are is because of the era we live in. Even as little as 25 years ago, the approach to hitting was very different than it is today. In ages past, when a batter got two strikes, he shortened up and attempted to make contact at all costs. This should lead to lower power numbers but higher batting averages than if the approach isn’t change. Nowadays, batters don’t care what the count is; many of them are swinging just as hard with a 0-2 count as they are with a 2-0 count. Theoretically, this would lead to better power numbers but a ton more strike-outs.
So, is this the best approach to dissecting middle-of-at-bat results?

First Pitch Strikes; the Argument Against

The problem with making a goal of throwing first pitch strikes is this; what happens when you throw strike 1? You’ve met your goal for the at-bat. Now there is a let-down and you let a pitch get away from you and the next thing you know, you’re watching the ball sail into the outfield bleachers. There is definitely a difference between 1-0 and 0-1 but statistically speaking, it’s just not that big of an advantage. When the count was 0-1 last year, batters hit .315 with an OPS of .799, both well above the league average. If the pitcher just got ahead with the first pitch, why do batters all of a sudden hit above average? One reason is the aforementioned let-down following the attainment of a goal. Another possible reason is pitchers get a little greedy after getting ahead 0-1 and they want to get ahead 0-2 at the expense of a more effective pitch that may be called a ball.

Another problem with judging at-bats by their first pitch are the number of at-bats that actually only go one or two pitches. Last year in Major League Baseball, 28.1% of at-bats lasted 1 or 2 pitches. The argument can easily be made that once an at-bat goes three or four pitches deep, neither the batter nor the pitcher is still hung up on that first one.

In short, judging a pitcher’s performance by how he does on the first pitch is a decent barometer but is also far from perfect. The problem is everyone knows that pitchers are trying to get ahead in the count and batters are sitting on good pitches early in the count and if you don’t believe it, just consider this. Last year, there were just two players in the league who hit better than .337 (Chipper Jones hit .364 and Albert Pujols hit .357). However, the league batting average on the first pitch was .337. Basically, the point is that a questionable pitch on the corner that might be called a ball or a strike can be better than a meatball for the sake of not getting behind 1-0.

A New Approach

Tracking how often a pitcher starts a batter with strike one isn’t a bad gauge, but there are better ways to do it. At this point, I have to acknowledge one of the great pitching influences in my life who introduced me to this approach, Todd Naskedov, a pitching coach and then head coach of high school baseball in the state of Washington.

Basically, it groups plate appearances into three categories; those in which the pitcher was ahead, those in which he was behind, and those in which the batter swung the bat early in the count. For the ahead and behind categories, the magic cut-off point was two out of three. If the count started 2-0 or 2-1, those go in the “behind” column. 0-2 and 1-2 go in the “ahead” column and 0-0, 0-1, 1-0, and 1-1 go in the “early” column.

So what makes this one so much better than just looking at the first pitch? It’s a better tool for judging where a pitcher is struggling and where they are doing well. For instance, if the league batting average in the “early” column is .330, as it was in 2008, and a pitcher is allowing a batting average of .400 in those counts, then there is a problem. He’s trying to get ahead in the count at the expense of good pitches and he’s getting hit hard for them. To give another example, let’s say that the league average in the “ahead” column is .186 but a pitcher is allowing a .280 batting average in those at-bats. This points to the fact that the pitcher is getting ahead in the count just fine but he isn’t finishing off batters the way he needs to.

In a nutshell, you shouldn’t concern yourself with just the first pitch as a pitcher. When the average plate appearance lasts 3.8 pitches, you can’t focus on the first at the expense of the next three.


This time around, the situation will be that of a middle reliever. There is one out in the top of the 7th, runners at 1st and 3rd, and you are protecting a 6-4 lead with the leadoff hitter batting. Again, how would you approach the at-bat as that reliever?

Tuesday, June 23, 2009

Statistics in Baseball

Statistics matter in every sport. On the rugby pitch, you need to track drop kicks, penalty kicks, tries, and conversions. However, all of those statistics simply come together to tell one the score and past that, there isn’t a lot of use of statistics in rugby. In its distant American cousin, football, statistics are much more widely used. Everything from first downs to time of possession to yards gained from scrimmage is tracked and pored over by analysts and the reason is simple. People no longer want to know who won; they want to know how they won. Yes, the Redskins played well on offense, piling up 467 yards of total offense but they committed seven turnovers, etc.

The other part of the statistical surge of the last decade is the desire to know who the best is. In football, the answer to this question is usually contained in raw statistics and the most basic of metrics. By the way, statistics are the raw data, such as attempts and completions, while metrics are calculated from those statistics, such as completion percentage. The most comprehensive metric in all of football is passer rating and while it is a reasonably good measure of a quarterback’s performance, there aren’t any other metrics to gauge the performance of just about any other position. And that brings us to the most statistically driven sport, the one that brilliant mathematicians have slaved countless hours to come up with numbers that answer that question about the best; baseball.

One of the reasons for baseball’s obsession with statistics and metrics is the face that the vast majority of confrontations are one on one. Another is the difference that you find when you compare one year to another. The game is played the same way but there are differences due to social changes or rule changes. For instance, Major League Baseball raised the height of the mound to 15 inches previous to the 1968 season and the result was the best pitching numbers the game has ever seen. The next year, the height of the mound was reduced to 10 inches tall, where it remains today. So how can you compare Bob Gibson’s simply ridiculous 1968 season (22-9, 1.12 ERA, 28 complete games and 13 shutouts in 34 starts) to Pedro Martinez’s unbelievable 2000 season (18-6, 1.74 ERA, 7 complete games and 4 shutouts in 29 starts)?
To answer that question, there are three basic methods with which to approach the problem. Each is valid, but some are hazier than others and leave out important details and the difference between the three is the underlying difference between people who look at raw data and others who look at the data and ask “what does it mean?”

The first approach is looking at the raw numbers. For the purposes of this diatribe, we’ll refer to this as the “NFL” approach.

GM W L CG SHO IP H R ER BB K HR
Gibson 34 22 9 28 13 304.2 198 49 38 62 268 11
Martinez 29 18 6 7 4 217 128 44 42 32 284 17


(GM = games pitched, W = wins, L = losses, CG = complete games, SHO = shutouts, IP = innings pitched, H = Hits allowed, R = runs allowed, ER = earned runs allowed, BB = bases on balls, K = strikeouts, HR = home runs allowed)

When we compare the raw statistics, it’s hard to truly distinguish between them. Bob Gibson pitched more games, threw more innings, and tripled the number of complete games and shutouts. Pedro Martinez walked fewer, struck out more and allowed fewer hits. This approach is good for the sake of direct comparison but it leaves much to be desired. This brings us to the next level which is the basic metrics.

ERA WHIP K/9IP BB/9IP OppBA OppOPS
Gibson 1.12 0.853 7.9 1.8 .184 .469
Martinez 1.74 0.737 11.8 1.3 .167 .473

(ERA = earned run average, WHIP = walks + hits per inning, K/9IP = strikeouts per 9 innings pitched, BB/9IP = walks per 9 innings pitched, OppBA = opponent’s batting average, OppOPS = opponent’s on-base + slugging)

When we look at these numbers, we can see that Bob Gibson had a significantly lower ERA but Pedro Martinez was better in every other category with the exception of OppOPS, which was close enough to call a tie. What these numbers say, combined with the raw statistics above, is that Bob Gibson’s season was incredible and one for the ages. They also say that Pedro Martinez’s 2000 season was ever so slightly better. There is one glaring problem with this comparison which brings us to the third level of statistical analysis. In 1968, the league average ERA was 2.98 and teams scored 3.42 runs per game. In 2000, the average ERA was 4.77 and teams scored 5.14 runs per game. Bob Gibson’s season came in the midst of the single greatest pitching season in the live-ball era (which baseball historians usually say started when Babe Ruth switched to the outfield full time in 1920). Pedro Martinez’s season came in the midst of the greatest hitter’s era the game has ever seen, two years after Mark McGwire hit 70 home runs in a season and a year before Barry Bonds hit 73.

Statisticians have tried over the past few years to come up with universal statistics that can be viewed in a vacuum. They have tried to take everything into account including the league averages and park effects. Most of these efforts are focused on creating a single comprehensive metric with which two players can be directly compared and a determination can be made about which was better. The most basic of these metrics is adjusted ERA+, which takes the earned run average and compares it to the league average as well as the ballpark they pitched in to find out who excelled the most over his competition. When we compare the two seasons listed above, Bob Gibson’s ERA+ comes out to 258 (where the league average is 100), which is the 7th best mark all-time and the 4th best of the live ball era. Pedro Martinez in 2000 had an ERA+ of 291, the 2nd best mark all time and the best mark since Tim Keefe in 1880.

So what does this all mean?

In a nutshell, statistics can tell you practically anything you want them to. When you look at a list of the best single-season ERA’s in baseball history, only two from the live ball era appear on the list; Bob Gibson’s 1.12 (ranked 4th) and Greg Maddux’s 1.56 ERA in 1994 (ranked 49th). He completed 82.3% of his starts in better than one out of every three starts, he threw a shutout. He was the most dominant pitcher in a year where pitchers dominated across the league. On the flip side of that coin is Pedro Martinez who is supported by most of the more advanced metrics. His complete games and shutouts don’t compare to Gibson’s but he pitches in an era where pitchers just don’t stay in the whole game. His ERA was significantly higher than Gibson’s but it was still one third of the league average and he allowed half as many baserunners per inning as the average pitcher in the league.

Allow this to serve as a cautionary tale. Adjusted ERA+ is just the tip of the iceberg and sabermatricians everywhere will pull their hair out because there’s no mention of VORP, win shares, win probability, and wins above replacement. There are even metrics that can give a rough estimate of how much a player is worth given his production. There is absolutely nothing wrong with any of these advanced metrics and they have gotten to the point that it doesn’t take a degree in applied mathematics to understand them. However, to inject a personal opinion, I don’t believe that a single number or metric is a good indication of a player’s value. These metrics are good for discussions of who is better but when it comes to making a decision about who you would rather have on your team, a single metric, no matter how comprehensive, is woefully inadequate.

To sum up, there is no right way and there is no wrong way to look at a discussion of who is the best amongst a group of players. Just realize that any good statistician can tweak the numbers to say whatever it is that he wants them to say.




This happens to by my first post on this blog and I want it to be mostly (if not entirely) about baseball. There will definitely be some random posts about whatever happens to be running through my mind, but seeing as how that’s baseball most of the time, you can bet much of the content will be baseball-oriented. I greatly appreciate comments and questions and will do my best to respond to each and every (appropriate) one of them. I’d like to sign off with a question to anyone who happens to read this. One of the things I love about baseball is there is no set rule of what to do when. The answer to every “what would you do in this situation?” is “it depends”. Simply put, I’m going to list a scenario and I’d like to hear reader’s thoughts about what either team should do. Don’t read too much into it and don’t ask who’s batting and who’s pitching, it doesn’t matter, it’s a hypothetical. I’m simply interested to hear what other people have to say. So, if you read this and you feel like leaving a comment about the scenario, please say what you would do and also your reasoning. So, for today…

Runners on second and third, nobody out, #3 hitter in the lineup is up, 0-0 score, 1st inning. How do you attack the batter and what is your aim for the at-bat?