Batter's Box Interactive Magazine - Craig Burley was right... Yankees have best rotation in A.L. (or do they?)

Craig Burley was right... Yankees have best rotation in A.L. (or do they?)

The basis of Craig's hypothesis is the fact that the Yankees are comfortably ahead of the pack in terms of DIPS ERA (DIPS is an acronym coined by Voros McCracken for Defence Independent Pitching Stats). DIPS ERA assumes that pitchers control only HBP, W, K and HR and then fills in the rest of the stats based on league average performance, so that the effects of "fielding" are filtered out.

There's one little problem. Even fielding-independent stats need to be park adjusted. How do we adjust these for home and road park of a given team/pitcher? Taking a general park factor (i.e. the impact of the park on runs scored in general) isn't going to work because we only want to adjust the fielding-independent events, as the fielding-dependent events are adjusted by DIPS to league average.

What we need is a set of event park factors, that is, an estimate of how a park affects the frequency of each event (2B/3B, HR, W, K, singles etc). I calculated a series of composite park factors (they take home and road parks into account) for the 2003 American League to date for a number of different events. One important assumption I will make is that batters faced remains constant. Here are the results.

Team	....... W rate  K rate	    HR	  XBH	Singles
Chi Sox  ...... 1.0072	1.0053	1.1994	0.9958	0.9720
Toronto ......  0.9862	1.0207	1.0492	1.0980	1.0112
Texas	......  0.9406	0.9919	1.0342	1.0730	1.0321
Baltimore ..... 1.0581	0.9953	1.0332	0.9213	0.9686
Seattle ....... 1.0157	0.9868	1.0105	0.8546	1.0039
Anaheim ....... 1.0068	1.0042	1.0029	1.0752	0.9843
Minnesota ..... 0.9644	1.0436	0.9932  1.0017	1.0271
Kansas City ... 1.0619	0.9038	0.9810	0.9908	1.0011
New York ...... 0.9880	1.0003	0.9566	0.9537	0.9827
Tampa Bay ..... 1.0670	1.0156	0.9532	1.0292	1.0255
Oakland ....... 0.9492	0.9456	0.9514	0.9988	0.9715
Cleveland ..... 1.0744	1.0329	0.9174	0.9880	0.9827
Boston ........ 1.0013	0.9387	0.9170	1.0301	1.0284
Detroit ....... 0.9472	1.0160	0.8983	0.9049	1.0003

Notes: a figure greater than 1 indicates that the environment increases the frequency of the event; HBPs remain unadjusted; XBH is doubles and triples combined.

What we now must do is adjust each of the revelant events using the appropriate park factor, being careful to take secondary effects into account (e.g. a park that decreases the frequency of homeruns will increase the frequency of balls in the field of play). It may also be useful to compare a DIPS RA with both the original Run Average and a park-adjusted Run Average using the general park factor.

American League Starting Pitchers by Team, sorted by DIPS RA

Team .......IP/start  RA   adjRA  DIPS-IP/start DIPS RA
New York ..... 6.61  4.42   4.73      6.75       4.14
Boston ....... 5.87  4.77   4.44      5.95       4.34
Chi Sox ...... 6.29  4.40   4.28      6.21       4.54
Oakland ...... 6.47  4.05   4.41      6.36       4.64
Baltimore .... 5.94  5.08   5.53      6.09       4.77
Toronto ...... 5.91  5.47   5.00      6.04       4.80
Seattle ...... 6.43  4.03   4.33      6.24       5.01
Minnesota .... 5.97  5.35   5.19      6.00       5.14
Kansas City .. 5.74  4.80   4.25      5.72       5.16
Cleveland .... 5.76  5.22   5.52      5.68       5.41
Detroit ...... 5.69  5.41   5.69      5.63       5.56
Anaheim ...... 5.77  5.14   5.32      5.77       5.58
Texas ........ 4.97  7.04   6.58      5.05       5.97
Tampa Bay .... 5.32  6.25   6.18      5.30       6.06

Key: IP/start = number of innings per start by the starting pitchers; RA = actual runs divided by actual innings multiplied by 9; adj RA = park-adjusted runs per 9 innings, using a general park factor; DIPS IP/start = innings per start adjusted for park and assuming a league average defence; DIPS RA = runs per 9 innings adjusted for park and assuming league average for all fielder-dependent events.

The Yankees have put up the best DIPS RA and work deeper into the game than anyone else. However, they only rank 6th in adjusted Run Average. I'll let the readers draw their own conclusions about this.

You may ask why the defence-independent IP/start are different than actual IP/start. There are two reasons: 1) park-adjusting walks, strikeouts and homeruns will cause the number of baserunners to change - thus increasing or decreasing the number of outs if batters faced remains constant; 2) substituting a league average defense may convert more or fewer balls in play into outs.

Park-adjusted Run Average is a good rating of the overall pitching/defensive performance of a team, but is defence-independent Run Average a similarly adequate assessment of the quality of pitching in isolation? DIPS RA purports to filter out the effects of defence, but it also filters out other skills the pitcher possesses which may impact run scoring. For example, a pitcher may be a good or bad fielder (think Greg Maddux and Kelvim Escobar on a ball hit up the middle), he may be good or bad at shutting off the running game or inducing easy double play balls, or he may be good or bad at working out of a jam. Finally, a pitcher may be good or bad at inducing easy outs on balls hit in play (chiefly pop ups and weak grounders). All of these things are filtered out in DIPS RA as well.

This leads me to believe that the truth lies somewhere in between DIPS RA and park-adjusted RA, perhaps a bit closer to the former.

Posted by robertdudek on Thursday, July 24 2003 @ 10:02 AM EDT.

Craig Burley was right... Yankees have best rotation in A.L. (or do they?) | 8 comments | Create New Account

The following comments are owned by whomever posted them. This site is not responsible for what they say.

_John Neary - Thursday, July 24 2003 @ 10:49 AM EDT (#13798) #

Very good piece, Robert. I think there are a lot of people on both sides of the DIPS debate who could stand to take your last sentence to heart.

Craig B - Thursday, July 24 2003 @ 11:02 AM EDT (#13799) #

This is the best article title *ever*.

We should start all our articles off like this.

_Jonny German - Thursday, July 24 2003 @ 12:36 PM EDT (#13800) #

When I saw walks and strikeouts in your Park Factors table, my initial reaction was "Huh? How can there be a park effect on these events?". Giving it some more thought, I came up with the following:

1) Temperature. I would expect extremes or severe fluctuations in temperature to negatively affect a pitcher's ability to find the strike zone.
2) Humidity. Not sure which way this would tip the scales, but I can see there being a significant effect in pitch movement going from Arizona to Toronto, for example.
3) Extreme elevation, i.e. Coor's Field.
4) Psychology. A batter thinking about lifting a fly ball over the Green Monster might be more inclined to take a big cut, increasing his chances of striking out. Similarly, the pitcher may be psyched out or psyched up by his perception of how a particular park can hurt or help him.
5) More psychology: The effect of particularly loud or quiet fans on the psyche of the players.

I remain skeptical that these are enough to have a long term significant effect on walk and strikeout rates. Each of them sounds very minor to me, and I would expect them to cancel each other out in a lot of cases. Am I missing something major? I see from your chart that there does appear to be a significant park effect on both, but would that remain true as the sample size increases?

Tangent: Why do some sources use W for walks and others use BB? Even more odd is that hitter's walks are consistently reported as BB, while it's pitchers who have an obvious double meaning for W.

Tangent of Tangent: To what degree will the integration of sabermetric thinking into the mainstream conciousness be impeded by the complete lack of a standardized reporting system? WS, VORP, and RCAA/RSAA are all more or less trying to quantify the same thing, a player's total contribution to his team: How many of you immediately recognized those acronyms as Win Shares, Value Over Replacement Player, and Runs Created Above Average / Runs Saved Above Average? How many of you can tell me that they originate from Bill James, Baseball Prospectus (I can't tell you who exactly at BP), and Lee Sinins? Who can tell me if a score of 13 is more likely to be Carlos Delgado or Chris Woodward on each of these scales? (The guy in the front of the class says he needs to know the number of games. Let's say one full season). Who can tell me if there are any major differences in the stats they are based on?

In this crowd, I would expect a lot of you could answer those questions quite well. But this is far from an average crowd of baseball fans, and I think Joe Fan is turned off at the huge number of new stats and new acronymns. I'm not Joe Fan, but I'm not a hardcore sabermetrician either, and my inclination when confronted with a new stat or unfamiliar acronym is often just to ignore it.

The beautifully simple counterpoint is that it doesn't matter at all, the whole point of sabermetrics is to not take traditional stats as being the whole story, it's all about looking at things in different ways and trying to quantify things, everything, precisely. So it's good if I'm regularly forced to learn new stats and think about what they mean.

I forget what this post was originally about.

robertdudek - Thursday, July 24 2003 @ 12:49 PM EDT (#13801) #

Of course I haven't regressed these park factors, but I can tell you that if you look at a 3 or 4 year period, you'll see that there are persistent differences between parks WRT walks and strikeouts.

Visibility is a factor. Parks vary WRT to sightlines and hitters backgrounds. This can make it more difficult to pick up a ball and should result in more swings and misses as well as some more called third strikes.

Foul territory is another factor (albeit significant in only a few parks). Oakland's large foul territory means that more PA end in foul outs than in an average park. Those PA would have otherwise ended as some other event (including strikeouts and walks).

robertdudek - Thursday, July 24 2003 @ 06:15 PM EDT (#13802) #

One more factor that used to be significant for the Cubs was the number of day games they had on their schedule. Generally speaking, it's probably easier for a batter to pick up a ball in the early and mid-afternoon (unless there is a lot of glare) than at night or twilight.

Perhaps some west coast teams play more games than average in twilight (I'm thinking of Sunday Night Baseball). I'm not sure if there are still teams that play significantly more day games than other teams (I think only the Rangers play very few day games).

These effects are probably small. However, when we add them in with everything else, small but persistent differences in walk and strikeout park factors become a little less implausible.

_John Neary - Friday, July 25 2003 @ 12:16 AM EDT (#13803) #

Robert,

I have another question regarding park factors. I understand that it's an awfully difficult one to respond to, and I'm not looking for a firm answer.

In your opinion, what period of time gives the best park factor? If you calculated a ten-year PF, you'd reduce the noise (from starting pitchers, injuries, and plain old random chance), but you'd also reduce the signal from things like moving the fences, changing the playing surface, and (?) climate change. Where do you think the best balance lies, and on what basis?

John

robertdudek - Friday, July 25 2003 @ 08:45 AM EDT (#13804) #

I think one-year park factors are fairly good, actually. MLB rules do not allow major changes to stadia within a season, so we can be very confident that the configurations of all the ballparks remain unchanged throughout the year (exception, when a new ballpark is introduced mid-season).

Once in a while some weird fluctuations arise, but they are far more consistent than a typical batter's or pitcher's one-year stats. Because the road parks always pull the park factor towards the centre, you rarely get an extreme outlier.

In general, I don't like to make judgments about a player without 3 years of data. I usually apply one-year PFs to one-year of data and then aggregate to 3 years. I don't like applying 3-year park factors to a single year of player data because timeframe differences allow the possibility of an inappropriate match between the park factor and the player's actual playing conditions.

_Rick McCarthy - Wednesday, July 14 2004 @ 11:01 PM EDT (#13805) #

HI-
I stumbled accross this page looking for a discussion of the Cleveland Indians and their pitching. Lee, Sabathia, and Westbrook are all doing well in terms of ERA, but their k/IP and k/BB are not spectacular. While they are in the top ten in AL ERA, these three pitchers all appear much further down the DIPS list (as does the team). Any suggestions as to why this is so? Is Cleveland's defense this good?

Craig Burley was right... Yankees have best rotation in A.L. (or do they?) | 8 comments | Create New Account

The following comments are owned by whomever posted them. This site is not responsible for what they say.