The Year in Review I - What Pythagoras Says

Sunday, October 09 2011 @ 07:25 AM EDT

Contributed by: Magpie

It has long been my custom to examine some odds and ends at the conclusion of the regular season, once the final game is in the books and I've had a chance to update my Big Honking Database.
Today we'll check in with Pythagoras, ancient Sage of Samos.

As everyone knows, the number of games a team wins and loses can generally be predicted by the relationship between the runs they score and give up. Elementary, my dear Watson. We are interested, as always, in the teams that don't match up to their Pythagorean expectation. There are two, and only two, reasons this can happen: a) an unusual level of performance in close games in general, one-run games in particular or b) an unusual record in blowouts. The right combination of unusual performance in both of these areas can produce some very, very strange results indeed.

What makes this especially interesting is that these two explanations say completely different things about the ball club affected. If a team has an exceptionally good or bad record in one-run games, they've just had a run of strange luck, one way or the other. That is all. The results of one-run games really are as random as a roll of the dice. To repeat, for the umpteenth time, my Mantra of the One-Run Game:

In a close game, the impact of random chance is sufficient to overwhelm the impact of overall quality.


My favourite example (because it's recent!) of this particular phenomena comes from 2003. You remember that unspeakably awful Detroit Tigers team, who went 43-119? The 2003 Tigers had a better record in one-run games (19-18) than the 2003 Atlanta Braves, who finished first and won 101 games that year. Those Braves went just 17-25 in one-run games. That sort of thing happens all the time, and represents one of the fundamental truths of the game - that one-run games are one of the mechanisms that enforce the Law of Competitive Balance, dragging everyone towards .500, pulling the bad teams up and dragging the good teams down.

Blowouts, on the other hand, actually tell us something about a team's quality. As a general rule, you just don't lose by six runs because you caught a bad break. Nor do you win by six runs because you got a little bit lucky that day.  It takes genuine ability to make a habit of beating the other team senseless.

Something else I like to keep an eye on. Two years ago, I introduced a brand new phrase to the Baseball Lexicon: the Pythagorean Swing. It caught on like wildfire - why, scarcely a baseball conversation takes place without its mention. I'm sure there's no need to explain it to this crowd. But let us take no chances, and save one or two of you the trouble of looking up the original explanation. Suppose a team (and yes, this is a real team) underperformed it's Pythagorean expectation in 2008 by 3 games. Let us then suppose that this very same team then went out and over-performed its Pythagorean expectation in 2009 by 9 games. That's a 12 game swing to the good, right there without the team doing anything different on the field. 

And in fact, this particular team really didn't do anything different on the field. Not even a little bit. They scored 1 - yes, one - more run than they had scored in 2008. They allowed 5 more than they had allowed in 2008. Yet their record went from 63-99 to 75-87. Ladies and gentlemen, I give you the 2009 San Diego Padres. It happens to someone, almost every year. This year's example comes from the great state of Texas...

So let's pull up this year's Pythagorean standings.

             PYTHAGORAS SAYS                            REAL WORLD              Difference From
                                                    Pythagorean Expectation
Team    Ex W  Ex L  GBL  Pyth Ex    RS    RA    |    W    L    GBL    PCT   |  2011 2010   SWING

NYY    103    59     -  .635    867    657    |    97    65   -    .599   |   -6 -3   -3
BOS    95    67    8   .585    875    737    |    90    72   4    .556   |   -5 0 -5
TB    92    70    11   .570    707    614    |    91    71   6    .562   |   -1 -2 1
TOR    79    83    24   .488    743    761    |    81    81  16    .500   |    2 1 -1
BAL    65    97    38   .404    708    860    |    69    93  28    .426   |    4 5 -1

DET    89    73     -  .551    787    711    |    95    67   -    .586   |    6 -1 7
KC    78    84    11   .479    730    762    |    71    91  24    .438   |   -7 4  -11
CHI    75    87    14   .462    654    706    |    79    83  16    .488   |    4 2 -2
CLE    75    87    14   .462    704    760    |    80    82   15    .494   |    5 0 5
MIN    60   102    29   .372    619    804    |    63    99   32    .389   |    3 1 2

TEX    100    62     -  .615    855    677    |    96    66    -    .593   |   -4 -2 -2
LAA    85    77    15   .526    667    633    |    86    76   10    .531   |    1 1 0
OAK    77    85    23   .474    645    679    |    74    88   22    .457   |   -3 -5   2
SEA    65    97    35   .404    556    675    |    67    95   29    .414   |    4 2 -2

PHI    104    58     -  .645    713    529    |   102    60    -    .630   |   -2 1 -3
ATL    86    76    18   .529    641    605    |    89    73   13    .549   |    3 -3 6
WSN    78    83    26   .485    624    643    |    80    81   21.5  .497   |    2 -2 4
NYM    78    84    26   .484    718    742    |    77    85   25    .475   |   -1 -2   1
FLA    72    90    32   .442    625    702    |    72    90   30    .444   |    0 -1 1
                                                   
MIL    91    71     - .561    721    638    |    96    66    -    .593   |    5 2 3
STL    89    73    2   .548    762    692    |    90    72    6    .556   |    1 -6 7
CIN    83    79    8   .510    735    720    |    79    83   17    .488   |   -4 -1   -3
CHC    69    93    22   .428    654    756    |    71    91   25    .438   |    2 3 -1
PIT    69    93    22   .423    610    712    |    72    90   24    .444   |    3 6 -3
HOU    61   101    30  .374    615    796    |    56   106  40    .346   |   -5 9 -14

ARI    89    73        .549    731    662    |    94    68    -    .580   |    5 -3 8
LAD    85    76    10   .525    644    612    |    82    79   11.5  .509   |   -3 2   -5
SF    80    82    14   .493    570    578    |    86    76    8    .531   |    6 -3 9
SD    79    83    17   .485    593    611    |    71    91   23    .438   |   -8 -2   -6
COL    79    89    17   .485    735    774    |    73    89   21    .451   |   -6 -4  -2

Pythagoras doesn't know anything about dysfunctional clubhouses; in his view, the Boston Red Sox were a better team than Tampa Bay, and should have finished ahead of them. On his site, Bill James has tentatively identified what happened to Boston and Atlanta as the two worst September collapses in the history of baseball:

The Red Sox blew a 10-game lead to a team that wasn't even playing well with a sustained month of horrific baseball....Believe me, I'd do anything I could to avoid being associated with a historic disaster, but. . .facts are facts


However, while the Red Sox were one of 2011's more notable 2011 Under-achievers, the Braves actually exceeded their Pythagorean expectation by three games (something they almost never did during the late stages of the Bobby Cox era. They did do some other things better!) Anyway, 21 of the 30 MLB teams posted records that were reasonably close to what their runs scored and allowed would have us to expect. It's the exceptions, as always, that interest us:

Overachievers


Cleveland - The Indians stayed in the AL Central hunt for much of the season. They faded to an 80-82 finish, but that still represented an 11 game improvement on their 2010 campaign and I think their fans were generally encouraged by the season. Some caution is definitely called for. They were outscored by 56 runs - they did indeed improve themselves, just not quite as much as it appears. They played much more like a 75 or 76 win team. They went 14-25 in blowouts, which is not a good sign. But they played a ton of close games, and did pretty well in them (30-26)

Detroit - Like Cleveland, the Tigers made a considerable improvement on their 2010 campaign - Detroit won 14 more games, ending up with a 95-67 record. Roughly half of that improvement was genuine, and the other half was Pythagoras smiling upon them. It was totally unnecessary, as they would have coasted to a first place finish without his help. The Tigers were a good team (they went 24-18 in blowouts) - but even more, they were a lucky team. They played .630 ball (29-17) in one-run games, which was by far the best mark in the AL.

Milwaukee
-  Another version of the same story. The Brewers made an enormous improvement, winning 19 more games than they had the previous season. Most of this was the team's own doing, but Pythagoras helped them all along, and gave them an extra six wins or so. Like the Tigers, the Brewers were a good team that got extremely lucky in the close games - the Brewers were one of the best teams in the NL in one-run games, playing .625 ball (30-18).

Arizona
- The Diamondbacks improved even more than the Brewers, improving their record by a whopping 29 games. The D'Backs had the Pythagorean Swing working for them. The 2010 team wasn't quite as bad as they looked (they should have won some 68 games instead of 65) and this year's team wasn't quite as good (they should have won about 89 games instead of 94.) They were legitimately about 20 games better than they were the year before - which is a lot - and they caught some breaks as well. Most notably, they went 28-16 in one run games. That's a .636 winning percentage, best in the majors.

San Francisco - The defending champs hung around in the NL West for most of the season, and finished with a respectable enough 86-76 record. It flatters them. They're just not a particularly good team. They were outscored by their opponents and should have finished about 80-82. They went 15-20 in blowouts, which is not a good sign. However, they played a great many close games and they did very well, playing .600 ball (33-22), leading the majors in one-run victories. That type of performance, being based entirely on dumb luck, is not sustainable. So it will be interesting to see if they realize their true situation. Very often, teams in this situation convince themselves that they're quite close to being in contention and fail to address their problems. Which then rise up and bite them, often savagely.

Underachievers

Kansas City - The Royals made a very modest improvement to their W-L record, going from 67 wins to 71. In fact, the team was much, much better than the previous season's. The 2010 team was fortunate to win 67 games - they weren't quite that good, but Pythagoras gave them a break of about four games. Whereas this year's team was extremely unfortunate in winning only 71 - Pythagoras turned his back, and gave them an extra seven losses. That's an 11 game Pythagorean Swing (plus 4 to minus 7,) the biggest in the AL. While they lost more than 90 games yet again they should have finished much closer to .500. The Royals broke even in blowouts (19-19) but did not play well in close games (25-32 in one-run games.) Only one team in the majors (Cincinnati) lost more games by a single run. What all this means is that Kansas City is in the exact opposite situation as the Giants. But as the Royals just lost 91 games, they're going to believe (unlike the Giants) that they have a lot of work to do. Which they do, but they're still much, much closer to Detroit than this year's W-L record would have you believe. The gap between the two teams is closer to 10 games than it is 25.

New York - The Yankees fielded a very, very good team this season, a better team (in my opinion) that was generally realized. They did win 97 games, which is pretty good, but they could have easily won well more than 100 games. Certainly, Pythagoras thinks they should have. Their pitching, maligned by almost everyone, was extremely effective; their defense, despite the ancient relics on the left side of the infield, was outstanding. And they can hit a little. The Yankees made a habit of beating the other fellows senseless, which is always the mark of a quality team - they went a staggering 34-13 in blowouts. But they weren't particularly lucky in the close ones, going just 21-24 in one-run games. Well, they had enough margin to play with.

Boston - For more than four months, the Red Sox looked like a historically great team. As good as the Yankees were, they were looking up at the Red Sox when September rolled around - and the Red Sox had been generous enough to give the rest of the league a head start by going 2-10 to start the season. In the end, after their collapse, they look like a somewhat lesser version of the Yankees. A powerful team, that beat the crap out of their opponents on a regular basis (32-17, almost as good as the Yankees) but only managed to break even in the close games (19-19 in one-run games.)

Colorado
The Rockies main problem was the  close games - they went 21-27 in one-run games. They weren't very good in blowouts  (18-24) overall, but they did show a flair for piling up truly ridiculous margins of victory. They were 5-3 in games decided by eight runs or more, and won both their games ten or more runs. It's only two games, yes, but they outscored their opponents by a total of 27 runs in just two games. Kind of skewed everything a little bit.

Houston - The Astros had a disastrous campaign, losing 20 times more often than they had the previous season. Like the Royals, they were caught in an evil Pythagorean Swing. The 2010 team was, in reality, almost as bad as this year's model - but they got lucky in some close games and stumbled to a 76-86 record that was considerably better than they deserved. Pythagoras says they caught a 9 game break to the good in 2010, and should have won some 67 games. This year, they should have won about 61, but Pythagoras gave them a 5 game swing to the bad - That's a total Pythagorean swing of 14 games in the wrong direction, the biggest such swing in the majors, and that accounts for the largest part of this year's disaster. When you're as bad as these guys, you'll clutch at any straw you can see. So let's note that the 2011 Astros actually improved slightly in blowouts, going from a grisly 12-31 in 2010 to a merely awful 15-28 this season.

San Diego - The Padres problem this season was very simple to spot - they were extremely unlucky. They went 20-31 in one-run games, the worst mark in all of the majors. That's most of what happened to them. They weren't as good as they had been the year before - their best player was in Boston - but in terms of quality they were still pretty close to being a .500 team (and indeed, they went 17-17 in blowouts.) They were only outscored by 18 runs in 2011, and they shouldn't have lost 21 times more often than they won.

Finally, a note on The Method, as Descartes (or was it Sartre?) might say. The Pythagorean formula I use to derive a team's Expected W/L record is the one devised by Bill James back at the Dawn of Time. It  uses the square of a team's runs scored and allowed to produce the expected winning percentage. It has been been suggested that using an exponent, rather than the square is more accurate (or more preferable - I'm not sure you can say more "accurate,"  as we're merely creating a formula for an imagined result. The figures 1.83 and 1.83 have been proposed as suitable exponents.  I'm not sure why they might be preferable, but I can definitely tell you this - the Pythagorean expectations produced by using the original formula (squared) produces results that match much more closely the actual W/L records of baseball teams.

Which certainly makes it preferable to me. My interest is in finding those teams whose performance does not match their Pythagorean expectation. I'd rather not use a formula that manufactures such teams with abandon. Which, it seems to me, is exactly what the formula using the exponent does.

Allow me to demonstrate, with a Data Table and a Pretty Picture, how these two Pythagorean methods cope with the 2490 seasons in the Big Honking Database. How often does each method match what happened?.

Pythag W             1.83    Squared
exceed          formula formula
Actual W by

19        3    0
18        2    0
17        2    0
16        10    0
15        9    1
14        15     0
13        24     1
12        38     6
11        33     6
10        68     14
9        80     19
8         93     28
7         99     41
6         89    91
5         123     124
4        139     169
3         141     174
2        127     206
1         132     238
0        150     249
-1        139     212
-2         113     208
-3        110     215
-4        105     151
-5         117    121
-6         93     77
-7         84     53
-8        69     32
-9        69     27
-10        40     12
-11        44    7
-12        35     3
-13         19     2
-14        26     3
-15        14    0
-16        14    0
-17         8    0
-18        5    0
-19         2    0
-20         2    0
-21         2    0
-22         0    0
-23         1    0
-24         1    0
-25         0    0
-26        1    0



PhotobucketPretty Picture

I think you would prefer to use the formula that is more likely to mirror actual W/L records. You want as many seasons as possible in the centre of that Pretty Picture.

Or you could look at it this way...

Difference        Exponent    Squared
Pythag & Actual   Formula     Formula

 0 wins 150 249
 1 271 450
 2 240 414
 3 251 389
 4 244 320
 5 240 245
 6 182 168
 7 183 94
 8 162 60
 9 149 46
10 108 26
11 77 13
12 73 9
13 43 3
14 41 3
15+ 76 1

To take a specific example, let's consider the San Diego Padres, one more time. In 2010, they scored 665 runs and allowed 581; according to the Squared formula, they should have gone 92-70; according to the Exponent formula, they should have gone 86-76. (They actually went 90-72.) They outscored their opponents by 84 runs, and intuitively one thinks that should result in about a 10 game swing - yes, something like 86-76. Of course (on the other hand!)  they're the San Diego Padres - they play half their games in one of the greatest pitcher's parks ever built. Where it doesn't take as many runs to change the outcome of a game.

At any rate, I think it's clear that using a formula based on the exponent produces much more dramatic results - and more of them. And I may distrust it for that reason alone. Which may not be a particularly good reason, but it suits my purposes. Oh, it would have been neat and catchy and easy to be able to say "they didn't deserve to win 90 games a year ago and they didn't deserve to lose 90 games this year" - but sometimes I don't think it should be that easy.

Finally, as it's Johnny's birthday (my first role model!), I'd like to close with some Inspirational Verse:

I'm a moldy moldy man
I'm moldy thru and thru
I'm a moldy moldy man
You would not think it true
I'm moldy til my eyeballs
I'm moldy til my toe
I will not dance I shyballs
I'm such a humble Joe.

98 comments



https://www.battersbox.ca/article.php?story=20111005223108335