Batter's Box Interactive Magazine Batter's Box Interactive Magazine Batter's Box Interactive Magazine
A reader pointed out that the Win Shares page currently credits Albert Pujols with slightly more offensive Wins Shares than Barry Bonds, despite Bonds' huge edge in the key rate stats. Craig Burley doubted that Pujols' bat could actually be worth more than Bonds and a brief discussion ensued. I chimed in with a suggestion - to use Base Runs in a team context to estimate the true value of a player's production.

A Smyth, a Tigre and Base Runs

Base Runs is a fresh new approach to Run Estimation formulas developed by David Smyth. Sabermetic denizen of Baseball Primer, Tangotigre, has written extensively on Base Runs and different Run Estimation formulas. When I worked on a technical version of Base Runs to handle intentional walks and GIDPs, I discussed the results in an e-mail exchange with Tangotigre, for whose help I am grateful. My weights for the various elements in Base Runs match up very well with the empirical values of all the various batting events. I tested my formula against game logs from 1994 to 2001, with very good results.

Before we get to the two big sluggers, this new approach to run estimation should be discussed.

There are 4 elements to the base runs formula:

A factor - this is the reach-base factor, similar to the A factor in Bill James' Runs Created, except that it removes Homeruns
B factor - the advance-the-baserunner factor
C factor - the failure-to-advance-the-baserunner factor
HR - that's homeruns, sitting on their own. Homeruns do advance baserunners, so they are also included in the B factor as well, but the portion of the homerun's value that drives in the batter is always equal to 1 run.

The factors are combined as follows:

Men on Base multiplied by Advancement Ratio plus Home Runs

The idea behind this approach is dead simple: runs are scored when runners first reach base, then advance (either on their own or with another player's help), with the exception of homeruns. Using the factor letters, the formula is:

A * (B/(B+C)) +HR

Here we see that the Advancement Ratio is Successes divided by the sum of Successes and Failures.

My technical version uses the following weights:

A factor:

Hits - Homeruns + .9*(HBP+W-IW) + .5*IW - CS - GIDP

The .9 and .5 might look strange; they reflect the fact that HBPs and walks create many GIDP opporunities and force plays. Intentional walks are often issued with weaker hitters coming up.

B factor:

.7*singles + 2.4*doubles + 4.1*triples + 2.4*Homeruns + 1.2*sac flies + 0.8*sac hits + 1.1*steals + .2*caught stealing +.1*GIDP +.2*HBP + .1*(W-IW) + .04*(AB-H-K)

Full discussion of this B factor is beyond the scope of this article, but I will deal with two peculiar aspects.

The advancement value of triples is worth more than homeruns because the triple also advances the batter two bases after he reaches base. The homerun's value here is only refective of the effect it has on runners; it's effect on the batter is dealt with separately.

Caught Stealing and GIDP have some advancement value because some CS are on double steals (the other runner advances) or runners are sometimes called safe after an error (the middle infielder drops the ball during the tag). GIDP's can, of course also advance runners.

C factor:

AB - H + SH + SF

This is nothing other than batting outs. Outs made on baserunners are dealt with in the A factor (they remove those runners).


Bonds and Pujols in the context to their teams.

It is possible to apply the Base Runs formula directly to Bonds and Pujols'stats, but then we are subject to the same difficulties the original Runs Created approach suffered from. In reality, Bonds' reach-base factor (A) acts on his teammates, not of his own advancement ratio. Similarly, his advancement potential acts on the batters batting ahead of him.

To sidestep this issue, I first calculated team base runs and compared that to team stats minus Bonds/Pujols. The difference is a rough approximation of the run value added by their performance.

The Giants have scored 601 runs (BaseRuns predicted 607.09); the Cardinals have scored 718 (Base Runs predicted 710.31). Subtracting out Bonds's stats from the Giants, the difference comes to 108.87 BaseRuns; Pujols gets credit for 120.00 BaseRuns after subtracting his stats from the Cardinals.

But that's not the end of the story - we need to account for secondary effects. When a batter reaches base as frequently as Bonds, he's creating opportunites for his teammates by not making outs. The next question is - how many outs do these batters save for their teammates.

The simplest approach is to subtract OBP from 1. But when a batter reaches first base he's creating a GIDP opportunity for the next batter and this ought to be accounted for. In the NL 2002, a GIDP occured approximately 1 for every 11 times on first base (as estimated by the formula: singles + walks + HBP - steal attempts). The result is an estimated 16.2 GIDPs for his teammates created by Bonds and 14.4 GIDPs by Pujols. I added these to the other outs and the results were:

Outs% (Outs/PA): Bonds 52.3%; Pujols 60.9%

The number of outs saved depends on what we assume a replacement level rate would be. I'm going to assume that an outfielder taking one of these two player's spots in the batting order would create fewer outs than average. I used an outs percentage of 67%.

Outs saved: Bonds 66.67; Pujols 33.41

How many extra runs would have scored from these outs? The outs are added to the team pool, so Bonds and Pujols can later take advantage of these extra opportunities as well as their teammates. I used the team Base Runs per Out figures (Giants = .168 Base Runs per out; Cardinals = .198 Base Runs per out). Bonds' outs saved should result in 11.22 extra runs for the team, while Pujols adds 6.44 to his.

The new total is 120.09 Base Runs for Bonds and 126.44 for Pujols. But those are Base Runs, which is only an estimate of runs. Adjusting for actual number of runs scored by the team (compared to team Base Runs), Bonds moves down to 118.89 and Pujols moves up to 127.81.

Those are estimates of how many runs Bonds and Pujols were responsible for. However, the value of the contribution depends on replacement level and so we need to calculate marginal runs after setting an appropriate replacement level.

I don't have a good idea what the replacement baseline should be, although I think it's higher than 0.5 league average RPG (which is what Win Shares uses). I set three different replacement Runs/PA as follows:

4.50 runs per game - approximately .116 runs/PA (RAR-1)
3.85 runs per game - approximately .101 runs/PA (RAR-2)
3.46 runs per game - approximately .093 runs/PA (RAR-3)

I derived these by looking at the history of the National League and looking at how various run per game levels translate into runs/PA. The results for out two subject are:

player ... PA adjBaseRuns RAR-1  RAR-2  RAR-3
Bonds .... 455 118.89 66.11 72.48 76.57
Pujols ... 546 127.81 64.47 72.12 77.03


RAR is Runs Above Replacement. Using any replacement level from 3.4 to 4.5 produces very close results, with 3.8 producing a virtual tie.

I've made no attempt to account for the types of clutch performance that Win Shares includes, nor have I accounted for park differences. Pac Bell is a much better pitcher's park than Busch, so for that reason I think Barry Bonds' offensive production has been the most valuable in the NL despite the time he's missed.

But it isn't a cakewalk, and if Bonds misses another 5-10 games, it might be enough to tip the scales.



Bonds or Pujols - who's the most valuable? | 50 comments | Create New Account
The following comments are owned by whomever posted them. This site is not responsible for what they say.
Pepper Moffatt - Thursday, August 28 2003 @ 02:27 PM EDT (#93447) #
http://economics.about.com
One issue that is very rarely addressed is the fact that the two players face a widely different set of pitchers (and conversely pitchers face a widely different set of hitters) thanks to the unbalanced schedule.

I'm not sure how it would influence the discussion here, though I think it would. It's one of the reason I think Hudson is a better choice in the AL Cy Young Race. Hudson is in a division with good hitting teams like the Rangers, Mariners, and Angels, where Loaiza has got to clean up on the Indians and Tigers.

This is interesting stuff, but I see no reason why this (because it lacks park factors) will be any more accurate than just using VORP.

Mike
robertdudek - Thursday, August 28 2003 @ 02:37 PM EDT (#93448) #
You can use park factors after estimating Runs. That's not a problem. The first step is determining marginal runs, then you adjust for park and convert to wins.

This method has several advantages over VORP:

1. It uses a better run estimation formula

2. It puts the runs in context, so that actual team runs can be divided among the players a la Win Shares.

3. It accounts for the secondary effect of using up outs at a greater or lesser rate than average.
Pepper Moffatt - Thursday, August 28 2003 @ 02:42 PM EDT (#93449) #
http://economics.about.com
Shouldn't you incorporate the park factors into each individual component (singles, doubles, GIDP, etc.) Wouldn't that make more sense? Plus you need to incorporate the fact that the road parks the two play in differ as well (other than Coors, the NL West parks are a hitter's nightmare).

Regarding point 2, doesn't VORP/EQR basically accomplish the same thing, with a margin of error of about 15 runs? I thought I read that in one of the early Prospectii.

How often will the results from this method differ significantly (in any sense of the term) from VORP? What is the average magnitude of the difference?

Mike
_R Billie - Thursday, August 28 2003 @ 03:02 PM EDT (#93450) #
http://www.auburndoubledays.com/

Is it just me or do you notice anything unusual about the Boxscore graphic?
robertdudek - Thursday, August 28 2003 @ 03:08 PM EDT (#93451) #
Shouldn't you incorporate the park factors into each individual component (singles, doubles, GIDP, etc.) Wouldn't that make more sense?

Absolutely not! In this case, you aren't interested in the hitter's ability, only in the value of his production. For that you need to park adjust the run factor only. If you know a park depresses offense by 10% generally, then the value of all production in that park needs to be adjusted upward by 10% regardless of the particular shape it takes. If you were concerned about a hitter's ability and especially what they would produce in another environment then you need to adjust each element individually.

The difference in schedule or opponent isn't adjusted for by this method (nor by VORP). It would be nice to have access to this information on a daily updated basis. In my opinion, this is where Wins Shares could be useful: adjustments could be made between the win shares earned by the offensive and defensive side of the ball based on schedule and opponent batting or pitching.

Regarding point 2, doesn't VORP/EQR basically accomplish the same thing, with a margin of error of about 15 runs?

This method adjusts the difference down to zero - remember we are after value here not ability. When searching for value, the main thing to remember is that the sum of runs created by individual players MUST equal the number of runs actually scored by the team. Those results may then be adjusted within the team based on different factors and converted to wins.

How often will the results from this method differ significantly (in any sense of the term) from VORP? What is the average magnitude of the difference?

I'll let you judge that from the following:

Prospectus currently lists Bonds' EqR as 122.6 and Pujols at 120.8. The marginal runs (RAR) are listed as Bonds 95.2, Pujols 82.2. These are much higher (especially for Bonds) than my estimates above.

I think they are overcompensating for the secondary effect due to not using up a lot of outs by perhaps not accounting for DP ops created and not distributing the outs to teammates properly.
Gitz - Thursday, August 28 2003 @ 03:49 PM EDT (#93452) #
I don't know what all these numbers tell us, frankly. They're interesting and fun -- like DIPS is -- but, at the end of the day, what have we learned?

Bonds and Pujols are awesome hitters, and their race for MVP is a great discussion. Whether this kind of heavy-sabermetric study, for all its impressiveness -- and this is great work, Robert -- adds anything is, in this skeptic's opinion, a subject of even greater debate.
robertdudek - Thursday, August 28 2003 @ 04:55 PM EDT (#93453) #
I think it indicates that the value of their production is fairly close, whereas a superficial look at OPS (Which passes for sabmetric analysis in some quarters - not intended to apply to anyone who posts here regularly, I hastily add) suggests that Bonds has been head and shoulders more valuable than Pujols.
robertdudek - Thursday, August 28 2003 @ 05:16 PM EDT (#93454) #
I've added a few refinements to my method and used it to anaylse the Blue Jays offense. The following Base Runs estimates take secondary effects into account (the lineup would take a huge hit without Delgado's OBP abilities in there). The sum of individual totals adds to 737 - the number of runs the Jays have scored so far.

Let's keep in mind that Skydome is playing as one of the stronger hitter's parks in the AL this year.

Player adj BaseRuns   per100PA
Delgado ..... 128.15 .219
Wells ........105.58 .174
Catalanotto .. 67.97 .142
Myers ........ 51.60 .159
Phelps ....... 51.33 .141
Hinske ....... 48.92 .120
Hudson ....... 47.34 .107
Stewart ...... 47.28 .139
Woodward ..... 39.63 .115
R Johnson .... 39.16 .114
Wilson ....... 33.82 .125
Bordick ...... 32.73 .114
Kielty ....... 16.27 .124
Berg ......... 12.58 .090
Clark ........ 9.90 .139
Werth ........ 4.87 .096
pitchers ...... 0.97 .035
Huckaby ....... 0.10 .009
Cash ......... -1.21 -.037

Craig B - Thursday, August 28 2003 @ 05:37 PM EDT (#93455) #
Robert, very interesting, excellent analysis. Two quick points...

1 - Bonds is getting *killed* on his intentional walks here. The rationale, as I understand it, is that IBBs are usually handed out with weaker hitters coming up, and so disproportionately result in fewer run-scoring opportunities (more double plays, platoon advantage, etc.). It's a good rationale for IBBs in general, and it utterly fails to apply to Bonds, who bats in the middle of the order. Bonds out-IBBs Pujols 50-8, and loses 16.8 "A points" as a result. That's got to be a huge factor.

If I had to guess, I'd bet that Bonds scores about as often after his intentional walks as he does after his non-intentional walks. (In fact, in general, I think the "A factor" should be based on exactly that data, should it not?). For players in general, that won't be the case. For Barry, it will.

2 - I am not sure, generally, that it is appropriate to allocate a team's run-scoring efficiency/inefficiency to individual players. Most systems now do this, though, so I can't really fault you for it!
Craig B - Thursday, August 28 2003 @ 05:43 PM EDT (#93456) #
Let me expand on that second point, in a rather philosophical vein. If we are trying to capture a player's individual value to his team, those values do not necessarily have to sum to zero. A team can be more or less than the sum of its parts.

I'm not sure I believe it, but it's a defensible position.
Craig B - Thursday, August 28 2003 @ 05:46 PM EDT (#93457) #
And Mike makes a great point about Loaiza/Hudson up above.

Thankfully, Baseball Prospectus has a ton of cool new stat reports, once of which is Quality of Batters Faced.

Hudson's average batter has a .758 OPS (slightly above average), Loaiza's a .740 OPS (slightly below average). The difference isn't nothing, and goes some way to undoing the gap between them.
robertdudek - Thursday, August 28 2003 @ 05:57 PM EDT (#93458) #
Craig,

Yes, a Bonds IW may be worth more than an average one, but a Bonds NIW may be worth less than a normal NIW because most NIW are sort of random and many of Bonds' are almost intentional. I'll note that from 1994-2001, it was empirically determined that an IW is worth about .178 runs and a NIW was about .33. That's also what my formula implies over the same sample.

I'd have to say that the burden of proof is on anyone claiming that Bonds' NIW/IW are more valuable than ordinary NIW/W. Look at how many runs Bonds has scored: 92. Base Runs predicts that, given league average hitters hitting behind him, Bonds should have scored 95.71 runs. This indicates that the formula is not underestimating the run scoring value of Bonds' walks and hits. Also note, that Base Runs predicts that Bonds should have driven in 90-92 runs, given league average ability of the hitters in front of him. He has actually driven in 79 runs.

One might also argue the opposite of your position, that with men on base, the opposition won't allow Barry to hit a homerun. This might imply that his homeruns are worth a lot less than a typical homeruns and it would explain why Barry's RBI totals are overestimated by Base Runs.

If Base Runs were way off base, it wouldn't predict Bonds' runs and ribbies so well. Remember that this does not include secondary effects, which I account for after making the original Base Runs calculation.
robertdudek - Thursday, August 28 2003 @ 06:10 PM EDT (#93459) #
"A team can be more or less than the sum of its parts."

I don't think its defensible unless you are bringing intangibles into the discussion, or assigning some position of run creation to the coaches and/or fans. The only question is how you divide up the share of runs among the players (or non-players). An offense's task is to score runs and we know exactly how many runs it has scored.

Here's the logic of the position as I understand it:

1) Team A scores X runs
2) The sum of Team A's batters' contributition are given full credit for X runs.
3) The sum of the value of individual contributions on Team X must equal X runs (actual runs), because only actual runs have value in a baseball game.

One does not win a baseball game by accumulating more Base Runs or a higher OPS than the other team.
Craig B - Thursday, August 28 2003 @ 06:25 PM EDT (#93460) #
No, Robert, and I understand that of course. What you said misrepresents my position (which isn't even "my" position).

All I'm saying is that when we speak of a player's "value", it's very vague, and permits of a lot of different meanings. I *think* the criteria that are given for MVP voting indicate that voters are to consider the actual impact a player has had on winning ballgames for the team... and that's the criterion I prefer.

But it's defensible to elect a more abstract definition of value. A guy who hits .500, all singles, but gets used in the 8 spot in the NL with Craig Paquette in front of him... well, in one sense he has a lot of value, he's very valuable, because you can use him in a lot of ways to create value. Does he win a lot of ballgames for his team? No, he doesn't. I wouldn't vote him for MVP, I don't think, but it's defensible to do so in my view.

There's another factor, though, which is that tarring individual players for a team's inefficiencies at scoring runs is not necessarily accurate. The inefficiencies may not be that player's fault at all. That actually is my point of view. I prefer to adjust, because you'll generally do better than otherwise.

On another note, what part of the Runs/RBI underperformance by Bonds is due to the effects of Pac Bell? I'd bet at least half. He's scored and driven in 171, where one would expect 187 on the BaseRuns methodology. Isn't Pac Bell's PF about .91?
Pepper Moffatt - Thursday, August 28 2003 @ 06:25 PM EDT (#93461) #
http://economics.about.com
Absolutely not! In this case, you aren't interested in the hitter's ability, only in the value of his production. For that you need to park adjust the run factor only.

Ahh.. okay. Makes sense.

This method adjusts the difference down to zero - remember we are after value here not ability.

Of course, the value of a metric isn't how close it matches to the number of runs scored. If it did, we could just use "Runs Scored" as the metric and be done with it.

I'm willing to accept a margin for error if it more accurately reflects the "value" or "ability" of a player (whichever one you want to measure).

I think they are overcompensating for the secondary effect due to not using up a lot of outs by perhaps not accounting for DP ops created and not distributing the outs to teammates properly.

You could test this by seeing what each assigns to lesser players on the Cards and the Giants.

I'm not sure if I agree with you, though, because if that's true, why is Delgado higher on your metric than he is under EQR? (20 points higher)

I'd really like to see an Excel file which has both Baseruns and EQR for each player, so we can see the differences between the two stats.

The Bonds estimate is probably much higher on Prospectus because you didn't incorporate park effects.

I don't know what all these numbers tell us, frankly. They're interesting and fun -- like DIPS is -- but, at the end of the day, what have we learned?

I agree with this 100%. I'm rather scared by that.

There's nothing in this analysis I inherently disagree with, but I don't have enough info on it to really make a judgement. I'd really like to see a more extensive comparison between it and other metrics.

Mike
robertdudek - Thursday, August 28 2003 @ 06:52 PM EDT (#93462) #
Craig wrote:

"The inefficiencies may not be that player's fault at all. That actually is my point of view. I prefer to adjust, because you'll generally do better than otherwise."

Well, if we can pin down whose fault it is within the lineup we could make tertiary adjustments. Nevertheless, the sum of individual contributions is going to precisely equal actual runs. Converting to wins is another tricky thing - it depends on how the club does in close games and what share the hitters/pitchers/defence are responsible for. Win Shares tackles this problem.

But to really get at the heart of it, we'd need play-by-play data and take a Mills brothers approach - where the expected winning percentage of each game state is compared to that which comes before it. The change in Win Expectancy would then be accredited to the appropriate players. That method would completely supercede the one being discussed here, so if you've got the data you'll be able to tell me within a very fine margin of error who has been truly more valuable.

And later he wrote ...

On another note, what part of the Runs/RBI underperformance by Bonds is due to the effects of Pac Bell? I'd bet at least half. He's scored and driven in 171, where one would expect 187 on the BaseRuns methodology. Isn't Pac Bell's PF about .91?

It would if the Giants offense scored runs at the rate the Dodgers have. Despite playing in a pitcher's park, the Giants aren't scoring runs at a below average (non-park adjusted) rate. It's the actual unadjusted production by the other Giants that will influence Bonds'actual R/RBI totals (which of course is itself influenced by park). But the other Giants hitters have been very near league average in the non-park adjusted sense, which suggests that the assumptions Base Runs uses for determining expected runs and ribbies are valid in the case of Bonds.

Mike wrote:

I'm not sure if I agree with you, though, because if that's true, why is Delgado higher on your metric than he is under EQR? (20 points higher)

Mainly because the Blue Jays have exceeded their Base Runs projection by 30 runs (706.69 expected; 737 actual), while the Giants have undershot theirs by 6 runs (607.09 expected; 601 actual)
Pepper Moffatt - Thursday, August 28 2003 @ 07:12 PM EDT (#93463) #
http://economics.about.com
Mainly because the Blue Jays have exceeded their Base Runs projection by 30 runs (706.69 expected; 737 actual)

Now I'm really confused. How can they exceed their projection if BaseRuns = Actual Runs? Or is there are a margin of error here like there is for EQR and VORP?

Mike
robertdudek - Thursday, August 28 2003 @ 07:30 PM EDT (#93464) #
No, the Base Runs formula estimates run production and of course there is a margin of error (in this case about 30 runs, which indicates that the Jays' offense has been very efficient converting offensive elements into runs). Then the team estimate gets adjusted to the actual total, which is why Delgado's adjusted Base Runs goes up.

EqR doesn't (to my knowledge) make such an adjustemnet.
Pepper Moffatt - Thursday, August 28 2003 @ 07:35 PM EDT (#93465) #
http://economics.about.com
EqR doesn't (to my knowledge) make such an adjustemnet.

It doesn't. From what I've read, though, it's margin of error is quite small. What's the margin of error for BaseRuns relative to EqR?

Which leads to the question: Why not just apply the adjustment to EqR rather than to BaseRuns?

Mike
robertdudek - Thursday, August 28 2003 @ 07:41 PM EDT (#93466) #
Which leads to the question: Why not just apply the adjustment to EqR rather than to BaseRuns?

Because Base Runs better represents how runs are scored in the real (major league baseball) world. Please read Tangotigre's analysis at Baseball Primer. The link is to the first installment, and there are two others on the subject. I agree with his reasoning and conclusions there.
_A Reader - Thursday, August 28 2003 @ 08:21 PM EDT (#93467) #
A reader pointed out...

Thanks for the props. F#@% Jurgen!
Pepper Moffatt - Thursday, August 28 2003 @ 08:32 PM EDT (#93468) #
http://economics.about.com
Thanks for the link, Robert!

I read all three articles. Color me unimpressed.

I wholly agree with this comment left on the final article:

"[Y]our advocacy of BaseRuns comes across as almost cultish, based on a series of assumptions it conflates with Truth, without regard for the world around it. It is like a sabermetric version of Ayn Rand."

All three articles seemed to be avoiding the subject of how the statistic performs in a typical situation relative to something like EqR. I can only imply from this that it doesn't work particularly well.

Mike

(I admit part of the reason I loved the quote because I can't stand Ayn Rand)
robertdudek - Thursday, August 28 2003 @ 08:46 PM EDT (#93469) #
Mike, I couldn't disagree with you more.

EqR, Runs Created and other run estimators were tested against annual team runs scored totals. But runs are scored on an inning-by-inning basis. Tango, thanks to Retrosheet, had access to game logs and tested the formulas on a game-by-game basis (I have as well). This is as close to the real context of run scoring situation as the present available data allows.

Please, by all means, test EqR and Base Runs against any sufficiently large set of game logs containing the requisite data. I'll be very interested in your results when they arrive.

In the meantime, you can offer a detailed critique of Tango's position. You may post it here, or if you wish you can forward me a copy at the e-mail address listed above.
Pepper Moffatt - Thursday, August 28 2003 @ 08:55 PM EDT (#93470) #
http://economics.about.com
Please, by all means, test EqR and Base Runs against any sufficiently large set of game logs containing the requisite data. I'll be very interested in your results when they arrive.

This is precisely the problem with a lot of the new analysis that comes out, particularly on Primer. Someone will come up with a new theory that challenges conventional wisdom. Someone will ask "Well, how does this new method work relative to the old method?" They'll never give you an answer and eventually you'll get a response like the one you gave.

If the people who come up with the theories aren't interested in them enough to bother testing them, why should anyone else be?

I'm a skeptic, which there aren't enough of them in baseball analysis, IMHO. Unless I can be shown proof that the new fangled theory is better than conventional wisdom, I'll stick with the devil I know.

Mike
robertdudek - Thursday, August 28 2003 @ 09:00 PM EDT (#93471) #
Mike,

Tango tested them on a game-by-game basis - or did you miss that?

I'm a skeptic, which there aren't enough of them in baseball analysis, IMHO. Unless I can be shown proof that the new fangled theory is better than conventional wisdom, I'll stick with the devil I know.

This isn't a very productive attitude. It's good to be skeptical, if it leads to productive action. If you are truly a skeptic, what caused you to believe in the devil you know in the first place?
robertdudek - Thursday, August 28 2003 @ 09:02 PM EDT (#93472) #
Characterizing EqR as conventional wisdom is preposterous.
Pepper Moffatt - Thursday, August 28 2003 @ 09:15 PM EDT (#93473) #
http://economics.about.com
Tango tested them on a game-by-game basis - or did you miss that?

I saw that. But if we're interested in analyzing a season, which you apparently are, then shouldn't you also test it on a season-by-season basis? Since it wasn't, I can only be lead to believe that it doesn't work for that. If you want to use the stat to determine who TSN should name as Player of the Game, then it would seem to work. But that's not what you're doing.

This isn't a very productive attitude. It's good to be skeptical, if it leads to productive action. If you are truly a skeptic, what caused you to believe in the devil you know in the first place?M

From the sci.skeptic FAQ:

"People who have failed to convince skeptics often say "Well, skeptics are just closed-minded bigots who won't listen to me!". This is not true. Skeptics pay close attention to the evidence. If you have no evidence then you will get nowhere.

Unfortunately life is short. Most of us have better things to do than investigate yet another bogus claim."

Perhaps I made it sound like I was blindly follow around conventional wisdom. This is not at all true. However, conventional widsom is more often than not right because:

1. It's already been tested against countless other theories and been found to work better than the rest.

2. Since it's been around awhile (by definition) there tends to be a great deal of empirical evidence in its favor.

It doesn't mean that you'll never accept a new theory. New theories are accepted all the time in science. It just means that the new theory has a lot of work to do to convince people and become conventional widsom.

BaseRuns, or some variation of it, might be better than the existing methods. I've never said that it's not, I've just said that there is very little evidence to suggest that there is. It's generally the responsibility of those advocating a new way of doing things to provide that evidence. Tangotigre presented a great deal of theory, but unfortunately there is little evidence to suggest that the methodology works for the type of analysis you wish to conduct with it.

Mike
robertdudek - Thursday, August 28 2003 @ 10:20 PM EDT (#93474) #
Mike,

You seem to be under the impression that EqR works. I've seen no evidence that it works. Can you show me the evidence that it works? As far as I'm concerned EqR is some crackpot method developed by Baseball Prospectus that isn't as simple or as accurate as a decent linear-weights formula (like XR). As I've said, to call it conventional "wisdom" is preposterous. It isn't even wisdom.

I assume that, as a skeptic, you've looked at EqR very carefully. Have you been convinced of its efficacy? If not, why do you prefer it over a method that has been explained in great detail, tested in great detail (Smyth posted RMSE results in the comments section of the 3rd installment) and makes logical sense (or is the latter not important).

Those results show that for team-seasons as a whole, Base Runs works about as well as the best linear weights equation. In the 3rd installment discussion, critics were complaining that it was 1% less accurate for the typical team. That may or may not be true, depending on the dataset you use, and depending on a further refining of the Base Run coefficients. But the other formulas clearly break down in non-typical conditions. The evidence is clear to me that Base Runs is the more comprehensive formula and that's why it supercedes the others.

I've seen evidence that XR (a linear weights formula) works much much better than RC (you can search for Jim Furtado's stuff on Baseball stuff). The co-efficient derived for a linear weights formula by regression will be the most accurate possible for that dataset only. To create a valid LWTS, you'll have to regress coefficients for every team and every season you want to evaluate. That's hundreds of formulas.

Compared to that, the simplicity of using one, very accurate formula for practically every team-season stretching 50 years into the past, is a huge huge edge.

If you do not fully understand the basis of EqR, then your skepticism of newer methods and preference for this older (unproven) one amounts to foolishness.
Pepper Moffatt - Thursday, August 28 2003 @ 10:23 PM EDT (#93475) #
http://economics.about.com
You seem to be under the impression that EqR works. I've seen no evidence that it works. Can you show me the evidence that it works? As far as I'm concerned EqR is some crackpot method developed by Baseball Prospectus that isn't as simple or as accurate as a decent linear-weights formula (like XR). As I've said, to call it conventional "wisdom" is preposterous. It isn't even wisdom.

Do you have a copy of the 1999 Baseball Prospectus? There's a terrific article on EQR in there.

Mike
robertdudek - Thursday, August 28 2003 @ 10:29 PM EDT (#93476) #
No I don't. Why don't you quote me the relevant passages.
Pepper Moffatt - Thursday, August 28 2003 @ 10:36 PM EDT (#93477) #
http://economics.about.com
No I don't. Why don't you quote me the relevant passages.

It'll have to wait until next week, unless someone's got a copy of it. All my stuff is in storage, as I'm in the process of moving.

Where did Smyth show that baseruns has a lower RMSE? All I can find is an unsubstantiated claim that it does.

Davenport has a chart of RMSE under various conditions in [ Reply to This ]
Pepper Moffatt - Thursday, August 28 2003 @ 10:39 PM EDT (#93478) #
http://economics.about.com
Ack! Technical error!

No I don't. Why don't you quote me the relevant passages.

It'll have to wait until next week, unless someone's got a copy of it. All my stuff is in storage, as I'm in the process of moving.

Where did Smyth show that baseruns has a lower RMSE? All I can find is an unsubstantiated claim that it does.

Davenport has a chart of RMSE under various conditions in this article. I don't think having the lowest RMSE is the end all and be all, because if it were we could just analyze players by using runs scored. However having a low RMSE is important, IMHO.

I'd highly recommend picking up Prospectus 1999 if you can find a copy. They need to be on every baseball statheads bookshelf, along with the major Bill James stuff. I picked up a used copy off of eBay a couple years ago, it's probably the easiest/cheapest way to find one.

Or maybe you can get an autographed one from Keith Law. :)

Mike
Pepper Moffatt - Thursday, August 28 2003 @ 10:40 PM EDT (#93479) #
http://economics.about.com
I hope that last message didn't mess up anything major. I left a quotation mark out of a URL.

Mike
robertdudek - Thursday, August 28 2003 @ 10:57 PM EDT (#93480) #
XR beats EqR in the 1993-2002, i.e. under current conditions (however, the chart you referred to is as unsubstantiated as Smyth's info). EqR sounds like a slightly inferior version of LWTS. It therefore suffers from the major handicaps of all LWTS versions do - that it has to be tailored by regression for each dataset it is applied to.

Do you have any measure of accuracy for EqR under non-typcial conditions? Has it been tested on teams that have a high OBP, score more than 6 runs a game, less than 2? The major selling point of Base Runs is its robustness in a variety of conditions. For a typical team almost any decent Run Estimation formula will do an adequate job.

Anyone can come up with a set of co-efficients to make a formula look good for a particular dataset. Thus, the theoretical grouding is the key to distinguishing the various approaches.
robertdudek - Thursday, August 28 2003 @ 11:09 PM EDT (#93481) #
I also noticed that EqR uses the league run scoring rates (league runs/pa) to adjust EqR. This effectively centres everything so that league EqR exactly matches actual league runs and errors on the team level are reduced. Base Runs doesn't do this - it predicts exactly how many runs a team or league will score, i.e. it estimates runs from the component elements in an absolute, non-relative way.

Colour me unimpressed by EqR. However I will test it against 1993-2002 team seasons against Base Runs.

Another thing I noticed is that EqR as presented here does not take advantage of all the minor batting events. If the published EqR figures rely on such a paucity of data, they are nothing more than garbage.
Pepper Moffatt - Thursday, August 28 2003 @ 11:12 PM EDT (#93482) #
http://economics.about.com
XR beats EqR in the 1993-2002, i.e. under current conditions (however, the chart you referred to is as unsubstantiated as Smyth's info).

Fair enough. I guess I should have said "undetailed". Smyth didn't post what the RMSE was for his method, how much lower it was, etc.

Do you have any measure of accuracy for EqR under non-typcial conditions? Has it been tested on teams that have a high OBP, score more than 6 runs a game, less than 2? The major selling point of Base Runs is its robustness in a variety of conditions.

The RMSE will capture a lot of this. There have been quite a few of these teams over the years in different offensive contexts. If EqR did a particularly bad job at predicting the runs scored by these teams, it would increase the RMSE.

Keep in mind that we're looking at mean squared error, so if a metric is particularly out to lunch in some cases, it's going to get killed by that method.

Now for teams which are completely out of context.. say 6 or 7 standard deviations from the mean, I'm not sure how the methods would fare. It's an interesting question. How would you go about testing it? I'd probably look at the 50 biggest RS outliers (relative to league context) in MLB history.. the 25 teams on either end. Then I'd look to see how each method does in predicting their performance. You could replicate it for 5, 25, 50, 100, 500 teams etc.

At any rate, it is an interesting question. At the moment I don't have reason to believe any metric would do better than any other.

Anyone can come up with a set of co-efficients to make a formula look good for a particular dataset.

Agreed. What is most impressive with EqA is not that it works overall, but it seems to work in many different contexts that have occurred during MLB history.

Thus, the theoretical grouding is the key to distinguishing the various approaches.

I can also agree with this. At any rate, I think both EqR and Base Runs are pretty naive metrics as they don't consider important factors like the quality of opposition. I can put up my slowpitch baseball stats (and league averages) into the EqR formula and find out that I have a .280 EqA. That doesn't make me Chris Woodward. Davenport has addressed this with his Major League Equivalencies, but he still hasn't taken into account that two big league hitters can face widely different pitchers/defenses.

Mike
robertdudek - Thursday, August 28 2003 @ 11:20 PM EDT (#93483) #
They aren't naive metrics. Metrics are developed for a specific purpose. In this case, they are developed to measure something very fundamental in baseball. They are supposed to measure runs scored, not quality of opposition oro anything else. If your softball team wins a game 9-6, then you and your teammates have created 9 runs. Quality of opposition has nothing to do with that fact.

But we can't really do ANY offensive analysis of individual players until we have a run estimator that we trust. I trust mine (not mine really, but you know what I mean), and you can have yours. I will say that if you don't think that an estimator that works much much better than anything else out there on a game-by-game basis, isn't a cool and groovy thing, then nothing I do or say is going to convince you.
Pepper Moffatt - Thursday, August 28 2003 @ 11:39 PM EDT (#93484) #
http://economics.about.com
I will say that if you don't think that an estimator that works much much better than anything else out there on a game-by-game basis, isn't a cool and groovy thing, then nothing I do or say is going to convince you.

First of all, we don't know if this method works better than EqR on a game-by-game method, but I have a hunch that it will. It'll probably work much better.

But there's a major flaw in your logic. Suppose you and I are estimating the sum of 162 dice rolls... let's suppose that this is one of those AD&D dice that goes from 1 to 100.

Your estimator always picks a number which is 2 higher than the actual value. So if the true value is 50, your estimator chooses 52.

My estimator naively picks 50.5. Always.

Now roll the die 162 times, and come up with a sum. Calculate the sum of all your predictions and calculate the sum of all my predictions? Although your estimator will be more accurate for an individual roll, mine will be more accurate over a larger sample size.

The key is that you can't just consider the standard error, you must consider bias. I would be shocked if either of the estimators were unbiased.

The thing is, you can convince me and others Robert. You, and others who support the metric, need to show some evidence why your estimator is better than another under a range of situations. If you intend to use the metric over a season, then the evidence should be of a seasonal nature, not a daily one.

Evidence that would help:

* Evidence on RMSEs under various conditions would help, but it's only one piece of the puzzle.

* Do you think your stat is better at estimating players on teams that hit a lot of homeruns but don't steal much? Then take 100 teams with that profile and compare the performance of those metrics. What about teams that have 2 sluggers and 3 holes in the lineup? Again, compare the performance of the stats under those conditions, using historical data. It's likely that some metrics will work better in some conditions than others. That's useful, because it helps show the drawbacks ofthe existing metrics. Perhaps a 3rd metric, which is a combination of Base Runs and EqR might be better than them both.

Again, it goes back to the skeptic FAQ:

"People who have failed to convince skeptics often say "Well, skeptics are just closed-minded bigots who won't listen to me!". This is not true. Skeptics pay close attention to the evidence. If you have no evidence then you will get nowhere."

Mike
robertdudek - Thursday, August 28 2003 @ 11:41 PM EDT (#93485) #
It wasn't David Smyth, but rather Patriot who mentioned the following:

"For 1980-2000, using the basic version of XR, the basic version of RC, and a variation of BsR, the RMSEs for team runs in a season are: XR=23.7, BsR=23.9, RC=25.8"

I'm pretty sure this comparison did not use the centering that EqR uses. You're not going to get a huge difference is team-season RMSE for most of the competing methods out there. Note also that these are the basic versions (and I haven't seen an EqR that uses all the data available).

The major logical hurdle is that testing against team-seasons doesn't necessarily mean that individual player stats will behave in the same way. What is true about the value of production of the Giants in general isn't necessary true of the value of Barry Bonds' production. That's why we need a robust formula we can trust in all sorts of contexts. That's why the game-by-game test is so important: it exposes the weaknesses of the various formulas. The inning-by-inning would be fascinating, but we'll have to wait until the data is readily available.

I still can't find a theoretical underpinning in that EqR article, except that production is heavily anchored to league average production. Tangotigre did an excellent job in laying out the theoretical underpinnings of Base Runs.

Ultimately, we aren't interested in what the run estimator says about how many runs the 2003 Giants should have scored because we know EXACTLY how many they actually scored. What we want to know is how many of those, roughly, is Barry Bonds responsible for.
Pepper Moffatt - Thursday, August 28 2003 @ 11:52 PM EDT (#93486) #
http://economics.about.com
What we want to know is how many of those, roughly, is Barry Bonds responsible for.

Right. Which is tricky, but I think you can show that by finding teams which are similar to the 2002 Giants in all other aspects (we'll call team X), but the left-fielder is somebody other than Bonds. If Team X scored 80 less runs than Bonds, then we'd expect the metrics to assign 80 more runs to Bonds than to Left Fielder X. I don't see any way other to do it.

I'm still not comfortable with optimizing a metric for a game or inning or whatever and then aggregating up to a season. This will only work well if your statistic is unbiased and I don't have enough confidence in any of the metrics to assume that. I'd rather have one that works well in the context of a season.

Mike
robertdudek - Friday, August 29 2003 @ 12:07 AM EDT (#93487) #
I prefer not to play the game of show and tell. I much prefer it if those who enter discussions with me are willing to share the workload.

I'm not out to convince anyone beyond a shadow of a doubt. Knowing what I know about how the method works for a team-season, knowing how it works when we look game-by-game, knowing the theoretical underpinning of a method (as opposed to a mathematical equation subsumes the run environment into itself), I don't need more convincing. I've stated my reasons for believing Base Runs is a better run estimator, I don't need to rehash them.

I expect that most reasonable people interested in run estimators ought to quickly recognize the brilliance of this formulation. If they are interested in it, they will likely seek to improve it.

Go check Fan Home for discussions about Base Runs. They've been discussing it for several years there and I'm sure there are just reams and reams of tests that various participants have undertaken. If you are really hungry for that - go seek it out!

Mike, I respect your point-of-view most of the time and you have a lot of intelligent things to say. But it does get tiresome after awhile. If you don't really have an interest in run estimation techniques then I find it strange that you seem to have invested a lot of effort trying to convince me that you won't give up your conventional wisdom because of some new thing. If you are interested, I would expect that you'd be willing to dig pretty deep to find the answers (either online or your own research) about these methods that you seek.

Frankly, I don't even know if you have a firm grasp of the theoretical underpinnings of EqR. I sure don't (assuming there is one), not from that article you linked too. If it's true that you haven't delved into EqR and other methods deeply, then it doesn't strike me as a very useful position to "hold onto EqR until someone proves that something else is better".
robertdudek - Friday, August 29 2003 @ 12:14 AM EDT (#93488) #
I'd rather have one that works well in the context of a season.

THEY ALL DO! That's what should be apparent from the chart you linked to: all the formulas that were designed recently work almost equally well in the modern context.

That's why it was a shock to me to discover that some of them failed badly, and one in particular did so well in the game context - where the combination of batting elements are very often atypical. Right now, the litmus test is the game logs. Someday, we'll be able to look at innings and see how the thing REALLY works.

And your wrong about the optimization. Base Runs were optimized using the plus-one method to reflect the empirically observed event values in modern conditions on a several league-season basis. The fact that they work extremely well on game-by-game samples is a major coup and is due to the theoretically soundness of the concept.
Pepper Moffatt - Friday, August 29 2003 @ 12:38 AM EDT (#93489) #
http://economics.about.com
I expect that most reasonable people interested in run estimators ought to quickly recognize the brilliance of this formulation.

It's quotes like this that suggest that you're not interested in a reasonable dialogue.

I prefer not to play the game of show and tell. I much prefer it if those who enter discussions with me are willing to share the workload.

Hey, I never said I wasn't unwilling to do anything. That being said, I don't think it's asking too much to see some evidence. Theory is great; most of the time I'm a theorist in my day job. I also want to know how well something works under real world situations.

Go check Fan Home for discussions about Base Runs. They've been discussing it for several years there and I'm sure there are just reams and reams of tests that various participants have undertaken. If you are really hungry for that - go seek it out!

I visit it about once a month or so. Not my cup of tea... there's way too much posturing and theory, not enough applied science. Too many of the posters come off like Jehovas Witnesses for my liking. I did go there earlier tonight at look at the most recent stuff on Base Runs. There wasn't much, though there's probably a lot of really useful discussions I've missed.

I think a more positive idea of how to come to a conclusion would be the following:

1. The people on different sides of the fence agree on a set of tests that should help to show the costs/benefits of certain techniques.

2. Those tests are run.

3. Take what you've learned from 2, go back to 1 if necessary and use that new knowledge.

I'm more than happy to help with step 2, but you've refused to comment on my methods or address my concerns with this methodology.

Frankly, I don't even know if you have a firm grasp of the theoretical underpinnings of EqR. I sure don't (assuming there is one), not from that article you linked too.

Davenport has posted years and years worth of stuff on rec.sport.baseball. Plus there are all the Prospectus books to look at.

Quite a few times I've stated that I can be convinced that Base Runs is a superior methodology. You've done nothing but attack me and attack EqR when it's quite clear you know very little about it. You refuse to acknowledge any of my concerns about Base Runs. I think I've been quite accomodating and reasonable throughout this.

This is my last post on this topic.

Mike
Gitz - Friday, August 29 2003 @ 01:10 AM EDT (#93490) #
Yawn.
_Jurgen - Friday, August 29 2003 @ 01:38 AM EDT (#93491) #
Boy, all I was trying to suggest was that maybe the idiots were right... but now I know the truth... the idiots are us.

I'm just trying to keep up.
_Jurgen - Friday, August 29 2003 @ 04:03 AM EDT (#93492) #
Gitz:

Usually you’re the one reprimanding me for my errand ways, but be thankful that stat-heads like Mike and Robert are willing to discuss and argue the merits of these otherwise yawn inducing metrics (even if it did get nasty at the end). The rest of us now quickly scan OPS (yup, Robert, that’s me—although I’d hardly qualify as a sabermatician) rather than AVG, R, and RBI to gauge a player's offensive value because of similar arguments years before. Our children’s children—if they’re baseball fans—will benefit from these and other web blog brawls.

Look, when a guy has an OPS over 1.250, and somebody tries to tell me that this other guy with a piddly 1.100 OPS is the rightful MVP, it sounds insane. How is that possible? It’s Rodriguez v. Tejada all over again.

But in this rare instance, Base Runs and Win Shares (and obviously Runs Created) make a very strong case for that other guy. And it’s not like he’s not very competitive in Equivalent Runs or Value Over Replacement Player. And it’s not like they’re (necessarily) pulling crazy rationales out of the air to explain it. The logic is simple: despite Bonds’ advantage when he does step to the plate, Pujols has simply been to the plate more often. And no matter how you calculate it, clearly it counts for a lot.

I think the rest of us would be foolish to not see some legitimacy in Pujols’ claim as “most valuable”. It’s not like he’s Terry Pendleton. Sometimes the Phil Rogers of the world are right, even if how they get there is very, very wrong. And when that happens sabermetrics should help show the rest of us why he’s right, and how he’s wrong.
Gitz - Friday, August 29 2003 @ 04:13 AM EDT (#93493) #
Jurgen, I was not yawning the numbers -- and in fact I liked all the numbers, as I said at the outset of what turned out to be Moffatt vs Dudek II: The Sequel Nobody Wanted To The Original That Nobody Wanted.
_Jurgen - Friday, August 29 2003 @ 04:31 AM EDT (#93494) #
Gitz:

Well, our roles stay switched, and in my wisdom I misinterpreted what you were trying to say.

It's ugly, but that kind of scrapping is necessary, too, otherwise they're just numbers. I'll say it again (although perhaps this time more clearly)--it's the arguments around the numbers (learning, say, the potential biases of a given metric) that are most enlightening.
_Jurgen - Friday, August 29 2003 @ 04:37 AM EDT (#93495) #
I guess I just feel like a frustrated Dustin Hoffman in Wag the Dog (a movie which, by the way, wipes the floor with Phone Booth).
_Rej - Friday, August 29 2003 @ 10:16 AM EDT (#93496) #
Hey Guys, pretty good discussion overall, too bad it got a bit snippy there in the end. Oh well.

A few thoughts:

1. At the end of the day, all of these measures, are only _estimates_. It's important to understand that we aren't splitting atoms here. I know you guys both know that, but sometimes a bit of extra perspective is nice.
2. Understanding that this is all estimation, I am always interested in having a new tool that may approximate reality more closely. BaseRuns looks very interesting. The main problem I have with BaseRuns boils down to: not readilly available. I'm as up to date with what goes on at Primer/BP/Fanhome as the next pseudo-SABR guy, but it would take me some time to go and find some BaseRuns numbers. It takes me all of 5 seconds to get the BPro data. I have no idea if I can even get BaseRuns in-season, and it would take some Googling to find out. Lazyness certainly shouldn't be an impediment if I were trying to make sure I got my A-bomb right, but it's a big impediment when all I'm trying to do is arguing with idiots (present company excluded of course ;-) )over the internet.
3. Whenever I use any of these numbers to do the kind of discussion that started this (Player A vs Player B), I try to get all the data I can get. OPS, WS, SLWTS, RC, BRs, EqA, VORP, whatever. If they all agree, I'm a happy man, and a tough one to argue with. If they disagree, I try and figure out why. The OPS/VORP thing is easy in this case: playing time. The EqR/BR differences are tougher, of course, but rather than worry about which one is more _correct_, I'd just as well move on to other reasons to argue, or maybe just call it a draw.
4. The differences in the two estimates: are they greater than the variability inherent in a baseball player? I'm not sure if I know how to say this right, but if, say, I can look at EqR and be confident that Bonds' 110 (for example) is really 110 +/- 5, while the BaseRun of 115 is really 115 +/- 3, then I'd be ok with the EqR because the +/- is probably not a whole lot different than the amount of "luck" or whatever involved in getting to that 110 or 115 in the first place. Does that make any sense? I'm just pulling the numbers out of my butt, and this whole line of thinking just popped into my head a second ago, so my apologies if I'm off on this point.

To me, with a relatively untrained eye - at least compared the likes of Tango or Smyth, this kind of debate is like arguing Camry vs Accord (or pick your car analogy). They are both good, solid, vehicles. I'm sure someone could find a dozen reasons to pick one over the other, and someone else could find a dozen more to pick a different one. But, at the end of the day, I'm not trying to win the Indy 500. I'm only trying to get to the Beer Store.
Bonds or Pujols - who's the most valuable? | 50 comments | Create New Account
The following comments are owned by whomever posted them. This site is not responsible for what they say.