Batter's Box Interactive Magazine Batter's Box Interactive Magazine Batter's Box Interactive Magazine
Thanks to the miracle of technology and the relatively advanced state of sabermetrics, evaluating a pool of ballplayers statistically, even one as large and disparate as the NCAA, has become pretty easy if you want to pour in the skull sweat, and can find the data.



I set out to do both, and got huge slices of help from friends, and also from people I'd never met before. Before I get into the whys, hows, and whats of my research, I want to thank four terrific people for their help with this project, of which this article is but a tiny part.

The first is Boyd Nation, the most knowledgeable layman in America about college baseball. Boyd is a true gentleman, and has managed to put together 99% of the data on which this project is based ? and made it available for free to anyone who wants it. When I needed assistance, he was there within 24 hours to give me what I needed with kind wishes.

I would also like to thank my friend Vinay Kumar, who was so helpful in helping me when I ran into difficulties with methods. Also, Joe C. and Noffs, readers of Baseball Primer, who also offered assistance without any chance of reward. Thanks guys.

As I said above, I set out to evaluate the pool of NCAA Division I players. Division I is huge ? there are over 280 participating schools, and 8,000 players compete in any given year. What's worse, there is a wide variation in ballparks, much more so than in MLB, and an even wider variation in competition. Unlike MLB, where teams play relatively equal schedules, some NCAA teams will play 80% or more of their games against teams in the top 40% (or bottom 40%) of the talent pool.

Once I had found that data was available on every hitter and pitcher to play Division I baseball in 2002 and 2003, I knew that I would have to crunch the numbers and come up with a rating system. After all, what is a sabermetrician without a rating system? To begin with (for the first iteration of this project at least) I have chosen a very simple system that is used so well by Lee Sinins. This is the RCAA/RSAA evaluation system. It uses runs allowed by pitchers and runs created by batters, and compares players to their league averages, giving an run value above or below the average for the number of outs (or innings) a player uses or records.

RCAA/RSAA is a very useful method of analysis for college players in particular. Most of us, myself included, don't have any particular interest in college play per se, rather we are interested in analyzing the performance of the top players ? the actual prospects. Using a ?baseline? of an average college player is probably more in line with what we are interested in, rather than using ?replacement level? which, in addition to being relatively difficult to calculate (not least because talent at the Division I level is very unevenly distributed).

I tweaked the system slightly. Knowing that the college game is different from the pro game, I decided on a linear run estimation system rather than a non-linear one, and so chose (it being a simple system) Jim Furtado's xRuns. I found that for the entire pool of players, xR (as it is known) underestimated scoring by about 6%. This is presumably due to the higer run-scoring environment in the metal-bat game... the average team in Division I scores about 6.5 runs per game. This makes sense; in an environment where there are more hits and more men on base, each offensive event will have a greater impact than it otherwise would.

When I redo the study and update my spreadsheet, and for future articles, I am hoping to use a run estimation tool that better approximates runs scored. All suggestions are welcome. At any rate, I will be using xRAA instead of RCAA to present my data (I should note that all xR figures are compared to league average xR, not runs scored, so that the 6% discrepancy doesn't work its way into the ratings). The problem is the data available; the team totals are not ? unfortunately ? reliable (so NewRC is out as a method) and the available historical information is sketchy. But we'll figure something out.

Once I had xR for every player in the database, I recognized the need to make two adjustments. The first was a park adjustment. Park-adjusting is a well-understood phenomenon and I don't need to get into the theory of it here, but I should point out that college ballparks have a much wider distribution of park factors than pro parks, which is reflective not only of the larger number of teams, but also of the greater geographic diversity and diversity of facilities.

Thankfully, my efforts were considerably speeded by Boyd Nation, who has conventionally calculated four-year park factors available for 2002 and 2003 along with all the other data on his website. The 2003 factors have an additional benefit: park factors based on a weighted average of all the parks a team played in over the course of a year. This is very important information; teams in the Mountain West conference, for instance, play dozens of games a year at a high altitude (San Diego State's home park factor is 104, but the average of all parks they played in was 114!). The fact that four-year park factors are used reflects the shorter NCAA season, almost always less than 70 games and usually less than 60.

For 2002, the park factors are also four-year, but aren't weighted averages, just home factors. I am still tweaking the PFs for 2002, based on home/road+neutral games, but for now I use the PFs in an absolutely standard way.

Park-adjusted numbers, though, aren't the whole story, because of the need to adjust for various levels of competition. Division I ranges from Texas and Florida State, who would be competitive with teams from the low-level minors, to Alabama A&M and Western Illinois, who make up numbers. In particular, there are massive differences in scheduling. Any two given NCAA players not only won't play similar schedules, they likely won't have any common opponents. So Mitch Maier's performance at Toledo, impressive as it is, is not in fact better than Beau Herrod's at Alabama. It looks like a better performance, but that's only because it is compiled against inferior competition. Toledo's opponents had a .474 winning perrcentage; their opponents' opponents also had a .474 winning percentage. Alabama, meanwhile, scheduled opponents with a collective .582 winning percentage, and their opponents' opponents had a .546 winning percentage. (All opponents' winning percentages (OppWP) and opponents' opponents' winning percentages (OppOppWP) listed are weighted for the number of games played versus each team).

So what can we do with this? Ideally, it would be great to adjust for the quality of the pitching staff (offense for pitchers) that each batter faced. Unfortunately, my data is not detailed enough for this ? nor are my analytical abilities that advanced. But I settled on a solution that allows for an adjustment to level of competition, without making particular adjustments for offense or defense.

First, I used OppWP and OppOppWP to derive a ?true ability? winning percentage for each team's opponents. I used the ?log5? method to do this, but if you want to get a basic approximation, adding the OppWP and OppOppWP and subtracting .500 usually gets close enough. This estimates what record the team's opponents would have had against a baseline level of competition, in this case .500.

Once I had done this, I used the Pythagorean forumla. Or rather, the Reverse Pythagorean, which sounds like a most unmentionable perversion but is really quite useful in this context. What we need, is not an approximation of the quality of a team's opponents in terms of wins and losses. In order to adjust a measure of talent expressed in terms of runs, what we need is an approximation of the quality of a team's opponents measured in terms of runs.

Converting runs to wins, of course, is done by the Pythagorean forumla. Converting wins back to runs can be done the same way. First, we make the assumption that each team's won-lost record is equally due to offense and defense (or pitching+defense if you prefer). Yes, this is a pretty massive assumption, but it's necessary for the time being given the data we have. (Which is why I refer to this adjustment as a ?competition adjustment? - it only adjusts for the level of competition and not of the player's actual opposition). Then, we can plug the opponents' winning percentage back into the Pythagorean forumla and we'll derive a number of runs scored and allowed for that team. That, essentially, is the quality of the offense or defense that that team faced that year, all measured against a baseline of the Division I average (a .500 team).

If you're interested, Vinay Kumar contributed the calculation. It used WP/(1-WP) instead of straight winning percentage, and derives a multiplier which indicates how much of league average a team's offense/defense is. That multiplier, the Competition Adjustment, is the fourth root of WP/(1-WP) (i.e. the won-lost ratio).

(WP/(1-WP))^0.25

So for Arkansas State 2003, for instance, whose opponents' won/lost ratio (adjusted for opponents' opponents) was just about 1.33, the forumla yields a Competition Adjustment of 1.073. In other words, we assume that Arkansas State's opponents were 7.3% above average in both run scoring and run prevention.

This allows us to make a very simple adjutment to the ?Average? level for RSAA and xRAA, enabling us to move the ?average? baseline to whom the Arkansas State players are compared up 7.3% for pitchers, and down 7.3% for hitters.

Once the park adjustments and competition adjustments are applied to xRAA, we get park-and-competition-adjusted xRAA, or **xRAA ? in keeping with the convention of one asterisk for park-adjusted numbers, we'll use two for park-and-competition adjustment. This measures how many runs above average a player would have been in the same number of opportunities, against perfectly average opposition on a perfectly average park. It puts every hitter in the NCAA on the same footing. And we can compare them directly.

So I'll stop, and present a top 50 for 2003. Eventually, I will have numbers for every player in 2002 and 2003 available. The only comment I will make on this table is that the Blue Jays managed to select the fourth-best hitter in the NCAA in 2003 in the 18th round of the amateur draft. As in any analysis of less than 300 plate appearances (remember these are short seasons!) small sample size warnings apply.


Top 50 hitters, NCAA Division I, 2003


Rk Name Team **OWP **xRAA
1 Jeremy Cleveland North Carolina .910 70.1
2 Michael Aubrey Tulane .907 66.8
3 Rickie Weeks Southern .930 64.2
4 Ryan Roberts Texas-Arlington .889 54.9
5 Brian Buscher South Carolina .849 50.7
6 Ricardo Nanita Florida International .879 47.9
7 Stephen Drew Florida State .841 47.2
8 Tony Richie Florida State .843 46.3
9 Tony McQuade Florida State .831 45.7
10 Jonny Kaplan Tulane .825 44.9


11 Josh Anderson Eastern Kentucky .847 44.8
12 Michael Johnson Clemson .870 44.2
13 David Murphy Baylor .837 43.8
14 Carlos Quentin Stanford .842 43.7
15 Matt Hopper Nebraska .851 42.8
16 Sean Farrell North Carolina .824 42.6
17 Chris Durbin Baylor .819 42.5
18 Brian Snyder Stetson .846 42.4
19 Lee Curtis College of Charleston .826 41.9
20 Dustin Majewski Texas .827 41.6


21 Beau Hearod Alabama .841 41.6
22 Jeff Larish Arizona State .831 41.3
23 Ryan Garko Stanford .833 41.1
24 John Gragg Bethune-Cookman .847 41.0
25 Chad Hauseman Jacksonville .867 39.9
26 Adam Boeve Northern Iowa .837 39.6
27 Jeff Fiorentino Florida Atlantic .830 39.3
28 Ryan Braun Miami, Florida .829 39.0
29 Michael Brown William and Mary .845 38.7
30 Aaron Hill Louisiana State .816 38.5


31 Clint King Southern Mississippi .803 36.6
32 Landon Powell South Carolina .804 36.5
33 Jeff Cook Southern Mississippi .797 36.4
34 Mitch Maier Toledo .842 36.0
35 Jamie Hemingway North Carolina-Wilmingto .803 35.7
36 Ryan McGraw Coastal Carolina .781 35.6
37 Neil Sellers Eastern Kentucky .799 35.5
38 Anthon Garibaldi Southeastern Louisiana .857 35.4
39 Jordan Foster Lamar .815 35.0
40 Brad Snyder Ball State .824 34.9


41 Conor Jackson California .864 34.8
42 Kevin Melillo South Carolina .799 34.8
43 Brian Hopkins Southeast Missouri State .821 34.6
44 Ryan Mulhern South Alabama .801 33.9
45 Keith Brachold Marist .788 32.8
46 Eddy Martinez Florida State .842 32.5
47 Ryan Gordon North Carolina-Greensbor .789 32.0
48 David Coffey Georgia .815 31.6
49 Christian Snavely Ohio State .794 31.5
50 David Castillo Oral Roberts .782 31.3

Statistical Evaluations of College Hitters | 58 comments | Create New Account
The following comments are owned by whomever posted them. This site is not responsible for what they say.
Pistol - Monday, March 08 2004 @ 08:48 AM EST (#75257) #
Wow, that's great, great stuff.

It might be useful to add their position and their year in school. For instance, Aaron Hill is 30th, but knowing how many players were middle infielders and/or draft eligible would help as well.
_Andrew Edwards - Monday, March 08 2004 @ 09:14 AM EST (#75258) #
Obviously a big step forward. Super stuff, Craig.

Following up on Pistol's comment, I think the ideal would be to use a positional average rather than a league average for xRAA. I have no idea how hard that would be to do.
_tangotiger - Monday, March 08 2004 @ 09:51 AM EST (#75259) #
Wow! The study I always wanted to do has been done!

I would not do pos adjustments, because all the good players are at SS and CF. Just go back to HS, and would you compare the star player (probably at SS) to the league SS, which is made up of every team's best player? And the guy you hide at 2B would come out looking great.
robertdudek - Monday, March 08 2004 @ 09:58 AM EST (#75260) #
tangotiger ..

Not true I think. A lot of the good college hitters don't play SS or CF.
_Andrew Edwards - Monday, March 08 2004 @ 09:59 AM EST (#75261) #
I would not do pos adjustments, because all the good players are at SS and CF.

And probably some C and 1B (Prince Fielder types). Point taken, though.

At the same time, a good college SS will grow up to be a good Major League 3B. A good college CF will grow up to be a good Major League RF. If the idea is to predict how valuable these guys will be as Major Leraguers, then we need some way to distinguish between the guy who will be an .850-OPS RF and the guy who will be and .850-OPS 3B.

What about adjustments for 4 classes: IF, OF, C, 1B/DH?

To be clear, I'm not knocking what we have. I'm mostly just trying to make Aaron Hill look better. :-)
Mike Green - Monday, March 08 2004 @ 10:00 AM EST (#75262) #
Very nice, Craig. Personally, I don't think it is necessary to have positional averages, but of course position played, age and draft eligibility would round out the short description.

Correlations between these numbers and minor league EqA for the second half of the year would be interesting.
Craig B - Monday, March 08 2004 @ 10:26 AM EST (#75263) #
Thanks guys. As I said, this is a first baby step, and I'll have more soon, including a summary of the Top 50 hitters of 2002, and a short article on pitchers with 2002 and 2003 top performers. Also, I will be working on refining and cleaning up my run estimation tools.

Coming relatively soon (i.e. over the next few weeks) will be four large reports, for hitting and pitching for the whole of Division I in 2002 and 2003.

Incidentally, the raw data is available on Boyd Nation's website at www.boydsworld.com.
_Sean McNally - Monday, March 08 2004 @ 10:31 AM EST (#75264) #
Craig,

Great work!

Was wondering how another Toronto farmhand -- Vito Chiaravalloti -- was as a college hitter. He had a smashing first professional season at Auburn after a fairly good career at Richmond.
Craig B - Monday, March 08 2004 @ 10:35 AM EST (#75265) #
A short note on defensive positions. I've always taken the view that defensive analysis is a very valuable thing, but that to the maximum extent possible it shouldn't be tied up with offensive analysis. Every guy in the lineup has just one position at the plate... hitter.

There are already 8,000,000 fudge factors in this kind of analysis. Adding a bunch more (and a ton more work - none of the data has defensive position, so it would have to be researched for 280 teams) to account for defensive position doesn't seem to me to be helpful.

Aaron Hill finished 30th in **xRAA out of 5,000+ hitters in Division I... he doesn't need extra point for being a shortstop. :)
Pistol - Monday, March 08 2004 @ 10:46 AM EST (#75266) #
As I said, this is a first baby step

This is a lot more than a baby step.

I don't think it would be necessary to adjust for position, but listing the position, year, and age would be enough to get a feel for a player (if it's not difficult to determine).
_Andrew Edwards - Monday, March 08 2004 @ 10:47 AM EST (#75267) #
none of the data has defensive position, so it would have to be researched for 280 teams

Ugh. Nevermind.
Craig B - Monday, March 08 2004 @ 10:51 AM EST (#75268) #
Vito was a top-50 as a junior in 2002, but slipped badly in 2003.
_Sean McNally - Monday, March 08 2004 @ 10:56 AM EST (#75269) #
Thanks, Craig. I kept seeing his name while tracking the career of a teammate of his at UofR... so when does the evaluation of college pitchers begin ;-)
Craig B - Monday, March 08 2004 @ 10:57 AM EST (#75270) #
I hope to have the pitchers article up at the end of this week or the beginning of next. And now, back to work.
_DW - Monday, March 08 2004 @ 10:58 AM EST (#75271) #
Excellent start!

Perhaps unsurprisingly, there are lots of Toronto, Boston and Oaklnad picks on this list (for the Jays, there's Roberts, Hill, Gordon, McGraw and Snavely).

I'm curious to see what biases exist in the method ... for example, I'd guess that this method overrates some of the EKU hitters (what that would be indicative of (if true), I'm not sure. What concerns do you have, Scott? How well do the major conferences do?

Vito's 2002 #s would have to crack the top 30...
robertdudek - Monday, March 08 2004 @ 11:01 AM EST (#75272) #
Here is how 2003 Entry Draft players born in 1982 or earlier fared in my prospect analysis. Some of these may be junior college or small college players and a player needed to accumulate at least 100 PA at a given level to be included. Rickie Weeks didn't manage that - he'd have ranked #1. My system determined that Aaron Hill is the 2nd best college position prospect taken in the 2003 draft, based on pro performance to date.

College players usually increase their prospect ratings from 5 to 20 points in their first full season.

The prospect rating takes into account age, position, league, level of competition, speed and the 4 component batting skills

Name Organisation League ProspectRating  PA
Hill Aaron TOR Florida State 69.6 134
Snyder Brad CLE New York-Penn 65.9 271
Giarratano Tony DET New York-Penn 65.0 206
Aubrey Michael CLE South Atlantic 63.6 154
Fox Jacob CHC Midwest 62.8 112
Wishy Andrew TEX Northwest 62.6 320
Jackson Conor ARZ Northwest 60.3 300
Chiaravalloti Vito TOR New York-Penn 60.2 286
Cleveland Jeremy TEX Northwest 60.2 303
Garrabrants Steve ARZ Northwest 59.6 247
Murphy David BOS Florida State 58.8 173
Gwynn Tony MIL Midwest 58.0 279
Kinsler Ian TEX Northwest 58.0 216
Pavkovich Adam ANA Midwest 57.3 147
Morton Colt SDP Northwest 57.1 107
Maysonet Edwin HOU 56.5 178
Quintanilla Omar OAK Northwest 56.2 143
Palmisano Louis MIL Pioneer 56.1 203
Herrera Javier CLE South Atlantic 55.9 183
Bubela Dane TEX Northwest 55.8 280
Webb Trey MON South Atlantic 55.7 230
Goleski Ryan CLE New York-Penn 55.4 268
Stonard Peter SDP Midwest 55.4 263
D'Antona Jamie ARZ Northwest 55.0 312
Ryan Brendan STL New York-Penn 54.6 213
Ethier Andre OAK Midwest 54.6 183
McGehee Casey CHC Midwest 54.4 258
Majewski Dustin OAK Northwest 54.3 205
Castillo David OAK Northwest 54.1 176
Borowiak Zachary BOS Florida State 53.8 106
Moran Javon PHI New York-Penn 53.3 272
Heether Adam MIL Midwest 53.0 196
Maniscalco Matthew TBD South Atlantic 52.9 238
Bourn Michael PHI New York-Penn 52.8 153
Stansberry Craig PIT New York-Penn 52.4 187
Rodland Eric DET New York-Penn 52.0 267
Gaetti Joe COL Northwest 52.0 134
Anderson Josh HOU New York-Penn 51.9 329
Mitchell Lee FLA New York-Penn 51.5 192
Cook Jeff ARZ Midwest 51.0 269
Murton Matt BOS New York-Penn 51.0 227
Blake Ryan FLA New York-Penn 50.6 162
Curtis Lee BOS South Atlantic 50.4 198
Durbin Chris BOS South Atlantic 50.0 110
_Doug - Monday, March 08 2004 @ 11:13 AM EST (#75273) #
Great work! I know somebody in sports information at Western Illinois. Can I ask what you mean by "make up numbers"? I'd like to either absolve him or tease him mercilessly, preferably the latter.
Mike Green - Monday, March 08 2004 @ 11:21 AM EST (#75274) #
http://www.sports-wired.com/players/profile.asp?ID=26998
Can someone please explain to me how Jeremy Cleveland lasted until the 8th round. COMN for his career record. From his statistical record, I'd guess that he learned after 2002 how to crowd the plate and hit for power, a la Frank Robinson.
Coach - Monday, March 08 2004 @ 11:30 AM EST (#75275) #
Fantastic job, Craig. This will be particularly useful on draft day, which in 2003 became the busiest single day in Box history. Interesting that the Jays, who everyone assumes were pitching-obsessed, landed three of the top 50 hitters; they have their own evaluation methods, of course, plus a stable of scouts, but Roberts was obviously a steal. Hard to believe he lasted to the 18th round. This method will help identify other potential bargains, and if someone will pick up the tab, I'd be happy to scout them.

all the good players are at SS and CF

I'm sure Tangotiger didn't mean that literally, but there's quite a bit of truth to it. It's almost gospel at my level; the three best players usually end up at SS, CF and C, so a college coach has very few guys on scholarship who played LF in high school. Position and year would be informative, especially for guys we've never heard of, but they are only a few mouse clicks away. Boyd's World has been in our Links section for ages, and he provides Team Links.
_DW - Monday, March 08 2004 @ 11:46 AM EST (#75276) #
Cleveland...
* is iffy defensively - some think he'll have to move to first, his ceiling in the outfield would be to be average (lacks range).
* isn't particularly fast.
* does not have tremendous power.
* has a swing that can get a little long from time to time.
* was thought by some to have a slider speed bat - and would thus struggle once he used wood bats and faced better pitching.
* UNC hitters are underrated (I'm biased, I work in Chapel Hill).

None of these reasons are sufficient for him lasting as long as he did. I'm conservative with moving prospects, but I think he's ready for AA.
_Jack Who Resemb - Monday, March 08 2004 @ 11:50 AM EST (#75277) #
Facinating. I can't believe this has finally been done.

There's only like 5 guys on this list that play in cold weather schools. Is there some bias against schools in cold weather? I've always wondered if the baseball is really better in Florida or if the teams gain a huge advantage in playing home games against cold weather schools on the road early in the season. Is there really any way to account for this?
_levski - Monday, March 08 2004 @ 11:50 AM EST (#75278) #
Craig, a few comments:

John Gragg, 24th on your list, is listed as a pitcher at Bethune-Cookman; he was also drafted as a pitcher by the Royals in the 9th round of the 2003 draft.

Ryan Gordon, 47th on your list, is also listed as a pitcher; he is shown as being drafted by Toronto in the 2003 draft, but am not sure if he signed or not.

Btw, I get my info from here:

http://www.sports-wired.com/players/

Just for kicks, I counted how many of your top 50 hitters were taken by MLB teams, and which MLB teams took the most hitters. Here is the basic breakdown:

A total of 33 hitters were drafted (not counting John Gragg and Ryan Gordon, who, as I pointed out, were drafted as pitchers, by Kansas City and Toronto, respectively).

They were taken by the following teams:
(Team, # players taken, names of players, rank on your list)

Oakland-- 4 (Farrell-16; Snyder-18; Majewski-20; Castillo-50)
Boston-- 4 (Murphy-13; Durbin-17; Curtis-19; Coffey-48)
Arizona-- 4 (Kaplan-10; Quentin-13; Cook-33; Jackson-41)
Cleveland--4 (Aubrey-2; Grako-23; Snyder-40; Mulhern-44)
Toronto-- 3 (Roberts-4, Hill-30; Snavely-48)
Chicago(A)-2 (Nanita-6; King-31)
Chicago(N)-2 (Richie-8; McQuade-9)
Houston-- 2 (Anderson-11; Hearod-21)
Kansas-- 1 (Maier-34)
Anaheim- 1 (Hauseman-25)
Atlanta-- 1 (Hemingway-35)
Pitt-- 1 (Boeve-26)
Milw-- 1 (Weeks-3)
San Fran-- 1 (Buscher-5)
Texas-- 1 (Cleveland-1)
Philly-- 1 (Hopper-15)

Teams like Oak, Bos, and Tor lead the pack; being led by stat-heads, I expected these teams to dip into the college pool more than others. AZ figures prominently on the list; this reflects their intention to replenish their farm system (generally poor on hitters despite having some intriguing pitching prospects) as quickly as possible. Cle is up there as well; I think this also reflects their strategy to just acquire as much young talented guys as possible and let them play.

Overall, I think AZ got the best hitters, Cle coming second, Tor third, and Oak fourth. I know that Carlos Quentin had to undergo TJ surgery, but he's almost back and should get back to raking in low A ball. Conor Jackson simply destroyed the Northwest league last year; both Jackson and Quentin will be pushed by AZ and could conceivably see time in AZ in 05-06.

Seems like Bos, Tor, Oak, and AZ must be using your system to search for talent in college. You should sell it to them for pretty penny.
Mike Green - Monday, March 08 2004 @ 12:16 PM EST (#75279) #
DW,
Thanks for the report on Jeremy Cleveland. His HBP totals went through the roof in 2003. Did he move up on the plate or does he dive in?
_tangotiger - Monday, March 08 2004 @ 12:18 PM EST (#75280) #
I did not mean that all potential MLB players would be playing SS or CF in college. Just that you'll have a disproportionate number of them doing so.

Remember the reason that we are doing positional adjustments. We believe that every position is equal (at the MLB level). So, the avg 1B = avg SS = avg C = avg CF (at the MLB level). So, if the avg SS is -10 runs as a hitter, and the avg 1B is +10 runs as a hitter, then we believe that as a fielder, the avg SS is +10, and the avg 1B is -10. So, when you apply the positional adjustment, you are actually applying it to that player's fielding totals.

So, if you have an avg SS, he's -10 runs relative to all hitters, +0 relative to all SS as a fielder, and +10 relative to all other fielders. This is why it's good to have position-neutral UZR. I would show this avg SS as -10 relative to all hitters, and +10 relative to all fielders.

When it comes to minors, college, HS, pee wee, our original assumption (that the avg 1B = avg CF = avg SS) no longer holds. That's because of the way talent navigates. A good overall player will find SOME place in MLB to play at, so he will be displaced from SS to 2B, or CF to LF, etc. There is a belief that there's equilibrium happening (again at the MLB level). This becomes less and less true the farther away you get from the pinnacle.

In any case, Craig said it best that you don't want to lop in all these adjustment factors into the hitter's evaluation. This is why it would be better to have a position-neutral fielder rating. So, if Tejada is a +5, you know he's a pretty good fielder, even if he'd be a -5 at MLB SS. (Not that simple, but on that track.)
Craig B - Monday, March 08 2004 @ 12:23 PM EST (#75281) #
There's only like 5 guys on this list that play in cold weather schools. Is there some bias against schools in cold weather?

Yes and no. Cold-weather schools are, simply put, not as good, so there would be fewer top players. But also, **xRAA is a type of counting stat, and so players who play more games tend to do better. Warm-weather schools tend to play more games.
_DW - Monday, March 08 2004 @ 12:26 PM EST (#75282) #
I've never seen Cleveland get hit by a pitch, but he does crowd the plate.

Gordon is a prospect both as a pitcher and an outfielder - I watched him DH for my alma mater two weeks ago. I don't know which he'll play as a pro - his prospects were better as a soft tossing lefty until he hit well in summer ball. As a 5th year senior, the Jays retain his rights until the draft.

Incidentally (not that anyone thinks this), but hitting well in college doesn't make you a lock for early success. For instance, Hausemann has already washed out of the Angels system...
_DW - Monday, March 08 2004 @ 12:40 PM EST (#75283) #
Oh, additions to levski's list: Gordon and McGraw were drafted by TOR, Michael Johnson signed pre-draft with SD, Landon Powell was drafted by the Cubs (insurance for Fox and Richie), Michael Brown is a Tiger.
robertdudek - Monday, March 08 2004 @ 12:52 PM EST (#75284) #
Tango,

In Craig's top 50 list, thee are alot of corner outfielders, first basemen and third basemen. Yes, they are among the best hitters in college, but most of them won't make the majors because they'll be facing competition from guys sliding towards the thick end of the defensive spectrum who are just as good or better with the bat (and more athletic, therefore more likely to be adequate defensively).

The outfielders on his list with speed and the middle infielders are the real prospects. That's why we need to know more about them than their current hitting abilities.
Pistol - Monday, March 08 2004 @ 12:57 PM EST (#75285) #
Seems like Bos, Tor, Oak, and AZ must be using your system to search for talent in college. You should sell it to them for pretty penny.

Geez, Craig, you gave away the formula for the secret sauce!

Actually, I suspect that those teams with a lot of players from the list have already done something comparable to this. It'll be interesting to see how the pitching lists shape up compared to the draft.
_Jack Who Resemb - Monday, March 08 2004 @ 01:00 PM EST (#75286) #
How soon till we have a list of all Division I-A players?
Craig B - Monday, March 08 2004 @ 01:03 PM EST (#75287) #
Probably a couple of weeks, as I want something nice and pretty, and will be incorporating refinements that people have been sending.
_levski - Monday, March 08 2004 @ 01:25 PM EST (#75288) #
DW: thanks for the additional info; I couldn't find it on the Baseball Cube search site. It'll be interesting to see how these hitters pan out; as DW mentioned, Hauseman has already washed out, and a few others don't look too promising either. Maybe Craig should revitit this list 5 years from now, and see which players make it to the big show. I'm betting a lot of cash on my Baby (D)-backs, Conor Jackson and Carlos Quentin. Not so sure about Cook and Kaplan just yet. Two players who didn't show up on Craig's list, but appear on Robert Dudek's entry--Garrabrants and D'Antona--also looked good at Missoula. There's future for AZ, after all.
_tangotiger - Monday, March 08 2004 @ 01:38 PM EST (#75289) #
Robert, I agree we need to know more about them as fielders. We need to know about their fielding traits, and what kind of fielding impact they'd have at each of the 8 positions in MLB.

I'm disputing that we can simply apply some positional factor to his hitting totals, based on the hitting totals of their positional-peers, to gain much knowledge.
_A - Monday, March 08 2004 @ 01:47 PM EST (#75290) #
Is there some bias against schools in cold weather?
There's a massive disadvantage to any school in the north. Aside from being cheated out of year round baseball because it's such a great game, northern schools won't get to play outside until late march at best and likely not until mid-April. For these schools to be ranked against one another seems awfully unfair because even if they do work out indoors, it's never the same. I'm not sure of the history regarding spring training in Florida and Arizona but my assumption is that teams who couldn't workout at home because of weather conditions built a complex down south (*please* correct me if I've assumed wrong).
_Sandlapper Spik - Monday, March 08 2004 @ 01:48 PM EST (#75291) #
Great job...I'm looking forward to the complete list of players.
_tangotiger - Monday, March 08 2004 @ 01:49 PM EST (#75292) #
For example, let's say that the avg MLB hitter is:
SS: -10
1B: +10

we can apply a SS positional factor of +10, and for 1B of -10, and we get a good way to balance them.

Let's say in college that the avg hitter is:
SS: -2
1B: +2

You might want to apply a SS positional factor of +15, and -20 for a 1B. I don't know what it is. But, I certainly can't just make it +2 and -2.

That's why you can't rely on the hitting positional profile (in College) to establish the proper positional adjustment factors.
Gitz - Monday, March 08 2004 @ 02:01 PM EST (#75293) #
Very compelling, Craig. It's asking a lot, but it would be fantastic to see a list like this from 10 years ago; I imagine teams with real scouting budgets have one, but that doesn't make it any less intereting.

I'm not sure FSU players belong on the list, however. Does FSU even offer classes any more?
robertdudek - Monday, March 08 2004 @ 02:12 PM EST (#75294) #
Well, I would start by applying major league position factors. Then we could modify these based on how these guys are shifted to other positions in the minors.

The idea is to capture agility and speed - these will ultimately determine what positions these college players end up playing in the minors and then the majors.
_DW - Monday, March 08 2004 @ 03:12 PM EST (#75295) #
I'd be reluctant to apply position factors here - many of these guys will end up at different spots on the diamond as they progress (as others have noted). Create the offensive metric, then mentally apply a subjective factor encompassing position, agility, ceiling, defensive skill - blah blah blah.

I'm a little surprised that D'Antona (Russ Branyan II) didn't make the top 50 myself - was it his OBA or WF's park factor that did him in? Oh, and I think Kaplan was a 5th yr senior - making him older than his comp. As others have noted, age matters here.
_Dr. Zarco - Monday, March 08 2004 @ 03:38 PM EST (#75296) #
Is there some bias against schools in cold weather?

As an alum of a northern school who was also pretty good at baseball (Notre Dame was ranked #1 at one point in 2002 and made it to the CWS), I saw how tough it was in the cold weather. Our squad would go on a Florida swing in March/April to begin the season, and play about 12 games in 13 or 14 days. The indoor facility is decent at ND, but nothing can compete with the great weather of the south.

If I were a top recruit in high school (didn't quite achieve that status) I would go south to college where I could play nearly all year round. A rather large recruiting disadvantage for northern schools.
_FJM - Monday, March 08 2004 @ 05:38 PM EST (#75297) #
DW: I'm a little surprised that D'Antona (Russ Branyan II) didn't make the top 50 myself - was it his OBA or WF's park factor that did him in?

I checked him out at Boydsworld. His 2002 stats are there, but nothing for 2003. I guess that explains it.
_FJM - Monday, March 08 2004 @ 05:56 PM EST (#75298) #
Levski: Two players who didn't show up on Craig's list, but appear on Robert Dudek's entry--Garrabrants and D'Antona--also looked good at Missoula. There's future for AZ, after all.

Actually, all 3 of them played at Yakima in 2003, not Missoula. Jackson's OPS was the best by far (.943, compared to D'Antona's .853 and Garrabrants .797). Kaplan did play at Missoula of the Pioneer Rookie League, presumably against lesser opponents, and could only manage a .764. Cook posted a .727, but that was against Midwest League opposition.
_FJM - Monday, March 08 2004 @ 06:03 PM EST (#75299) #
Craig A.: **xRAA is a type of counting stat, and so players who play more games tend to do better.

That explains why Conor Jackson wound up in 41st place, even though his **OWP was 8th best. Wouldn't it be better to rank on OWP?
_tangotiger - Monday, March 08 2004 @ 06:13 PM EST (#75300) #
It would be better to take his LWTS and regress towards the mean based on number of PAs. That'll give you the best true estimate.
Gerry - Monday, March 08 2004 @ 09:29 PM EST (#75301) #
Great job Craig.

I am not familiar with Jim Davenport, can you tell me what is the formula or point me in the right direction?

When you say you normalize to league average does that mean (a) your normalize to the total population in your study, or (b) you normalize to the various college leagues, such as southwest conference?

Ryan Roberts is interesting. Your formula rates him fourth and he had a good season at Auburn (814 OPS). Nevertheless he was not drafted until the 18th round and he is not listed in either Baseball America's, or John Sickels', 2004 Prospect books. Roberts is smaller than most third basemen at 5' 10". Roberts looks like a "Moneyball" pick, good numbers, but does not excite the scouts.
_ainge_fan - Monday, March 08 2004 @ 09:58 PM EST (#75302) #
DW- who's McGraw? The name doesn't ring a bell...
_Ryan01 - Monday, March 08 2004 @ 10:05 PM EST (#75303) #
http://thesundevils.ocsn.com/sports/m-basebl/mtt/pedroia_dustin00.html
I'm pretty interested in another "good numbers but does not excite the scouts" type guy, Dustin Pedroia (teammate of ASU's Jeff Larish). He's a slick fielding shortstop (voted the Collegiate Baseball National Defensive Player of the Year). He doesn't hit a lot of homers, but he's a doubles machine. He set the PAC-10 record with 34 doubles last year and already has 8 doubles (and 3 HR) in roughly 60 AB this season. Hit .404/.472/.579, 36 BB, 13K, in 297 AB in his sophomore year and is off to a very good start in BA's College Player of the year watch.

Scouts doubt if his bat is major league quality due to his smaller size, (5'8", 180) and lack of speed on the bases. Things that JP should be able to look beyond. BA only ranked him the 41st best college prospect in the 2004 draft. It will be very interesting to see where he ends up the draft. His numbers were probably inflated a little playing in Arizona so I'll be interested so see if he shows up in these rankings somewhere. COMN for his bio.
_Brent - Monday, March 08 2004 @ 10:10 PM EST (#75304) #
Wow, amazing analysis. This takes Boyd's AOPS to new levels. It'll be fun to look at the 2004 numbers before the draft. wink, wink.

One thing I wasn't clear on, was the use of OPPWP and OPPOPPWP. I thought the basis of Boyd's rating system was that the rpi (because of it's use of OPPWP and OPPOPPWP) has a regional bias. Why not just use his SOS rating system? Is it because there isn't a tidy relationship between SOS and difficulty of hitting? (ie it's not 15% more difficult to hit against a 115 SOS)

Second, and I'm prepared to get flamed for this, but I percieve that teams east of Baton Rouge are lets say hitting rich, while teams west of Baton Rouge tend to be pitching rich. You did include a disclaimer regarding the "assumption that each team's won-lost record is equally due to offense and defense" but I just want to throw the idea out there that there might be a "directional" bias that come draft time we can mentally adjust for. Predictably most of the top 50 are from Eastern teams.
_DW - Monday, March 08 2004 @ 11:16 PM EST (#75305) #
http://www.coastal.edu/athletics/baseball/2003/Stats/teamcume.htm#TEAM.MLB
Uhhhh... Ryan McGraw was drafted by San Diego, not Toronto - my bad. This, of course, begs the question of why in the hell I thought I'd know from memory which team drafted a 39th round pick. Sorry...

(Anyhoo, he's a 5-7 guy from Coastal Carolina with walks, speed, and defense - an OK prospect. Stats in homepage.)

BA has said (IIRC) that Roberts merited consideration for their Pioneer League top 20 list and named him the top defender that Toronto drafted/signed, so scouts aren't totally down on him. Why he fell that far was and is a mystery to me...
_Jurgen - Tuesday, March 09 2004 @ 02:06 AM EST (#75306) #
I remember reading at Elephants in Oakland some joking about what would happen to DePodesta's mythical laptop when he went to the Dodgers. Now we know.

Great work, Craig.
Craig B - Tuesday, March 09 2004 @ 08:46 AM EST (#75307) #
Why not just use his SOS rating system?

Brent, you correctly identified the reason - it's the fact that unlike "True Opponents' Win %" (my stat derived from OppWP and OppOppWP), SOS doesn't tell you anything meaningful, it's a number that is only comparable to itself.

I'm not concerned overly about the regional bias. RPI is extremely problematic due to the extreme weight is places on strength of schedule, but True Opponents' Win % doesn't do that.

teams east of Baton Rouge are lets say hitting rich, while teams west of Baton Rouge tend to be pitching rich.

I am starting to think that there's something fishy in the park factors that is doing this, and I'm starting to work with some new park factors to try to correct it.
Mike Green - Tuesday, March 09 2004 @ 09:08 AM EST (#75308) #
May I suggest that Craig's piece be added to the "Analysis" section on the sidebar.
_Johnny Mack - Tuesday, March 09 2004 @ 10:08 AM EST (#75309) #
Good idea, Mike Green. I was thinking that as I read.

Craig B, you are a beautiful kind of insane. Thank you.
_RMc - Tuesday, March 09 2004 @ 11:09 AM EST (#75310) #
Fantastic! Any chance I could see the entire data file? I'd love to see how many favourite team's players are doing...
_JEM - Thursday, March 11 2004 @ 06:21 PM EST (#75311) #
where does the 1/4 power come from? i don't get it.

-JEM
Craig B - Thursday, March 11 2004 @ 10:09 PM EST (#75312) #
JEM, the 1/4 power comes from plugging a won/lost ratio (not a winning percentage) back into Bill James's classic Pythagorean Theorem, which states that winning percentage = runs squared divided by (runs squared plus runs allowed squared). Using that as the forumla, if we have a won/lost ratio and we solve the equation, we find that the runs/runs allowed ratio should be the fourth root of the won/lost ratio.
_AgRyan04 - Friday, March 12 2004 @ 02:52 AM EST (#75313) #
http://www.geocities.com/kies19
As a student at Texas A&M (and big follower of our baseball team), I can understand why it seems as if northern schools are at a disadvantage with regards to players.

To take what Dr. Zarco said a step farther....not only is it more likely that a southern school gets the big recruits from up north, but the states of Texas, California, and Florida also take a HUGE chunk of recruits from in-state.

These in-state kids grow up playing ball year round so they're going to have more innings under their belt, which (in theory) would make them better prepared for the college game.

Not only that, but Tx and California are two of the largest states in the nation so the pool is not only more talented, but larger. I could have played D1 ball up north somewhere if I had wanted to (well, I could have been on a roster), but I wouldn't even sniff the lockerroom at tryouts if I had tried to walk-on at a D1 school here in Texas.

Here at A&M we don't recruit very many position players because we can pluck great guys out of the JuCo system (which we have relied on very heavily over the last 2 or 3 years) and they can compete for starting jobs....
_Chris M. - Friday, March 12 2004 @ 02:57 PM EST (#75314) #
FWIW - Ryan Gordon was drafted after last season but chose to stay at UNC Greensboro for his senior season. His numbers through 12 games this season are pretty similar to last year.

Craig, your analysis is fascinating stuff and seems like it would be quite valuable to any team that was trying to honestly evaluate college hitters.
Statistical Evaluations of College Hitters | 58 comments | Create New Account
The following comments are owned by whomever posted them. This site is not responsible for what they say.