“I have not failed. I’ve just found 10,000 ways that won’t work.”
–Thomas A. Edison
We learn as much or more from our failures as we do from our successes; a lesson Edison learned in spades as he worked to develop the incandescent light bulb. Of course, he was also the holder of 1,093 patents, so he certainly enjoyed his successes as well. And while our little dabblings in baseball analysis in no way compare to the work of true inventors, we do share the common experience of sometimes having to engage in significant rework of an idea to bring it just a little closer to where we want it. So this week we revisit the topic of outfield defense and take a slightly different perspective in creating a play by play fielding metric for outfielders. We’ve already got version 1.0 of SFR for infielders, which was discussed last week.
However, before we delve back into the topic of outfield defense, I want to make good on my promise to release some minor league numbers. So at this link you’ll find the same spreadsheet linked to in last week’s article, this time with a new tab that includes all 10,774 2007 minor league player, team, and position combinations. Have fun.
We now return to our regularly scheduled topic.
A Different Approach
After my last foray into outfield defense a few weeks ago I received helpful feedback from readers and the sabermetric community at large in relation to the methodology I employed to calculate the SFR values. For those who weren’t with us last time, we can describe the algorithm described in that column like so:
- First, calculate a baseline for the year and league that includes the percentage of balls that fall for hits and the resulting total bases across the following axes: position, hit type (fly, line drive, pop up, and ground ball), and batter handedness. The resulting matrix will be used for comparison purposes.
- Calculate the same matrix for each fielder for the year and league in question and compare the matrix to the baseline in order to calculate the expected number of runners and expected total bases given the same number and quality of opportunities. Each difference in the number of expected runners is credited at 0.74 runs (0.46 for the hit plus the negative of -0.27 for the out) and each additional total base above and beyond the number of expected runners is credited at 0.33 runs (the difference between the value of a double and a single).
- Because outfielders are much more constrained by their park than infielders are, we need to make a park adjustment. This is done by creating a three-year park factor for each park and position using the same context (hit type and batter handedness) as above. The park factor is calculated by comparing all plays at the park and position in question with all plays in games by the home team when on the road. In other words, the park factor is calculated as the ratio of the rate at which balls are turned into outs and extra bases are gained at home, versus that done on the road over a three-year span. The park factor (not weighted but simply averaged over the three years) is then applied to each fielder opportunity at each park with the result being the adjusted SFR value.
When it was all said and done, for 2007 Coco Crisp was our leader at +35 runs, and Brian Giles found himself on the bottom at -23 runs. Although it wasn’t mentioned in the previous column, the correlation coefficients by position when compared with UZR for 2005 and 2006 were:
Table 1: SFR beta 1 Correlations with UZR 2005-2006
Pos r
Overall 0.54
-------------
LF 0.57
CF 0.66
RF 0.34
The correlations here are lower than that for infielders in part because in this system we don’t have a method for partitioning balls between outfielders. If a ball falls in the gap between a left and center fielder, the fielder (let’s say the center fielder in this case) who picked it up will be debited although it could have been the case that the ball was really in the left fielder’s “area of responsibility” and the center fielder was simply doing his job in backing up the play.
Also, I’ll have to admit that the last row surprised me. I’ll have more on the right field issue a little later, but in the meantime, it should be noted that Sean Smith encountered a similar result in correlating his TotalZone and TotalZone+ systems with UZR.
But as mentioned above, some readers had questions and reservations with this approach, with the primary issue being how the park factors were calculated and applied. Of those issues, probably the largest complaint was that by calculating the park factors using all home and road games for a particular team associated with a park, a large percentage of the resulting factor will be influenced by the home team’s regular fielder at that position. For example, if Manny Ramirez plays the vast majority of games in left field for the Red Sox from 2005 through 2007, then the park factor for 2007 is nearly 50% based on the actions of Manny being Manny.
In order to correct for this and to streamline the calculations, I created a second beta version of SFR for outfielders that uses the following approach:
- First, rather than create a baseline matrix as described above, this version takes to immediately calculating the percentage of balls that become hits and the cost in terms of total bases for each fielder, park, position, hit type, and batter hand combination for the year and league. Everything is then calculated from this raw data. In order to prep the data for the next step, here we also run this for a five-year span surrounding the year and league we’re dealing with (e.g. for 2005 we run data for 2003 through 2007).
- Next, we calculate the expected runners and expected total bases, along with the run values for the differences by comparing how the fielder performed in each permutation of park, position, hit type, and batter handedness to how all fielders (not including himself) did in that combination in a five year span centered on the year in question. That’s a mouthful, but what it means is that rather than using a park factor based largely on the fielder who calls that park home, we now effectively calculate an individualized park factor for each fielder at each park for each hit type and batter hand.
This approach not only removes the bias that may be present by over-representing a particular fielder in a particular park, but it also makes it more likely that an outfielder who is especially good at working with the eccentricities of his park will get the appropriate credit. You’ll also notice that this approach no longer requires data from road games played by a team that were formerly used to create the ratios for the park factor, nor does it require a baseline matrix, since the fielder is compared to all other fielders who have fielded balls at that park. The cost of making these changes is that they reduce the sample sizes upon which the “park factors” are based. However, we’ve made up for that somewhat by including two more years of data when available.
- Finally, we’ll take one additional step. Because we no longer first compare individual fielders with a baseline for the year and league, we’ve lost a bit of the seasonal context. As a result, we then make a correction that ensures that the totals for the league and position equate to zero. In this way, each player’s SFR score is relative to a baseline of zero for the year and league and can be thought of as runs above or below average for the position.
Software developer geek alert: Although there is far less code involved with this approach (from over 300 lines in the first version to just under 200 in this one), the amount of number crunching has increased because of the need to create the “individualized park factors.” Running the numbers for 2003 through 2007 took about three hours on a 2.33 GHz Core 2 Duo laptop with 2 gigabytes of memory.
So after all of that processing, Table 2 includes the new leaders and trailers in “Outfield SFR beta 2.” The Runners column is the number of runners who reached base on balls fielded by the fielder, TB is the resulting number of total bases, DRunners is the delta in terms of runners, and DTB is the delta in terms of total bases.
Table 2: Top and Bottom Ten 2007 Outfielders By Position Player Pos Balls Runners TB DRunners DTB SFR Covelli Crisp Center 709 301 374 47 79 39 Carlos Beltran Center 708 319 394 27 50 21 Grady Sizemore Center 791 394 474 23 44 17 David DeJesus Center 757 357 448 20 37 14 Nook Logan Center 444 196 238 16 31 13 Felix Pie Center 198 78 92 14 25 12 Alfredo Amezaga Center 381 173 216 14 29 12 Nyjer Morgan Center 148 64 74 10 17 9 Vernon Wells Center 619 298 378 15 23 9 Melky Cabrera Center 680 333 412 17 21 8 -------------------- Juan Pierre Center 747 381 462 -3 4 -7 Torii Hunter Center 777 388 482 0 -1 -7 Dave Roberts Center 436 212 267 -4 -6 -7 Tike Redman Center 161 83 101 -10 -9 -8 Mike Cameron Center 742 377 461 -3 -6 -9 Hunter Pence Center 493 233 315 -1 -15 -10 Aaron Rowand Center 778 386 503 -1 -8 -10 Ryan Freel Center 281 145 183 -10 -11 -10 Chris Young Center 729 375 472 -9 -3 -11 Nick Swisher Center 295 156 203 -8 -20 -12 Billy Hall Center 592 297 386 -6 -20 -14 --------------------------------------------------------------------------- Matt Holliday Left 663 367 477 28 48 26 Eric Byrnes Left 460 221 287 18 29 16 Jason Botts Left 104 41 55 14 18 12 Ryan Church Left 401 205 262 7 26 11 Reggie Willits Left 296 145 187 12 18 10 Alfonso Soriano Left 546 301 395 6 22 9 Adam Lind Left 278 141 177 6 17 8 Carl Crawford Left 619 332 424 4 19 7 Norris Hopper Left 76 39 48 7 13 7 Rob Mackowiak Left 218 120 151 6 13 6 Andre Ethier Left 161 88 106 5 14 6 -------------------- Barry Bonds Left 369 207 276 -10 -14 -9 Josh Willingham Left 558 347 458 -11 -13 -10 Jason Bay Left 682 416 551 -8 -16 -10 Adam Dunn Left 604 359 483 -9 -16 -10 Manny Ramirez Left 463 281 387 -8 -19 -10 Luis Gonzalez Left 462 270 370 -6 -23 -11 Moises Alou Left 336 198 282 -6 -30 -13 Pat Burrell Left 468 292 386 -17 -24 -16 Raul Ibanez Left 571 347 470 -9 -37 -17 Chris Duncan Left 401 243 321 -22 -30 -19 --------------------------------------------------------------------------- Jeremy Hermida Right 505 258 314 12 45 18 Luke Scott Right 388 191 241 20 31 18 Austin Kearns Right 717 342 447 19 30 16 Vladimir Guerrero Right 414 206 262 13 29 14 Carlos Quentin Right 270 132 183 13 16 10 Magglio Ordonez Right 538 277 373 19 10 10 Andre Ethier Right 382 205 253 5 23 8 J.D. Drew Right 426 214 269 8 15 7 Gabe Gross Right 137 67 85 7 14 7 Nick Swisher Right 193 84 114 10 8 6 Corey Hart Right 470 217 288 8 11 6 -------------------- Nate Schierholtz Right 99 51 70 -8 -12 -8 Ken Griffey Jr. Right 615 324 414 -11 -6 -8 Shawn Green Right 406 203 270 -7 -13 -8 Michael Cuddyer Right 568 312 420 -7 -14 -9 Jack Cust Right 190 111 157 -5 -22 -10 Jose Guillen Right 616 349 445 -14 -13 -11 Juan Encarnacion Right 288 163 218 -13 -18 -12 Bobby Abreu Right 631 318 434 -5 -27 -13 Trot Nixon Right 288 159 222 -9 -27 -13 Mark Teahen Right 671 353 496 -17 -37 -21 Brian Giles Right 472 256 358 -17 -42 -22
Coco Crisp still takes the top spot and gains four runs in the process, while Carlos Beltran, David DeJesus, and Grady Sizemore continue to look very good. Melky Cabrera falls from +15 to +8, and Nook Logan holds his own. At the bottom, the soon-to-be-infielder-again Bill Hall claims the title at -14, taking over from Chris Young, who moves up a few spots to -11. Nick Swisher (who at -12 in center fared much worse than he did in right field at +6) and Ryan Freel (-10) now also make appearances in this unenviable list. In left field Matt Holliday (+26) continues to shine, as does Eric Byrnes (+16), with Jason Botts (+12) making a surprise appearance. None of those at the bottom of the list will come as a surprise (and yes, for all of those readers who couldn’t believe Manny wasn’t near the bottom, I hope you’re satisfied with his -10) as Chris Duncan continues to lead them at -19. In right field, Jeremy Hermida (+18) and Luke Scott (+18) continue to shine while Carlos Quentin (+10) and Gabe Gross (+7) sneak into the mix. At the bottom of the right fielders we find Brian Giles at -22 (more on him in a moment) while the rest of the list is filled with familiar faces.
In order to get a feel for how this second beta version compares with UZR I ran regression on all outfielders who fielded 500 or more balls from 2003 through 2006. The summary results are found in Table 3.
Table 3: SFR beta 2 Correlations with UZR 2003-2006, players with 500 or more balls fielded
Pos # r
Overall 149 0.64
--------------------
LF 49 0.75
CF 50 0.78
RF 50 0.24
Ouch. While we were looking pretty good at left and center with the correlation coefficients approaching what we saw for infielders, things kind of fall apart in right field. A plot of the correlations follows in Figure 1.
Figure 2. SFR vs. UZR, >=500 Balls Fielded for 2003 through 2006
From this graph you can see how right fielders are more scattered than the other positions with some real outliers in Brian Giles and Juan Encarnacion; Magglio Ordonez is the other blue dot almost directly below Giles. Their totals (across all positions they played are shown below.
Table 4: Brian Giles Year SZR UZR 2003 +0 -3 2004 +1 +24 2005 -20 +20 2006 -2 +23
Both 2005 and 2006 show huge differences in how the systems treat Giles and, as noted above, SFR thinks he’s no great shakes in 2007 either. Having watched him play in person many times in the past several seasons, I do find it hard to believe he could be worth anywhere near +20 to +25 runs, but then again he also doesn’t seem like a -20 type fielder either.
Table 5: Juan Encarnacion Year SZR UZR 2003 -17 +14 2004 -4 +4 2005 -19 +9 2006 -3 +4
Encarnacion is also consistently at odds, as SFR rates him negatively every season, while UZR likes him. Once again, there is nothing obvious that jumps out at me that would indicate that one is clearly right and the other wrong.
These two players account for the bulk of the variation in the two system resulting in the low correlations. It would be interesting to hear theories as to why these two players (and to a lesser extent the flip side of Jose Cruz Jr. and Ichiro Suzuki with an SFR of +53 and UZR of +17) would differ so strongly. Is there a positioning issue? Are there particular park effects that are not somehow being accounted for? I’ll admit that as with the issue of first baseman and UZR I’m a bit stumped at the moment.
Moving on to happier things we can note some pretty strong correlations in left and center field, so we’ll throw few out there so that you can see how they compare.
Table 6: Manny Ramirez Year SZR UZR 2003 -2 -9 2004 -3 -11 2005 -20 -47 2006 -15 -30
Although I didn’t think it possible, SFR now hates Manny almost as much as UZR.
Table 7: Ken Griffey Year SZR UZR 2003 -8 -13 2004 -16 -25 2005 -34 -41 2006 -25 -34
It’s too bad we don’t have numbers on Griffey in his prime, but the older version finishes dead last in both seasons before his merciful shift to right field in 2007 (where he still fared poorly but didn’t cost his team as much).
Table 8: Carl Crawford Year SZR UZR 2003 +3 +6 2004 +23 +27 2005 +23 +11 2006 +4 +0
Crawford shows a peak of sorts in 2004 and 2005 and a decline in 2006 and 2007, but overall the totals are very close.
Table 9: Grady Sizemore Year SZR UZR 2004 +9 +9 2005 +24 +14 2006 +23 +22
Sizemore is now consistently rated the best center fielder in SFR with his +17 finish in 2007; UZR agrees.
To finish up today let’s take a look at the top and bottom 25 outfielders from 2003 through 2007 in terms of SFR by position. Obviously, we haven’t yet calculated a rate statistic, but could obviously do so using the number of balls fielded.
Table 10: Top and Bottom 25 Outfielders for 2003-2007 by Position Player Pos Balls Runners TB DRunners DTB SFR Grady Sizemore Center 2497 1213 1461 68 141 73 Jose Cruz Jr. Right 1613 762 971 68 136 69 Carl Crawford Left 3160 1638 2154 65 115 63 Carlos Beltran Center 3531 1638 2001 44 145 61 Ichiro Suzuki Right 2588 1242 1606 56 92 53 Andruw Jones Center 3718 1800 2265 62 95 52 Garret Anderson Left 1760 899 1134 37 90 42 Austin Kearns Right 2244 1071 1416 49 60 40 Covelli Crisp Center 1929 924 1161 47 73 39 Vernon Wells Center 3455 1743 2167 29 93 38 Covelli Crisp Left 928 465 607 37 65 37 Matt Holliday Left 2276 1289 1726 41 52 35 Nook Logan Center 1291 585 719 31 68 34 Mike Cameron Center 2961 1376 1689 40 73 33 Reed Johnson Left 1138 597 750 21 70 32 Alfonso Soriano Left 1221 650 846 22 61 29 David DeJesus Center 2078 990 1241 33 56 29 Alexis Rios Right 1945 1021 1295 11 75 29 Eric Byrnes Left 1376 698 923 33 43 28 Vladimir Guerrero Right 2475 1244 1631 29 52 27 Ryan Langerhans Left 617 322 419 28 38 24 Jeff Francoeur Right 1669 894 1180 27 33 23 Reggie Sanders Right 946 462 574 12 56 22 J.D. Drew Right 1957 1000 1303 24 41 21 Laynce Nix Center 995 516 605 9 47 21 ------------------------------------------------------------------------------ Matt Stairs Right 468 268 375 -16 -40 -21 Juan Gonzalez Right 373 222 295 -23 -34 -22 Chipper Jones Left 631 394 522 -20 -35 -23 Marlon Byrd Center 1558 809 1039 -21 -34 -24 Michael Tucker Right 848 461 638 -18 -45 -24 Reed Johnson Right 690 385 526 -20 -49 -25 Craig Biggio Center 985 525 685 -21 -46 -28 Shawn Green Right 2117 1120 1522 -19 -62 -28 Ryan Klesko Left 767 426 595 -24 -61 -28 Carlos Lee Left 3126 1741 2334 -28 -49 -29 Mark Kotsay Center 2776 1383 1773 -18 -62 -29 Moises Alou Left 1692 961 1307 -23 -58 -30 Adam Dunn Left 2863 1638 2243 -17 -69 -30 Aubrey Huff Right 893 458 623 -28 -64 -31 Luis Gonzalez Left 2662 1533 2018 -41 -44 -33 Bernie Williams Center 1628 853 1080 -29 -64 -35 Bobby Abreu Right 3122 1635 2179 -24 -70 -35 Marquis Grissom Center 1514 763 990 -27 -64 -35 Hideki Matsui Left 2367 1338 1748 -38 -68 -39 Brian Giles Right 2441 1310 1788 -28 -81 -39 Pat Burrell Left 2542 1473 2003 -28 -86 -41 Juan Encarnacion Right 2415 1278 1735 -35 -98 -48 Manny Ramirez Left 2541 1537 2099 -50 -90 -51 Raul Ibanez Left 2506 1417 1906 -35 -110 -52 Ken Griffey Jr. Center 1653 876 1153 -84 -157 -83
Baby Steps
I believe this second beta version of SFR makes some nice improvements over the previous version, with the added benefit that it is simpler from a conceptual and code perspective. As noted in the case of right fielders, though, there are still issues to be explored, but I’m sure that with the “wisdom of crowds” at our back here at BP, we’ll continue to make progress.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now