Welcome
Hello, new readers! Welcome to our new series, Fantasy Baseball Do-It-Yourself (DIY).
Do you love fantasy baseball, but find that those annual magazines and websites don’t actually help? Have you long ago left behind stodgy rotisserie fantasy for more cutting edge leagues? Are you willing to spend a little time learning to use Excel to give yourself a competitive advantage in your league? No? Well, you can stop reading now as long as you vote for me. But if the answer is “yes,” we here at Fantasy Baseball DIY are going to show you how to build your own expert analysis that is optimized for your league’s scoring system.
What are you going to need to build this statistical dream house? The requirements are:
Raw Materials: Access to the vast array of baseball information out there, especially player projection data (a Baseball Prospectus subscription will more than suffice).
Tools: Excel (or Access) is enough to do the trick.
A Little Know-How: Some basic data manipulation skills, such as how to use Pivot Tables and the VLOOKUP functions in Excel. If you don’t, consider an off-season workout regimen to prepare for 2010.
Desire: A willingness to get your hands dirty, programmatically speaking.
Key decisions
Throughout the course of the season, the fantasy owner makes three key decisions:
- Whom do I draft?
- On a given night, who is active and who is reserve?
- What mid-season rosters moves do I make?
While all are important, I’ve been able to win leagues by simply drafting a better team at the beginning and doing the least amount of mid-season roster moves. Similarly, I’ve seen enough owners mess up the draft badly enough that no amount of mid-season wheeling and dealing can dig them out of the hole they have dug. At Fantasy Baseball DIY, we will cover all these questions, but, for the first few installments, we will focus on the draft.
League Context
While the techniques that we present can be adjusted to match the scoring and settings of your league (hey, that’s the point of Fantasy Baseball DIY), I’m going to use my favorite league as an example. I play in a non-keeper head-to-head league of 20 teams, followed by a three-week playoff for the top six teams. We select 24 players on draft day, but, on any given night, the active roster consists of 10 position players (one at each fielding position plus two utility players) and 7 pitchers (two must be starters and two must be relievers). As for scoring, there are six offensive categories (R, HR, RBI, SB, OBP, SLG) and six pitching categories (W, SV, K, WHIP, ERA, K/BB). The selection of players is a live draft, not an auction.
Marginal Positional Value
As most fantasy baseball owners know, the key concept of player valuation is determining a player’s marginal positional value (MPV) compared to other players at the same position. Although Albert Pujols has better overall numbers, his MPV (compared to other first basemen) is lower than the top players, like Hanley Ramirez, at the traditionally weak offensive positions (SS, 2B, or C).
The first step to calculating a player’s MPV is to estimate the relevant stats for all likely drafted players in the upcoming year. By being a Fantasy DIYer, you can use whichever projection system you like (PECOTA, CHONE, Marcel, ZiPS, an average of many systems, etc.). There’s no need to create your own projection system (unless you want to). After all, Bob Vila doesn’t make his own bricks. He just uses quality raw materials from a supplier he trusts. Personally, I like combining two BP data sets, the downloadable PECOTA projections and the depth charts, which refine the expected playing time.
Note to Baseball Prospectus: a great addition for next year would be a downloadable file of the plate appearances and innings pitched used in the depth charts, preferably with the HoweID key for each player for easy joining with PECOTA projections.
Baseline Calculation
The first step in determining MPV is to calculate a baseline for each position. To do this, we take the average of each player that:
- is eligible at that position and
- is projected to get over 200 plate appearances.
The table below shows what we projected the baseline numbers for each position at 2009 to be:
Position PA R HR RBI SB OBP SLG 1B 519 69 20 75 3 .358 .472 2B 461 58 10 50 9 .337 .412 3B 445 56 15 60 5 .340 .448 SS 441 53 8 45 10 .331 .395 C 382 43 11 47 2 .333 .419 RF 470 62 15 61 7 .345 .450 CF 460 62 12 52 15 .339 .425 LF 466 62 17 63 8 .345 .458 Avg 453 57 13 56 8 .341 .435
Based on a team that has each position player plus two utility players (which I fill in with an average player), my baseline team stats are:
PA R HR RBI SB OBP SLG Team Total 4552 578 132 465 74 .341 .435
Examples of MPV Calculation
MPV is calculated by determining how a given player, if inserted into the baseline roster, changes the team’s final statistics in each category. Let’s take two examples, both third basemen, to show how to do the MPV calculation: David Wright and Russell Branyan. Based on the PECOTA projections and depth charts from late February (my draft this year was early March), the projections for each of these were:
Player PA R HR RBI SB OBP SLG Branyan 463 61 26 75 6 .335 .491 Wright 694 119 32 106 20 .400 .538
By inserting Branyan or Wright in the baseline team’s lineup and removing the “average” third baseman, we calculate the improved scores of the baseline team. As our more astute readers will notice, we are doing similar “first principles” calculations that lead to the creation of stats like MLVr and VORP, but geared for the scoring of the league, not for increasing run production in the major leagues. The table below shows the new team totals with Branyan (first row) and Wright inserted (third row). The second and fourth rows show the percent improvement in each category over the baseline.
R HR RBI SB OBP* SLG* MPV Team+Branyan 581 143 580 75 .340 .441 Branyan Improvement 1% 8% 3% 1% - 2% 7% 18 Team+Wright 639 149 611 89 .350 .450 Wright Improvement 11% 13% 8% 20% 14% 18% 84
The MPV column is simply the sum of the percent increases in each of the scoring columns, since in this league each category counts equally. Also, we calculate the MPV of a player if they were inserted into the utility slot as well. As the draft proceeds and if all of a player’s eligible position spots get filled, he will likely become a utility player. When this occurs, the MPV compared to the average utility player is a better player ranking.
Note that the percent improvement calculation for OBP is compared to increasing the OBP over a reasonable low-level of .275. We do the same for SLG percent as a percent increase over .350. I find that these help put these rate categories on a similar scale to the counting categories.
Personally, I also note if the MPV for a player is heavily weighted by one single category, such as SB for hitters or SV for pitchers, where the majority of points scored come from just a single player or two. A player like Jacoby Ellsbury may be overvalued, because most of his benefit will come from stolen bases. If your team also has another stealing threat (like Jose Reyes), the additional benefit of Ellsbury is not as great.
After calculating the MPV for every player, we create a sorted list for each position. In the position list, we include all players that are eligible at the position or will likely be eligible. For example, Russell Branyan was not eligible at 1B to start the season, but it was pretty certain in Spring Training that he would likely be the starting first baseman. Next to their name, we put down the MPV score for that player in that position as well as his utility role MPV. When it is our turn to select in the draft, we simply look at the top of the list of all positions and select the player with the highest MPV on any position list. See the table below for an example of the top of the lists at a few positions this year.
1B 2B SS Pujols,A. (76,91) Utley,C. (59,52) Ramirez,H. (104,91) Berkman,L. (44,58) Kinsler,I. (58,52) Reyes,J. ( 98,91) Howard,R. (39,53) Roberts,B. (51,44) Rollins,J. ( 72,59) Cabrera,M. (36,50) Phillips,B. (43,37) Tulowitzki ( 31,18)
There are two distinct decision-making processes regarding the draft. We have discussed the first, which is the preparation before the draft which lends itself to a sabermetric-type analysis. The second is the adjustments you make as the draft unfolds. The drivers that affect your adjustments are more like a poker game which is about reading your opponents, understanding what they are trying to accomplish, and making tweaks to your own strategy to compensate. If you’ve done good prep work, the amount of tweaks you make will be minimal.
Next Time on Fantasy Baseball DIY
By looking at the counter in Microsoft Word, it seems that I’m out of words today. Please keep your eye posted for the next installments of Fantasy Baseball DIY, where we will:
- Describe the MPV calculation for pitchers
- Adjusting the MPV calculation to maximum bid prices for auction leagues
- Incorporate team health reports and Beta values from the PECOTA projections in our MPV calculation so that we understand the risk in each of our scoring categories.
Good bye and hope to see you next time at Fantasy Baseball DIY.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Tim's piece is, to my mind, very good. He explains a *very* valuable fantasy tool clearly and with ease. Especially for fantasy players who are relatively new to the game, this article has the potential to have a pretty dramatic effect on how they approach a season. It's hard to see how that doesn't meet the goal of the week's competition. While the more technically naive among us will have to do some work to make use of Tim's method, it's not too much work: given the tools available, calculating percentages (basically) is easy.
So, I don't get Will's problem: Tim compliments BP readers by assuming they're smart, but he doesn't go so far as to assume they're degree students in a statistics program. As I understand, that's been the tone BP has been trying to strike for as long as it has been around. What's the complaint?
Thumbs up.
But it's a bit discouraging (and puzzling) to see him so hell-bent on criticizing every article that is even the least bit statistically sophisticated. I'm all for analytical diversity at BP, but I actually think there is currently a shortage of top-shelf statistical analysis among the regular contributors to the website and this contest could be a great opportunity to hire someone who can make up for that deficiency.
For this piece: I don't like the tone at all but the analysis is great. When I considered entering this competition, I thought that I'd write a similar piece on how I use various spreadsheets for my fantasy drafts. Tim, you've vindicated my decision to not write it! Not only do you articulate things much better than I ever could, you provide new tools that I'll definitely try to use in the future.
Thumbs up.
For those who only draft people from their local team or their favorite players, then neither this kind of analysis nor method will work for them. But then again, the people who draft along those lines probably won't come to BP to read about better drafting techniques.
Of course, the real draft wizards will also bring a copy of Will's latest UTK and a list of those he says are TRIPping. :) The nice thing about that is all I have to look for is the number of days the player's theorized to be out and whether the injury issue will continue to affect the player's performance.
I absolutely hear what you are saying. I am not the most stats driven person. I have to work my way through research articles. However, I think you have really taken the non-stats analysis stance thus far to a new extreme. Any type of math you have seemed to make a comment against. This seems odd to me on BP, just for the record.
That said, I liked the writing on both articles. I just think there's a 'sweet spot' like what Matt Swartz did this week. It's a VERY technical article and yet is readable for someone like me.
I'm probably more of a casual reader than the majority of the BP reader base but this article missed the mark with me since I'm looking for something that is more easily digestible and that I don't have to spend time working toward the conclusion away from the actual article.
This was a great article. I had never really considered putting together my own fantasy program until this article and it gave me a useful framework to use. I also probably wouldn't have used Excel since I typically use other statistical programs, but Tim told me the exact commands that I would use in Excel. I'm much more likely to use Excel now. Should he have detailed the VLOOKUP command? Definitely not. I'm happy to research that myself. There really wasn't anything complicated in the article so I don't understand the complaint.
(obscure grade school arithmetic reference(
However, you do realize that BP is kind of the "opera" of baseball analysis, right? I understand your point that articles need to be accessible, but I pay for a BP subscription instead of reading a free si.com/espn.com because I want something more. Heck, I love your injury column, but that information certainly isn't "pop," know what I mean?
Perhaps, the Japanese and Spanish speaking sabermaticians aren't reaching us, because they aren't writing in English.
As far as the other leagues go, I would think people from other countries that follow sabremetrics would've picked up a bit of english. I also acknowledge that the math itself is pretty universal regardless of native language.
The Japanese are probably way ahead of us on computers, but look at baseball differently than we do – generally. To them, baseball is more of an audience participatory event. Although, they are enthusiastic strategists. Perhaps, Japanese just don't make time in their lives for such hobbies as studying baseball stats.
I don't know, I still think it is more of a language issue mixed with it being a cultural issue. People are people. Bill James started the Sabermetric explosion here in North America thanks to that SI article on him in 1982(?). I'm sure there are thousands of young fantasy players and Sabermetric baseball fans who never read him, but his popularity perfectly coincided with the exponential growth of Rotisserie leagues. That snowballed thanks to USA Today, then everyone's access to websites and the ease of website building. At least, that's my take on history. The Japanese are generally xenophobic. The Sabermetric snowball started in the U.S. from an American who writes in English. It may just not have rolled over the Japanese, yet.
Either that, or their work is in Japanese (or Spanish, Dutch, Korean, or Mandarin) and we haven't read it.
And I thought this article was brilliant. Easiest vote this week.
Given that for a fantasy draft we only need to evaluate under 2000 players, I'll take the quick mark-up of Excel. However, if we're doing some Retrosheet work, than Excel is going to be useless due to its size limitations, and I'll pop over to Access.
Thanks for the compliments!
Tim's contribution addresses the 2nd of these.
Many of us are always on the lookout for ways to place overall weighted values on players.
I've developed a number of Excel-based algorithms over the years to address issues just like the ones he raises.
Tim's approach seems quite useful to me and I give him the thumb!!
Side note: Pivot tables are WAY easier to create and use than most people realize. Find 15 minutes, a dataset you're interested in and someone to walk you through them. You'll be thankful you do.
A step-by-step article for people who haven't been to math grad school would be really helpful.
Since I haven't had that kind of training (though I'd like to), I agree with Will that Tim's article lost me at the VLOOKUP line. I read through to the end and appreciated Tim's insight. But telling readers that "you can do it" without telling them *how* to do it is, imho, a mistake.
Still, he did make the idea of MPV quite clear, and that's good.
Similar to a "This Old House" or the dozens of home improvement shows, there are sometimes things they show me that I just don't have the proficiency to do given my current set of skill, however, it does give me ideas of what skills will be helpful for me to develop in the near future. My use of the terms of Pivot Table and VLOOKUP functions (which would typically be around Lesson 9 in a 12-part Beginning How to Use Excel course) were not just simply a throw-away for me to look smart or whatever, but a way for me to ground the article for the readers to gauge their abilities to see if what I was proposing was in their skill set or not.
An article (or a few articles) on Excel for the Fantasy Owner would be great, but for the purposes of this contest (in terms of a 1500 word-limit) I tried to shoot for where I thought the middle would be in terms of technical proficiency. I may have missed my mark.
With that said, with a few days of work, I could easily write a quick step-by-step tutorial of how to do what I did in this article. Maybe offline somehow (or through this comment section), people can put down an e-mail address and I'll put something together and send something off for people who are interested.
If you'd be willing to do a service to everyone in the community who could use the assist, just throw me your tutorial, and I'll run it in Unfiltered, where it'll be published in the free and clear for anyone to use, and where you'll be free to add on comments and respond to (perhaps inevitable) questions. Not that I can speak for the audience at large, but I would definitely appreciate it if you were game for something like this.
Christina
I'm completely game for writing a tutorial on this, plus attaching a Basic Excel spreadsheet with the Pivot Tables and VLOOKUPs etc that is at least set up initially.
Keep in mind that a standardized sheet might be hard because the input data can vary from one database to another... for example, you might set your database to vlookup to home runs from the PECOTA download, but a different database might have home runs in a different column.
Very minor quibbles:
Many people consider Pujols a top three or top pick in almost any format. I think you needed to elaborate on why you think he is not a top three pick or else you'll lose a lot of readers.
It might have helped the word count to just send the "Note to Baseball Prospectus: " to customer service. Either way, it's not a good idea to mention a word count as being a limiting factor.
Since some statistics are more scarce than others, I think you could've discussed the possibility of the reader to add a weight to MPV by adding in a multiplier to a category like stolen bases.
I know one of the finalists got some negative comments for using a title and subtitle to start up an article theme but I didn't mind it there, nor here.
A bigger quibble
I'll admit I'm a bit confused which "team" you are using to compare Branyan/Wright.... I assume it's the "Team Total 4552" line, but then I don't see how the difference between a team+wright OBP of .350 and a Team Total OBP of .341 comes out to a "14%" difference. Is that a .009/.350 or something?
Personally, I found the article easy to understand and think I would've understood the methodology behind MPV even if I didn't know Excel. I think the tone of the writing was strong, fun, and a pleasure to read. I like how you indicate that databases would be downloaded. I would even recommend that you could create a mock spreadsheet to attach to the article that people could download. That way, you spend more time discussing MPV without scaring people off with Excel. Even then, the concepts discussed were easy to understand and most of the examples (besides the math on the Wright/Branyan section) were well presented. I also like that you defined the kind of league you used based on style, roster size, and categories used.
Another great job, and another thumbs-up from me.
On the "word count", not sure if you are referring to my closing in the article or in one of my comments. Obviously as my closing, I was trying to mimic a TV show when they always (annoyingly) say "Well, by the clock on the wall, it seems we are out of time"). As for one of my comments, I was simply referring to the fact that also including an Excel tutorial is going to be a 5000 word thing (also with lots of screen shots to help). I was just saying more of "That's a completely different kettle of fish" than a 1500 word article.
Once again, thanks for your comments, and I can't wait to see your updated Idol Hit List on Tuesday...
The actual word count line I was referring to was "By looking at the counter in Microsoft Word, it seems that I'm out of words today." but I can see what you mean. I think you can get away with that a bit easier once the competition is over, but in the context of a competition, it almost comes across as an apology. As I said though, with your further clarification, I understand what you were driving at.
The Hit List thing was fun and I'll probably do it again :) I'm glad you liked it.
http://baseballprospectus.com/article.php?articleid=8942
The Team+Wright or Team+Banyan comparison is to the baseline team (The Team Totals) line. I really could have made that a bit clearer.
So the next questio is how is .009/.350 = 14%? It's not, but as I said, I'm comparing to a "zero point" of .275 (another thing I could have made a little clearer) so it's really more like (.350 - .341) / (.341 - .275) = 13.6% ~ 14%.
Based on this comment and those of ryneestabrook, I definetly could see some benefit to maybe rezeroing the whole thing compared to a baseline of some type of replacement. I'm always a little reluctant to change as this has served me very well over the last few years.
As far as I can tell much of the comparison between Wright and Branyan was based on at bats which resulted in much of the difference as clearly in a daily league a replacement level player even in those spots for 200 ABs would have greatly boosted the rate stats.
Secondly I do have a problem with using this value as the only tool as first of all you do not necessarily maximise the value of your team by taking the best ranked player and it would be better to use this system to tier the players and look where the drop off from one player to another is which isn't necessarily the same as the one with the highest score.
Overall though I thought the stats were fine and followed through well but I did feel that perhaps more effort could have gone into what happens if you take say Josh Hamilton over Ian Kinsler say for the rest of the draft to give people a flavour of how it was used in practice.
So some critisicm but I enjoyed the article and voted it up
But those 200 ABs are important. Even in my league where you have 7 bench players, I like to fill these up with extra starters so for a week where I'm slipping on wins or Ks I can rotate in a lot more straters.
If I've got two position players and one is going to get 200 less ABs which show a decrease in rates, that's possibly one more position player forced to be a 3B (or 1B) on the bench. I guess I'm someone who really likes his bench to load up on pitchers.
For the unitiated, pivot tables allow a very quick summary of data based on fields you choose. Vlookups are a way to link to datasets together when you have fields in one that you want to use with fields from another, or if you simply want to combine the data. Both are relatively simple to use and worthwhile to learn if you have any inclination to do statistical analysis - they will make the task much less daunting.
I use pivot tables in particular frequently to help in my Diamond Mind league - help decisions around lineup construction (both who to start and in what batting order), what opposing hitters to pitch around, and pinch hitting and bullpen decisions. Extremely valuable tools.
As a quick example, consider the value of a full-time but poor-hitting catcher in either a one-catcher and a two-catcher 12-team league. In the one-catcher league, this catcher should have very little if any value, as the replacement-level catcher in this league is probably full-time (or close) and similarly useless without his chest protector. In a two-catcher league, this player is relatively valuable, as whomever you can find on the waiver wire is likely a part-time or reserve catcher.
This issue will also come to a head when there are differences in the variances of performance across positions, when one player is hands-down above the rest at his position, and a variety of other situations. The rest of the article was good, but if a fantasy player followed your advice in any league where the average (200 PA+) player was too far off from that league's replacement level, they'd just plain draft the wrong players.
#1) If you draft a catcher, then I suggest, that you should immediately change the MPV you look at not compared to catcher, but to average players.
#2) In your specific example of catcher, one of the things that happens is that a lot of the players that come to the top are going to be your better catchers, so that you don't get into the position of having to choose the full-time, poor-hitting catcher. For example, in my 20-team league, I had the 7th and the resulting 34th pick. At the 34th pick, I noticed catchers were not being taken, so I was choosing McCann because his MPV was so high. Once I took McCann though the remaining catchers dropped significantly because I started using their MPV compared to the average utility player.
#3) So I guess if you think that the qualifying versus replacement level, your recommended fix (which is pretty simple) is to change the qualifying PA so that the number of players who make-up the qualifying list is roughly equal to the number of players who would be drafted? Am I correct in thinking that?
This is of importance when comparing players across positions (within any position, there is *relatively* little impact). Pretend its late in a draft, and I only have two spots to fill: 2B and catcher. Which ever position I fill last should be almost exactly replacement level, so assume that player/position has no value.
Whatever valuation system I use should tell me who to draft next; the best available 2B or the best available catcher. Regardless of the number of catchers each team must roster, the 2B ratings won't change. If I'm in a one catcher league (and thus have not drafted a catcher yet), the choices at catcher should have much higher MPVs than the choices in a two catcher league (where I've already drafted one of my two catchers). The problem occurs that MPV doesn't move with the number of players on rosters in each league, and thus doesn't provide a true zero point to compare players across positions.
Regarding your third point, I think the use of the mean is part of the problem. In a 10 (or 12) team league, replacement level for any given position should be the 10th (or 12th) best player at that position, with caveats made for utility and other flexible position spots. I'd either move to a non-parametric approach (i.e., using ranks), or at least adjust the MPVs for each position such that a player with no value at each position (10th best player at a position in a 10 team league, assuming no players at that position get used in a UT spot) has a value of zero.
My personal experiences (maybe yours are different) have been the mid rounds are the key, with an occasional issue to the top few rounds (when someone chases numbers and not MPV). In the first 3-4 rounds, you're almost always drafting solid players, so it's not a big issue. The only bad decisions I've seen in the first 3-4 rounds is when someone panics when the top SPs are off the board and they reach for a mid-level SP that isn't worth it, however, the mid rounds are HUGE because that's when some good ranking system tied to your league's specific scoring is worth its wait in gold. This is when the lesser owners draft names or numbers, but aren't thinking as much about position.
The one issue I have with replacement in fantasy league which is a little different than for VORP is a matter of sequence.
There is a little bit of a chicken and egg here, NO? How do I know who is the 10th best catcher in the league necessarily, until I do SOME type of valuation.
Also let's take an example of a 2B to determine the replacement level, but which is the replacement level, the poor hitter with 18 SB but has a .310 / .375 OBP SLG (a la Willie Bloomquist) or the one who has 2 SB but is more like .325 /.400 or a little pop. I could see how you designate "replacement" level having just as many problems.
I guess one could fix it with taking the 10th best R of 2B, the 10th best RBI, the 10th best SB, etc.
There is a chicken and egg issue, because we have to deal with multiple categories for each player. We have to project arrays (vectors) of data for each player into a single (scalar) value, which includes at least two components: how the different statistical categories relate to winning, and what the baseline or replacement level of each category is for each position. As you suggested, you can't really figure one out without the other.
Your Bloomquist example is a good one. I see two options to get around this, though there are certainly more. Both start with a preliminary valuation. The first is to use the MPV of the replacement level player at each position. Then adjust your MPVs such that freely available players have a value of zero, tweak to your heart's content, and go!
Another is to use the preliminary valuation simply to find the replacement level players, and come up with a new valuation based on who those players are. I typically get around the Bloomquist problem be using a kernal smoother; instead of taking the 10th best player, I'll use a weighted average of the 7th-13th best, for example. If you properly value each category (i.e., put them on a scale such that moving 1 unit in runs or SB equals x wins or x points), it will end up not mattering; Bloomquist and the slow but better hitter will have exactly the same value, so either can be used.
I've liked your responses, and your original entry. I made the point originally because a reader might have assume this method adjusted for league context because you mentioned league context is important. Good luck in the contest.
If you went through an entire draft and did not get a 2B or catcher, then the general MPV does not matter as much. What matters more is the highest MPV at each position that you have a need in.
And yes, the baseline might be a bit low if all players with at least 200 PA are counted, but that also depends on how deep your league is. If it's a shallow 5x5 casual league, the 200 PA types wouldn't have enough MPV to outrank a full-time player because the full-time player will generate more counting stats and thus, more MPV.
As an addendum, I might have to patent the usage of "quibble" :)
For a lot of us, BP was an introduction to sabremetrics, and taught a lot of us to think differently, expand our thought processes, realize that there were more than a few ways to do and/or analyze something, etc. We didn't have Bill James, because James wasn't writing as much anymore, but we did have BP, Neyer, Shandler, etc.
I didn't get everything on the technical side. But I did realize I'd be able to learn, and it made it seem not so scary. That gave it a thumbs up on it's own.
It requires a fair amount of work to stay on top of your necessary updates. I’m not sure how practical that is, but if your Pivot Tables make it a cinch, then I guess I should sign up for those lessons.
The part I didn’t understand is where you get your percent improvement over the baseline. I didn’t come up with the same percentages as you. For example, calculating Team-Wright’s .improvement in OBP: T-W’s .350 – baseline .341 is .009. .009 / .341 is .026 or a rounded 3%. You say 14%.
You also say, “Note that the percent improvement calculation for OBP is compared to increasing the OBP over a reasonable low-level of .275.†No, I do not see that either. .350 - .275 = .075 which is 28% of .275. That’s not 14% either. How about .075/.341? No, that’s 22%.
The other problem I have with this essay is that your baseline is a composite of average starters for your league rather than the bottom of the possibilities. That may well be more suitable, but you do not explain why. Most of us are used to looking at Value Over Replacement, so to do otherwise requires an explanation.
"Note that the percent improvement calculation for OBP is compared to increasing the OBP over a reasonable low-level of .275. We do the same for SLG percent as a percent increase over .350. I find that these help put these rate categories on a similar scale to the counting categories."
(and it is explained again in the comments above too)
Clearly the rate categories need to be treated differently than the counting categories, since a guy who isn't playing doesn't hurt your rate stats. This is one way to address that - maybe not particularly elegant, but certainly some thought was given to the issue. However, there is a LOT of space in the article given to cutesy stuff that could have been used for explanations instead, and this is one point that needed an example such as that given by the author in the comments.
The author took a big risk I thought in terms of the scope of the article - "player evaluation" is a lot to take on in 1500 words, compared to say "here's some guys who might be undervalued in your league, or not". But ultimately it worked.
The proper way to do this would be, for example, to take a random number generator between 1 and 300,000 (or better, the number of Yahoo leagues in existence) and extract a sample of a few hundred leagues. Then, observe the stats each team has in each category and use those to estimate a distribution for a given year (I'd assume they're all distributed ~ normal with different parameters). Average those parameter values across a few years to get a reasonably more robust picture (although you may decide to keep it only last year, for example, to account for various changes in league context) and estimate an ordered probit regression of points in each category on the raw statistics from that category. A statistics package such as STATA would then allow you to compute partial effects on scoring for a representative team.
There are a number of problems with this approach, though, and I don't know if there's a good way to deal with them. First, converting stats into rotisserie points is a non-linear function (going from 0 SB to 1 doesn't help you at all, but going from 70-71 may have a relatively larger impact on the probability of gaining a point) and in order to understand the partial effects of a given player, you have to make assumptions about what the rest of your team will look like, which is, of course, what the valuation formula is supposed to tell you in the first place. Furthermore, this sort of exercise is completely invalid if you're constructing the estimated distribution from leagues different than the one you play in, for fairly obvious reasons. Finally, the marginal effects can be inaccurate if people in your league play unconventional strategies (such as punting categories) since the distributions you estimated in the first step wouldn't take those into account.
excellent job and i look forward to your future pieces.
That being said, I'm going to offer some advice: You have to have a better opening. The opening of this piece reminds me of the first few lines of a pamphlet that would get handed to me at a work conference. Don't say things like "Welcome", and "Hello New Readers". It just doesn't work. Granted a lot of BP readers will look past a weak opening and judge you on overall content, but not all of them. Just ask Byron.
But I do have one question. In calculating the position list you put Branyan at third but also put him at first because he was slotted there by the Mariners for 2009. Branyan isn't going to be drafted at both positions--you can only draft him once. And since he's likely going to be drafted as a third baseman, why should he be included in the 1B list at all? Including him in both spots treats him like two separate players for the analysis.
Branyan may be a bad example, since the position difference between first and third is close enough that he could be moved to first depending on how the draft proceeded. But what about guys like Russell Martin or Pablo Sandoval? Do they get included at third base even if they're never going to play there on anyone's team?
I would be curious your take on value in keeper leagues, based on growth, attrition, and decline rate. My two major leagues are long-term keeper with prospect drafting, and I have had a heck of a time assigning value to this sort of thing.
An analogy I can come up with is Golf. Tons of people play golf. There are those who are perfectly content (and I've been one of them for many years) to just head out to the golf course and swing the wrenches. Then there is the next level of player who likely subscribes to a golf magazine or two, and goes out to the driving range a few times a year. Beyond that, there are those who actively take lessons at the course or at a place like GolfTec (this is the year I decided to be a bit more into Golf and went down this step).
Definitely, my goal of this article was to try and appeal to what I think may be significant (though admittedly, not all-inclusive) group of fantasy leaguers who are in the middle ground of wanting something a little more than the hundreds of magazine and standard websites offer. Heck, the #1(maybe even sole) reason that I started my BP subscription was for the "one-stop shopping" aspect of of collecting some usuable data to build the better mousetrap of customized fantasy analysis, because I truly was annoyed with most of what was out there.
As opposed to looking at the negative of "this may alienate some readers who don't have the skill set," I think articles like this could be a positive gateway of saying "For those who want better analysis tools for your fantasy teams -- We have the data you need, we have the instruction manual you need [hence an article series like Fantasy DIY], and we are helping you make what we provide even better (Matt Swartz's article)"
Just my two cents
In a similar analogy to golf, people do progress, and some don't like to just stay at the same level. To have something like this article and ONLY at this level wouldn't necessarily be a smart strategy BP, but if they had content at multiple levels of fantasy owner sophistication (some that do appeal to the general masses, but others that appeal to more skilled owners), it seems like this would make it more likely to keep people on the subscription roles as their skill level and sophistication progresses.
I think that the base content itself is pretty good for readers who are new to valuation, but as a stand-alone piece of work, it is lacking a smoothly-flowing intro > body > conclusion structure that properly sets the reader's expectations, delivers against them, and leaves them with something tangible to “do on their own†(as promised).
All that said, Tim showed enough potential with this that he earned my third of three thumbs up votes for this week. Even though he explained only a small fraction of what a person REALLY needs to know about the DIY process, he at least convinced me that he understands the DIY process well enough to potentially serve as an instructor on the topic. As someone who has played this role (informally) elsewhere, I know that mastering the process yourself is only part of the job – being able to fully break down the custom valuation components and clearly communicate the technical how-to is the only way to wean a reader off of pre-published or systematically-generated dollar values. I would be curious to read a more in-depth “training†from Tim on this.
First as another reader pointed out it fails to establish a proper replacement level player. It simply compares players to those at the same position who are projected to have 200 ABs. Where does this number come from? Is it right? Answer, no. To see why it's wrong lets use a simple example of a league that only counts HRs. Using Branyan and Wright as above Wright is projected to have 32 HRs to Branyan's 26. If we used 200 ABs as our baseline and 30 hitters qualified, lets say the average 3B would get 15 HRs. This would make Wright 113% better than the average 3B and Branyan isn't too far behind at 73%. In relative terms, Wright is about 50% better than Branyan. But this is a 20 team league so there should only be 20 3Bs (ignoring the UT/CI spot). So if we now only use the top 20 3Bs we should probably expect an average of 20 HRs. Now Wright is still 60% better than average but Branyan is a much smaller 30% better than average. And in relative terms Wright is now two times better than Branyan. So as you can see, what you set the replacement value at has a great effect on the relative value of players.
The second complaint is that it appears to ignore the effect of plate appearances on OBP/SLG. A player who puts up an OPS of .900 over 600 PA can actually improve a team's OPS more than a player with an OPS of 1.000 over 200 PA, depending on the team's OPS and PAs. Also the fact that you arbitrarily choose .275 and .350 as baselines for OBP and SLG seem to suggest that there is very little rational statistical basis for this method.
As far as 200 AB, you'd have to be in a very deep fantasy league (or maybe, dealing with rookies) to find someone worthy of a fantasy roster spot that wasn't projected to get 200 AB. Even a generic pinch runner will end up with about 200 AB. Go any lower than 200 AB, and I imagine the overall MPV difference would be minimal or zero with respect to a "replacement level" fantasy player that'd be on a waiver wire.
That's my point, using his method he is comparing Branyan and Wright to a bunch of players that won't even be rostered on anyone's team. Compared to a 200AB player Branyan is almost as much a stud as Wright. However as you reduce the player pool to the proper size for the league the difference between Branyan and Wright becomes more pronounced; Wright is still a stud but Branyan would simply be average.
The .275 and .350 are somewhat arbitrary, but to find the right levels I believe is also a complicated enough process that for someone who is just beginning to create their own analysis system, this is something that would be addressed later on.
I disagree with your other comment because when you insert a player into your roster and remove the "average player" it will take into account the fact that .900 OPS over 600 PA will ahve the greater impact than 1.000 OPS over 200 PAs as one does calculate what the new OBP and SLG will be.
Oh and in reference to this, "Personally, I like combining two BP data sets, the downloadable PECOTA projections and the depth charts, which refine the expected playing time.
Note to Baseball Prospectus: a great addition for next year would be a downloadable file of the plate appearances and innings pitched used in the depth charts, preferably with the HoweID key for each player for easy joining with PECOTA projections." I can't believe no one pointed it out above, I meant to yesterday, BP already does this for us.
Here you go: http://www.baseballprospectus.com/fantasy/pfm/index.php?raw
Since the Depth Charts were updated on 06/01 these projections are from then to the rest of the season. It's too bad the raw PECOTA projections themselves couldn't be updated. Does BP still really think C. Young is going to hit 62R 17 HR 59 RBI 13 SB .269 from here on out?
My point is that BP has updated their depth charts and playing time projections for the season but not their actual projections for playing ability. I would say that C. Young has soldily established a true talent level significantly below PECOTA's projections.
I'm finding it surprising that BP is unable to update their projections in-season. For $20 a year I expect more features than I can get for free at a site like FanGraphs (which DOES offer in-season updated projections based on ZIPS). I'm just really not sure if I'm going to renew my subscription next year if they're unable to keep up with features that are free and just as good, if not better, at sites like FanGraphs, Hardball Times and LastPlayerPicked.
That's debatable.
"they have no team tracker"
Not true. Their's even includes multiple projection systems, including inseason projections. http://www.fangraphs.com/blogs/index.php/testing-some-stuff http://www.fangraphs.com/blogs/index.php/my-team-now-with-projections
"probably none of those projections are as reliable as PECOTA duing the off-season"
PECOTA maybe used to be the best but the other systems are becoming just as good if not better. http://www.hardballtimes.com/main/article/so-how-did-tht-projections-do/
"but is FanGraphs making any money?"
Is that relevant. Until a few months ago they didn't even bother to have ads. It seems to be more of a labor of love than a source of income and they are doing quite an excellent job. I haven't seen any indication that they are going to fold shop.
Ultimately I've enjoyed BP, mostly just for their PECOTA projections and their PFM but if they aren't able to step it up this year I'm gone and I imagine a bunch of others will be as well.