Bio: I’ve been a stats-geek since before Bill James started self-publishing Abstracts, and have been a Prospectus subscriber for about ten years now. Recently, I’ve written for Seamheads, StatSpeak and FanGraphs, and am waiting for the call up to the big leagues. My personal interests and writing have focused on statistical analysis, and I must admit I’ve been somewhat disappointed in the number of research articles published at BP since Dan Fox left last year. Eric has been a welcome addition, and I am hoping that I can also contribute to BP’s publishing of analysis.
Entry: Major League Equivalencies
Major League Equivalencies (MLEs) are a set of formulas that will translate a player’s minor league statistics into those that he would be expected to produce if he was in the major leagues. They form a part of most projections systems, including BP’s PECOTA and Davenport Translations. Modeling the level of competition at each stop in the minors can prove to be much more daunting than dealing strictly with major league data, as there are several potential selection biases which can markedly affect the accuracy of the projections.
-
How much elapsed time should be allowed between samples?
-
Does including bench players, who may suffer a pinch hit penalty, bias the factors?
-
Should all players be sampled, or only those who advance all the way to MLB?
-
Should the lower minors be compared directly to MLB, or to the next highest level?
The results show a wide variance in HR and SO rates, and increasingly large overall discrepancies in projections from Double-A and High-A. The best approach might not be the one you would expect.
To test these scenarios, I created matched pairs of batting data with different selection criteria in order to calculate the ratios between major and minor league performance using each method. In the following tables, factors listed are the expected ratio between the minor and major league percent. For example, if a player has a BB% of .100 in High-A, and a factor of 0.71, he would be expected to have a BB% of (.100 x .71) = .071 in the majors.
By recording statistics for each player in each season, we are taking a sample, over a given period of time, which estimates the player’s “true talent” in each of the various categories. As a player ages through his twenties, he will on average lose speed, but gain power, strike zone judgement and contact skills. If it takes two years of statistics to get a good measure of a player, at the end of the two years he is likely not exactly the same player he was before. It would then make sense not to let too much time elapse between the two sets of stats being compared.
With such a time constraint, there is not a sufficient number of players at Double-A or lower who can be compared to their Major League stats. In my first test, I set a time restriction of one year, and collected all batting stats for Class A Advanced (A+), comparing them to what the same players did in Double-A (AA) in the year before, the same year, or a year later. Double-A was compared to Triple-A, and Triple-A to MLB. In order to calculate the factors for the lower minors, the results must be “chained” – that is, to know the factor from High-A to MLB, take (A+ to AA) times (AA to AAA) times (AAA to MLB). This first test, with all players, and using chaining, is labeled “All Chained”.
SDT = (H–HR)/(AB-SO-HR) {Singles, doubles, triples}
DO = DO/(AB-SO-HR) {Doubles}
TR = TR/(AB-SO-HR) {Triples}
HR = HR/(AB-SO) {Homeruns}
HP = HP/(AB+HP+BB) {Hit by Pitch}
BB = BB/(AB+HP+BB) {Walks}
SO = SO/(AB+HP+BB) {Strikeouts}
All Chained Level SDT DO TR HR HP BB SO AAA 0.90 0.94 1.02 0.78 0.85 0.82 1.20 AA 0.88 0.91 1.03 0.75 0.74 0.73 1.26 A+ 0.84 0.92 1.03 0.73 0.66 0.71 1.33
A bias which exists in using all players is the “pinch hit penalty”. It has been shown that most players do not hit as well coming off the bench as they do starting and playing regularly. The factors will be depressed by a disproportionate number of players at the higher level (particularly in the majors) playing sparingly. In order to account for this, I decided only to use players who had an average of more than 2.5 plate appearances per game in each level. This test is labeled “Min PA Chained,” which makes the HR and SO factors and to a lesser extent SDT more beneficial to the batter.
Min PA Chained Level SDT DO TR HR HP BB SO AAA 0.91 0.94 1.02 0.80 0.84 0.82 1.18 AA 0.89 0.91 1.03 0.76 0.73 0.73 1.23 A+ 0.85 0.92 1.02 0.74 0.66 0.71 1.30
The first two tests included many players who never advanced through every level, failing to get to the majors. If the MLEs are being used to judge how well a player will perform if and when he makes the majors, is it correct to base the factors partly on the records of players who failed to advance? In the third test, I produced a list of 368 MLB “rookies” from 2003 to 2008. My definition of a rookie season is a player who had 150 or fewer career major league plate appearances entering the season, and more than 150 during that season. Factors were calculated using only the records of these 368 players, in seasons where they had 2.5 or more PA per game at each level. These results are labeled “MLB Chained,” and show virtually the same factors at Triple-A (MLB Chained is limited to players who achieved at least 150 PA in their rookie season, while Min PA only requires 2.5 PA per game). It’s in the lower minors where larger factors favoring the batter are seen across the board.
MLB Chained Level SDT DO TR HR HP BB SO AAA 0.91 0.97 1.01 0.82 0.82 0.82 1.18 AA 0.90 0.94 1.02 0.81 0.71 0.76 1.22 A+ 0.87 0.99 1.00 0.88 0.66 0.76 1.24
In the event that there was still any bias or distortion that existed in the method of chaining the factors through multiple levels, the fourth and final test compared the minor league records at each level directly to the MLB records compiled no later than one year after the player’s rookie season, otherwise not setting any maximum elapsed time. This is labeled “MLB Direct,” which uses the same list of players and same playing time criteria as “MLB Chained.” The differences being a direct comparison vs chaining, and for the lower minors, a longer elapsed time between the records being compared, which will introduce more aging factors being built into the level factors. The Triple-A factors vary from “MLB Chained” in that the samples do not need to be within a year of each other. Again, HR and SO factors at Triple-A improve slightly for the batter, with larger gains in all categories in the lower minors.
MLB Direct Level SDT DO TR HR HP BB SO AAA 0.92 0.98 1.01 0.85 0.85 0.82 1.15 AA 0.94 0.98 1.01 0.94 0.75 0.83 1.16 A+ 0.93 1.03 0.99 1.08 0.72 0.83 1.14
At all levels, “All Chained” has the least favorable factors for batters, while “MLB Direct” is the most favorable. Using only players who reached MLB is always more favorable than using all players. There is little difference between methods for Triple-A, except in HR and SO. Going into the lower minors, the differences between chaining and direct comparison become more pronounced, as each level requires another multiplication to generate the final factors, which also then multiplies any biases that exist between each level.
Level SDT DO TR HR HP BB SO All Chained AAA 0.90 0.94 1.02 0.78 0.85 0.82 1.20 Min PA Chained AAA 0.91 0.94 1.02 0.80 0.84 0.82 1.18 MLB Chained AAA 0.91 0.97 1.01 0.82 0.82 0.82 1.18 MLB Direct AAA 0.92 0.98 1.01 0.85 0.85 0.82 1.15 Level SDT DO TR HR HP BB SO All Chained AA 0.88 0.91 1.03 0.75 0.74 0.73 1.26 Min PA Chained AA 0.89 0.91 1.03 0.76 0.73 0.73 1.23 MLB Chained AA 0.90 0.94 1.02 0.81 0.71 0.76 1.22 MLB Direct AA 0.94 0.98 1.01 0.94 0.75 0.83 1.16 Level SDT DO TR HR HP BB SO All Chained A+ 0.84 0.92 1.03 0.73 0.66 0.71 1.33 Min PA Chained A+ 0.85 0.92 1.02 0.74 0.66 0.71 1.30 MLB Chained A+ 0.87 0.99 1.00 0.88 0.66 0.76 1.24 MLB Direct A+ 0.93 1.03 0.99 1.08 0.72 0.83 1.14
Now that we see how the factors compare to one another, we can judge their relative accuracies? The purpose of the MLEs is to show how well a player in the minors will perform, if and when he reaches the majors. I took the list of 368 rookies from 2003-2008 to see how well each of the methods translated their statistics at each level, compared to each player’s MLB records.
Tom Tango’s Marcel system was used to generate the baseline MLB records. Marcel uses three years of data, weighted 5/4/3. I generated the Marcels one year after each player’s rookie season, giving more time for the player to collect a sufficient sample size, while not going too far into the future, when the player’s skills might be somewhat different from when he entered the majors.
I used three methods to test the accuracy:
-
Comparing the weighted means of each player’s projections and Marcel
-
Calculating the root mean square error between each player’s projections and Marcel
-
Calculating a similarity score, where the difference between each player’s projections and Marcel is expressed as a percentage of the standard deviation (t-score) of all players stats in each of the categories, and then using the Pythagorean Theorem to determine the “distance” in t-scores, in all categories together, from the projection to the observed.
Level pSDT pXBH pHR pBB pSO eSDT eXBH eHR eBB eSO vSDT vXBH vHR vBB vSO Sim All Chained AAA 0.301 0.247 0.030 0.073 0.201 0.307 0.255 0.037 0.078 0.175 0.029 0.052 0.016 0.025 0.068 1.212 MinPA Chained AAA 0.305 0.248 0.031 0.073 0.197 0.307 0.255 0.037 0.078 0.175 0.029 0.052 0.016 0.025 0.065 1.154 MLB Chained AAA 0.304 0.255 0.032 0.073 0.197 0.307 0.255 0.037 0.078 0.175 0.029 0.053 0.017 0.025 0.065 1.158 MLB Direct AAA 0.309 0.258 0.033 0.073 0.192 0.307 0.255 0.037 0.078 0.175 0.029 0.054 0.017 0.026 0.061 1.096 Level pSDT pXBH pHR pBB pSO eSDT eXBH eHR eBB eSO vSDT vXBH vHR vBB vSO Sim All Chained AA 0.295 0.242 0.028 0.067 0.214 0.308 0.255 0.037 0.078 0.177 0.034 0.053 0.017 0.024 0.072 1.287 MinPA Chained AA 0.298 0.243 0.028 0.067 0.209 0.308 0.255 0.037 0.078 0.177 0.033 0.053 0.017 0.024 0.068 1.217 MLB Chained AA 0.299 0.250 0.030 0.069 0.207 0.308 0.255 0.037 0.078 0.177 0.033 0.053 0.018 0.024 0.066 1.184 MLB Direct AA 0.313 0.260 0.035 0.075 0.197 0.308 0.255 0.037 0.078 0.177 0.033 0.056 0.020 0.026 0.059 1.047 Level pSDT pXBH pHR pBB pSO eSDT eXBH eHR eBB eSO vSDT vXBH vHR vBB vSO Sim All Chained A+ 0.286 0.235 0.024 0.065 0.228 0.309 0.255 0.037 0.078 0.177 0.038 0.061 0.019 0.025 0.089 1.580 MinPA Chained A+ 0.290 0.236 0.024 0.065 0.222 0.309 0.255 0.037 0.078 0.177 0.036 0.061 0.019 0.025 0.084 1.495 MLB Chained A+ 0.294 0.252 0.028 0.070 0.212 0.309 0.255 0.037 0.078 0.177 0.034 0.063 0.019 0.026 0.075 1.340 MLB Direct A+ 0.315 0.262 0.035 0.076 0.195 0.309 0.255 0.037 0.078 0.177 0.034 0.066 0.021 0.028 0.061 1.102 0.302 0.254 0.040 0.082 0.164
All of the methods under projected HR, BB and SO for Triple-A, with “MLB Direct” slightly high on base hits and extra base hits, while the others were a little low. At Double-A and High-A, “MLB Direct” gives much the same projections for the test group as it did at Triple-A, while the others, which employed chaining, give progressively worse projections, for the same players, the more steps removed they are from MLB. I believe this is because each multiplication of one level to another required in the chaining process also multiplies any biases found between each level.
Despite the increased passage of time inherent in direct comparison of minor to major league statistics, as compared to chaining comparisons of data in consecutive seasons, the direct comparison method consistently gives the closest estimate of future MLB performance. In addition, direct comparison produces virtually the same MLB projection despite which level of minors was used in the calculation, where as chaining produces projections which are increasingly in error the further down into the minors.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
“As a player ages through his twenties, he will on average lose speed, but gain power, strike zone judgement… It would then make sense not to let too much time elapse between the two sets of stats being compared.†Would it not make more sense adjust his statistics based on appropriate aging patterns? Of course, with limited amateur resources that may have been too difficult, but aren’t there some reasonable aging algorithms publicly available?
“The factors will be depressed by a disproportionate number of players at the higher level (particularly in the majors) playing sparingly.†Nice catch. That’s a nuance I never considered before.
“. . . is it correct to base the factors partly on the records of players who failed to advance?†Sure it is. That makes it more useful for assessing if a player could cut it in the majors. If your MLE is accurate, it should reflect that ability.
“The purpose of the MLEs is to show how well a player in the minors will perform, if and when he reaches the majors.†That may be so, but in plain English “Major League equivalents†would be a direct translation of what that player would have done in the Majors, not what he will do when he gets there. We need a better word for this.
“. . . the direct comparison method consistently gives the closest estimate of future MLB performance.†Given that the “MLB direct†method shows the MLE of an A+ level player to have deflated indications of power (doubles and home runs on balls in play), while AA and AAA show inflated indications of power by the same measure, does this pass the smell test?
Whether it does or not, this is an interesting exploration. Perhaps, this study shows that players of future major league ability already demonstrate in A+ whether they are capable or not, while their time in AA and AAA is a waste! Their hit rates on balls in play (SDT or BA/BiP) go down at each level on their way up, because fielding improves at each level. Their walk rates drop and strike out rates rise, because the pitchers they face are better at each higher level. How much of that is due to his improvements in hitting may be very little – as little as his power rates improve going from A+ to MLB over the number of years it takes.
Yes, a finished projections system should include aging patterns, but these tests were to determine which selection criteria were best for a matched pairs comparison, which is the first of several steps before applying age corrections. In that test I was attempting to minimize other influences such as aging. Several projections that I am familiar with deliberately choose to limit their comparisons to adjacent seasons or only same seasons in order to avoid an aging bias. I wanted to find out if this is a wise approach, or if it creates more problems than it solves.
Wouldn't the final accuracy test with players who actually made the Major Leagues produce a bias against the first two MLE methods which include minor leaguers who did not reach the Majors?
Who were the players you sampled for these data sets? You stated whom their MLEs were compared against in the final test, but I don't see where you stated where the samples come from in the first place other than "matched pairs of batting data". I know you were limited in word count, but the data selection process is rather critical in such an analysis.
I also wonder about the level of precision with the calculations. If you multiply a .00002 by a .00003, are you getting .00006, .0001 or .0000 (depending on whether you are calculating or rounding up/down). In this kind of article, where a difference of .0001 is significant, it'd be important to include a statement about what rounding, if any, you are using.
I really had to take my time reading your piece, mainly because I wasn't clear on what your actual goal was until the very end--not really a big deal, because you did make it clear eventually.
I was happy to see that you included both types of comparisons, and not just the ones that worked. I beleive this kind of reporting/analysis is the most informative.
Lastly, I think it is important to include all of the players at the lower levels, and not just the handful that make it to the MLB. After all, we are trying to forecast prospects, and a team won't know which of their prospects will make the majors. If I understand your piece correctly, the numbers have been adjusted using only those that made the majors. But if we then apply these numbers to all prospects, won't it make some look better than they actually are? If I were a GM, I would rather err on the side of a prospect doing better than I projected (using deflated numbers) than spending money, time, etc. on a prospect who was not as good as I thought (because of inflated numbers).
Overall I enjoyed this article, and can't wait to see more!
I didn't do any rounding other than telling Access to display 3 deciamls.
The purpose the the MLE fctors is to translate minor league stats into 'equivalent' major league stats. How can I test the accuracy of the translation on players who did not perform in both situations?
Say I have a set of ten x,y pairs, each in two different coordinate systems. I can compare the two sets of data to derive a matrix to convert from one to another. Then each of the points are run through this matrix to compare their predicted location to their observed, the residual error. Once the residuals are sufficiently small, the conversion is deemed reliable, and any point from coordinate system one can be converted to system two. The complication in MLEs is that there are more than two systems, with six levels of minors below MLB.
What is the method of modeling the talent levels of the various minor leagues that will create the smallest residual errors? I listed four different methods, and calculated the error rates for each. We need to avoid picking a method because inutitively it sounds good without firt testing it against the alternatives.
I make my living creating digital maps from aerial photography. Surveyors go into the field and measure a small but sufficient number of points that we can see in the photography, reporting to us the point's east, north and elevation. When putting the cursor on each of these point in the photos on the screen, the software reports back the pixel location, then then compares the ground coordinates to th photo coordinates. Once the translation is establshed, I can go to any point in the photography and the software will give me a calculated ground coordinate. That coordinate translation is the basis of the entire map compilation process, and it's the same concept I used here in the results test.
The theory is that if his system compares well to Marcel's system, but is more accurate, then his system is a better system.
In any case, thanks for sharing the research, and best of luck: I know your work, and I want the winner of this contest to be a writer capable of bringing to BP the sort of insight and analysis that one always finds in your articles.
Cartwright, Brian -- 7. I think this is a BP quality piece. Most of it is way above my head and therein lies my problem with it. I just don't grasp most of it and he makes little effort to bring it down to a level where I get any sort of takeaway or even an explanation. As an example he starts off with a question about "elapsed time" - I'm not sure what it means and I don't see where he answered it in the piece. There's going to be an element of BP readers that loves this and I don't blame them, but there's going to be a lot of people that just don't get it.
How much elapsed time should be allowed between samples?
...By recording statistics for each player in each season, we are taking a sample, over a given period of time, which estimates the player's "true talent" in each of the various categories. As a player ages through his twenties, he will on average lose speed, but gain power, strike zone judgement and contact skills. If it takes two years of statistics to get a good measure of a player, at the end of the two years he is likely not exactly the same player he was before. It would then make sense not to let too much time elapse between the two sets of stats being compared.
With such a time constraint...
First, in order to do so, that would require three more articles of this size to get me where I need to be.
Second, it appears as if Brian knows what he is talking about, but that doesn't work if I, being the moronic reader, do not "get" the numbers.
Third, I would like be able to understand what he is trying to convey. I think your challenge is going to be breaking down your knowledge through words. You are a professor teaching in middle school at this point. For me at least, I still laugh at farts.
and the time constraint? heck, you've got until friday to get the next piece done. and then the next week ... and the next week ...
'we do this every day' - earl weaver
I'd be more interested in what his projected numbers would be in time... and thus, I'd also be interested in predicted changes in true talent.
And Will, I believe the "With such a time constraint..." hanger was the lead of the next paragraph, not a complaint about working conditions. Although I suppose it could be both.
Though that sentiment doesn't mean a writer shouldn't be as clear and concise as possible.
Thanks, Brian. I'm realizing that this is not just a great contest entry, but that it's also going to be one of those articles that might influence thought in the community for some months to come. It took me a while to appreciate what you'd written, but it was well worth the time.
I find it similar to what I see going on in my academic field of Operations Research. One of the key building block problems is the Travelling Salesman Problem (TSP). Simply stated if I have n cities, a distance matrix from each city to each other city, what is the shortest distance tour that visits each city only once. It's a simple problem that has numerous applications (semi-conductor fabrication, vehicle routing, etc.), but also becomes incredibly hard to solve optimally once the number of cities starts getting up into the hundreds or thousands.
It seems that a significant portion of the research (and countless academic articles) is spent on slightly improving the solution time/quality to this one single problem and some of its minor variations. It's very important research, but it can get mind-numbing reading yet another approach to improve something by a percentage point or two
I think it comes down readers voicing their opinion of seeing research that finds slightly better ways of doing what's been done before or applying techniques, research and analysis to questions that have yet to be addressed.
Other than that, refreshing new topics unreleated to winning, but relate to our lives in some way - explainable or not - is most welcome. Although, the sheer adventure of tackling a subject with a mathematical approach can be interesting, too. We need to find the right balance.
Also, a quick question:
To test the accuracy of the various projection methods, you use the actual accumulated MLB statistics from a group of player that, by necessity, have reached the bigs. Wouldn't this bias your results in favor of the projection methods that, similarly, only look at players that reached the big leagues?
Similarly, any system for translating minor league statistics to the big leagues will necessarily be optimized for translating minor league statistics for players that reach the big leagues. Since in practice, we will inevitably end up using such projections on players who won't reach the majors, won't this introduce unaccounted for variation into the model? Is there any way to deal with this? Is it even important to?
To check how well the means of the predicted and expected compared, sum(e-p)/sum(n)
To check root mean square (error is error, regardless of negative or positive) sqrt(sum((e-p)^2)/sum(n))
To use Pythagoras to calculate simlarity score sqrt(sum((e1-p1)^2+(e2-p2)^2+(e3-p3)^2)/sum(n))
As I said in my last comment. the control group hit 270 in their first 2-3 seasons of MLB. Based on only High A or lower stats, MLB Direct projects them to hit 267. All Chained project 229. Even if MLB Direct has some bias, All Chained has other biases which clearly outweigh and make the results unusable. If a large group of players hits 270, and a projection says they are 229 hitters, I would say that projection is wrong.
I've been a BP subscriber for about 10 years, and this is the fourth site where I have published articles. One of the problems when writing for a new audience is knowing the type of article they expect. When I first started reading BP, it was a stats site, but apparently is not so much anymore, although there are some (many?) of us who wish it still was. I am still primarily a stats guy, and would like to write mainly stats articles for BP, but wherever I write will be learning how to tailor my message to the audience.
As someone who works with the stats virtually everyday, the numbers quoted in the article were in my language, and had lots of meaning for me. When someone says "superior projection of a few hundreths of OBA+Slg" I realize I have not made it clear in terms the readers are familiar with.
Restating the test results:
BA OB SA
270 327 427 Marcel of MLB of 368 players in test sample
267 323 419 A+ using MLB Direct
242 295 368 A+ using MLB chained
234 284 345 A+ using MinPA Chained
229 279 338 A+ using All Chained
Collectively the 368 players had a MLB line of 270/327/427 in their first 2-3 seasons. Using only batting data from High A and below, the MLB Direct approach would tells us that they project to 267/323/419, very close. A projection system that uses an All Chained method would project these same players, based on the same set of stats, to have a 229/279/338 line, which is clearly not good enough to even play in the majors, instead of MLB average, and therefor not a useful method.
Now, if you came up with a similar system that could separate the "star" prospect wheat from the chaff given different minor league environments, that I'd find more interesting. If you could project the kinds of prospects that do well in A ball but tend to flame out at Double AA, that would also be interesting (and useful to a major league team)... or the reverse, a player profile that tends to do horribly in A ball but tend to mature/"fill out" into star players, that'd also be interesting.
In other words, I don't care as much about the average A baller since I won't see most of them at Coors Field or read about them on ESPN.. but I would be interested in the "news" names from the draft that flame out, or the "sleepers".
I realize there is a fine line on responding to comments. I don't like to go back and forth too many times on the same point. I'll clarify, but would like to move on to another question. Sometimes the reader doesn't get it - I'll look to see if it's my fault for not explaining it well enough.
Just my personal taste. I didn't say I disliked the article, it just didn't do much for me. Others appreciated it more than I did and perhaps they're right. I can admit it when I'm wrong :)
I bring this up because the different methods described above are different ways to select the sample. You're not just changing the "elapsed time" or accounting for the "pinch hit penalty." You're also changing the bias due to selection. By selecting based on plate appearance per game, you're dropping players that started in Triple A but were only good enough to pinch hit in the majors. But you're keeping players that started in Triple A but were good enough to play regularly in the majors. Your sample ends up only including the successful players. Similar points can be made for the other permutations.
This isn't a fatal flaw, but I think the differences in the results need to also be discussed in this light. Thoughts?
A control group has to consist of people or objects that exist in both circumstances. In this case, there are more than two circumstances, so you have a choice between chaing and direct comparison. The tests show that chaining multiplies biases.
If there are two players who hit the same at a given level, and one got promoted and the other didn't, you can loom at other things like speed, defense and defensive position.
Take your method to a different (admittedly, strange) degree. Say I take the Triple A stats and major league stats for everyone who makes an All-Star team and calculate conversion rates. Then I test the accuracy of this prediction by looking at the sample of All-Star players. You'll inevitably get a strong prediction, but it doesn't make it useful. I'm interested in figuring out who - given their Triple A stats - is going to make the All-Stars. Instead, I have the conversion rate of the players who make All-Star teams. That's probably not representative for the average Triple A player.
In your context, I want to know who should play in the majors given their minor league stats. Not the conversion rates of those who did play in the majors. This isn't your cross to bear, obviously, but it should be addressed when altering the method to get MLEs. A better correlation between predicted stats and actual stats doesn't mean it's more useful if it loses some of its "representative-ness."
2) Just to help explain how you practically did this - why would the MLB Chained and MLB Direct methods produce different results for AAA? It seems like those should be exactly the same.
3) Here's the broader point: You start out saying, "Major League Equivalencies (MLEs) are a set of formulas that will translate a player's minor league statistics into those that he would be expected to produce if he was in the major leagues." I agree with this - we want to know the expected production of a randomly promoted AAA hitter given his stats. This is different from what you later say, "If the MLEs are being used to judge how well a player will perform if and when he makes the majors, is it correct to base the factors partly on the records of players who failed to advance?" The promoted players are not random, biasing the results. All MLEs (that I know of) have this problem, but you exacerbate it by selecting the sample based on the best-of-the-best, the ones that eventually make the majors (or, worse, the ones that get > 2.5 PA/G). Yes, the eventual translations are better predictors for that group, but that would be true no matter how you sliced the sample - the average for a group is a better predictor for that group than some other average. But is it useful outside of that group in some way? Improvements in MLEs need to answer that question.
2) Slight idfferences between MLB Chained and MLB Direct for AAA to MLB are because Direct did not have a requirement that that two seasons be within a year of each other.
3) When I said "is it correct to base the factors partly on the records of players who failed to advance?" I was proposing an alternate scenario, predicting an argument that could be used to back another method. I was simply trying to find several different methods to test.
There will always be some biases. When you get rid of one, you create another. By empirically testing the results I was trying to find the method that was least effected. This was not a case of trying to prove my method was right. Last year when I was designing my projections, I spent a lot of time looking for a method I felt was accurate enough, more so than the other projections available. This article describes my process of discovery.
Good questions.
2) Thanks.
3) To summarize my position - I don't think your test tells us anything about biases since it's self-reinforcing. Given that, we have to evaluate the methods on their own. I'm sure there will always be some bias at least, but we should try to move towards less bias. Your selection method seems to make it much, much worse. Now, for a projection system, this isn't necessarily bad. You don't want translation ratios for random players - you want ones for promoted players because you want something that's descriptive of what does happen. If you were a GM and you wanted to start promoting people based primarily on stats, then you want the ratios for a randomly-promoted player (your first sentence). I would think this would be an interesting avenue for research - finding random (non-performance) shocks that cause players to get promoted (injuries at higher levels?).
So I guess we each get to vote for one article, and whichever article has the fewest votes goes away?