Rob Neyer has some interesting comments stemming from an argument in Moneyball over the importance of on-base percentage and of slugging percentage and on the validated of OPS (on-base plus slugging).
Here's an abridged version of the Moneyball text:
OPS was the simple addition of on-base and slugging percentages. Crude as it was, it was a much better indicator than any other offensive statistic of the number of runs a team would score. Simply adding the two statistics together, however, implied that they were of equal importance...An extra point of on-base percentage was clearly more valuable than an extra point of slugging percentage -- but by how much? ... In [the resulting] model an extra point of on-base percentage was worth three times an extra point of slugging percentage.
But three-to-one at what point? Clearly as Neyer opines they are not saying that a player with a .200 on-base percentage was equal to a .600 slugging hitter. Neyer states that he "came to the conclusion that while OPS ain't bad, a better measure would be the sum of slugging percentage and OBP*1.4 (or thereabouts)... So yes, OPS is a crude tool, a blunt object that shouldn't be used when precision is critical"
However, we have to use something as a yardstick or Mario Mendoza would look like Babe Ruth-well, maybe not. It got me to thinking how well the various batting averages correlated to runs historically. I compared batting average, on-base percentage, slugging percentage, OPS, and Neyer's modified OPS' (OBP*1.4 + Slug) against runs for all major-league teams to determine which best correlated.
Here's what I got. The higher the correlation coefficient the better:
So Neyer's OPS' is best historically, and regular OPS nudges on-base percentage. That all seems to make intuitive sense.
I next did the same thing broken down by decades:
Note that initially batting average was the best predictor of runs being scored. Then on-base percentage ruled in the 1880s. Ever since then OPS (or OPS') has shown the best correlation to runs scored.
But it's odd how wildly the correlation coefficients fluctuate. One would think that a stat would predict well from decade to decade, or at least that the process would evolve more rather than swing wildly back on forth.
I think there is some way to use linear regression to get the different averages weighed properly based on era, but figuring out what constitutes an era may be the difficult part. It could be split up by decade, but that's sort of an artificial rule being imposed on the system. Perhaps runs-per-game could be used as a means to stratify the major-league seasons, thereby chunking them into like groups.
I'll have to think about this a bit more but I think it's do-able. Maybe I'll wait until after Amazon gets around to sending me my copy of Moneyball.