Baseball Toaster Mike's Baseball Rants
Help
This is my site with my opinions, but I hope that, like Irish Spring, you like it, too.
Frozen Toast
Search
Google Search
Web
Toaster
Mike's Baseball Rants
Archives

2009
01 

2008
10  09  07 
06  05  04  03 

2007
12  11  10  09  08  07 
06  05  04  03  02  01 

2006
12  11  10  09  08  07 
06  05  04  03  02  01 

2005
12  11  10  09  08  07 
06  05  04  03  02  01 

2004
12  11  10  09  08  07 
06  05  04  03  02  01 

2003
12  11  10  09  08  07 
06  05  04  03  02  01 

2002
12  11  10  09  08  07 
Links to MBBR
Grandeur of the Perfect Sphere
2004-06-10 13:11
by Mike Carminati

Grandeur of the perfect sphere
Thanks the atoms that cohere.
—Ralph "Branca" Waldo Emerson

Perfect behavior is born of complete indifference.
—Cesare "Carl" Pavese

So who’s perfect? … Washington had false teeth. Franklin was nearsighted. Mussolini had syphilis. Unpleasant things have been said about Walt Whitman and Oscar Wilde. Tchaikovsky had his problems, too. And Lincoln was constipated.
—John "Kid" O'Hara

In the wake of Randy Johnson's perfect game a number of analysts have remarked on the fact that there have been eleven perfect games since 1961, the year of baseball's first foray into expansion. In the previous ninety years of major-league baseball, there were just five regular-season perfectos (plus Don Larsen's 1956 World Series gem).

Of course, they argue, expansion diluted the player pool making it easier for a dominant pitcher to shut down an inferior team 27 times in a row. Keith Emmer wrote a very good article on the topic that cuts through the bluster and uses common sense to look at the issue. He uses on-base percentage to evaluate this claim and finds that most of the post-expansion perfect games have been thrown against teams with better OBPs than those in previous perfect games.

Here is a table of the overall ratios (batting average, on-base percentage, slugging, and OPS) in the majors per decade. Also listed is the number of perfect Games (PG) per decade. (Note: Don Larsen's World Series game is included with the regular-season stats for the 1950s.) There is a grand total and a breakdown by pre-expansion and post-expansion era:

DecadeBAOBPSLUGOPSPG
1870s.269.282.333.6160
1880s.251.298.338.6362
1890s.275.345.369.7140
1900s.253.311.328.6392
1910s.256.322.338.6590
1920s.285.347.397.7431
1930s.279.342.399.7420
1940s.260.332.368.7000
1950s.259.331.391.7231
1960s.249.314.374.6883
1970s.256.323.377.7000
1980s.259.324.388.7123
1990s.265.334.410.7434
2000s.265.335.426.7611
Total.262.327.379.70617
Pre-Exp.265.328.367.6956
Post-Exp.259.326.393.71811

Yeah, there's a big difference in OBP since expansion. Well, two percentage points at least, is that it? It's also worth mentioning that decades with low OBPs tend to have more perfect games (1880s, 1900s, and 1960s) than those with high OBPs (1890s, 1930s and 1940s). However, the decade with the highest overall OBP, the 1920s, did have a perfect game and the high-OBP Nineties had four. So there is something more than just how often players get on base that affects the likelihood of a prefect game.

The next big factor and one that is so obvious that it's remarkable that it hardly gets mentioned. It's the number of games. As Emmer points out, the number of games per year has nearly doubled in the expansion era. There are 30 teams now as opposed to 16, and the schdule is 162 games as opposed to 154. If one views the likelihood of a perfect game being throw as a probability problem, each game pitched (except those pitched by Brian Anderson) has a shot at being a perfect game. The more games there are the better than chance that a perfect game is thrown.

I though that I would use the number of games and on-base percentage yearly to calculate the probability of a perfecto being thrown in that year. However, I wasn't happy using the standard OBP. For those of you unfamiliar with the stat, it is the sum of a player's hits, walks, and hit-by-a-pitch divided by his total plate appearances, i.e., the sum of his at-bats, walks, hit-by-a-pitch, and sacrifice flies. It was a decent predictor but had some obvious issues.

First, it counts sacrifice flies as a plate appearance (actually, initially the devisors of OBP went back and forth on this issue for a few years before settling on including sac flies). Now, obviously in a perfect game, there is no possibility of sac flies since whenever a given batter is at the plate there could not possibly be someone already on base, let alone at third, for him to drive in with a sac fly.

I thought about taking sac flies out of the plate appearance total, but decided against it for two reasons: 1) Sac flies were not officially recorded prior to the 1954 season. Prior to that they were just fly outs. If I removed them from the post-1954 plate appearances, it wouldn't be consistent with the pre-sac fly era statistical record.

2) I don't have a lot of faith in sac flies. Sure, if it's the bottom of the ninth of a tie ballgame and there's a runner at third with fewer than two outs, the batter is trying to drive the ball deep to the outfield to score the runner and win the game. That clearly is a sac fly (though often the outfield is drawn in and the ball drops for a hit).

However, let's say there are runners at first and third with one out in the ninth and the team at bat down by three. The batter drives a ball deep to left to score the runner, is credited with a sac fly, and receives high fives all around in the dugout. But that batter wasn't trying to sacrifice himself to score the run: he was trying to hit the ball hard possibly for a home run or a gap double to score the runners.

I just don't buy that the sac flies accurately capture a batter sacrificing himself for a run. The framers of on-base percentage didn't but it either. That's why they eventually included sac flies but not bunts in the total plate appearances.

Anyway, I kept sac flies in the total plate appearance totals.

That then leads us to sacrifice bunts. They are not in the plate appearance totals and I decided to keep it that way. Again, in a perfect game, there is no possibility of a bunt except for a base hit and then the at-bat would result in a hit, an error, or an out, but not a sacrifice bunt. Bunts have been recorded since 1895, and more importantly unlike sac flies they do capture an intentional sacrifice on the batter's part.

So bunts are not included in the total plate appearances calculations.

Next there is the issue of intentional walks. No one is going to give someone a free first-base pass when he is pitching a perfect game. Intentional walks have only been recorded since 1955, but I decided to subtract them from the walk totals, which affects both parts of the equation. Intentional walks will not be included in total plate appearances but will also be subtracted from the calculations for times on base.

Lastly and possibly the biggest issue with using on-base percentage, OBP ignores errors. Well, more precisely, if a player reaches on an error, it counts as an at-bat and therefore, a plate appearance but not as a time on base. However, an error would break up a perfect game. Therefore, I am including errors in the equation, which I will no longer call on-base percentage, since it is far afield of the official definition. I'll just call the new equation reached percentage—why not?

By the way, I will not be including passed balls, wild pitches, balks, or any other stat since they, for the most part, occur when there are already runners on base. (I know that a balk could be called with no runner, thereby charging the pitcher with a ball. Also, a dropped third strike could be called a passed ball if the batter reaches. The same goes for a wild pitch. But I think those instances are so rare that they can be ignored.)

The final "reached" equation combines batting and fielding stats: It's the sum of hits plus hit-by-a-pitch plus walks minus intentional walks, all divided by "modified" plate appearances (the sum of at-bats, hit-by-a-pitch, sac flies, and walks minus intentional walks). Given that it combines the offensive stats for the team at batter and the defensive stats for the team pitching, it can only be done at the league level. (Note, that interleague play complicates all of this, but instead of going to the major-league level as a result, I decided to keep it at the league.)

I then took the reached percentage and subtracted it from one to get the percentage of times that the player did not reach. I then multiplied raise it to the 27th power to represent the full complement of plate appearances in a nine-inning perfect game. That gave me the probability that a given game in a given year could result in a perfect game. I then multiplied that by the total number of games (one game per team) to arrive at the expected number of perfect games for a given league in a given year.

Now here are the results:

YrLgGReached %PG ProbExp # PGActual PG
1871NA25448.40%1.74E-080.0000
1872NA36647.02%3.56E-080.0000
1873NA39848.66%1.52E-080.0000
1874NA46447.76%2.43E-080.0000
1875NA69043.80%1.75E-070.0001
1876NL52043.01%2.55E-070.0001
1877NL36042.16%3.81E-070.0001
1878NL36840.56%7.96E-070.0003
1879NL64239.74%1.15E-060.0007
1880NL68038.48%2.01E-060.00142
1881NL67239.96%1.04E-060.0007
1882AA46840.87%6.91E-070.0003
1882NL67639.71%1.17E-060.0008
1883AA78041.22%5.88E-070.0005
1883NL79041.53%5.11E-070.0004
1884AA131839.73%1.15E-060.0015
1884NL91440.12%9.69E-070.0009
1884UA85642.41%3.38E-070.0003
1885AA89039.50%1.28E-060.0011
1885NL89039.10%1.53E-060.0014
1886AA111441.05%6.36E-070.0007
1886NL99039.46%1.30E-060.0013
1887AA110043.93%1.64E-070.0002
1887NL101642.07%3.97E-070.0004
1888AA109639.21%1.46E-060.0016
1888NL108837.48%3.11E-060.0034
1889AA111842.54%3.18E-070.0004
1889NL106241.74%4.63E-070.0005
1890AA108041.29%5.70E-070.0006
1890NL107840.66%7.58E-070.0008
1890PL105844.38%1.32E-070.0001
1891AA111842.22%3.70E-070.0004
1891NL110440.35%8.74E-070.0010
1892NL184239.58%1.24E-060.0023
1893NL157042.83%2.78E-070.0004
1894NL158645.44%7.88E-080.0001
1895NL159243.53%1.99E-070.0003
1896NL158442.10%3.91E-070.0006
1897NL161841.90%4.29E-070.0007
1898NL184239.78%1.13E-060.0021
1899NL184240.84%7.00E-070.0013
1900NL113840.37%8.67E-070.0010
1901AL109840.23%9.24E-070.0010
1901NL112237.92%2.57E-060.0029
1902AL110638.79%1.76E-060.0019
1902NL112437.05%3.75E-060.0042
1903AL110835.71%6.60E-060.0073
1903NL112039.18%1.48E-060.0017
1904AL125234.28%1.19E-050.01501
1904NL124636.45%4.82E-060.0060
1905AL123434.88%9.34E-060.0115
1905NL124036.78%4.20E-060.0052
1906AL122635.27%7.96E-060.0098
1906NL123035.76%6.48E-060.0080
1907AL123435.17%8.28E-060.0102
1907NL123235.50%7.20E-060.0089
1908AL124434.51%1.09E-050.01351
1908NL124434.56%1.07E-050.0133
1909AL124035.37%7.60E-060.0094
1909NL124236.10%5.61E-060.0070
1910AL125635.95%5.96E-060.0075
1910NL124237.53%3.04E-060.0038
1911AL122839.13%1.51E-060.0019
1911NL124638.15%2.32E-060.0029
1912AL123838.76%1.78E-060.0022
1912NL122638.42%2.06E-060.0025
1913AL122837.32%3.33E-060.0041
1913NL124036.77%4.22E-060.0052
1914AL126236.63%4.48E-060.0057
1914FL124837.50%3.08E-060.0038
1914NL125036.51%4.70E-060.0059
1915AL124237.21%3.49E-060.0043
1915FL123836.34%5.07E-060.0063
1915NL124835.10%8.53E-060.0106
1916AL125036.21%5.36E-060.0067
1916NL124434.70%1.01E-050.0125
1917AL124435.93%6.02E-060.0075
1917NL125034.71%1.00E-050.0125
1918AL101636.48%4.76E-060.0048
1918NL101635.21%8.15E-060.0083
1919AL112037.25%3.44E-060.0039
1919NL111634.99%8.93E-060.0100
1920AL123438.46%2.03E-060.0025
1920NL123436.13%5.54E-060.0068
1921AL123239.36%1.36E-060.0017
1921NL122637.50%3.08E-060.0038
1922AL123638.13%2.34E-060.00291
1922NL124038.43%2.05E-060.0025
1923AL123238.51%1.98E-060.0024
1923NL123437.98%2.50E-060.0031
1924AL123439.08%1.54E-060.0019
1924NL122837.02%3.79E-060.0047
1925AL123239.47%1.30E-060.0016
1925NL122438.35%2.12E-060.0026
1926AL123238.41%2.07E-060.0026
1926NL123637.34%3.30E-060.0041
1927AL123838.74%1.80E-060.0022
1927NL123437.27%3.40E-060.0042
1928AL123437.72%2.81E-060.0035
1928NL122837.54%3.03E-060.0037
1929AL122638.19%2.28E-060.0028
1929NL123238.68%1.84E-060.0023
1930AL123238.38%2.10E-060.0026
1930NL123639.17%1.49E-060.0018
1931AL123637.66%2.88E-060.0036
1931NL123636.44%4.86E-060.0060
1932AL123037.69%2.83E-060.0035
1932NL123635.80%6.37E-060.0079
1933AL121637.15%3.58E-060.0043
1933NL123634.68%1.02E-050.0125
1934AL123038.05%2.42E-060.0030
1934NL121636.18%5.43E-060.0066
1935AL122237.92%2.57E-060.0031
1935NL123436.36%5.01E-060.0062
1936AL123639.17%1.48E-060.0018
1936NL124036.70%4.34E-060.0054
1937AL124438.32%2.16E-060.0027
1937NL123436.22%5.33E-060.0066
1938AL122638.68%1.84E-060.0023
1938NL122035.81%6.33E-060.0077
1939AL123038.26%2.22E-060.0027
1939NL123236.51%4.72E-060.0058
1940AL123837.26%3.42E-060.0042
1940NL123435.57%7.00E-060.0086
1941AL124437.00%3.83E-060.0048
1941NL124435.59%6.93E-060.0086
1942AL122235.90%6.09E-060.0074
1942NL122634.65%1.03E-050.0126
1943AL123434.95%9.07E-060.0112
1943NL124235.24%8.06E-060.0100
1944AL123835.56%7.04E-060.0087
1944NL124635.50%7.21E-060.0090
1945AL122435.35%7.67E-060.0094
1945NL123636.26%5.24E-060.0065
1946AL124235.57%7.01E-060.0087
1946NL124235.56%7.05E-060.0088
1947AL124635.61%6.89E-060.0086
1947NL124036.26%5.23E-060.0065
1948AL123637.25%3.43E-060.0042
1948NL123835.96%5.94E-060.0074
1949AL123637.62%2.93E-060.0036
1949NL124435.95%5.98E-060.0074
1950AL124037.98%2.50E-060.0031
1950NL123636.14%5.52E-060.0068
1951AL123436.68%4.38E-060.0054
1951NL124435.68%6.70E-060.0083
1952AL124235.38%7.59E-060.0094
1952NL123634.82%9.57E-060.0118
1953AL123635.92%6.06E-060.0075
1953NL124436.04%5.75E-060.0072
1954AL124235.48%7.26E-060.0090
1954NL123235.97%5.93E-060.0073
1955AL123635.50%7.22E-060.0089
1955NL123234.65%1.03E-050.0127
1956AL123636.16%5.46E-060.0067
1956NL124233.76%1.48E-050.0184
1957AL123234.25%1.21E-050.0149
1957NL123834.01%1.34E-050.0166
1958AL123833.98%1.35E-050.0168
1958NL123234.52%1.09E-050.0134
1959AL123634.30%1.18E-050.0146
1959NL124034.24%1.22E-050.0151
1960AL123434.68%1.01E-050.0125
1960NL123833.70%1.52E-050.0188
1961AL162235.04%8.74E-060.0142
1961NL123834.60%1.05E-050.0130
1962AL161834.37%1.15E-050.0187
1962NL162434.72%9.98E-060.0162
1963AL161632.99%2.02E-050.0327
1963NL162232.60%2.37E-050.0384
1964AL162833.00%2.02E-050.0328
1964NL162433.09%1.95E-050.03161
1965AL162032.76%2.22E-050.0360
1965NL162632.88%2.12E-050.03441
1966AL161232.33%2.63E-050.0424
1966NL161833.12%1.92E-050.0310
1967AL162031.96%3.06E-050.0495
1967NL162032.41%2.55E-050.0414
1968AL162431.45%3.73E-050.06061
1968NL162631.52%3.63E-050.0590
1969AL194633.78%1.47E-050.0286
1969NL194633.62%1.57E-050.0305
1970AL194633.86%1.42E-050.0277
1970NL194234.49%1.10E-050.0213
1971AL193233.22%1.85E-050.0357
1971NL194433.18%1.88E-050.0365
1972AL185832.21%2.77E-050.0514
1972NL186033.11%1.93E-050.0358
1973AL194434.69%1.01E-050.0196
1973NL194233.79%1.46E-050.0284
1974AL194634.20%1.24E-050.0241
1974NL194434.34%1.17E-050.0227
1975AL192634.83%9.52E-060.0183
1975NL194234.50%1.09E-050.0212
1976AL193433.95%1.37E-050.0265
1976NL194433.78%1.47E-050.0286
1977AL226234.93%9.16E-060.0207
1977NL194434.46%1.11E-050.0216
1978AL226234.53%1.08E-050.0244
1978NL194233.57%1.60E-050.0310
1979AL225635.31%7.82E-060.0177
1979NL194234.02%1.33E-050.0259
1980AL226434.88%9.33E-060.0211
1980NL194633.63%1.56E-050.0303
1981AL150033.68%1.53E-050.02291
1981NL128833.60%1.58E-050.0204
1982AL227034.44%1.12E-050.0254
1982NL194433.52%1.63E-050.0317
1983AL227034.45%1.12E-050.0254
1983NL194833.74%1.49E-050.0290
1984AL226834.35%1.16E-050.02641
1984NL194233.54%1.62E-050.0315
1985AL226434.39%1.14E-050.0259
1985NL194233.38%1.73E-050.0336
1986AL226834.65%1.03E-050.0233
1986NL193833.71%1.51E-050.0292
1987AL226834.96%9.04E-060.0205
1987NL194234.27%1.20E-050.0233
1988AL226233.92%1.39E-050.0313
1988NL193832.50%2.47E-050.04781
1989AL226634.15%1.26E-050.0286
1989NL194632.66%2.30E-050.0448
1990AL226634.24%1.22E-050.0276
1990NL194433.48%1.66E-050.0322
1991AL226834.31%1.18E-050.0268
1991NL194033.18%1.87E-050.03641
1992AL226834.27%1.20E-050.0272
1992NL194432.82%2.16E-050.0421
1993AL226835.03%8.78E-060.0199
1993NL227034.34%1.17E-050.0265
1994AL159435.88%6.16E-060.00981
1994NL160634.62%1.04E-050.0167
1995AL202035.76%6.46E-060.0130
1995NL201434.62%1.04E-050.0209
1996AL226636.36%5.02E-060.0114
1996NL226834.54%1.08E-050.0244
1997AL226435.41%7.48E-060.0169
1997NL226834.80%9.64E-060.0219
1998AL226835.49%7.24E-060.01641
1998NL259634.54%1.07E-050.0279
1999AL226536.21%5.34E-060.0121
1999NL259135.70%6.62E-060.01721
2000AL226536.31%5.13E-060.0116
2000NL259335.55%7.05E-060.0183
2001AL226634.79%9.68E-060.0219
2001NL259234.32%1.18E-050.0305
2002AL226434.45%1.11E-050.0252
2002NL258834.26%1.21E-050.0313
2003AL227034.63%1.04E-050.0235
2003NL259034.38%1.15E-050.0297
2004AL79635.50%7.22E-060.0057
2004NL80534.07%1.30E-050.01051
Total36501536.01%5.83E-063.237916
Pre-Exp19106237.68%2.85E-060.85995
Post-Exp17395334.17%1.25E-052.378011

I even ran the numbers for the postseason, but won't try your patience by posting it all here. I'll just list 1956 and the overall numbers:

YrGReached %PG ProbExp # PGActual PG
1956730.83%4.76E-050.00031
Total114032.65%2.32E-050.04531

Anyway, if you look at the regular-season table totals, you'll notice that even though OBP has remained about the same in the expansion era as it was in the pre-expansion era, due to improved defensive play, fewer men, about 3.5%, reach base after expansion. The result is that a perfect game is 4-5 times more likely to be thrown. Add in the extra games, and the expected number of perfect games in the expansion era trebles the pre-expansion expectation.

Actual perfect games do seem to occur when the expectation numbers have a sudden increase (e.g., 1880, 1904, 1908, 1968, and 1988), but also seem to happen when the expectation level is relatively low (1922 and 1994). I take that to mean that probability plays a guiding role but that there's still a great deal of randomness to the whole thing.

You may also notice that the expected number of perfect games is about a quarter of the actual (about 3.25 to 15). Obviously, basing the calculations on the league average for defense and offense was not an entirely accurate model for reality. It appears that the advantage of a very good defensive team facing a very poor offensive team outweighs the disadvantage of a very poor defensive team facing a very good offense.

A study for another day might be to based the calculations not at the league level but at the team level and they formulate the expected number of perfect games based on the matchups in that year. The 1950 Phils offense against the 1950 Braves defense if they played 33 times as they did in the days of the 154-game schedule. Maybe that would give us a more accurate picture.

Then again there is the possibility that pitchers exceed expectations why there is a possibility of a perfect game. Look at that Kevin Costner movie after all (although how difficult could it be to pitch a perfect game if Kelly Preston is the prize that you'd win).

Whatever the cause, given that actual results generally follow expectations, I have to believe that there's a bit more than randomness in the mix.

Comment status: comments have been closed. Baseball Toaster is now out of business.