Baseball Toaster was unplugged on February 4, 2009.
Ken: catfish AT zombia d.o.t. com
Ryan: rarmbrust AT gmail d.o.t. com
Philip: kingchimp AT alamedanet d.o.t net
Ryan has a post about optimizing the A's lineup over on The Pastime, using PECOTA projections and a formula from Cyril Morong over at Beyond the Boxscore.
Ryan didn't have the programming nerdiness to work through all 362,800 lineup permutations. But I happened to be cursed with such geekdom, so I wrote a perl script to churn out the calculations. I ran it twice, once with Frank Thomas in the lineup, and once with Jay Payton in place of Thomas.
Here are the best and worst lineups. The number is runs/162 games.
Five best lineups with Thomas:
853.45: Bradley Chavez Ellis Thomas Johnson Crosby Swisher Kotsay Kendall 853.44: Bradley Chavez Ellis Thomas Johnson Swisher Crosby Kotsay Kendall 853.13: Bradley Johnson Ellis Thomas Chavez Crosby Swisher Kotsay Kendall 853.12: Bradley Johnson Ellis Thomas Chavez Swisher Crosby Kotsay Kendall 852.90: Ellis Chavez Bradley Thomas Johnson Swisher Crosby Kotsay Kendall
Five best lineups with Payton:
834.91: Bradley Johnson Ellis Chavez Swisher Payton Crosby Kotsay Kendall 834.80: Bradley Johnson Ellis Chavez Crosby Payton Swisher Kotsay Kendall 834.78: Bradley Swisher Ellis Chavez Johnson Payton Crosby Kotsay Kendall 834.63: Bradley Crosby Ellis Chavez Johnson Payton Swisher Kotsay Kendall 834.50: Bradley Chavez Ellis Swisher Johnson Payton Crosby Kotsay Kendall
A few interesting notes:
With Thomas: 852.58: Ellis Johnson Bradley Thomas Chavez Crosby Swisher Kotsay Kendall With Payton: 834.36: Ellis Johnson Bradley Chavez Swisher Payton Crosby Kotsay Kendall
Providing evidence that Zachary's preference for Ellis and Johnson at the top of the order is a good one.
Now for some fun: the worst lineups...
With Thomas:
816.79: Crosby Kotsay Johnson Kendall Swisher Ellis Bradley Chavez Thomas 816.84: Swisher Kotsay Johnson Kendall Crosby Ellis Bradley Chavez Thomas 816.92: Crosby Kotsay Johnson Kendall Swisher Bradley Ellis Chavez Thomas 816.97: Swisher Kotsay Johnson Kendall Crosby Bradley Ellis Chavez Thomas 817.05: Kotsay Ellis Swisher Kendall Crosby Bradley Johnson Chavez Thomas
With Payton:
799.02: Payton Kotsay Swisher Kendall Crosby Ellis Bradley Johnson Chavez 799.11: Payton Kotsay Crosby Kendall Swisher Ellis Bradley Johnson Chavez 799.15: Payton Kotsay Swisher Kendall Crosby Bradley Ellis Johnson Chavez 799.24: Payton Kotsay Crosby Kendall Swisher Bradley Ellis Johnson Chavez 799.59: Payton Kotsay Swisher Kendall Crosby Ellis Bradley Chavez Johnson
The perl code is below, for those of you with the Unixness for these things...
#!/usr/bin/perl use Algorithm::Permute; # put players and their obp/slgs here my @pname = ('Ellis','Bradley','Chavez','Payton','Johnson','Crosby','Swisher','Kotsay','Kendall'); my @pobp = (.351,.355,.354,.322,.353,.346,.347,.332,.333); my @pslg = (.426,.447,.479,.432,.462,.453,.455,.414,.338); # formulae from http://www.beyondtheboxscore.com/story/2006/2/12/133645/296 my @obpx = (2.997,2.255,2.141,1.670,2.254,1.346,1.528,1.188,2.550); my @slgx = (.931,1.263,.933,1.504,1.146,1.237,1.164,.825,.539); my $constant = -5.261; my $slots = 9; my @array = (0..($slots-1)); Algorithm::Permute::permute { my $lineup = ""; $rpg = $constant; for (my $i=0; $i<$slots; $i++) { $rpg += ($obpx[$i] * $pobp[$array[$i]]) + ($slgx[$i] * $pslg[$array[$i]]); $lineup .= $pname[$array[$i]] . " "; } print 1.00*(int($rpg*16200)/100) . " " . $lineup . "\n"; } @array; # run the program from the command line like this: ./permute.pl | sort -n >somefilename.txt
"I'm smarter than any stinkin' formula!"
--Ken Macha
I would really like to see data involving pitches per plate appearance (P/PA). The OBP and SLG statistics are great starting points.
The comments are great for leading up to a few posts this week.
That's really weird, so I'm starting to question those numbers. Either that, or we need to have a radically different view of batting orders when there's a DH than we're used to.
The question is whether to believe the model in the first place. It looks to me like the noise in the data is so great, that there isn't much that can be salvaged here. In any case, this seems to be a somewhat perverse way of trying to solve the problem of optimizing a batting order. Simulations are simpler and likely to give more useful results.
The drawback of simulations is that it takes around 100K games with a fixed lineup to get precision on the order of .01 runs/game, so it would be somewhat prohibitive to do it for all possible permutations. But that fact should also tell you why it's so hard to draw any useful conclusions from a few years of historical data.
Something like the following might be interesting, though: run a simulation with a typical lineup (based on league averages for each slot), and then vary OBP/SLG in each slot slightly to get a table like Morong's. The coefficients should look considerably less random than what we have here. Then you could apply Ken's script to any actual group of players. There would be some circularity in logic here (it would generate lineups that are optimal, given the constraint that they look something like the lineups that managers actually use), and this can be seen as either a bug (it might miss a better answer) or a feature (it give answers that have some chance of actually being implemented).
I'm also looking forward to seeing what mgl/tango have to say about this subject in The Book.
I get what you're saying, Turnstiles. Still, it seems a waste to disregard real game data, and use simulated data instead. Maybe there really is something going on here that we wouldn't capture in simulation. Maybe a hybrid solution would be better?
Comment status: comments have been closed. Baseball Toaster is now out of business.