Baseball Toaster Catfish Stew
Help
STOP CASTING POROSITY! An Oakland Athletics blog.
Frozen Toast
Search
Google Search
Web
Toaster
Catfish Stew
Archives

2009
02  01 

2008
12  11  10  09  08  07 
06  05  04  03  02  01 

2007
12  11  10  09  08  07 
06  05  04  03  02  01 

2006
12  11  10  09  08  07 
06  05  04  03  02  01 

2005
12  11  10  09  08  07 
06  05  04  03  01 

2004
12  09  08  01 

2003
12  11  10  09  08 
Email Us

Ken: catfish AT zombia d.o.t. com
Ryan: rarmbrust AT gmail d.o.t. com
Philip: kingchimp AT alamedanet d.o.t net

Ken's Greatest Hits
28 Aug 2003
12 Jan 2004
31 May 2005
11 May 2005
29 Jun 2005
8 Jun 2005
19 Jul 2005
11 Aug 2005
7 Sep 2005
20 Sep 2005
22 Sep 2005
26 Sep 2005
28 Sep 2005
29 Sep 2005
18 Oct 2005
9 Nov 2005
15 Nov 2005
20 Nov 2005

13 Dec 2005
19 Jan 2006
28 Jan 2006
21 Feb 2006
10 Apr 2006
16 Apr 2006
22 Apr 2006
7 May 2006
25 May 2006
31 May 2006
18 Jun 2006
22 Jun 2006
6 Jul 2006
17 Jul 2006
13 Aug 2006
15 Aug 2006
16 Aug 2006
20 Aug 2006
11 Oct 2006
31 Oct 2006
29 Dec 2006
4 Jan 2006
12 Jan 2006
27 Jan 2007
17 Feb 2007
30 Apr 2007
27 Aug 2007
5 Sep 2007
19 Oct 2007
23 Nov 2007
5 Jan 2008
16 Jan 2008
4 Feb 2008
7 May 2008
20 Jun 2008
4 Feb 2008
Must. Bat. Kendall. Ninth.
2006-02-21 12:40
by Ken Arneson

Ryan has a post about optimizing the A's lineup over on The Pastime, using PECOTA projections and a formula from Cyril Morong over at Beyond the Boxscore.

Ryan didn't have the programming nerdiness to work through all 362,800 lineup permutations. But I happened to be cursed with such geekdom, so I wrote a perl script to churn out the calculations. I ran it twice, once with Frank Thomas in the lineup, and once with Jay Payton in place of Thomas.

Here are the best and worst lineups. The number is runs/162 games.

Five best lineups with Thomas:

853.45: Bradley Chavez Ellis Thomas Johnson Crosby Swisher Kotsay Kendall
853.44: Bradley Chavez Ellis Thomas Johnson Swisher Crosby Kotsay Kendall
853.13: Bradley Johnson Ellis Thomas Chavez Crosby Swisher Kotsay Kendall
853.12: Bradley Johnson Ellis Thomas Chavez Swisher Crosby Kotsay Kendall
852.90: Ellis Chavez Bradley Thomas Johnson Swisher Crosby Kotsay Kendall

Five best lineups with Payton:

834.91: Bradley Johnson Ellis Chavez Swisher Payton Crosby Kotsay Kendall
834.80: Bradley Johnson Ellis Chavez Crosby Payton Swisher Kotsay Kendall
834.78: Bradley Swisher Ellis Chavez Johnson Payton Crosby Kotsay Kendall
834.63: Bradley Crosby Ellis Chavez Johnson Payton Swisher Kotsay Kendall
834.50: Bradley Chavez Ellis Swisher Johnson Payton Crosby Kotsay Kendall

A few interesting notes:

  • This formula insists on batting Kotsay eighth and Kendall ninth. The other players switch around a lot at the top of the list, but that configuration is solid. If there is one conclusion to draw from this exercise, this is it.
     
  • The A's are about 20 runs/year better with Thomas in the lineup than Payton.
     
  • It likes Bradley leading off and Ellis batting third. That's probably not going to happen in real life, but the presumed order with Ellis leading off also works pretty well.
     
  • Given that Ellis is probably going to lead off, and Chavez will bat either third, fourth, or fifth, the ideal lineups with that configuration are:
    With Thomas:  852.58: Ellis Johnson Bradley Thomas Chavez Crosby Swisher Kotsay Kendall
    With Payton:  834.36: Ellis Johnson Bradley Chavez Swisher Payton Crosby Kotsay Kendall
    

    Providing evidence that Zachary's preference for Ellis and Johnson at the top of the order is a good one.

  • When Thomas is in the lineup, it tends to like Chavez batting second. When Thomas is out of the lineup, it tends to like Chavez batting cleanup.
     
  • Crosby and Swisher are pretty much interchangeable. Swapping them between any two lineups spots produces almost exactly the same result.
     

Now for some fun: the worst lineups...

With Thomas:

816.79: Crosby Kotsay Johnson Kendall Swisher Ellis Bradley Chavez Thomas
816.84: Swisher Kotsay Johnson Kendall Crosby Ellis Bradley Chavez Thomas
816.92: Crosby Kotsay Johnson Kendall Swisher Bradley Ellis Chavez Thomas
816.97: Swisher Kotsay Johnson Kendall Crosby Bradley Ellis Chavez Thomas
817.05: Kotsay Ellis Swisher Kendall Crosby Bradley Johnson Chavez Thomas

With Payton:

799.02: Payton Kotsay Swisher Kendall Crosby Ellis Bradley Johnson Chavez
799.11: Payton Kotsay Crosby Kendall Swisher Ellis Bradley Johnson Chavez
799.15: Payton Kotsay Swisher Kendall Crosby Bradley Ellis Johnson Chavez
799.24: Payton Kotsay Crosby Kendall Swisher Bradley Ellis Johnson Chavez
799.59: Payton Kotsay Swisher Kendall Crosby Ellis Bradley Chavez Johnson

The perl code is below, for those of you with the Unixness for these things...




#!/usr/bin/perl
use Algorithm::Permute;

# put players and their obp/slgs here
my @pname = ('Ellis','Bradley','Chavez','Payton','Johnson','Crosby','Swisher','Kotsay','Kendall');
my @pobp = (.351,.355,.354,.322,.353,.346,.347,.332,.333);
my @pslg = (.426,.447,.479,.432,.462,.453,.455,.414,.338);

# formulae from http://www.beyondtheboxscore.com/story/2006/2/12/133645/296
my @obpx = (2.997,2.255,2.141,1.670,2.254,1.346,1.528,1.188,2.550);
my @slgx = (.931,1.263,.933,1.504,1.146,1.237,1.164,.825,.539);
my $constant = -5.261;

my $slots = 9;
my @array = (0..($slots-1));

Algorithm::Permute::permute {
        my $lineup = "";
        $rpg = $constant;
        for (my $i=0; $i<$slots; $i++) {
                $rpg += ($obpx[$i] * $pobp[$array[$i]]) + ($slgx[$i] * $pslg[$array[$i]]);
                $lineup .= $pname[$array[$i]] . " ";
        }
        print 1.00*(int($rpg*16200)/100) . " " . $lineup . "\n";
} @array;

# run the program from the command line like this:  ./permute.pl | sort -n >somefilename.txt

Comments
2006-02-21 13:07:45
1.   Bob Timmermann
If I can get that program to run from the terminal emulation on my Mac, I will officially become a geek won't I?
2006-02-21 13:16:11
2.   Ken Arneson
Yes. Especially if you can figure out how to download and install Algorithm::Permute from cpan.org.
2006-02-21 14:51:16
3.   Roman
This formula insists on batting Kotsay eighth and Kendall ninth.

"I'm smarter than any stinkin' formula!"

--Ken Macha

2006-02-21 14:58:08
4.   Zachary D Manprin
Cool.

I would really like to see data involving pitches per plate appearance (P/PA). The OBP and SLG statistics are great starting points.

The comments are great for leading up to a few posts this week.

2006-02-21 15:47:50
5.   Ken Arneson
Beyond the Boxscore has a new formula based on DH-only leagues, which for some reason really minimizes the value of the #3 spot in the order. Using those numbers puts Kotsay in the #3 slot most of the time.

That's really weird, so I'm starting to question those numbers. Either that, or we need to have a radically different view of batting orders when there's a DH than we're used to.

2006-02-21 15:52:53
6.   Ken Arneson
P/PA would be cool, as would having L/R splits. I don't know of any existing projections with L/R splits, but I suppose you could calculate a L/R Marcel projection.
2006-02-21 16:50:53
7.   For The Turnstiles
It's pretty easy to see what's going on here. If the model says that the #9 hitter has the smallest effect on run production (which is probably true, but not to the extent that the original version claimed), then you'll certainly want to hide your least productive hitter there. And if it further claims (as in the revised version) that, of the 1-8 slots, slugging matters least for the #3 hitter, then you'll probably want to put Kotsay, with the lowest projected slugging other than Kendall, at #3, especially on a team with such a small range of expected OBPs.

The question is whether to believe the model in the first place. It looks to me like the noise in the data is so great, that there isn't much that can be salvaged here. In any case, this seems to be a somewhat perverse way of trying to solve the problem of optimizing a batting order. Simulations are simpler and likely to give more useful results.

The drawback of simulations is that it takes around 100K games with a fixed lineup to get precision on the order of .01 runs/game, so it would be somewhat prohibitive to do it for all possible permutations. But that fact should also tell you why it's so hard to draw any useful conclusions from a few years of historical data.

Something like the following might be interesting, though: run a simulation with a typical lineup (based on league averages for each slot), and then vary OBP/SLG in each slot slightly to get a table like Morong's. The coefficients should look considerably less random than what we have here. Then you could apply Ken's script to any actual group of players. There would be some circularity in logic here (it would generate lineups that are optimal, given the constraint that they look something like the lineups that managers actually use), and this can be seen as either a bug (it might miss a better answer) or a feature (it give answers that have some chance of actually being implemented).

I'm also looking forward to seeing what mgl/tango have to say about this subject in The Book.

2006-02-22 10:09:02
8.   Ken Arneson
Simulations are simpler than a 10-line script?

I get what you're saying, Turnstiles. Still, it seems a waste to disregard real game data, and use simulated data instead. Maybe there really is something going on here that we wouldn't capture in simulation. Maybe a hybrid solution would be better?

2006-03-16 11:43:33
9.   Dennis
What season stats did you use? Or are you using career stats? I started writing a simulator in C to churn through the permutations that is similar to the one that salb918 over at beyondtheboxscore.com wrote in MATLAB, but when I plugged in 2005 stats for your above lineup (using Payton), my estimated runs per season clocks in way lower than what you got. I'm double checking to make sure I didn't make any typos in the stats (I'm using PAs, BBs, hits, 2Bs, 3Bs, HRs, SOs), but I would'nt have expected to be 100 runs lower than what you got.

Comment status: comments have been closed. Baseball Toaster is now out of business.