ABOUT COSMIC LOG

Quantum fluctuations in space, science, exploration and other cosmic fields... served up regularly by MSNBC.com science editor Alan Boyle since 2002.

Alan Boyle covers the physical sciences, anthropology, technological innovation and space science and exploration for MSNBC.com. He is a winner of the AAAS Science Journalism Award, the NASW Science-in-Society Award and other honors; a contributor to "A Field Guide for Science Writers"; and a member of the board of the Council for the Advancement of Science Writing.

Check out Boyle's biography or send a message to Cosmic Log via cosmiclog@msnbc.com.



The science of baseball stats

Posted: Monday, March 24, 2008 11:50 AM by Alan Boyle


NIST / Notre Dame
 A wind-tunnel test shows
 the turbulent air flow
 around a baseball.

Was there ever a team sport better-suited for statistical modeling than baseball? The heart of the game involves one pitcher vs. one batter at a time, allowing for a dizzying array of individual statistics. The regular season, as well as the typical player's career, will generally last long enough to build up an encyclopedia's worth of those statistics.

No wonder so many statisticians and physicists love to theorize about the game's winning factors - and no wonder new statistics are being created on a regular basis.

Batting averages and earned-run averages were just the start: Nowadays, you can track win shares and win probability, defense-independent ERA and range factor. But there's always a farther frontier for baseball analysis, and a couple of new twists came to light at last month's annual meeting of the American Association for the Advancement of Science.

As Major League Baseball kicks off the new season this week, here are a few Web links you just might get a kick out of - even if you're not a fan of the game:

SAFE or out?
The University of Pennsylvania's Shane Jensen caused a stir with his proposed method for judging fielding performance, involving a new statistic called spatial aggregate fielding evaluation, or SAFE.

"Things like hitting or pitching are a little bit easier to quantify ... because they're easy to tabulate. There's a finite number of outcomes," Jensen said. "Fileding is a much more challenging endeavor because you're trying to estimate people ranging toward a ball in play on a continuous surface."

SAFE uses mathematical modeling to determine the "overall measure of fielding quality" for each player in the 2002-2005 period - that is, how many runs each fielder saved or cost his team over the course of a season. The stir came about when Jensen noted that Yankees star shortstop Derek Jeter ended up rated as one of the worst fielders in the majors.

That sparked some choice headlines in the home of the Red Sox, the Yankees' archrivals: The Boston Herald headlined its blog item "Science Proves Derek Jeter Does Indeed STINK." Of course, Jeter's spot is still safe - if not because of his fielding, then because of his hitting and his history. Nevertheless,  Baseball Musings' David Pinto explained that statistics like SAFE could make a difference when it comes time to evaluate trades and negotiate contracts.

"You're spending money, where can you get the runs?" Pinto said. "If we do fielding better than we've done in the past, here's a way of saying, 'Oh, I can have an edge over some other team by knowing that this person can save me 10 runs.' And 10 runs is usually a win."

The model manager
OK then, how do you evaluate the contribution of the team manager? Swarthmore College's Steve Wang delved into the machinations of managers - how long they left their starting pitchers in the game, for example, or how many different lineups they used in the course of a season.

"Certain styles might be more effective with certain kinds of teams," he explained in a news release about his research. "A manager who prefers to stay with his starters might be best suited for a team with veteran starting pitching, whereas a team with fragile young arms might do best with a manager who uses his bullpen aggressively."

Wang grouped managers together into clusters, based on the similarities he saw in the statistics. Last year's division-leading managers in the American League - the Red Sox's Terry Francona, the Angels' Mike Scioscia and the Indians' Eric Wedge - clustered together as moderate managers in the pitching-related categories. They didn't get too hot about their pitchers, nor did they play things too cool. But in a follow-up e-mail, Wang cautioned that you can't read too much into that.

"I would be hesitant to put too much weight on that conclusion," he told me, "since I was not systematically looking for such correlations, and it's also not clear which way the cause-and-effect runs (i.e., whether being moderate causes success, or whether being successful enables the manager to be moderate, or yet some other relationship)."

Statistics vs. steroids?
A similar caveat would apply when considering whether statistics could be used to sniff out steroids. Jensen said it would be "incredibly difficult to infer any causation from a statistical analysis."

In an analysis written for The New York Times, Jensen and three of his colleagues at Penn take a look at the Roger Clemens steroid case and confirm that there was something definitely unusual about the pitcher's late-career surge. However, it would be impossible to attribute the surge definitively to steroid use, they said.

Part of the problem is that the information about steroid use in the majors is so murky. Perhaps if the players who used steroids could detail exactly when they used them - say, as part of an amnesty program - statisticians could check for correlations in the performance data.

"I think we can look for people who we might want to test more," Pinto said. "I think if we now see someone in their 30s having a huge career surge, that should raise a red flag. It can happen, but if it happens over two or three years, I might want to test him every month rather than twice a year."

Make your virtual pitch
For more about the science of bats and balls, check out this archived article, and this one, plus this report on last year's "gyroball" controversy).

Elsewhere on the Web, you should visit Alan Nathan's compendium of baseball physics and take a glance at this infographic from the San Francisco Chronicle. The Exploratorium has a great home page for "The Science of Baseball." If you want to find out for yourself just how hard it is to throw a strike, you can try your hand at pitching virtual curveballs, courtesy of NASA's "Aerodynamics of Baseball" Web site. The same site offers a HitModeler applet that shows you how different factors affect the trajectory of a batted ball.

Can you shed additional light on the scientific curiosities of the national pastime? Or would you care to come up with an alternative answer to my opening question, and expound upon the scientific glories of other sports? Basketball, perhaps ... or hockey? Feel free to call 'em as you see 'em in the comment section below.

MAIN PAGE

Email this EMAIL THIS

Comments

there's really no reason to even bother playing 162 games anymore...the last couple of seasons, stats predicted the teams win/lose numbers with startling accuracy...just go directly to the playoffs...which, thankfully still provide anomaly...
Steve--Over the entire season, the stats are very good a predicting who will be in first place.  But they still can't predict who will win an individual game.  Each game is won on the merits of hat particular game and it is still fun to watch and enjoy.  Yeah, the Yankees will always have a good team because they can afford to spend the $. But that doesn't mean the Brewers can't put on a good show.
Are you serious? The NL's best team, Arizona, won 90 games despite a negative run differential. They were a .500 team that played themselves into the best record in the league. There is plenty of variation in a regular season - that's a blanket statement you've made. And I would argue that any well-informed fan could make pre-season picks for standings with quite a bit of accuracy without the numbers.
Ummmm, Maybe Golf is as good or better with statistics. The reason it might be better is that it has a built in normalization system: par. But as in baseball, golf has many not quite helpful statistics. And golf lacks some helpful ones. In baseball, you get a slugging %. In golf, we all know that Tiger has a huge slugging %, but there are no stats to back it up.
just funnin' witcha fellas...besides, as a Red Sox fan, I wouldn't get to experience Manny bein'Manny...which is drama, pathos, and a miracle of nature in and of itself...oh, yeah...Sox/As are in the third...Matsuzaka showing some nerves...baseball in the AM...gotta love it, eh?
i agree that most well informed fans can pick the winners without the numbers, but why the rage against the numbers? that's what i'll never understand. as for the dbacks, what they did is highly unusual, and there's some luck involved as well as maybe some determination from the players. but you will see few and far teams in between that will do that.

You are all wrong! Until there is one division per league and all teams play the same schedule, you will never know who is the best. Anyone can get hot and win a tournament.
It's hard to use statistics about sport when the bets are laid on people that are buying the best statistics for business purposes, funny how they try and stay in front of things!
I still think the Detroit Red Wings will take the World Series, the b******s . . .
Too bad statistics can't predict injuries. As a Cubs fan, I've seen more than my share of season-ruining injuries from glass-bodied players such as Kerry Wood, Mark Prior, Moises Alou and Nomar Garciaparra. All the positive statistics in the world can't save a team that suffers injuries to crucial players at critical points in the season. Maybe it's time for some biological analysis as well...


SEND A COMMENT

PLEASE READ: All comments must be approved before appearing in the thread; time and space constraints prevent all comments from appearing. We will only approve comments that are directly related to the blog, use appropriate language and are not attacking the comments of others.

Message (please, no HTML tags. Web addresses will be hyperlinked):

TRACKBACKS

Trackbacks are links to weblogs that reference this post. Like comments, trackbacks do not appear until approved by us. The trackback URL for this post is: http://cosmiclog.msnbc.msn.com/trackback.aspx?PostID=800181

Latest Tech & Science News

Syndicate This Site

Add Cosmic Log to your news reader:
live.com xml
myyahoo msn
bloglines newsgator
google