keyboard_arrow_uptop

Jesse Krailler runs Modern Box Score, which is experimenting with a new way of visualizing everything that happened in a baseball game. We're big fans of it, so we asked him to take us through the designs he discarded and the decisions he made.

Data visualization is hot right now. Very hot. DataVis, as the cool kids call it, is so hot that there seems to be a conference on the subject in a major city once I week. I am in an airport, returning from one such conference. That fact is only relevant to this story, because this week’s conference was the second I’ve had the pleasure of attending in person, and the first is where this whole thing got started.

In the fall of 2013, I attended a large Data Visualization conference that had a small afternoon session on sports visualizations. Most of the presenters were discussing work they had done with sports primarily popular in Europe – namely soccer and rugby. The only U.S.-based presenters had done their work in either basketball or football. With baseball having, by far, the most publicly available historic data, I was surprised that it didn’t even make an appearance in this session.

One of the presentations I watched that day was an attempt to illustrate the individual performance of soccer players over the course of a game. The developers were using a series of glyphs representing players to show things like time of possession, shots and fouls. There's little variety to events that happen in a soccer match, so the amount of information to glean from the visualization wasn’t huge.

I got back to my room that night with these presentations stuck in my head. Baseball, I thought, is desperately lacking for interesting visualizations. I’m willing to bet that 95 percent of baseball analytics articles contain, at most, heat maps and/or scatter plots. Most of them are just text and tables. While these articles contain great analysis and insights, many don’t keep my attention. I like pictures. Pretty ones, at that.

Between major-league games (almost all in my city of residence, Cincinnati) and minor league games within driving distance, I usually get to watch 5-15 games live each season. Contrary to my anti-hoarding approach to life, I have a file drawer full of scorecards from games that I’ve attended. What I like about these scorecards, and the reason I save them, is that they give me enough information to recreate the game in my head. When I become senile, my only earthly possessions will be these scorecards, and I will play these games over and over in my mind’s eye. I’ll pretend that I remember being there, but I can’t possibly, and it won’t matter because I have all the information I need to keep that illusion alive.

All that leads me to the challenge I set for myself: create a picture that shows the action of an individual baseball game.

Sounds easy enough. It certainly doesn’t sound like something that could take the better part of two years.

I am a programmer, not an artist, so naturally my first creation attempt came with a screen in front of me instead of a piece of paper. One of the first things I learned about data visualization is that people like circles. Curves are, in general, more appealing to the human eye than corners. So naturally, I started with a circle.

Draft 1.

The most basic way to divide up a game is by inning, so I think it makes sense to divide the circle up into innings. I wanted my plot to get larger as the game went on, so I split the circle into rings, one for each inning, starting with the innermost ring and working outward. We already have this language of ‘top’ and ‘bottom’ when we talk about half-innings, so dividing the circle in half like an inning makes sense.

The next natural divisions of a game, after innings, are batters within an inning. For each half inning, we now have half of a ring, so let’s take that area and divide it equally into sections, one for each batter. This gives you some information visually about performance, since more plate appearances should increase the likelihood of a run-scoring inning.

But now we need some way of knowing what each batter did. More circles? Sure, more circles.

Only, let’s call them dots this time. Let’s add a black dot to the end of each batter’s section that resulted in an out. We know there are a fixed number of them, so let’s start there. Now let’s add a white dot if the batter got a hit.

This first plot is really ugly, mostly because of the color choices, but from a visualization standpoint there's another problem: marking outs is, in a lot of cases, redundant. We know that the final plate appearance of an inning will always result in an out. We also know that if there are only three plate appearances in an inning, those are all outs. There are only a few marks that can be functionally used in a small space, so to waste the most obvious one, the black dot, on outs didn’t seem to be the best solution.

Draft 1b.

For a time, I continued pursuing the circular plot. I thought about using diagonal lines to show base hits. If I were purely concerned with aesthetics, I might have continued down this road, but it was just too hard to read practically.

One thing I always want to know when I read a traditional box score is, “Where in the order did the production come from?” For some reason I love when the seven, eight and nine batters are responsible for a win. Maybe it’s just cheering for the underdog. If I used different shades for each batter, getting darker as the order progressed and then reverting when the lineup turns over, I could easily see if the hits were coming from the top, middle, or bottom of the order.

Draft 2.

I don’t remember the exact moment I decided to change the form from circles to rectangles, but I almost feel that it’s important enough to the story to fabricate an event around the occasion. Let’s just say I was cleaning the bathtub, slipped, hit my head on the back of the toilet and there it was.

The corners of the rectangle fit perfectly with this plot. The shape naturally has two ninety degree ‘turns’ in the middle of each half. An out changes the dynamic of an inning completely, and should be emphasized. One thing I wish I could see in a traditional box score was production with no outs, one out or two outs, which we can easily digest quickly with this three-sided rectangular inning shape.

Now that I had the outs issue settled, the marks fell easily into place. One, two or three black dots for a single, double or triple. A diamond for a home run. A white dot for a walk.

I still hadn’t yet tackled the issue of runs, though. I decided that dark outlines around the boxes would be a good choice, but I had two choices. When I keep score live, I circle an action that results in a hit, in addition to marking that the runner scored in his own box. It’s mostly like an RBI, but less rigid. I could go this route and draw the boxes around the plays on which a runner scored, but then I would be required to draw multiple lines around a single box, up to four.

This could have been a space issue, but more importantly a baseball philosophy issue. Less emphasis should be made of RBIs and more should be made of total bases, correct? I doubt you’d be reading this publication if you thought otherwise.

So that settles it. The boxes would be drawn around the batter who would eventual score. Each rectangle would have a maximum of one outline and counting the outlines will give you the final score. I may or may not have confirmed this decision with a doodle made during a conference call that had nothing to do with baseball.

Draft 2b.

The white-board doodle above also shows an early attempt at showing pitching changes. I love the drama of a mid-inning pitching change. I also love knowing how ugly a situation is when a pitcher leaves or enters a game. It’s really hard, if not impossible, to decipher this from a traditional box score. Eventually the dark lines in this drawing changed to triangles. The triangles are also nice because they naturally remind you which direction you’re moving in the plot; they rotate with each out so they're always pointing toward the next batter.

While simultaneously making all of these design choices, I was also struggling with the technology of this whole undertaking. I needed a way to get the data, some basic math to figure out where all the elements were to be placed in relation to each other, and a plotting system that could handle all this.

Luckily, my day job requires me to be well versed in R, which can handle all of these steps. It’s easy to doodle rectangles, but it’s much harder to take a table full of words and letters and translate them to coordinates. I ran into all sorts of programming bugs along the way, and am still finding them. Pinch runners who eventually scored were a real challenge. Also if the play description involved a player named “Walker,” it would see the “Walk” and put a white dot there. Jarrod Saltalamacchia’s name is too long for Baseball-Reference's code, apparently. It randomly shortens it to Saltalamacchi depending on which table you’re looking at. And I will never again cheer for a player with a space in his last name. I’m looking at you, Scott Van Slyke.

Final Draft.

This is the final product, a Modern Box Score. This is the opening game of the 2015 season.

To remind you of what this means, now that you have a final product to map it against:

Each batter is one rectangle and each "ring” of rectangles is an inning (starting from the inside, moving out). If an out is made, the bars turn right and continue on the next side of the ring. If the result of the plate appearance is not an out, the next bar is placed next to the previous one. If the batter eventually scores, a box is drawn around their plate appearance. The color of the bars represents the spot in the batting order (lightest gray = lead-off, black = ninth spot). Walks are white dots; doubles are double dots; homers are diamonds. You can see how many pitchers there were and the situation they faced entering and leaving; you can see the rhythm of scoring, both within innings and within the game. And, as the game nears its conclusion, in the late innings where leverage so often goes up, the bars get bigger, too. In a close game, the bars match the sense of importance on each plate appearance.

I’m still tweaking things on the programming side, but for the most part, the only hurdle I have to overcome now is my impatience while waiting for Baseball-Reference.com to update every day. I’m very happy with the product I have now, but that doesn’t mean that it won’t change over time.

The whole point of this product was to make an image of a baseball game that’s readable and digestible for baseball fans, not just one baseball fan (me), so I’d love to hear your thoughts.

Jesse Krailler is a life-long resident of Cincinnati, Ohio, where he has a day job working as a statistical programmer. He has a lovely wife, two small children, a sentimental affinity for Marty Brenneman and likes making pretty pictures out of baseball data. Contact him here.

You need to be logged in to comment. Login or Subscribe
apbadogs
4/14
I honestly don't know how to figure out that graph.
GreenvilleGent
4/14
Pokeball?
GreenvilleGent
4/14
"Mookie Betts, I choose you!!!"
Chesty
4/14
The acronym KISS would be a fine ending to this Modern Box Score.Let's put the box aside and get Pete Rose in the Hall of Fame.
Kongos
4/14
Interesting idea, but I think that FanGraphs has already done the same thing, in a much more easily understood way: http://www.fangraphs.com/wins.aspx?date=2015-04-13&team=Cubs&dh=0&season=2015 In a way, the problem with your version is that there is still too much data -- and the wrong kind of data. What does it matter how many outs there are? Why is that the variable that defines the shape of the graphic? In the FanGraphs version, the score drives the graphic. You can see that the game was all Reds until Soler hit his home run and Votto got his double.
redguy12588
4/14
The picture for this post on the home page looks like the album art from the new modest mouse album
whatzitmather
4/14
I'll offer a contrary opinion and say that I think this is a cool way to visualize a game, if you're willing to spend a little time getting familiar with it (it gets easier when you've seen a few of them). This isn't intended to give you a complete play by play summary or show you swings in Win Expectancy. It's a replacement for a boxscore - something you can look without scrolling/clicking around and get an idea of where the action happened. Most people would probably look at a scorecard and say that they "don't know how to figure it out". It takes a modicum of effort to know how to keep score and interpret someone's card - same thing is true here. Nice work Jesse - I'm sure you'll come up with further improvements, but thanks for sharing the creative process.
GoTribe06
4/14
Completely agree! This graphic seems to make it easy to recreate a game's important events, which is exactly what I do with my old scorecards. Thanks for the look behind the design. I think a good litmus for a design is how obvious it seems once it is complete. I imagine starting with a circle would be the common approach, but the square, with outs as the corners, just seems so obvious. As a sometimes HMI developer, I appreciate the balance of complexity (shading the batter, indicating pitching changes) with a clean and attractive appearance.
swarmee
4/14
I get it; but without a companion lineup, knowing who the 3rd batter for the bottom team is omits important data. And the greyscale look, while clean, is not attractive. Maybe it was done for future printing in newspapers, but most people seeing it would look right past. In other words, I can see how the game unfolded, but I don't know which relievers came in, or if Mike Trout hit a home run unless I have something else to go on. So it's less effective than a 19th century box score in that way.
swarmee
4/14
I'd also recommend adding inning numbers on the X-axis. Also considering making the third out of the bottom of the inning line up with the first batter of the second inning, in a sort of spiral?
swarmee
4/14
Something like this? Paste to text if formatting doesn't work. ____________ | | | ________ | | | | | | | | | 2 1 1 2 | | ________| | | __________|
swarmee
4/14
Something like this? Paste to text if formatting doesn't work.
  ____________
 |            |
 |  ________  |
 | |        | |
 | |        | |
 2 1        1 2
          | |
  ________| | 
             |
   __________|
swarmee
4/14
I had "" for the third outs in the first and second innings, but they are not displaying.
swarmee
4/14
Your software is omitting backslashes.
mbabramo
4/14
The beauty of this is that every play is visualized, but relative to traditional scoring, it sacrifices the ability to see how any particular player performed. There should be a version that has the list of players on the right side (including notes about substitutions). To make it easy to follow a particular player, a color gradient instead of black and white should be used. This would make it easier than with traditional scoring to see, e.g., exactly what situations Mike Trout batted in and how he performed. Right now, it's a lot harder. Also, this would make the top/bottom half distinction much more intuitive. It took me a while to understand that. Another objection is that you don't use the diagonals. It seems to me that this would be an excellent place to note how the out was recorded, perhaps with just one letter (K, F, G, etc.). The third out could be recorded in a line dividing the top half and bottom half, with lines or spacing used to associate them with the graphics below or above them. I understand, of course, that you want to avoid too much information, but I think a discrete letter on the diagonals would be fairly minimalist. (By the way, you could also add a couple more glyphs. I think it odd that someone who reaches on an error or gets hit by a pitch gets no glyph.) The most radical suggestion I would make would be to turn this into a hexagon rather than a square. The top of the inning first out would go from 9:00 to 11:00, second out from 11:00 to 1:00, third out from 1:00 to 3:00; the bottom of the inning would go from 3:00 to 5:00, 5:00 to 7:00, then 7:00 to 9:00. This would serve two purposes. First, it would again highlight the bottom half/top half distinction. One thing that makes this counterintuitive now is that the top of the inning looks visually like a continuation of the bottom half. (Denoting the third out would help but not solve this problem.) Second, you are devoting much more space to one out situations than situations with zero or two outs. This makes no sense to me. If you made an equilateral hexagon, that would no longer be an issue. Of course, an alternative would be to make the box score more vertical (twice as vertical as horizontal). A further step would be to make this interactive, so that rolling the mouse over or touching gives more detail about a play (and maybe the players).
ravenight
4/14
I think this visualization is pretty awesome, and the suggestions above (especially the hexagonal shape to de-emphasize one-out situations, and the use of the diagonals / dividing line to denote how the out was made) would improve it. I also think that the spiral idea is a potentially interesting one, emphasizing the flow of the plays. One way to add the type of outs might be to extend each box into the adjacent space (perhaps with an extra trapezoidal section in the case of the hexagon layout) to visually tie the indicator (F, G, K) to the AB during which it occurred. This would also help display mid-AB outs like pick-offs or CS (you could extend the color of the AB across both edges, with a CS or PO in the spot for out type) and double-play outs (though I suppose that those could be represented with a gap on the corresponding edge - in other words, if there's a DP with 1 out, then there was no AB with 2 outs). It's true that this display makes it tough to pick out the performance of an individual. I wonder if it would be useful to cycle through some set of background textures for the boxes to allow the shading to be more distinct. For example, if you have empty texture, a light cross-hatch, and a light pinstripe or herring bone, then you could have 3 shades of gray / black, and perhaps switch to a dark blue for subs. Then even a pinch runner could be displayed by swapping the color/texture combo in the middle of a box. So given all of that, perhaps the display would now be too cluttered. It would also still fail to indicate how and when runners advanced on plays like a wild pitch, sac fly/bunt, SB, etc. In most cases that's not too important, but it certainly loses some of the cool details of a game if it isn't there. There's also the question of reached-on-error vs. hit. If you really want to go nuts, you could potentially treat each AB as a timeline, putting a little tick on the top/bottom for each pitch (to indicate ball/strike), and drawing a line of colored squares to indicate base-state at the start of an AB and at any moment it changes during an AB. So an AB might look like |,'',** where the line at the front shows the base-state, the commas are strikes, the quotes are balls, and the two stars indicate a double. Of course, this could become quite cluttered and downright horrible if there was, say, a 7-run first inning; this would work better as the "expanded display" of an AB. So perhaps the real answer is that an interactive display is the only way to keep the nuances but leave a clean presentation - if something looks odd, you can drill down, but otherwise you leave the detail out.
tylersnotes
4/14
i think the promise of this is to be fulfilled when this can be an interactive graphic. touch or mouseover each outcome to get the play in a highlight, for instance. the box score itself was a novel invention and purposely included (and excluded) items at the creator's discretion (or whimsy, as some may argue). It was a great format for newsprint. it's also not particularly intuitive if you aren't already familiar with baseball, and its numbers don't tell you anything about the cause, just the outcome (runs, hits, errors). I think there are 2 potentially conflicting arguments here. One is that with the amount of information at our fingertips, the box score idea itself is unnecessary; we can get the game recap in other ways that allow for more information to be processed but maybe don't allow as easily for quick scanning. The second is that most of us want things in smaller and smaller tidbits. I think this tool attempts to find the balance there. It looks weird at first but if you spend some time with it, it makes a lot more sense.
whatzitmather
4/14
Completely agree with there being a lot of potential in a mouseover/interactive version. Just one thought, mouseover to see something like LI at start of event or WPA added from event. Or a dropdown of options that would change what was displayed when each event box was clicked/hovered over... One way around the lack of names would be do as some scorekeeping methods do - just use uniform number. A legend on the side could supply the actual names.
DetroitDale
4/14
There's nothing wrong with the current box score. Things don't have to change every few years just for the sake of change.
tylersnotes
4/14
I think this is objectively not true. The current box score was created some time around 1850 and does not include anything that would reflect some of the skills that we now know to be a major part of the game-- on base skills being the most obvious. Hits, Runs, and Errors may have been Henry Chadwick's favored measures of productivity but they can't give a complete picture of a game, which is the goal of a box score. How many times did a team get on base? How did they get on base? What happened once they got there? This info can be achieved without too much effort, and it can give us a ton of information about the game that the current box score doesn't provide until you dig down into the individual line scores for pitchers and players and start inferring outcomes for each inning. Whether or not this approach is an adequate solution to the issue can be debated, but people have been pointing out what's wrong with the current box score at least since Bill James. You can actually draw a pretty clear line through the creation of Chadwick's box score to the under-emphasis on on base skills through baseball history.
lsommerf
4/15
It all depends what you are looking for, of course, but Bill James suggested a revised box score 30 years ago that captured every at bat about as well as a scorecard, using about as little space as a newspaper box score. His method, meant for newspapers at the time, was not graphical, yet captured more information than what is suggested here. So while this graphical representation is interesting, and the possibilities as it further develops are intriguing, for pure usefulness and picturing year's later what happened in the game, the Bill James method still seems superior to me.
84538411
4/14
but I'm not even on your lawn
worldtour
4/14
Wait. Where are the names again? Nice effort but not even Olt fits, forget Saltalamacchia!
davescottofakron
4/14
Word missing in second paragraph? "work they done"
Hookalakah
4/14
The utility of this box score versus the traditional boxscore depends, of course, on what information the creator wants to share. This is more of a story than a summary. It tells the story of a game in an appealing, easy-to-understand way, I think. I'd like more ability to summarize myself, though. For example, simply put the player's uniform number in each box, and forget the name. We can look that up. I'd also like to know how each out was made, in a clean typeface (like Helvetica). An unobtrusive K or 4-3, maybe shaded a little, wouldn't crowd the box. Nice job, Jesse.
MartnAR
4/14
I like this though I do agree that there are some things that can be tweaked to make it more friendly. I had to look at three or four of these with the game recap in order to understand what information was being displayed. That said, having the lineups and pitchers used would help to understand the situation. Also, switching to a hexagon sounds like a great idea - not so sure about the spiral though. Finally, maybe you could colour code it to match the teams that are playing rather than have them all go from grey to black. Other than this, I think you're onto something great Jesse! Keep us posted about how your boxscore keeps evolving.
ptakers
4/14
My main issue is this - how in the world can I possible see at a glance how Anthony Rizzo did in that game? A tradition boxscore takes milliseconds to get that info,
adamsternum
4/14
this is crazy and silly and why I subscribe to BP. One day it would be cool to have a drag and drop design your own box score tool, which would then be updated with the feed of MLB results.
hotstatrat
4/14
Perhaps, I'm an old coot, but this doesn't do it for me. A regular scoresheet doesn't take up much more room than this - gives more details of how each inning developed, is much more instrinsicly easier to follow, and allows you to see what each individual did. That's what we use to recall a game - a boxscore is mainly for seeing how individuals did and this doesn't address that at all.
matrueblood
4/14
I wonder if folks are getting hung up on the name for this thing, instead of appreciating what it really is. It's not a box score; it doesn't begin with the same intention or achieve the same things that a box score does. But under a different name, I hope everyone can see that it's a neat shorthand version of the game. Like a more richly detailed *line* score.
kvamlnk
4/14
Exactly. It's an alternative to a score sheet, NOT the newspaper box score. That's clear from the opening paragraphs. The title is a bit misleading. And I totally agree with the comments above that this makes the most sense as an interactive framework for "viewing" a game as an alternative to a score sheet.
edwinblume
4/14
What would a double play look like? A triple play? In the top of the second, the Cardinals inning ends with a hit. That's not the way it usually works, unless someone is thrown out on the bases. Indeed, the play-by-play shows the inning ending on a caught stealing. That's not really shown well in the graph. This is an intriguing concept. I hope you keep going with the idea.
lipitorkid
4/15
I suggest the following four items for the diagonal. Starting at the top left and working clockwise: 1. Drinks consumed: dot for a beer, empty circle for anything else 2. Food consumed: dot for a hotdog, circle for nachos or peanuts, triangle for meal items, square for anything sweet. with both of the above you can quickly see how boring/nervous a game was 3. A dot every time something happened in the inning that a Cardinal fan would clap for. A double dot for a standing ovation, a disapproving face if it reminded you of Puig or anything fun. 4. A star for every inning where you wish a friend could have seen what you saw. A face with the eyes crossed out when you saw an inning that made you wish your friend and not you had gone to the game.
kalimantan
4/15
I'll just say that I love the idea, I love different ways of representing information in graphical form. The hexagon / rainbow colours approach sounds an interesting adaptation.
tylersnotes
4/15
i would like to see this using numbers instead of player names on jackie robinson day