Jesse Krailler runs Modern Box Score, which is experimenting with a new way of visualizing everything that happened in a baseball game. We're big fans of it, so we asked him to take us through the designs he discarded and the decisions he made.
Data visualization is hot right now. Very hot. DataVis, as the cool kids call it, is so hot that there seems to be a conference on the subject in a major city once I week. I am in an airport, returning from one such conference. That fact is only relevant to this story, because this week’s conference was the second I’ve had the pleasure of attending in person, and the first is where this whole thing got started.
In the fall of 2013, I attended a large Data Visualization conference that had a small afternoon session on sports visualizations. Most of the presenters were discussing work they had done with sports primarily popular in Europe – namely soccer and rugby. The only U.S.-based presenters had done their work in either basketball or football. With baseball having, by far, the most publicly available historic data, I was surprised that it didn’t even make an appearance in this session.
One of the presentations I watched that day was an attempt to illustrate the individual performance of soccer players over the course of a game. The developers were using a series of glyphs representing players to show things like time of possession, shots and fouls. There's little variety to events that happen in a soccer match, so the amount of information to glean from the visualization wasn’t huge.
I got back to my room that night with these presentations stuck in my head. Baseball, I thought, is desperately lacking for interesting visualizations. I’m willing to bet that 95 percent of baseball analytics articles contain, at most, heat maps and/or scatter plots. Most of them are just text and tables. While these articles contain great analysis and insights, many don’t keep my attention. I like pictures. Pretty ones, at that.
Between major-league games (almost all in my city of residence, Cincinnati) and minor league games within driving distance, I usually get to watch 5-15 games live each season. Contrary to my anti-hoarding approach to life, I have a file drawer full of scorecards from games that I’ve attended. What I like about these scorecards, and the reason I save them, is that they give me enough information to recreate the game in my head. When I become senile, my only earthly possessions will be these scorecards, and I will play these games over and over in my mind’s eye. I’ll pretend that I remember being there, but I can’t possibly, and it won’t matter because I have all the information I need to keep that illusion alive.
All that leads me to the challenge I set for myself: create a picture that shows the action of an individual baseball game.
Sounds easy enough. It certainly doesn’t sound like something that could take the better part of two years.
I am a programmer, not an artist, so naturally my first creation attempt came with a screen in front of me instead of a piece of paper. One of the first things I learned about data visualization is that people like circles. Curves are, in general, more appealing to the human eye than corners. So naturally, I started with a circle.
The most basic way to divide up a game is by inning, so I think it makes sense to divide the circle up into innings. I wanted my plot to get larger as the game went on, so I split the circle into rings, one for each inning, starting with the innermost ring and working outward. We already have this language of ‘top’ and ‘bottom’ when we talk about half-innings, so dividing the circle in half like an inning makes sense.
The next natural divisions of a game, after innings, are batters within an inning. For each half inning, we now have half of a ring, so let’s take that area and divide it equally into sections, one for each batter. This gives you some information visually about performance, since more plate appearances should increase the likelihood of a run-scoring inning.
But now we need some way of knowing what each batter did. More circles? Sure, more circles.
Only, let’s call them dots this time. Let’s add a black dot to the end of each batter’s section that resulted in an out. We know there are a fixed number of them, so let’s start there. Now let’s add a white dot if the batter got a hit.
This first plot is really ugly, mostly because of the color choices, but from a visualization standpoint there's another problem: marking outs is, in a lot of cases, redundant. We know that the final plate appearance of an inning will always result in an out. We also know that if there are only three plate appearances in an inning, those are all outs. There are only a few marks that can be functionally used in a small space, so to waste the most obvious one, the black dot, on outs didn’t seem to be the best solution.
For a time, I continued pursuing the circular plot. I thought about using diagonal lines to show base hits. If I were purely concerned with aesthetics, I might have continued down this road, but it was just too hard to read practically.
One thing I always want to know when I read a traditional box score is, “Where in the order did the production come from?” For some reason I love when the seven, eight and nine batters are responsible for a win. Maybe it’s just cheering for the underdog. If I used different shades for each batter, getting darker as the order progressed and then reverting when the lineup turns over, I could easily see if the hits were coming from the top, middle, or bottom of the order.
I don’t remember the exact moment I decided to change the form from circles to rectangles, but I almost feel that it’s important enough to the story to fabricate an event around the occasion. Let’s just say I was cleaning the bathtub, slipped, hit my head on the back of the toilet and there it was.
The corners of the rectangle fit perfectly with this plot. The shape naturally has two ninety degree ‘turns’ in the middle of each half. An out changes the dynamic of an inning completely, and should be emphasized. One thing I wish I could see in a traditional box score was production with no outs, one out or two outs, which we can easily digest quickly with this three-sided rectangular inning shape.
Now that I had the outs issue settled, the marks fell easily into place. One, two or three black dots for a single, double or triple. A diamond for a home run. A white dot for a walk.
I still hadn’t yet tackled the issue of runs, though. I decided that dark outlines around the boxes would be a good choice, but I had two choices. When I keep score live, I circle an action that results in a hit, in addition to marking that the runner scored in his own box. It’s mostly like an RBI, but less rigid. I could go this route and draw the boxes around the plays on which a runner scored, but then I would be required to draw multiple lines around a single box, up to four.
This could have been a space issue, but more importantly a baseball philosophy issue. Less emphasis should be made of RBIs and more should be made of total bases, correct? I doubt you’d be reading this publication if you thought otherwise.
So that settles it. The boxes would be drawn around the batter who would eventual score. Each rectangle would have a maximum of one outline and counting the outlines will give you the final score. I may or may not have confirmed this decision with a doodle made during a conference call that had nothing to do with baseball.
The white-board doodle above also shows an early attempt at showing pitching changes. I love the drama of a mid-inning pitching change. I also love knowing how ugly a situation is when a pitcher leaves or enters a game. It’s really hard, if not impossible, to decipher this from a traditional box score. Eventually the dark lines in this drawing changed to triangles. The triangles are also nice because they naturally remind you which direction you’re moving in the plot; they rotate with each out so they're always pointing toward the next batter.
While simultaneously making all of these design choices, I was also struggling with the technology of this whole undertaking. I needed a way to get the data, some basic math to figure out where all the elements were to be placed in relation to each other, and a plotting system that could handle all this.
Luckily, my day job requires me to be well versed in R, which can handle all of these steps. It’s easy to doodle rectangles, but it’s much harder to take a table full of words and letters and translate them to coordinates. I ran into all sorts of programming bugs along the way, and am still finding them. Pinch runners who eventually scored were a real challenge. Also if the play description involved a player named “Walker,” it would see the “Walk” and put a white dot there. Jarrod Saltalamacchia’s name is too long for Baseball-Reference's code, apparently. It randomly shortens it to Saltalamacchi depending on which table you’re looking at. And I will never again cheer for a player with a space in his last name. I’m looking at you, Scott Van Slyke.
This is the final product, a Modern Box Score. This is the opening game of the 2015 season.
To remind you of what this means, now that you have a final product to map it against:
Each batter is one rectangle and each "ring” of rectangles is an inning (starting from the inside, moving out). If an out is made, the bars turn right and continue on the next side of the ring. If the result of the plate appearance is not an out, the next bar is placed next to the previous one. If the batter eventually scores, a box is drawn around their plate appearance. The color of the bars represents the spot in the batting order (lightest gray = lead-off, black = ninth spot). Walks are white dots; doubles are double dots; homers are diamonds. You can see how many pitchers there were and the situation they faced entering and leaving; you can see the rhythm of scoring, both within innings and within the game. And, as the game nears its conclusion, in the late innings where leverage so often goes up, the bars get bigger, too. In a close game, the bars match the sense of importance on each plate appearance.
I’m still tweaking things on the programming side, but for the most part, the only hurdle I have to overcome now is my impatience while waiting for Baseball-Reference.com to update every day. I’m very happy with the product I have now, but that doesn’t mean that it won’t change over time.
The whole point of this product was to make an image of a baseball game that’s readable and digestible for baseball fans, not just one baseball fan (me), so I’d love to hear your thoughts.
Jesse Krailler is a life-long resident of Cincinnati, Ohio, where he has a day job working as a statistical programmer. He has a lovely wife, two small children, a sentimental affinity for Marty Brenneman and likes making pretty pictures out of baseball data. Contact him here.