May 11, 2005
After a number of years as a software engineer and database designer for Silicon Valley technology firms, Mark Johnson jumped into sports with a consulting job for the MLB commissioner's office. He explored the statistical implications of various potential rule changes on a committee that included John Henry, owner of the Red Sox, and Rob Manfred, head of the Labor Relations Department at MLB. From that experience Johnson got hooked on applying his mathematical background to sports management and eventually ended up serving as Senior Analyst of Baseball Development for the 2004 NL Champion St. Louis Cardinals. Now back in Silicon Valley, Johnson sat down with BP during a recent San Jose Giants game to discuss his background, his experience with the Cardinals, and where he sees the the most valuable applications of sabermetrics, both now and in the future.
Baseball Prospectus: Could you begin by telling our readers a little bit about your academic background? In the baseball industry it is a bit atypical.
Mark Johnson: I was an undergraduate at Indiana University where I studied computer science and mathematics. During five years there I earned a bachelor's degree in computer science and a master's in mathematics. Immediately following that I went to Princeton University, where I earned a master's and a PhD in applied and computational mathematics. My areas of research were mostly in theoretical mathematics, but I tried to mix in as much computational work as I could into solving theoretical mathematical problems. Following that I went to the University of Minnesota for one year as a post-doctoral researcher before deciding to leave the academic world and jump into the Silicon Valley workforce.
BP: You first became involved in sports by doing some consulting for the commissioner's office, is that correct?
MJ: In a professional sense, yes. I actually have experience from my undergraduate years; I was a student manager for Bob Knight during my four years at Indiana. In fact I worked closely with, and was roommates with Lawrence Frank and Calbert Cheaney. I also, of course, worked closely with Coach Knight and some of the assistants. That was my first real exposure to organized sports, but in a professional sense my first experience was working as a consultant to the commissioner's office in the summer of 2003.
BP: What exactly were you working on for the commissioner's office?
MJ: My brother (Matthew S. Johnson, a professor of Statistics at Baruch College of the City University of New York) and I were hired as consultants to the commissioner's initiative, which is a group organized to evaluate possible changes to baseball, and changes to rules. We were brought in to add a little statistical analysis to some of the decisions that the people in the commissioner's office as well as other people within the initiative were looking to address. They had plenty of evidence to support some of these decisions but they wanted to add a statistical component to the final decision-making process. So we helped in that regard.
BP: Did you find the commissioner's office was receptive to that statistical point of view, and was any of the work you did put into action, anything fans might have seen?
MJ: They absolutely were receptive. I was working on a subcommittee consisting of Rob Manfred and John Henry, the owner of the Boston Red Sox, who as many people know is a very analytically-minded person. He was definitely very interested in some of the analytical results that we came up with, and these results were definitely weighed heavily in the final decision.
I can't really speak about the details of what we worked on, however it did have to do with some scheduling questions that they were asking, and the outcome of all those results was that nothing changed. The rules never changed, and whether or not it was the result of our work that led to those decisions is beyond my understanding.
BP: How did you end up working with the St. Louis Cardinals? Who brought you in, and what was your position supposed to entail?
MJ: After I worked for the commissioner's office I was putting in my best effort to try to create a niche of moving analytics into sports, so I spent a lot of time on the phone talking with athletic directors from universities, with commissioners from several different major collegiate conferences, speaking with the NFL commissioner's office, obviously continued speaking with Major League Baseball's commissioner's office, and started to think about how I could start talking to actual clubs. I was put in touch through some colleagues with Jeff Luhnow, who had recently been hired by the St. Louis Cardinals. Upon meeting him and talking with him about some of the ideas that I had and some of the approaches I could bring to the table, Jeff eventually hired me on in January of 2004.
The original charter for my position was to, at a really broad-stroke level, try to bring some innovative technology into baseball decision-making. So there was a combination of both my software engineering skills and database management skills as well as my mathematical skills that were used in the position, both to help collect and disseminate information from large amounts of data. So, in a sense, part of the role was mathematical and another part was software engineering and database related.
BP: How did first experiences with the Cardinals go?
MJ: Jeff was relatively new at the time that I started, so it was a learning process to understand where we would have the most effect within the organization. As you know, the baseball schedule has a lot of very distinct decision-making points throughout the season. For me, when I started in St. Louis in the beginning of February, the first real decision-making point on the horizon, in my eyes, was the June draft. Between the time I started and the beginning of June I first had to learn more about the actual process--how the scouting organization was put together--and also try to understand and collect as much information as I could about high school and collegiate players, and then to try to disseminate something from that information that could help us make better decisions on draft day, or in the months leading up to draft day.
So that was really the first objective, to understand the scouting process, and the draft itself, to try to bring in as much of my mathematical and software skills as I could in time for that draft. Beyond that, the other important deadlines that I played a little bit of a role in were the trade deadline in July as well as the minor league free agent season in October and subsequently the major league free agent season.
BP: It sounds like you were using your mathematical background in player evaluation...
MJ: In the baseball analytical world, at least the way that I look at it, I think there are three categories where statistical analyses can help decision makers. One is at the commissioner's office level, when you evaluate rules to try to understand the impact that rule changes might have on the game. The second level is looking at player evaluation, it's more of a front office decision-making process. That's putting together rosters, trying to understand the value of players to your team or the impact of certain contracts. The third level is the actual in-game strategy, what sort of strategies can you bring to the field.
My experience with the commissioner's office definitely fell into that first category, and my experience with St. Louis definitely fell into the second, which was helping the front office put together rosters, which like you said, falls into the area of player evaluation techniques.
BP: How have the Cardinals balanced various approaches to baseball operations decision-making? There have been guys like you and Jeff Luhnow internally, others on the Board of Advisors, and then there are the more traditional scouts and player development people. How does the front office balance these voices?
MJ: Each one of the decisions that I was somehow involved with were being made by different people in the organization who combined information in different ways. My objective was to basically be the "mathematical scout," and to give these decision-makers the mathematical scouting report, so to speak. In a lot of the decisions that were made last season there was a mix of scouting reports and statistical scouting reports and each person handles the combination of that information differently.
In the case of the amateur draft we have a lot of players that we don't know a lot about. We have scouts that go watch these players only a few times in the course of their careers, so in that sense the statistical information has more of an impact. It's, relatively speaking, a lot more information than we had beforehand. When you look at major league player evaluation, people have a lot of information about those players already. They've been watching them for years. They have many scouting reports on these players, they have a lot of video tape on these players. In my opinion, the statistical information is still valuable in those cases, absolutely it's still valuable, but maybe not, relatively speaking, as valuable as having additional statistical information on the larger pool of players from whom you need to make your selections. So at the amateur draft I think statistical evaluation has more impact, is the conclusion I've come to, rather than at the major league level.
BP: Without getting into any areas that are covered by your non-disclosure agreement, is the way that a team like the Cardinals goes about evaluating players radically different from the statistical evaluation that you would see at Baseball Prospectus or at other venues that do that type of analysis? Is there stuff that's just a generation ahead, or is it relatively similar?
MJ: I would say it's more of the latter. There is a real limitation in this business that lies in the granularity of data that's available. I think that most organizations that are doing any sorts of statistical analysis, whether that be Baseball Prospectus or the St. Louis Cardinals, is that we are all working with fundamentally the same core data, which have certain limitations to it. I think if you put a few smart people on top of that data that they are going to reach some pretty similar conclusions. Definitely with different approaches, but some similar conclusions come out the other end.
I think the real difference-maker is in establishing ways to get better granularity of data at every level. For example, at the major league level defense is the next chapter that people want to write and frankly, with all the data that I've seen available to clubs like the St. Louis Cardinals, the data aren't really at a granularity that's fine enough to make any real solid conclusions. At the other end of the spectrum, at the amateur draft, one would have a really hard time finding significant amounts of, let's say, high school statistical information. There's still a question of whether that information would be useful, but the story there is still that the data is relatively difficult to find.
I've digressed. To get back to your first question: I think that most organizations are probably doing slight variations of the same thing. I think the real difference maker will be in collecting more data. Certain organizations may be investing money in finding a greater granularity of data, I don't know, but if they are they're probably at the next level.
BP: Especially as it regards major league free agents, do you find that teams are spending much energy or time trying to figure out "the market"--the finances, the competition, what a guy is worth, quantifying value in dollar values? Every off-season we hear about the deals that "blew the market" with this offer for that guy that set up new comps for those players down the road. Is there time spent on this sort of analysis?
MJ: Clubs definitely do a significant amount of research on the market conditions before free agent offers are made and during the preparation for arbitration. I haven't really seen much compelling work in the publicly available literature (Ed note: Check out the work done by Baseball Prospectus' Ben Murphy, following the work of the late Doug Pappas), and I am sure that some clubs would stand to gain from understanding this market a bit better, but yes, clubs definitely do some work at evaluating the market.
BP: The original reason you came into the organization was to build a database for the team. Is that just a collection of statistics or is it more than that? What type of data warehousing does a major league team do to help them make decisions?
MJ: The most crucial pieces of information, as far as the front office is concerned, are just being able to collect scouting reports and being able to find information from scouting reports. There is a process that begins out at the ballyard, with a scout writing down notes about a specific player. That information needs to find its way back to the front office. Now in St. Louis, we used some third-party software to help in that process, but at the end, before decisions are made, we sometimes need to collect all of our information and put it into a sensible and accessible presentation before the decisions are finally made. So in addition to helping with constructing player evaluation models, I was also involved in collecting some information from these scouting reports and putting that into final reports that the decision makers would use.
BP: As someone with a thorough professional background in mathematics and analytical-approaches to decision-making, how do you view the current state of publicly-available baseball analysis? Are there strengths and weaknesses to the body of work that's out there that jump out to someone like yourself who has used the same sorts of approaches with other problems in different fields?
MJ: This is a great question. To be completely honest, when I first started doing a background study on publicly available baseball research papers, I grew pretty frustrated pretty fast. I quickly convinced myself that I would need to reinvent the wheel, so to speak, in any area where real decisions were going to be made as the result of my work. I encountered cases where people who had developed "predictive" models never tested how well their models actually predicted; I encountered cases where the contents of the paper contradicted the conclusions that were stated, and on and on. I really do believe that in some areas, it is more efficient to start from scratch than to try to separate the signal from the noise in the literature. Don't get me wrong, there is some quality work out there, so you have to know where to look.
BP: As someone who has worked on analytical approaches to baseball decision-making from both the outside and inside of a major league club, where do you see the next frontiers in sabermetrics? What questions do you think are most desperately in need of answers? Are there approaches that have been heretofore ignored? Is there anything that you yourself are working on independently?
MJ: There are two parts to this answer, in my mind. On the one hand, as I mentioned earlier, I think a next frontier lies more in the collection of finer-granularity data of major league baseball events as well as more thorough injury databases. Beyond better major league data, there definitely is a need for better collegiate and high school, as well as international, information as well.
On the other hand, I think that there really is great possibility for bringing some cutting-edge machine learning technologies from the academic world into building more robust predictive tools in baseball. The academicians continue to push the envelope with machine learning in many scientific applications, so staying on top of these technologies...or working with people who are on top of them, should shine some new light on separating the random components of baseball observations from the informative.
As far as what I am doing now, I have started assembling a consortium of a few statistics and applied math professors who I know from my graduate school days. Several of us will be meeting in a few weeks to talk about where we want to take this work, but I do believe sports and academics can be brought together in a practical way.
BP: What do you mean by "machine-learning technologies?"
MJ: Machine-learning refers to computational approaches toward finding patterns and structure in data. Voice recognition, as used in phone menu systems, or character recognition, as used when scanning entire libraries into a computer database are built using machine learning methods.
In baseball analysis, once we establish how to measure a player's contribution towards a team's success, we can start to think about the next step of how to predict it. In a simplified view, we take every piece of information about every person who's played the game and throw it into a big pot. We then let a machine learning system loose in the pot, sifting through all the data toward finding the information that is most useful in predicting future success, which is what statistical player evaluation techniques are aiming for. Neural nets, decision trees, support vector machines...those are some of the machine-learning buzzwords people might have heard of.
BP: You left the Cardinals at the end of the season and returned to Silicon Valley, where you've gone back to doing the sorts of tech work that you did before you got into baseball. Do you see yourself getting back into sports, perhaps going back to work for another baseball club in the future, or is that it for you?
MJ: I definitely see myself being involved with sports in some capacity in the future. I continue to talk with executives in baseball as well as other sports. I currently have returned to Silicon Valley; until November, I'm sitting out a one year non-compete clause that was a part of my employment in St. Louis. I do hope at some point in the future to be involved with a baseball organization again.