After a number of years as a software engineer and database designer for
Silicon Valley technology firms, Mark Johnson jumped into sports with a
consulting job for the MLB commissioner’s office. He explored
the statistical implications of various potential rule changes on a
committee that included John Henry, owner of the Red Sox, and Rob Manfred, head of the Labor Relations Department at MLB. From that experience
Johnson got hooked on applying his mathematical background to sports
management and eventually ended up serving as Senior Analyst of Baseball
Development for the 2004 NL Champion St. Louis Cardinals. Now back in
Silicon Valley, Johnson sat down with BP during a recent San Jose Giants
game to discuss his background, his experience with the Cardinals, and
where he sees the the most valuable applications of sabermetrics, both now
and in the future.

Baseball Prospectus: Could you begin by telling our readers a little bit
about your academic background? In the baseball industry it is a bit

Mark Johnson: I was an undergraduate at Indiana University where I studied
computer science and mathematics. During five years there I earned a
bachelor’s degree in computer science and a master’s in mathematics.
Immediately following that I went to Princeton University, where I earned
a master’s and a PhD in applied and computational mathematics. My areas of
research were mostly in theoretical mathematics, but I tried to mix in as
much computational work as I could into solving theoretical mathematical
problems. Following that I went to the University of Minnesota for one
year as a post-doctoral researcher before deciding to leave the academic
world and jump into the Silicon Valley workforce.

BP: You first became involved in sports by doing some consulting for the
commissioner’s office, is that correct?

MJ: In a professional sense, yes. I actually have experience from my
undergraduate years; I was a student manager for Bob Knight during my four
years at Indiana. In fact I worked closely with, and was roommates with
Lawrence Frank and Calbert Cheaney. I also, of course, worked closely with Coach Knight and some of the assistants. That was my first real exposure
to organized sports, but in a professional sense my first experience was
working as a consultant to the commissioner’s office in the summer of

BP: What exactly were you working on for the commissioner’s office?

MJ: My brother (Matthew S. Johnson, a professor of Statistics at Baruch
College of the City University of New York) and I were hired as
consultants to the commissioner’s initiative, which is a group organized
to evaluate possible changes to baseball, and changes to rules. We were
brought in to add a little statistical analysis to some of the decisions
that the people in the commissioner’s office as well as other people
within the initiative were looking to address. They had plenty of evidence
to support some of these decisions but they wanted to add a statistical
component to the final decision-making process. So we helped in that

BP: Did you find the commissioner’s office was receptive to that
statistical point of view, and was any of the work you did put into
action, anything fans might have seen?

MJ: They absolutely were receptive. I was working on a subcommittee
consisting of Rob Manfred and John Henry, the owner of the Boston Red Sox,
who as many people know is a very analytically-minded person. He was
definitely very interested in some of the analytical results that we came
up with, and these results were definitely weighed heavily in the final

I can’t really speak about the details of what we worked on, however it
did have to do with some scheduling questions that they were asking, and
the outcome of all those results was that nothing changed. The rules never
changed, and whether or not it was the result of our work that led to
those decisions is beyond my understanding.

BP: How did you end up working with the St. Louis Cardinals? Who brought
you in, and what was your position supposed to entail?

MJ: After I worked for the commissioner’s office I was putting in my best
effort to try to create a niche of moving analytics into sports, so I
spent a lot of time on the phone talking with athletic directors from
universities, with commissioners from several different major collegiate
conferences, speaking with the NFL commissioner’s office, obviously
continued speaking with Major League Baseball’s commissioner’s office, and
started to think about how I could start talking to actual clubs. I was
put in touch through some colleagues with Jeff Luhnow, who had recently
been hired by the St. Louis Cardinals. Upon meeting him and talking with
him about some of the ideas that I had and some of the approaches I could
bring to the table, Jeff eventually hired me on in January of 2004.

The original charter for my position was to, at a really broad-stroke
level, try to bring some innovative technology into baseball
decision-making. So there was a combination of both my software
engineering skills and database management skills as well as my
mathematical skills that were used in the position, both to help collect
and disseminate information from large amounts of data. So, in a sense,
part of the role was mathematical and another part was software
engineering and database related.

BP: How did first experiences with the Cardinals go?

MJ: Jeff was relatively new at the time that I started, so it was a learning
process to understand where we would have the most effect within the
organization. As you know, the baseball schedule has a lot of very
distinct decision-making points throughout the season. For me, when I
started in St. Louis in the beginning of February, the first real
decision-making point on the horizon, in my eyes, was the June draft.
Between the time I started and the beginning of June I first had to learn
more about the actual process–how the scouting organization was put
together–and also try to understand and collect as much information as I
could about high school and collegiate players, and then to try to
disseminate something from that information that could help us make better
decisions on draft day, or in the months leading up to draft day.

So that was really the first objective, to understand the scouting
process, and the draft itself, to try to bring in as much of my
mathematical and software skills as I could in time for that draft. Beyond
that, the other important deadlines that I played a little bit of a role
in were the trade deadline in July as well as the minor league free agent
season in October and subsequently the major league free agent season.

BP: It sounds like you were using your mathematical background in player

MJ: In the baseball analytical world, at least the way that I look at
it, I think there are three categories where statistical analyses can help
decision makers. One is at the commissioner’s office level, when you
evaluate rules to try to understand the impact that rule changes might
have on the game. The second level is looking at player evaluation, it’s
more of a front office decision-making process. That’s putting together
rosters, trying to understand the value of players to your team or the
impact of certain contracts. The third level is the actual in-game
strategy, what sort of strategies can you bring to the field.

My experience with the commissioner’s office definitely fell into that
first category, and my experience with St. Louis definitely fell into the
second, which was helping the front office put together rosters, which
like you said, falls into the area of player evaluation techniques.

BP: How have the Cardinals balanced various approaches to baseball operations
decision-making? There have been guys like you and Jeff Luhnow internally, others on the Board of Advisors, and then there are the more traditional scouts and player development people. How does the front office balance these voices?

MJ: Each one of the decisions that I was somehow involved with were being
made by different people in the organization who combined information in
different ways. My objective was to basically be the “mathematical scout,”
and to give these decision-makers the mathematical scouting report, so to
speak. In a lot of the decisions that were made last season there was a
mix of scouting reports and statistical scouting reports and each person
handles the combination of that information differently.

In the case of the amateur draft we have a lot of players that we don’t
know a lot about. We have scouts that go watch these players only a few
times in the course of their careers, so in that sense the statistical
information has more of an impact. It’s, relatively speaking, a lot more
information than we had beforehand. When you look at major league player
evaluation, people have a lot of information about those players already.
They’ve been watching them for years. They have many scouting reports on
these players, they have a lot of video tape on these players. In my
opinion, the statistical information is still valuable in those cases,
absolutely it’s still valuable, but maybe not, relatively speaking, as
valuable as having additional statistical information on the larger pool
of players from whom you need to make your selections. So at the amateur
draft I think statistical evaluation has more impact, is the conclusion
I’ve come to, rather than at the major league level.

BP: Without getting into any areas that are covered by your non-disclosure
agreement, is the way that a team like the Cardinals goes about evaluating
players radically different from the statistical evaluation that you would
see at Baseball Prospectus or at other venues that do that type of
analysis? Is there stuff that’s just a generation ahead, or is it
relatively similar?

MJ: I would say it’s more of the latter. There is a real limitation in
this business that lies in the granularity of data that’s available. I
think that most organizations that are doing any sorts of statistical
analysis, whether that be Baseball Prospectus or the St. Louis Cardinals,
is that we are all working with fundamentally the same core data, which have
certain limitations to it. I think if you put a few smart people on top of
that data that they are going to reach some pretty similar conclusions.
Definitely with different approaches, but some similar conclusions come
out the other end.

I think the real difference-maker is in establishing ways to get better
granularity of data at every level. For example, at the major league level
defense is the next chapter that people want to write and frankly, with
all the data that I’ve seen available to clubs like the St. Louis
Cardinals, the data aren’t really at a granularity that’s fine enough to
make any real solid conclusions. At the other end of the spectrum, at the
amateur draft, one would have a really hard time finding significant
amounts of, let’s say, high school statistical information. There’s still
a question of whether that information would be useful, but the story
there is still that the data is relatively difficult to find.

I’ve digressed. To get back to your first question: I think that most
organizations are probably doing slight variations of the same thing. I
think the real difference maker will be in collecting more data. Certain
organizations may be investing money in finding a greater granularity of
data, I don’t know, but if they are they’re probably at the next level.

BP: Especially as it regards major league free agents, do you find that
teams are spending much energy or time trying to figure out “the market”–the finances, the competition, what a guy is worth, quantifying value in dollar values? Every off-season we hear about the deals that “blew the
market” with this offer for that guy that set up new comps for those
players down the road. Is there time spent on this sort of analysis?

MJ: Clubs definitely do a significant amount of research on the market
conditions before free agent offers are made and during the preparation
for arbitration. I haven’t really seen much compelling work
in the publicly available literature (Ed note: Check out the work done by Baseball Prospectus’ Ben Murphy, following the work of the late Doug Pappas), and I am sure that some clubs would
stand to gain from understanding this market a bit better, but yes, clubs
definitely do some work at evaluating the market.

BP: The original reason you came into the organization was to build a
database for the team. Is that just a collection of statistics or is it
more than that? What type of data warehousing does a major league team do
to help them make decisions?

MJ: The most crucial pieces of information, as far as the front office is
concerned, are just being able to collect scouting reports and being able
to find information from scouting reports. There is a process that begins
out at the ballyard, with a scout writing down notes about a specific
player. That information needs to find its way back to the front office.
Now in St. Louis, we used some third-party software to help in that
process, but at the end, before decisions are made, we sometimes need to
collect all of our information and put it into a sensible and accessible
presentation before the decisions are finally made. So in addition to
helping with constructing player evaluation models, I was also involved in
collecting some information from these scouting reports and putting that
into final reports that the decision makers would use.

BP: As someone with a thorough professional background in mathematics and
analytical-approaches to decision-making, how do you view the current
state of publicly-available baseball analysis? Are there strengths and
weaknesses to the body of work that’s out there that jump out to someone
like yourself who has used the same sorts of approaches with other
problems in different fields?

MJ: This is a great question. To be completely honest, when I first
started doing a background study on publicly available baseball research
papers, I grew pretty frustrated pretty fast. I quickly convinced myself
that I would need to reinvent the wheel, so to speak, in any area where
real decisions were going to be made as the result of my work. I
encountered cases where people who had developed “predictive” models never tested how well their
models actually predicted; I encountered cases where the contents of the
paper contradicted the conclusions that were stated, and on and
on. I really do believe that in some areas, it is more efficient
to start from scratch than to try to separate the signal from the noise in
the literature. Don’t get me wrong, there is some
quality work out there, so you have to know where to look.

BP: As someone who has worked on analytical approaches to baseball
decision-making from both the outside and inside of a major league club,
where do you see the next frontiers in sabermetrics? What questions do you
think are most desperately in need of answers? Are there approaches that
have been heretofore ignored? Is there anything that you yourself are
working on independently?

MJ: There are two parts to this answer, in my mind. On the one hand, as I
mentioned earlier, I think a next frontier lies more in the collection of
finer-granularity data of major league baseball events as well as more
thorough injury databases. Beyond better major league data, there
definitely is a need for better collegiate and high school, as well as
international, information as well.

On the other hand, I think that there really is great possibility for
bringing some cutting-edge machine learning technologies from the academic
world into building more robust predictive tools in baseball. The
academicians continue to push the envelope with machine learning in many
scientific applications, so staying on top of these technologies…or
working with people who are on top of them, should shine some new light on
separating the random components of baseball observations from the

As far as what I am doing now, I have started assembling a consortium of a
few statistics and applied math professors who I know from my graduate
school days. Several of us will be meeting in a few weeks to talk about
where we want to take this work, but I do believe sports and academics can
be brought together in a practical way.

BP: What do you mean by “machine-learning technologies?”

MJ: Machine-learning refers to computational approaches toward finding
patterns and structure in data. Voice recognition, as used in phone menu
systems, or character recognition, as used when scanning entire libraries
into a computer database are built using machine learning methods.

In baseball analysis, once we establish how to measure a player’s
contribution towards a team’s success, we can start to think about the
next step of how to predict it. In a simplified view, we take every piece
of information about every person who’s played the game and throw it into
a big pot. We then let a machine learning system loose in the pot, sifting
through all the data toward finding the information that is most useful in
predicting future success, which is what statistical player evaluation
techniques are aiming for. Neural nets, decision trees, support vector
machines…those are some of the machine-learning buzzwords people might
have heard of.

BP: You left the Cardinals at the end of the season and returned to Silicon Valley, where you’ve gone back to doing the sorts of tech work that you did before you got into baseball. Do you see yourself getting back
into sports, perhaps going back to work for another baseball club in the
future, or is that it for you?

MJ: I definitely see myself being involved with sports in some capacity in
the future. I continue to talk with executives in baseball as well as
other sports. I currently have returned to Silicon Valley; until November,
I’m sitting out a one year non-compete clause that was a part of my
employment in St. Louis. I do hope at some point in the future to be
involved with a baseball organization again.