keyboard_arrow_uptop

I’d like to do something entirely theoretical. I’d like to propose some studies that will likely never actually see the pages of Baseball Prospectus, but would be fun to do anyway. Since it’s the offseason, let’s have some fun.

Our theme today is “things I thought of while catching up on back episodes of the 'Freakonomics' podcast” mostly because the “things I thought of while watching the Gilmore Girls revival episodes” article ended up not working out so well. (Did you know that Luke, the diner guy, was played by former Braves and Yankees minor-league pitcher Scott Patterson?)

Warning! Theoretical Mathematical Details Ahead!

One of the episodes that I listened to actually had a baseball angle. The show was on “how to make a bad decision” and talked about how the gambler’s fallacy–the faulty belief that, if an event hasn’t happened for a while, it's “due” to happen–can affect people’s decision-making. On this episode, they reviewed a study by Toby Moskowitz which suggested (using PitchF/X data) that when umpires called a strike on one pitch, their likelihood of calling a strike on the next pitch actually went down, even controlling for where the pitch went.

Moskowitz also found similar effects for loan officers (they were less likely to approve two loans in a row) and judges in political asylum cases (they were less likely to approve two asylum applications in a row). Indeed, I found evidence of a similar effect in how people respond to stressful life events in my own dissertation.

The consequences for that sort of bias are fairly obvious in the strike zone. We’ve known for a long time that umpires don’t always call the same strike zone on all counts, although the reasons why are up for debate. The effect sizes aren’t huge, with the researchers suggesting that ceteris paribus–the chances of a taken pitch being called a strike go down by about a percentage point. The idea is that the umpire, like most humans, is basing his opinions on what came before, rather than what was in front of him.

While listening to that, my thoughts turned to another area where this sort of effect might show up. Scouting. Teams rely on their scouts as the primary data input into their search for amateur (and professional) talent. Statistics are practically useless for high school players as a means of comparison and even college stats can deceive, so the only way to really get a point of reference on a player is to have a scout to put him into context.

Those scouts are human. If loan officers and asylum judges and major-league umpires show evidence of the gambler’s fallacy, then perhaps scouts do, too? A scout might see a kid who's a rare gem one day. He actually has a decent shot at making the major leagues and might be worth an early-round draft pick. That’s not something you see every day. Our scout writes down an OFP of 50 on the form (or types it into the iPad). The next day, that same scout sees another kid who also has an honest chance to be something useful. Now, knowing that this second kid is (also) a potential future major leaguer is important. But will our scout write that on the form? It works the other way, too. If a scout sees a couple of duds in a row, on the fifth day, will he subconsciously bump someone up a bit?

Teams, of course, don’t make their scouting report databases public. If they did, we could potentially directly test this. In the original study of loan officers, Dr. Moskowitz and friends were not able to control for the quality of the loan application. It’s entirely possible that the loan officers were correctly rejecting certain applicants and that it had nothing to do with sequencing effects. However, they did notice more rejections after approvals than they might otherwise expect. Even if scouts are shading players down a grade, most scouting departments use a 20-80 scale and many don’t like “5s” (i.e., 55). That leaves seven possible “grades” a scout can give and there are some big differences between a 30 and a 40 or a 40 and a 50. Part of the problem with using a seven-point ordinal scale is that you have to use it with incredible precision.

If the data were available, it would be possible to look at the logs of individual scouts sequentially and see whether these sorts of ordering effects were popping up. If they were, it would be worth sitting down with the scouts and pointing this out. Even some slight drift in the rating system could have real consequences for the team’s talent acquisition stream. It’s also the sort of thing that, even if it only had a small impact, the cost of conducting that test and addressing the issue if it’s there is minimal. Given how valuable identifying and signing amateur talent is, it’s probably best to make sure that everything’s running at peak efficiency.

***

Then there’s another episode that caught my ear, this one about social trust. How is it that people in a society can learn to trust each other? The episode veered into a couple of different directions, but eventually came to a point where they were discussing a concept called social capital. This is the idea that interpersonal connections between people are actually resources. For example, the fact that you “know someone” can help you to get a job. But social capital has other uses. For example, if two people are in conflict and there is someone or some organization who can mediate between them, the conflict might be resolved.

On a more positive note, when organizations have a great deal of social capital, those mediators might play the role of connecting one person with an idea to another person who could improve that idea. When the nodes are functioning well, that’s a healthy and trusting society. Listening to this brought me back to one of my favorite topics to think about in baseball: team chemistry. Mostly, I only ever get to think about it, because it’s not really possible to examine with any precision using the kind of data that we have available. What if we had better data?

During the podcast, the host (Freakonomics co-author Stephen Dubner) asked how something as elusive as social trust can be measured. His guest, David Halpern replied:

Yes, you can do it in a number of ways. You can ask people how many names have they got in their Fil-o-Faxes or in their phones, which will give you some sense of their social networks. You can also measure more subtly with asking a question around social trust: “Do you think other people can be trusted?” essentially. That’s the question we’ve been asking, in fact, for decades.

In the past, I’ve written about the difficulties in measuring clubhouse chemistry. To do a full analysis of how all of the intricate webs of interpersonal relationships unfolded over the course of a season, it would probably take 30 teams of researchers, all doing a proper social network analysis. It’s theoretically possible, but logistically prohibitive. Listening to this, it struck me that perhaps we don’t need that level of detail.

What if we just asked two questions of a bunch of players? How many of your teammates can you talk to about “baseball stuff” and how many of them can you talk to about “personal stuff”? It’s not perfect research methodology, but data collection would take about a minute per person. It wouldn’t give us the kind of molecular level “how” questions around team chemistry that we’d love to know, but assuming that we got a critical mass of the players in MLB and assuming that we could attach player and team names to the data, we could look both at the size of an individual player’s support network and look at his performance over time.

We could also look at the average size of a support network on a team-by-team basis and look for any effects there. Maybe teams with more evidence of connectivity show better results. We could perhaps parse out at least the boundaries of how much value a well-functioning clubhouse provides. When we think about clubhouse chemistry having an effect, it’s worth asking the question of how we think that mechanically works. Certainly, it’s nice to go to work each day at a place where you generally like the people, but what if it’s more than that?

The idea of social capital is that those connections allow players to access resources that live in the minds of other people. If a hitter has been having trouble with a certain pitcher, there might be someone on the team who has an idea to share. But he needs to feel that he can talk to that player and feel comfortable with being vulnerable in front of that player. He needs to be able to say, “I don’t know what I’m doing here. Can you help?” It’s one thing to be able to talk about the weather. It’s another thing to be able to admit weakness, even though there’s nothing wrong with not knowing something. (No one knows everything.)

And if the issue is about something more than baseball, it can be twice as hard. It’s not a guarantee that the other player you approach will have something useful to say. He might be just as confused as you are, but the wider the network that the player feels comfortable enough to access, the more chances someone might know something and share it. It’s in the best interest of teams to facilitate this sort of networking. In some cases, that takes the form of players just doing the obvious work of taking the time to gain someone’s trust. But teams also use bonding rituals.

The oft-discussed “rookie dress-up day”–which happened to be in the news yesterday for MLB-mandated changes–is an obvious one. Done correctly, it asks the rookies to do something mildly embarrassing where they break some small societal norm (running around in superhero costumes!), and after everyone has a good laugh, the rookies realize that even when they are mildly embarrassed, no one is being evil to them. It’s a way to build trust. In fact, I’d propose that you can tell a lot about a team’s chemistry simply by looking at the pictures from dress-up day and seeing whether the rookies are smiling and obviously into it.

If the veterans are being harsh or if they are doing demeaning or downright evil or racist or homophobic stuff, then that’s the sign of a clubhouse in which people aren’t safe. The point of the exercise is to make sure that everyone can feel at ease, so people who are good at facilitating it will pick something that everyone will see the humor in.

The idea of asking players how big their social network isn’t foolproof. Players can lie. Players can exaggerate. It might depend on what day they happened to be answering the question. But it would be more data than we have now, in that now we have nothing at all. If the chemistry effect is worth pursuing further, it should show up in that data set. If nothing emerges, there might still be an effect lurking in there somewhere, but it probably isn’t all that big. That statement alone might be worth all the trouble.

Of course, the problem is still access. To do this properly would require buy-in from multiple teams and cooperation from multiple players on each team to build a data set robust enough to do the kinds of analyses that would be interesting. Hey, if someone wants a fun dissertation topic, though …

***

Once in a while, I think it’s a good exercise to stretch the limits of possibility. It’s easy to get constrained within the limits of the data sets that we work with on a regular basis and focus only on questions that those data can answer. Playing that sort of game can open you up to possibilities that you hadn’t considered before, and while you might not get the data that you are dreaming of, sometimes you end up realizing that you could get something similar and perhaps answer the question from a somewhat different angle.

So, even if nothing ever comes of this, I still recommend the exercise. Take a moment to ask a question that you can’t answer.