Prospectus Feature: A Better Personality Test

There’s a section in Mike Stadler’s The Psychology of Baseball (2007) that discusses the 1980 draft from the New York Mets’ perspective. The Mets had the first pick and were considering a pair of five-tool prospects: Darryl Strawberry and Billy Beane. The Mets ended up with both, with Strawberry going first.

Stadler then highlights the difference in confidence between the two men. After striking out three times in his big-league debut, Strawberry brushed it off. Same thing when he was hitting .198 two months into his career. Around that time, he hit a two-run homer in the eighth inning to break a tie against the Reds:

When asked if the homer boosted his confidence at all, Strawberry answered, “I’d say encouragement rather than confidence. I’ve always got the confidence. I know I’m going to hit plenty more in situations like that.”

This is in contrast to the well-known assessments of Billy Beane’s early-career confidence, detailed to great profundity by Michael Lewis in Moneyball. “Inside a batter’s box he experienced a kind of claustrophobia,” Lewis wrote. “The batter’s box was a cage designed to break his spirit.” Strawberry became a great ballplayer; Beane became a great administrator.

Most teams don’t end up with both of their top draft day targets. For the very many teams that value confidence, wouldn’t it be ideal to identify which of two players looks like a ballplayer not just in a uniform, but between the ears?

In this article, I want to explore the current state of “personality testing” in baseball, and what other sports are doing that baseball might be able to emulate. I put the term in quotes because, while it might mean something to you, it has a very different meaning in the world of psychology. So that we’re all on the same page, I’ll use the term to refer to quantifying non-physical traits in order to (attempt to) assess future baseball performance.

Tests like this are already happening in major-league baseball, but they don’t get the same attention that performance-based sabermetric stats do. One reason is that each team that does this testing has its own examinations for prospects—there isn’t a centralized test given to all players before they’re drafted.

If you look hard enough, you’ll see some descriptions here and there. When asked about the recruitment process leading up to the draft, Drew Storen indicated that most teams he spoke with asked him to take personality tests—and he also pointed out they were rather time-consuming. Aaron Cook revealed that some questions were pretty weird (“Have you ever harmed small animals?”). In the latter piece, I found a clue as to why this type of testing isn’t a bigger topic of conversation in baseball:

What the NFL is finding out through testing and, in some cases, conducting background checks at draft time, a good Major League scout would know because he has developed a personal relationship.

In this writer’s humble opinion, asking your scouts to objectively diagnose personality traits on top of all the other work they’re doing is going a bit too far. (Depending on how useful the on-paper tests are, though, they might still be the best option.)

Here’s the short list of the most common personality tests: DISC, Predictive Index, Myers-Briggs, and the Athletic Motivation Inventory.

That last one is pretty popular among major-league teams. The Orioles started using it to test their players back in 1973. And while it seems to give teams some sense of the player, it’s so broad that it’s used by several different sports.

If the goal is to figure out which players might perform best in any given sport, it needs to be designed specifically for that sport. A quarterback needs to make decisions very quickly and adjust on the fly; a pitcher, meanwhile, must keep his composure while standing alone on a mound, still and staring into an invisible strike zone that might seem to shrink in disproportion to the size of the moment. So, two problems: Lack of consistency, lack of validity.

The most famous psychological test in sports is the NFL’s Wonderlic, which purports to measure ability to learn and problem-solve. There hasn’t been any meaningful correlation found between Wonderlic scores and performance at the NFL level. So in terms of validity for the sport itself, the Wonderlic gets an F—it’s not a model that baseball should be emulating.

But one thing the Wonderlic does do is test all players in a consistent, systematic way; it might be a bad fit for the NFL, but danged if it’s not at least standardized. Every year, all draft prospects go to the combine, where they’re measured on how quickly they can run, how high they can jump, how much weight they can lift, and how much Wonder they can Lic.

Problem solved: All the players take the same test at the same time in the same manner. This eliminates the inconsistency of one pick taking the AMI while another is taking the Myers-Briggs, and maybe checking his Facebook feed at the same time.

There’s a relatively new test being administered by the NFL, too. It’s called the Player Assessment Test (PAT for short—clever, I know) and it’s basically what the Wonderlic would look like if a bunch of really smart people decided to build something better and smarter for the sport than the Wonderlic. Not only has it actually shown a valid correlation between test scores and on-field results, but it also can help baseball solve for that second problem: Making sure the test is designed specifically for baseball, and thus valid.

After reading up on the PAT and talking to the people who designed and implemented it, I’m convinced that baseball teams would benefit by implementing a baseball-specific version of the test.

The PAT was created by a lawyer named Cyrus Mehri (the man behind the Rooney Rule) and two professors (Harold Goldstein and Kenneth Yusko), whose main goal was to create a test that wasn’t socioeconomically biased. Tests like the Wonderlic lean pretty heavily on things like vocabulary, where if the subject don’t know a word it’s impossible to tell whether he is any good at making connections between those words.

Like the Wonderlic, the PAT is given in a systematic way—the NFL has been administering this test at the combine every year since 2013.

When I spoke to Mehri, he stressed that the test works in large part because it’s built “by football, for football.” To do that, the creators met with NFL GMs to find out what traits they value in elite players. Based on those conversations, Mehri and his colleagues created the questions that zeroed in on those specific traits.

(Goldstein and Yusko are professors and consultants working in a field called industrial psychology, which studies workplace behavior inside organizations. As consultants, they work with companies all over the world across different industries.)

But the most important part of this test (and the reason why I’m going on and on about it) is that it’s working. Or, at least, they tell us it’s working. While we don’t have access to the data to confirm those claims, the creators will be presenting a paper at the Society of Industrial and Organizational Psychology Annual Conference in April. That presentation is peer reviewed. They’re also in the final year of a three-year validation study that they will subsequently submit for peer review and publication.

“A lot of the performance data we have has allowed us to validate the PAT,” Yusko told me. The NFL has shared data with the creators of the test to validate and make continuous tweaks to the test. While many teams seem to like the AMI and the results they’re seeing, Goldstein points out a major difference between it and the PAT: The PAT measures, and can then adjust for, variables that show socioeconomic biases.

“Bottom line is nobody else has a validated system in sports,” Mehri said. While that statement may come off as a bit bold, Goldstein and Yusko agree that nobody else has a test that's as comprehensively validated at a league-wide basis like the PAT.

While the Big Brother ramifications of tests such as these might have already turned off some readers, testing need not be primarily about filtering out players, but about working with them. Perhaps Billy Beane could have become a great ballplayer with the right understanding of his needs.

“What I didn’t anticipate was how valuable the player development part of the test has been,” Mehri told me. After a team makes its pick, it gets a confidential coaching summary specifically about that player. This summary helps the team understand what makes the player tick, and it can help them coach and develop the player. It also helps the players understand things about themselves they might not have known. Picking the raw material is one thing, but it’s something else entirely to mold and coach that player so he can fulfill his potential.

This could be incredibly useful for baseball teams. In a sport where the draft lasts 40 rounds and there are several levels of minor-league play, player development is exponentially more important than in sports like football and basketball, where player development is outsourced to the NCAA and careers are more front-loaded. Baseball players are drafted as young as 17 and often spend a half-decade focused on nothing but becoming. If baseball teams could make that process more efficient, how much would that be worth?

I asked Mehri why baseball’s version of PAT doesn’t already exist. While he and his colleagues have had conversations about tailoring the process to other sports, nothing is imminent. “Nobody has called us from Major League Baseball yet,” Mehri said.

I raised some of the key differences between baseball and football, which might make transferring this model across sports. First off, because baseball players are drafted at such a young age—and signed internationally even younger—is there really any predictive value to be gleaned? Stories abound of players who mature. Can you really test kids at 15 and learn anything about what sort of 25-year-old they’ll be?

Mehri didn’t think it was an issue. “Cognitive stuff is pretty stable, and it doesn’t change much over time.”

What about cultural differences? Can one test account for the differences between Dominican, Japanese, Puerto Rican, urban American and suburban American upbringings? Mehri pointed out that the original point of the PAT was to overcome those precise kinds of issues. Goldstein and Yusko are consultants for companies all around the world, and this is their wheelhouse. If their tests couldn’t adjust for these types of cultural differences, they would be out of business by now.

In discussing these complications with Goldstein and Yusko, they mentioned how excited they were to be talking about the PAT in the context of baseball. To be honest, I was a little surprised at how easy it was to get them on the phone to talk to me.

There’s a reason for that.

“What’s really cool about baseball is it might work even better because it’s a much more individual sport than football,” Goldstein told me. I told Goldstein and Yusko about the Cubs’ and how they’ve decided to develop catchers by converting players that have the right type of personality for it, and they nodded along at that line of thinking. “We’re starting to go beyond just picking successful athletes and looking at what works at the position level,” Yusko added.

Wrap your head around that for a second: Instead of just measuring whether or not a player has the personality type that succeeds in pro ball (which we don’t even have yet), now we’re potentially talking about measuring whether or not a player might have the traits of a successful starting pitcher. Or catcher. Or bench player. Instead of drafting “filler” players toward the end of the draft, a team could zero in on personality types that are perfect for utility players and grab more value that way.

So the big question is: Why hasn’t anybody asked Mehri & Co. to bring their talents to baseball?

Part of it might be secrecy. Right now some teams are weighing this type of information more heavily than others, and they don’t want to lose that edge if all of the sudden all teams are getting information that they never cared about before. Some teams at the cutting edge might lose their edge. Teams treat information as discoverers once thought about distant lands: Untapped resources.

“There's a lot of unexplored territory here,” BP’s resident psych-guy Russell Carleton told me. I imagine any number of GMs drooling at the thought of any “unexplored territory” left in baseball.

It’s unlikely a 50-minute test is ever going to cause GMs to completely change the way they pick players in the draft. “This is not the end all be all, it’s more of a tiebreaker,” said Mehri. But a tiebreaker has a fair amount of value when the stakes are this high—for the team, and for the player.

Last year, teams spent around $220 million on bonuses during the 2015 draft. The international spending pools tell us they’re going to spend around $80 million there. The vast majority of those players aren’t going to make it to the majors. Wouldn’t it behoove teams to implement a proven, systematic process that allows teams to get some insight into the personalities of the players they’re about to select, to see if they’ll be a good fit for their organization?

If so, Major League Baseball could pay Mehri and his colleagues a visit—they’re expecting you.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

You need to be logged in to comment. Login or Subscribe

mhmosher

3/02

Interesting topic for sure. Good read.

Reply to mhmosher

portocac

Thanks amazon! It'll be interesting when this kind of data is freely available to see what kind of traits become over and under valued.

Reply to portocac

jwyllys

Great work, Carlos. This is something that will be very interesting to follow closely in the coming years. Nicely done.

Reply to jwyllys

Thanks Jared!

jrmayne

I enjoyed the piece. The initial anecdote strikes me as at least a partial fallacy, though.

While I think confidence does help in some fields, a substantial confidence/competence ratio - where confidence outstrips ability by a substantial amount - is a problem.

Take some of your readers. They may be quite confident in some things - perhaps math or chess. But, to stereotype from one example, they may be unconfident in their dancing.

The direction is competence --> confidence, as it should be. Assuming that the correlation means (say) I should be more confident in my keen fashion sense in order to have more strikes me as misguided.

Thinking you're a little or somewhat better than you are may have advantages. But thinking you're way better than you are ends too often in failure modes, right?

(Oh, and the amount you can think you are better than you are? 40% Because I've been calling it the 40% rule for 30 years, and I am *so much better* at quantifying excess confidence than other people that my number should stand despite its resistance to objective analysis.)

Reply to jrmayne

Good point, but it's a very small examples of the types of things that could impact performance from the psych POV.

BeplerP

Outstanding! Filled with useful information not generally seen elsewhere in baseball coverage- many thanks. I think you are right to hint that this could be the next evaluation tool that delivers "Moneyball"- type results, if done well.

Reply to BeplerP

Thanks Peter!

bryanherr

I found this article interesting. I have always thought that teams could benefit from personality testing like the Myers-Briggs for roster construction in an attempt to cultivate team chemistry.

Reply to bryanherr

pizzacutter

3/03

Myers-Briggs is a four-letter word.

Reply to pizzacutter

BeltwayTraffic

ENTP?

Reply to BeltwayTraffic

Prospectus Feature: A Better Personality Test

Thank you for reading

Latest Articles

speX ’24: Week Four $

Will I Be Drawing These Stupid Rabbits Forever? $

Deep League Landscape ’24: Week Four $

MLU: Bratt Frustrates Opposing Hitters $

Box Score Banter: Knuckling (Way, Way) Up B

Carlos Portocarrero

Latest Articles

speX ’24: Week Four $

Will I Be Drawing These Stupid Rabbits Forever? $

Deep League Landscape ’24: Week Four $