February 26, 2014
Sabermetrics Goes MOOC
It was year much better known for a different baseball breakthrough a few miles away, but 2004 was also the first year that Andy Andres first offered his class in sabermetrics at Tufts University just outside of Boston. Ten years later, the Tufts and Boston University-affiliated professor is bringing his course into a more populist place as a MOOC - a massive open online course - through Boston University and the online platform EdX.
Andres' announcement of Sabermetrics 101 on Tuesday sparked a minor frenzy in some corners of Twitter and what he would call "a good day" of signups when reached Tuesday night. The six-week course beginning May 8 will be free for auditors and will be of some cost - modest by the standards of his more traditional employers - for those who would like formal grading and a certificate of completion.
Andres joined us for a Q&A Tuesday night and discussed what a sabermetrics MOOC will look like, how the course evolved and the challenges of teaching online and of teaching students with such different levels of saber-literacy coming into the class.
As usual, answers have been slightly modified for length - mostly rambling on the part of the questioner.
ZL: You've been teaching this course in a more traditional format for a while now. What made you want to try the MOOC now?
AA: So many people have emailed me over the years saying they want to take it. The only way to take it before this was being a Tufts undergraduate and that's a very limited population. I've always had in the back of my mind that this should be open to anybody, mainly just to give people the opportunity to do the work - to explore data science, explore sabermetrics and do more research and analytics if they felt like it.
I know the story of many people who would tell me 'I tried to learn these things on my own and it was frustrating and there wasn't enough help.' I hope this course will grease the skids where I give the opportunity for folks to clear those hurdles and learn some basics that they need to do their own sabermetrics and their own research.
ZL: How did you go about getting it started through BU?
AA: It was basically a grant process that BU had. They were going to join EdX and offer some MOOCs - some online courses. They put a call out to the faculty and asked if anyone was interested in teaching one of these courses, so I did. They did that last spring, and there was a fairly complicated process to get to the point where BU was willing to teach the sabermetrics course as part of their offering.
ZL: One of the things that strikes me as difficult - and it's something you mentioned - about the MOOC environment is that it's different from just raising your hand in a classroom if you're falling behind or have a question. How do you plan to deal with students who are struggling?
AA: It's completely different from residential face-to-face learning. The way you get through that problem of students learning on their own in their own little silos is the discussion forums that are created with each class. Typically, like a lot of discussion forums on the Internet, people will ask questions and people who are on the forum will answer them. So that's one of the better models.
When the course is running, there will be some times too for some sort of office hours, and that will be an online format too where we may have a Google Hangout to answer questions. We'd do that routinely like I might do office hours on campus.
ZL: How do you handle the challenge of the fact that some of the people in your class might be regular Baseball Prospectus or Fangraphs readers - people who consume this every day - and some of them are so new to baseball or have never seen it analytically and would need to start from basically on-base percentage?
AA: Yeah, we talked a lot about that problem. What we've decided on is something called 'tracks.' There will be a sabermetrics track where we talk about the development of these different metrics that seem to help valuation of different players.
We'll also have a technology track. A lot of people know what true average is and weighted runs created is, and they're pretty comfortable with those things. But maybe they don't know R code or they don't know SQL or don't know how to get data and they might be more interested in that track. So when I discuss OPS, they might skip through that lecture quickly and take the assessment and move on, and they'll go through those tracks easily if they're a savvy Fangraphs reader or BP reader. But maybe they don't know R and that's where they'll learn something.
Another track we're going to try to get through is some very basic statistical concepts. We're going to try to teach regression and sample size. A lot of students who take this course will know statistics really well. They'll learn nothing from my statistics track. They had it. A lot of programmers might want to take this class, so they might know R and SQL very well, and that track they might be able to skip through pretty easily and they might be more interested in the sabermetrics track.
We'll also have a sabermetrics history track. By tracking these different topics, people will be able to decide which ones they can focus on and which ones can they go through quicker.
ZL: Have you heard from anyone with any teams looking to get their scouts or their non-statistical front office people involved?
AA: No, I haven't heard anything in particular. One vision I have as I think about how to teach this stuff is for scouts. The idea of teaching a real baseball observer some of this material is a good vision of a student for me. A real baseball insider - knowledgable but wants to know some basic ways to think about and look at data.
ZL: What did you learn from your 10 years of running the Tufts University sabermetrics class, and how did that change over the years?
AA: That's definitely changed over the years. In the beginning, it was much more of a lecture format, and it was hardly popular at all in 2004. We did something on win probability added, and when we taught that material students said 'this is really cool.' People hadn't really understood it - there were a couple papers by (Jay) Bennett that talked about WPA - but it was very obscure. So a lecture format covered a lot of this material for the students.
But by a few years in, students really understood most of the stuff. They'd be interested enough that they'd read the websites and understand the material, so the course transitioned pretty quickly to a research course where I'd guide the students through their own research.
I can't do that hands-on sort of thing with thousands of students, so we have a much different format we're trying to cover in this version of the course.
ZL: Do you have any plans for guest lectures?
AA: Nothing is nailed down yet, but we'll have every week a supplemental interview with industry insider types.
ZL: Are you going to be a tough grader?
AA: (Sympathy laugh) The grading is all done by computer. I don't grade anything.
But yes, part of this is to really challenge the students, even the students who come in and say 'yeah, I know sabermetrics.' It's important to write challenging assessments. I think that's the way to make sure students understand the material.
ZL: Are there any plans for a Sabermetrics 201?
AA: We have sort of separated it out into a 101 and a 201. We'll do more advanced work in 201, looking at event data and looking at PITCHf/x data more completely. 101 will really be an intro look at sabermetrics, where 201 will be doing a lot more with more complicated databases and even higher-level theoretical sabermetrics.
Disclosure and more information: Dan Brooks of BrooksBaseball and BP has been involved in some of the planning of the course and answered some questions about the course on Reddit.