Surveying is A Beautifully Foolish Endeavor

I binged Hank Green’s A Beautifully Foolish Endeavor yesterday and it is, let me tell you, just fantastic. I don’t plan on writing a longer review because I didn’t really write any notes. I didn’t want to take notes. Not because it was bad, but because I didn’t want to peel my eyes away from the page for the short seconds it’d take for me to grab my pen and notebook. I was too engrossed to want to do anything but live in its pages for however long it’d let me. So, no review—but maybe that ardor could stand in place of one.

But aside from just being human, well-written, and the right mix of cynical and optimistic, it is also just an eminently and endlessly quotable book. There are quotes here that I plan on using when I talk about machine learning in my upcoming data science class. Quotes that just stabbed me in the chest with self-recognition (cough cough Miranda being a workaholic cough cough). And then there are quotes that make me feel that Green is one of the few science fiction authors who actually understand the sociology and psychology of being a practicing scientist (which, ya know, SciShow and his M.S.—so it makes sense). But then there’s this quote near the end of the book that so perfectly described being a survey researcher that it made me laugh out loud:

[We wanted to] dive dep into our survey responses. Except there were just too many…no matter how we filtered, there just wasn’t a good way.

April and I were griping about this around the black marble countertop in the kitchen when Carl came in and overheard us.

“Just do a search,” Carl said.

“The searches take forever and we don’t know what to search for,” April replied. “It’s just a bunch of dumb data. Most of the useful stuff is in text responses, which is impossible to parse.”

I loved and loathed the reductionism of “dumb data”—mostly because it’s right. Until we data geeks make the decisions on what to model, how to model it, and how to frame and embed our results into a digestable narrative it is literally just that. Dumb data, (sometimes) intelligently collected. And also, it’s so, so spot-on to note that almost all of the really juicy stuff is in the open-ended responses where people just let you know what’s actually going on. Or at least their inwardly-biased interpretation of their messy cognitive processes—which is still super cool and really interesting!

As an example: for my dissertation, I had completely written-off games like Hearts as sociopolitically unimportant until some respondent mentioned that there are online lobbies where people use the game to chat about politics. I never would’ve learned that if I didn’t bother to read what my respondents had taken the time to write. And there have been a few papers that make the point that we should really pay more attention to the nuance that people write in rather than (often incorrectly) discretizing them on the basis of hardline, ex ante coding rules.

I’m envious that April and Maya had an alien superintelligence to parse through all of that data and get to all the juicy bits. Sure we have NLP and algorithims for market segmentation—but do they come from a near-omniscient monkey? Didn’t think so.