Matching (and Scoring) Content to Culture and Predicting the Academy Awards

Thoughts and Reflections on the Process

We’ve spent our spare time in the last six weeks participating in the 538 Academy Awards Prediction Challenge. On Sunday, we’ll find out how we did. But even though we expect to crash and burn on the acting awards and are probably no better than 1-3 in a very close movie race, we ended up quite satisfied with our unique process and the model that emerged. You can get full and deep description of our culture matching model with it’s combination of linguistic analysis and machine learning in this previous post.

What I love about projects like this is that they give people a glimpse into how analytics actually works. Analysis doesn’t get made at all the way people think and in most cases there is far more human intuition and direction than people realize or that anyone reading screeds on big data and predictive analytics would believe. Our culture-matching analysis pushes the envelope more than most we do in the for-pay world, so it’s probably an exaggerated case. But think about the places where this analysis relied on human judgment:

  1. Deciding on the overall approach: Obviously, the approach was pretty much created whole-cloth. What’s more, we lacked any data to show that culture matching might be an effective technique for predicting the Oscars. We may have used some machine learning, but this approach didn’t and wouldn’t have come from throwing a lot of data into a machine learning system.
  2. Choosing potentially relevant corpora for Hollywood and each movie: This process was wholly subjective in the initial selection of possible corpora, was partly driven by practical concerns (ease of access to archival stories), and was largely subjective in the analyst review stage. In addition to selecting our sources, we further rejected categories like “local”, “crime” and “sports”. Might we have chosen otherwise? Certainly. In some cases, we tuned the corpora by running the full analysis and judging whether the themes were interesting. That may be circular, but it’s not wrong. Nearly every complex analysis has elements of circularity.
  3. Tuning themes: Our corpora had both obvious and subtle biases. To get crisp themes, we had to eliminate words we thought were too common or were used in different senses. I’m pretty confident we missed lots of these. I hope we caught most. Maybe we eliminated something important. Likely, we’ll never know.
  4. Choosing our model: If you only do 1 model, you don’t have this issue. But when you have multiple models it’s not always easy to tell which one is better. With more time and more data, we could try each approach against past years. But lots of analytic techniques don’t even generate predictions (clustering, for example). The analyst has to decide which clustering scheme looks better, and the answer isn’t always obvious. Even within a single approach (text analytics/linguistics), we generated two predictions based on which direction we used to match themes. Which one was better? That was a topic of considerable internal debate with no “right” answer except to test against the real-world (which in this case will be a very long test).
  5. Deciding on Black-Box Validity: This one is surprisingly hard. When you have a black-box system, you generally rely on being able to measure it’s predictions against a set of fairly well known decisions before you apply it to the real-world. We didn’t have that and it was HARD to decide how and whether our brute force machine-learning system was working at all. But even in cases where external measurement comparisons exist, it’s the unexpected predictions that cause political problems with analytics adoption. If you’ve ever tried to convince a skeptical organization that a black-box result is right, you know how hard this.
  6. Explaining the model: There’s an old saying in philosophy (from James) that a difference that makes no difference is no difference. If a model has an interesting result but nobody believes it, does it matter? A big part of how interesting, important and valid we think a model is comes from how well it’s explained.

This long litany is why, in the end, the quality of your analysis is always about the quality of your people. We had access to some great tools (Sysomos, Boilerpipe, Java, SPSS, R and Crimson Hexagon), but interesting approaches and interesting results don’t come from tools.

That being said, I can’t resist special call-outs to Boilerpipe which did a really nice job of text extraction and SPSS Text Analytics which did a great job facilitating our thematic analysis and matching.


Thoughts on the Method and Results

So is culture matching a good way to predict the Oscars?

It might be a useful variable but I’m sure it’s not a complete prediction system. That’s really no different that we hoped going into this exercise. And we’ll learn a little (but not much) more on Awards night. It would be better if we got the full vote to see how close our rank ordering was.

Either way, the culture-matching approach is promising as a technique. Looking through the results, I’m confident that it passes the analyst sniff test – there’s something real here. There are a number of extensions to the system we haven’t (and probably won’t) try – at least for this little challenge. We’d like to incorporate sentiment around themes, not just matching. We generated a number of analyst-driven cultural dimensions for machine training that we haven’t used. We’d like to try some different machine-learning techniques that might be better suited to our source material. There is a great deal of taxonomic tuning around themes that might drive better results. It’s rare that an ambitious analytics project is every really finished, though the world often says otherwise.

In this case, I was pleased with the themes we were able to extract by movie. A little less with the themes in our Hollywood corpus. Why? I suspect because long-form movie reviews are unusually rich in elaborating the types of cultural themes we were interested in. In addition, a lot of the themes that we pulled out of the culture corpus are topical. It’s (kind of) interesting to know that terrorism or the presidential campaign were hot topics this last year, but that isn’t the type of theme we’re looking for. I’m particularly interested in whether and how successful we can be in deepening themes beyond the obvious one. Themes around race, inequality and wealth are fairly easy to pick out. But if the Martian scores poorly because Hollywood isn’t much about engineering and science (and I’m pretty sure that’s true), what about its human themes around exploration, courage and loneliness? Those topics emerged as key themes from the movie reviews, but they are hard to discover in the Hollywood corpus. That might be because they aren’t very important in the culture – that’s certainly plausible – but it also seems possible that our analysis wasn’t rich enough to find their implicit representations.

Regardless, I’m happy with the outcome. It seems clear to me that this type of culture matching can be successful and brings analytic rigor to a topic that is otherwise mostly hot-air. What’s more it can be successful in a reasonable timeframe and for a reasonable amount of money (which is critical for non-academic use-cases). From start to finish, we spent about four weeks on this problem – and while we had a large team, it was all part-timers.

This was definitely a problem to fall in love with and we’d kill to do more, expand the method, and prove it out on more substantial and testable data. If you have a potential use for culture matching, give us a call. We probably can’t do it for free, but we will do if for less than cost. And, of course, if you just need an incredible team of analysts who can dream up a creative solution to a hard, real-world problem, pull data from almost anything, bring to bear world-class tools across traditional stats, machine-learning and text analytics, and deliver interesting and useful results…well, that’s fine too.


Torture is Bad – Don’t Waterboard your Models even when you know they are Wrong

Predicting the Best Actor and Actress Categories

My Analytics Counseling Family here at EY has been participating in the 538 Academy Award Challenge. Our project involved creating a culture-matching engine – a way to look at pieces of content (in this case, obviously, movies) and determine how well they match a specific community’s worldview. The hypothesis is that the more a movie matches the current Hollywood zeitgeist, the more likely it I to win. In my last post, I described in some detail the way we did that and our results for predicting the Best Movie (The Big Short). We were pretty happy with the way the model worked and the intuitive fit between the movies and our culture-matching engine. Of course, nothing in what we’ve done proves that culture matching is a great way to predict the Oscars (and even if we’re right it won’t prove much in a single year), but that wasn’t really the point. Culture-matching is a general technique with interesting analytics method and if the results are promising in terms of our ability to make a match, we think that’s pretty great.

The second part of our task, however, was to predict the Best Actor and Actress awards. Our method for doing this was similar to our method for predicting the best movie award but there were a few wrinkles. First, we extracted language specific to each character in the nominated movie. This is important to understand. We aren’t looking at how Hollywood talks about DiCaprio or Cranston or Lawrence as people and actors. We aren’t looking at how they are reviewed. We’re entirely focused on how their character is described.

This is the closest analogue we could think of to culture matching movies. However, this was a point of considerable debate internal to our team. To me, it seems intuitively less likely that people will prefer an actor or actress because their character matches our worldview than when discussing a movie as a whole. We all understood that and agreed that our approach was less compelling when it came to ANY of the secondary awards. However, our goal was to focus on culture-matching more than it was to find the best method for predicting acting awards. We could have predicted screenplay, I suppose, but there’s no reason to think the analysis would deviate in the slightest from our prediction around movie.

Once we had key themes around each nominated role, we matched those themes to our Hollywood corpus. In our first go round, we matched to the entire corpus matching actor themes to broad cultural themes. This didn’t work well. It turned out that we were conflating themes about people with themes about other things in ways that didn’t make much sense. So for our second pass, we tightened the themes in the Hollywood corpus to only those which were associated with people.

In essence, we’re saying which roles best correspond to the way Hollywood talks about people and picking the actor/actress who played that role.

So here’s how it came out:

1Bryan Cranston
2Michael Fassbender
3Leonardo DiCaprio
4Eddie Redmayne
5Matt Damon


1Jennifer Lawrence
2Brie Larson
3Cate Blanchett
4Saoirse Ronan
5Charlotte Rampling


Do I think we’re going to be right? Not a chance.

But that doesn’t mean the method isn’t working pretty well. In fact, I think it worked about as well as we could have hoped. Here, for example, are the themes we extracted for some of the key actors and actresses (by which I mean their nominated roles):

For Matt Damon in the Martian: Humor, Optimism, Engineer, Scientist, and leadership.

For Leonardo DiCaprio in the Revenant: Survival, Endurance, Tragedy, Individual, Unrelenting, Warrior, Physicality

For Bryan Cranston in Trumbo: Idealist, humanity, drinking, liberal, civil rights

If you’ve seen these movies, I think you can agree that the thematic pulls are reasonable. And is it any surprise, as you read the list, that Cranston is our predicted winner? I think not. To me, this says more about whether our method is applicable to this kind of prediction – and the answer is probably not – than whether the method itself is working well. Take away what we know about the actors and the process, and I think you’d probably agree that the model has done the best possible job of culture matching to Hollywood.

I was a bit concerned about the Jennifer Lawrence prediction. I saw the logic of Cranston’s character immediately, but Joy didn’t immediately strike me as an obvious fit to Hollywood’s view of people. When I studied the themes that emerged around her character, though, I thought it made reasonable sense:

Lawrence in Joy: Forceful, personality, imagination, friendship, heroine

WDYT? There are other themes I might have expected to emerge that didn’t, but these seem like a fairly decent set and you can see where something like forceful, in particular, might match well (it did).

In the end, it didn’t make me think the model was broken.

We tried tuning these models, but while different predictions can be forced from the model, nothing we did convinced us that, when it came to culture matching, we’d really improved our result. When you start torturing your model to get the conclusions you think are right, it’s probably time to stop.

It’s all about understanding two critical items: what your model is for and whether or not you think the prediction could be better. In this case, we never expected our model to be able to predict the Academy Awards exactly. If we understand why our prediction isn’t aligned to likely outcomes, that may well be good enough. And, of course, even the best model won’t predict most events with anything like 100% accuracy. If you try too hard to fit your model to the data or – even worse – to your expectations, you remove the value of having a model in the first place.

Just like in the real world, with enough pain you can make your model say anything. That doesn’t make it reliable.

So we’re going down with this particular ship!


Machine Learning

We’ve been experimenting with a second method that focuses on machine learning. Essentially, we’re training a machine learning system with reviews about each movie and then categorizing the Hollywood corpus and seeing which movie gets the most hits. Unfortunately, real work has gotten in the way of some our brute-force machine learning work and we haven’t progressed as much on this as we hoped.

To date, it hasn’t done a great job. Well, that’s being kind. Really it kind of sucks. Our results look pretty random and where we’ve been able to understand the non-random results, they haven’t captured real themes but only passing similarities (like a tendency to mention New York). With all due respect to Ted Cruz, we don’t think that’s a good enough cultural theme to hang our hat on.

As of right now, our best conclusion is that the method doesn’t work well.

We probably won’t have time to push this work further, but right now I’d say that if I was doing this work again I’d concentrate on the linguistic approach. I think our documents were too long and complex and our themes too abstract to work well with the machine learning systems we were using.

In my next post, I have some reflections on the process and it what it tells us about how analytics works.

Bet your Shirt on The Big Short

Early Results

We’re still tweaking the machine learning system and the best actor and actress categories. But our text/linguistic culture-matching model produced the following rank ordering for the best picture category:


So if you don’t know, now you know…The Big Short wins it.

Incidentally, we also scored movies that had best actor/actress nominees (since they were in our corpus). Big Short still won, but some of those movies (such as Trumbo) scored very well. You can read that anyway you like – it might indicate that the best actor and actress nominations are heavily influenced by how much voters liked the type of movie (which is certainly plausible) or it might indicate that our model is a pretty bad predictor since those movies didn’t even garner nominations. And, of course, given our sample size, it probably means nothing at all.

I think the list makes intuitive sense – which is always something of a relief when you’ve gone the long way around with a methodology. I particularly think the bottom of the list makes sense with The Martian and Mad Max. Both movies feel well outside any current Hollywood zeitgeist (except maybe the largely silent super-model refugees in MMFR). If a system can pick the losers, perhaps it can pick the winners as well. But more important to me, it suggests that our method is doing a credible job of culture matching.

With a few more weeks, we’ll probably take a closer look at some of the classifications and see if there are any biasing words/themes that are distorting the results. This stuff is hard and all too easy to get wrong – especially in your spare time. We’ll also have results from the black-box machine learning system, though we’re not confident about it, as well as what I hope will be interesting results for the actor/actress category. We’ve never believed that the method is as applicable to that problem (predicting acting awards) but we’re fairly satisfied with the initial themes that emerged from each actor/actress so we’re a little more optimistic that we’ll have an interesting solution.

Stay tuned…

Building a Unique Cultural Prediction Engine for the Academy-Awards

Beauty is in the eye of the beholder. But what determines beauty is behind the eye of the beholder.

We know that if two people watch the same debate, they nearly always think the candidate who is closest to their opinion won. That’s why debates seldom move minds. The same is surely true for movies. How often does a scene or character in a movie resonate with something that’s going on in your life?  You’ve probably had that happen and when it does, it makes the movie more memorable and impactful.  Given a roughly similar level of artistry and accomplishment, the movie that Hollywood insiders will likely prefer is the one that feels closest to their heart. But how to measure that? Our goal was to build a method for understanding a community’s culture by understanding the whole of what they read and then to develop methods for matching specific pieces of content to culture. Our hypothesis is that, other things being roughly equal, the movie that is closest to the Hollywood worldview will win.


Who is this “we” Kemosabe?

I lead the digital analytics practice at Ernest & Young (EY). As part of that, I also lead a “Counseling Family” of west coast analysts (I’m based in SF). A counseling family isn’t so much a corporate reporting structure as a community of interest and support group. We try to do fun stuff on the side, support each other, and build careers. Since my CF is all analysts (but not all digital), our idea of fun isn’t limited to surfing, skiing and room escapes (though it does include those). We try to sprinkle in some fun analytics projects we can do on the side – things that give us a chance to pursue special interests, work together, and have some deeply geeky fun. So when 538 – a site we all love – announced their Academy Awards prediction challenge, I signed us up. We have a much larger team and a much larger family than you’ll see here, but not everybody always has time for the fun stuff. Clients come first. For all of us, this is a side-project we squeezed in-between the billable hours. So special thanks to all the members of the team who contributed (mightily) to this effort. This is far more their effort than mine and I’ve tried to call out the team members who worked on each step of the analysis. And what did I contribute? Well, you know that Fedex commercial where the senior guy kind of adds the arm chop? Seriously, the broad analytics approach was mine but all the real work came for the teams I’ve named below.


Why we think this is interesting

It’s unlikely that matching content to community culture will out-perform other prediction methods that are focused on things like earlier voting results. However, such methods are of interest only with respect to the problem of predicting this specific award. And how much do we really care about that? I’ll just say this, if you’re betting on the Oscars, “culture matching” probably isn’t the best bet to punch your winning ticket.

Our goal was to develop an approach that might be interesting and applicable to a broad range of problems and that would require interesting analytic methods (R idea of fun). Wouldn’t it be nice to be able to map a TV drama to an audience’s culture? To understand which social media content would most appeal to a targeted community? To know which arguments will play best in Iowa vs. New Hampshire? These, and hundreds of other applications, involve matching content to a community culture. So let’s dispel with the myth that this is just about predicting the Oscars. There are many, many problems where having a “culture matching score of content to community” might significantly improve analytic models and the Oscars is just one (interesting) case of that broad problem set.

Methodology – High Level

To make our culture matching method work, we needed three basic components: a way to describe the Hollywood worldview and capture whatever zeitgeist was current, a way to describe the key themes in a movie, and a way to match and score the two sets of themes. Here’s how we went about developing these three components.

Academy Awards Process 1

Within this broad method, we tried several different sub-approaches and several different technology solutions. Below is a more detailed break-out of each step.

Steps 1 & 2: Identify a Hollywood Corpus and Extract

One of the challenges to predicting Academy Awards is uncertainty around the exact community of voters. And, of course, even if you know the community you don’t necessarily know what (or if) they read. We looked at a number of different potential sources in developing a Hollywood corpus. We considered industry specific sources like Variety and American Cinematographer, general purpose sources like the LA or NY Times, and broader sources like Vanity Fair and the Atlantic Monthly. With more time, we might have been able to find ways to analytically identify which corpus or combination was most reflective. For this exercise, however, we simply pulled each data source, categorized them, and reviewed them. The review included study of word/phrase frequency counts and analyst’s reading the source material posts. We eliminated the industry specific sources because the text wasn’t thematically interesting enough. Though filled with Hollywood specific materials, most of that material was technical in nature (jobs, films in process, etc.) and too thin to establish broader cultural themes. The LA Times proved more accessible for large amounts of content than the NY Times and gave us a more focused geography. Vanity Fair turned out to be our favorite corpus. It blended lots of opinion and culture with a healthy serving of Hollywood specific content. For our analysis, we ended up using selected VF and LA Times categories with Vanity Fair dominating. For both these sources, we extracted 12 months of articles using a standard listening tool, filtered them by category and to eliminate duplications, and then loaded them into our analysis tools.

Data Extraction Team: Jesse Gross, Abhay Khera


Steps 3 & 4: Identify a Movie Corpus and Extract

Our initial thought was that we could use movie reviews to create a corpus specific to each movie. A good movie review will not only capture topic themes, but is likely to capture more abstract themes and also to tie those to broader cultural issues (like race, fear, or wealth inequality). We expected to be able to use sites like IMBD, Metacritic or Rotten Tomatoes to quickly identify and pull reviews. We were right – and wrong – about this. Movie reviews did turn out to be a really rich, highly-focused source of language about each movie. And the sites above gave us a great list of movie reviews to pull from. But we couldn’t pull full-text reviews from the APIs on those sites. Instead, we pulled the URLs of the reviews from those sites, filtered them for English-language only, and then wrote a Java program using Boilerpipe’s text extraction library to actually extract the review from its original site. Boilerpipe did a really nice job extracting core document text and with our script and the URL’s, we were able to quickly pull a large library of movie reviews for each nominated movie. This turned out to be more work than we expected but we ended up pretty satisfied with our Movie corpus.

Movie Corpus Data Team: Emanuel Rasolofomasoandro, Michael Yeluashvili, Jin Liu, Tony Perez, Yilong Wang

Text & Linguistic Analysis vs. Machine Learning

At this point, we had two alternative approaches to matching the “Movie” corpus to the “Hollywood” corpus. The first method was to use IBM’s SPSS Text Analytics to extract and match themes. The second approach was to use a machine-learning tool to auto-match the two corpora.

Text & Linguistic Analysis Method

Step 5: Extracting Top Themes from each Movie

We started with a set of about 150 movie reviews per movie (all Best Picture nominees and those featuring a Best Actor or Actress nominee), and used R and SPSS to do an analysis of which word themes frequently occurred in that set. For example, some of 45 Year’s themes included “marriage”, “secrets”, “aging”, “jealousy”. We gathered about 20 themes for each movie and each actor. Second, we used SPSS to count the frequency that these themes occurred in our 2015 Hollywood corpus. The total number of occurrences gave us an initial score for each movie or actor. Next, we adjusted the initial score by examining context. We looked at a theme’s context in movie reviews.   For example, in 45 Years, the husband receives a letter with important news. Therefore, a letter, in this context, is a personal communication sent from one person to another. In our Hollywood corpus, there were frequent occurrences of “letters to the editor”. That’s clearly a textual distortion not a cultural theme. We tried to make sure that thematic concepts were truly matches. When we judged the match to be spurious, we adjusted the score by removing the match.

We did try some alternative approaches. For example, we also asked ourselves whether the process worked in reverse. If we took key themes from the Hollywood corpus and then matched them to each movie, would be get similar results? If you think about it, you’ll see that this is a rather different question. There’s no guarantee that the top overall themes in Hollywood will match the top themes from ANY of our movies – so it’s possible that the answer to which movies match Hollywood themes isn’t the same as the answer which movie themes resonated most strongly in Hollywood. Our lead analyst on the SPSS text analytics, Brian Kaemingk, described these questions this way:

Academy Awards Questions

In the end, the models for each question produced quite similar results but there were a couple of movies (e.g. Bridge of Spies) that moved position significantly between the two methods. We decided that Question #1 worked better for our analysis, since the theme identification in the Movie corpus was richer and more specific than the them identification in the Hollywood corpus. We think those more specific themes are probably better in terms of capturing real aspects of the Hollywood worldview and creating that feeling of resonance we’re hoping to capture.

We also used this method to make our predictions around best actor and actress. Instead of using the whole review corpus, however, we first extracted concept maps around the character/actor. For Matt Damon in the Martian, that looked something like this:

Academy Awards Actor Concepts 2

We then matched these Concept Maps back to the Hollywood corpus. In our first try, we simply matched to the entire Hollywood corpus. However, we decided this confused concepts since optimism about the weather isn’t quite the same as being an optimistic person. So we decided to extract just people-themed concepts from the Hollywood corpus and then match those. The idea is that, just as we are matching the movie to broader cultural themes, we matched the character to the way Hollywood talks and reads about real people. Does Hollywood resonate to optimistic, imaginative scientists?

Well, at least Matt Damon’s handsome…

On the technical side of things, we used R to pre-process data and count theme frequency. R also helped to remove stop and non-thematic words and apply document stemming to make sure that themes were counted correctly. Stemming significantly boosts the accuracy of matching and theme consolidation. Most of our work, however, was done using IBM SPSS. We used SPSS to score themes and examine context using co-occurrence, semantic network, concept root derivation, concept inclusion, and text link analysis NLP techniques.

Text Analytics Team: Brian Kaemingk, Miguel Campo Rembado, Mohit Shroff, Jon Entwistle and Sarah Aiello


Machine Learning Method

Step 5-7: Training, Categorization and Scoring

We are experimenting with different methods of using our machine learning tools. But our first attempt is very much a brute force method. We loaded the Movie and the Hollywood corpus into a workset. We then created training categories for each movie and trained the tool using the movie reviews for that film. After the training, we simply let the tool categorize every article in the Hollywood corpus and counted which movie it was categorized as most resembling. The category in which the most Hollywood posts were sorted was the winner.

This approach is asking a lot of the machine learning tool, but it was simple and potentially interesting. The hard part was trying to figure out if the resulting categorization made sense! That’s often the difficulty when working with a Black Box tool. Even if you believe the results, it can be hard to make skeptics into converts with black-box systems. It was particularly challenging in this case because we weren’t at all confident that this brute force method would produce good results AND we really had no outside view of a plausible rank ordering of movies. Even if the assignment of posts to movies was completely random, it would be hard to tell if it was wrong.

Machine Learning Team: Emanuel Rasolofomasoandro, Michael Yeluashvili, Jesse Gross, Jin Liu, Mohit Shroff



Isn’t it awful when you get all the way through the hour of something like Dancing with the Stars and then the actual selection is carried over into the next episode? Totally sucks!

Unfortunately, it’s a 538 challenge and we owe them first shot at the actual prediction. I’ll push it as soon as we post there. The good news? You can see it there Tuesday and I’ll even update this post to include the prediction.


We’ve release the predictions. Here’s the initial rank ordering of Best Picture nominees by match to Hollywood themes:

  1. The Big Short
  2. Spotlight
  3. Brooklyn
  4. Room
  5. Bridge of Spies
  6. The Revenant
  7. Mad Max
  8. The Martian

So the Big Short wins it – and if you didn’t know, now you know!

Here’s the 538 article on our method (it also includes our probably disastrous picks in the acting categories)…