Tag Archives: EY

Torture is Bad – Don’t Waterboard your Models even when you know they are Wrong

Predicting the Best Actor and Actress Categories

My Analytics Counseling Family here at EY has been participating in the 538 Academy Award Challenge. Our project involved creating a culture-matching engine – a way to look at pieces of content (in this case, obviously, movies) and determine how well they match a specific community’s worldview. The hypothesis is that the more a movie matches the current Hollywood zeitgeist, the more likely it I to win. In my last post, I described in some detail the way we did that and our results for predicting the Best Movie (The Big Short). We were pretty happy with the way the model worked and the intuitive fit between the movies and our culture-matching engine. Of course, nothing in what we’ve done proves that culture matching is a great way to predict the Oscars (and even if we’re right it won’t prove much in a single year), but that wasn’t really the point. Culture-matching is a general technique with interesting analytics method and if the results are promising in terms of our ability to make a match, we think that’s pretty great.

The second part of our task, however, was to predict the Best Actor and Actress awards. Our method for doing this was similar to our method for predicting the best movie award but there were a few wrinkles. First, we extracted language specific to each character in the nominated movie. This is important to understand. We aren’t looking at how Hollywood talks about DiCaprio or Cranston or Lawrence as people and actors. We aren’t looking at how they are reviewed. We’re entirely focused on how their character is described.

This is the closest analogue we could think of to culture matching movies. However, this was a point of considerable debate internal to our team. To me, it seems intuitively less likely that people will prefer an actor or actress because their character matches our worldview than when discussing a movie as a whole. We all understood that and agreed that our approach was less compelling when it came to ANY of the secondary awards. However, our goal was to focus on culture-matching more than it was to find the best method for predicting acting awards. We could have predicted screenplay, I suppose, but there’s no reason to think the analysis would deviate in the slightest from our prediction around movie.

Once we had key themes around each nominated role, we matched those themes to our Hollywood corpus. In our first go round, we matched to the entire corpus matching actor themes to broad cultural themes. This didn’t work well. It turned out that we were conflating themes about people with themes about other things in ways that didn’t make much sense. So for our second pass, we tightened the themes in the Hollywood corpus to only those which were associated with people.

In essence, we’re saying which roles best correspond to the way Hollywood talks about people and picking the actor/actress who played that role.

So here’s how it came out:

1Bryan Cranston
2Michael Fassbender
3Leonardo DiCaprio
4Eddie Redmayne
5Matt Damon


1Jennifer Lawrence
2Brie Larson
3Cate Blanchett
4Saoirse Ronan
5Charlotte Rampling


Do I think we’re going to be right? Not a chance.

But that doesn’t mean the method isn’t working pretty well. In fact, I think it worked about as well as we could have hoped. Here, for example, are the themes we extracted for some of the key actors and actresses (by which I mean their nominated roles):

For Matt Damon in the Martian: Humor, Optimism, Engineer, Scientist, and leadership.

For Leonardo DiCaprio in the Revenant: Survival, Endurance, Tragedy, Individual, Unrelenting, Warrior, Physicality

For Bryan Cranston in Trumbo: Idealist, humanity, drinking, liberal, civil rights

If you’ve seen these movies, I think you can agree that the thematic pulls are reasonable. And is it any surprise, as you read the list, that Cranston is our predicted winner? I think not. To me, this says more about whether our method is applicable to this kind of prediction – and the answer is probably not – than whether the method itself is working well. Take away what we know about the actors and the process, and I think you’d probably agree that the model has done the best possible job of culture matching to Hollywood.

I was a bit concerned about the Jennifer Lawrence prediction. I saw the logic of Cranston’s character immediately, but Joy didn’t immediately strike me as an obvious fit to Hollywood’s view of people. When I studied the themes that emerged around her character, though, I thought it made reasonable sense:

Lawrence in Joy: Forceful, personality, imagination, friendship, heroine

WDYT? There are other themes I might have expected to emerge that didn’t, but these seem like a fairly decent set and you can see where something like forceful, in particular, might match well (it did).

In the end, it didn’t make me think the model was broken.

We tried tuning these models, but while different predictions can be forced from the model, nothing we did convinced us that, when it came to culture matching, we’d really improved our result. When you start torturing your model to get the conclusions you think are right, it’s probably time to stop.

It’s all about understanding two critical items: what your model is for and whether or not you think the prediction could be better. In this case, we never expected our model to be able to predict the Academy Awards exactly. If we understand why our prediction isn’t aligned to likely outcomes, that may well be good enough. And, of course, even the best model won’t predict most events with anything like 100% accuracy. If you try too hard to fit your model to the data or – even worse – to your expectations, you remove the value of having a model in the first place.

Just like in the real world, with enough pain you can make your model say anything. That doesn’t make it reliable.

So we’re going down with this particular ship!


Machine Learning

We’ve been experimenting with a second method that focuses on machine learning. Essentially, we’re training a machine learning system with reviews about each movie and then categorizing the Hollywood corpus and seeing which movie gets the most hits. Unfortunately, real work has gotten in the way of some our brute-force machine learning work and we haven’t progressed as much on this as we hoped.

To date, it hasn’t done a great job. Well, that’s being kind. Really it kind of sucks. Our results look pretty random and where we’ve been able to understand the non-random results, they haven’t captured real themes but only passing similarities (like a tendency to mention New York). With all due respect to Ted Cruz, we don’t think that’s a good enough cultural theme to hang our hat on.

As of right now, our best conclusion is that the method doesn’t work well.

We probably won’t have time to push this work further, but right now I’d say that if I was doing this work again I’d concentrate on the linguistic approach. I think our documents were too long and complex and our themes too abstract to work well with the machine learning systems we were using.

In my next post, I have some reflections on the process and it what it tells us about how analytics works.

Bet your Shirt on The Big Short

Early Results

We’re still tweaking the machine learning system and the best actor and actress categories. But our text/linguistic culture-matching model produced the following rank ordering for the best picture category:


So if you don’t know, now you know…The Big Short wins it.

Incidentally, we also scored movies that had best actor/actress nominees (since they were in our corpus). Big Short still won, but some of those movies (such as Trumbo) scored very well. You can read that anyway you like – it might indicate that the best actor and actress nominations are heavily influenced by how much voters liked the type of movie (which is certainly plausible) or it might indicate that our model is a pretty bad predictor since those movies didn’t even garner nominations. And, of course, given our sample size, it probably means nothing at all.

I think the list makes intuitive sense – which is always something of a relief when you’ve gone the long way around with a methodology. I particularly think the bottom of the list makes sense with The Martian and Mad Max. Both movies feel well outside any current Hollywood zeitgeist (except maybe the largely silent super-model refugees in MMFR). If a system can pick the losers, perhaps it can pick the winners as well. But more important to me, it suggests that our method is doing a credible job of culture matching.

With a few more weeks, we’ll probably take a closer look at some of the classifications and see if there are any biasing words/themes that are distorting the results. This stuff is hard and all too easy to get wrong – especially in your spare time. We’ll also have results from the black-box machine learning system, though we’re not confident about it, as well as what I hope will be interesting results for the actor/actress category. We’ve never believed that the method is as applicable to that problem (predicting acting awards) but we’re fairly satisfied with the initial themes that emerged from each actor/actress so we’re a little more optimistic that we’ll have an interesting solution.

Stay tuned…

Building a Unique Cultural Prediction Engine for the Academy-Awards

Beauty is in the eye of the beholder. But what determines beauty is behind the eye of the beholder.

We know that if two people watch the same debate, they nearly always think the candidate who is closest to their opinion won. That’s why debates seldom move minds. The same is surely true for movies. How often does a scene or character in a movie resonate with something that’s going on in your life?  You’ve probably had that happen and when it does, it makes the movie more memorable and impactful.  Given a roughly similar level of artistry and accomplishment, the movie that Hollywood insiders will likely prefer is the one that feels closest to their heart. But how to measure that? Our goal was to build a method for understanding a community’s culture by understanding the whole of what they read and then to develop methods for matching specific pieces of content to culture. Our hypothesis is that, other things being roughly equal, the movie that is closest to the Hollywood worldview will win.


Who is this “we” Kemosabe?

I lead the digital analytics practice at Ernest & Young (EY). As part of that, I also lead a “Counseling Family” of west coast analysts (I’m based in SF). A counseling family isn’t so much a corporate reporting structure as a community of interest and support group. We try to do fun stuff on the side, support each other, and build careers. Since my CF is all analysts (but not all digital), our idea of fun isn’t limited to surfing, skiing and room escapes (though it does include those). We try to sprinkle in some fun analytics projects we can do on the side – things that give us a chance to pursue special interests, work together, and have some deeply geeky fun. So when 538 – a site we all love – announced their Academy Awards prediction challenge, I signed us up. We have a much larger team and a much larger family than you’ll see here, but not everybody always has time for the fun stuff. Clients come first. For all of us, this is a side-project we squeezed in-between the billable hours. So special thanks to all the members of the team who contributed (mightily) to this effort. This is far more their effort than mine and I’ve tried to call out the team members who worked on each step of the analysis. And what did I contribute? Well, you know that Fedex commercial where the senior guy kind of adds the arm chop? Seriously, the broad analytics approach was mine but all the real work came for the teams I’ve named below.


Why we think this is interesting

It’s unlikely that matching content to community culture will out-perform other prediction methods that are focused on things like earlier voting results. However, such methods are of interest only with respect to the problem of predicting this specific award. And how much do we really care about that? I’ll just say this, if you’re betting on the Oscars, “culture matching” probably isn’t the best bet to punch your winning ticket.

Our goal was to develop an approach that might be interesting and applicable to a broad range of problems and that would require interesting analytic methods (R idea of fun). Wouldn’t it be nice to be able to map a TV drama to an audience’s culture? To understand which social media content would most appeal to a targeted community? To know which arguments will play best in Iowa vs. New Hampshire? These, and hundreds of other applications, involve matching content to a community culture. So let’s dispel with the myth that this is just about predicting the Oscars. There are many, many problems where having a “culture matching score of content to community” might significantly improve analytic models and the Oscars is just one (interesting) case of that broad problem set.

Methodology – High Level

To make our culture matching method work, we needed three basic components: a way to describe the Hollywood worldview and capture whatever zeitgeist was current, a way to describe the key themes in a movie, and a way to match and score the two sets of themes. Here’s how we went about developing these three components.

Academy Awards Process 1

Within this broad method, we tried several different sub-approaches and several different technology solutions. Below is a more detailed break-out of each step.

Steps 1 & 2: Identify a Hollywood Corpus and Extract

One of the challenges to predicting Academy Awards is uncertainty around the exact community of voters. And, of course, even if you know the community you don’t necessarily know what (or if) they read. We looked at a number of different potential sources in developing a Hollywood corpus. We considered industry specific sources like Variety and American Cinematographer, general purpose sources like the LA or NY Times, and broader sources like Vanity Fair and the Atlantic Monthly. With more time, we might have been able to find ways to analytically identify which corpus or combination was most reflective. For this exercise, however, we simply pulled each data source, categorized them, and reviewed them. The review included study of word/phrase frequency counts and analyst’s reading the source material posts. We eliminated the industry specific sources because the text wasn’t thematically interesting enough. Though filled with Hollywood specific materials, most of that material was technical in nature (jobs, films in process, etc.) and too thin to establish broader cultural themes. The LA Times proved more accessible for large amounts of content than the NY Times and gave us a more focused geography. Vanity Fair turned out to be our favorite corpus. It blended lots of opinion and culture with a healthy serving of Hollywood specific content. For our analysis, we ended up using selected VF and LA Times categories with Vanity Fair dominating. For both these sources, we extracted 12 months of articles using a standard listening tool, filtered them by category and to eliminate duplications, and then loaded them into our analysis tools.

Data Extraction Team: Jesse Gross, Abhay Khera


Steps 3 & 4: Identify a Movie Corpus and Extract

Our initial thought was that we could use movie reviews to create a corpus specific to each movie. A good movie review will not only capture topic themes, but is likely to capture more abstract themes and also to tie those to broader cultural issues (like race, fear, or wealth inequality). We expected to be able to use sites like IMBD, Metacritic or Rotten Tomatoes to quickly identify and pull reviews. We were right – and wrong – about this. Movie reviews did turn out to be a really rich, highly-focused source of language about each movie. And the sites above gave us a great list of movie reviews to pull from. But we couldn’t pull full-text reviews from the APIs on those sites. Instead, we pulled the URLs of the reviews from those sites, filtered them for English-language only, and then wrote a Java program using Boilerpipe’s text extraction library to actually extract the review from its original site. Boilerpipe did a really nice job extracting core document text and with our script and the URL’s, we were able to quickly pull a large library of movie reviews for each nominated movie. This turned out to be more work than we expected but we ended up pretty satisfied with our Movie corpus.

Movie Corpus Data Team: Emanuel Rasolofomasoandro, Michael Yeluashvili, Jin Liu, Tony Perez, Yilong Wang

Text & Linguistic Analysis vs. Machine Learning

At this point, we had two alternative approaches to matching the “Movie” corpus to the “Hollywood” corpus. The first method was to use IBM’s SPSS Text Analytics to extract and match themes. The second approach was to use a machine-learning tool to auto-match the two corpora.

Text & Linguistic Analysis Method

Step 5: Extracting Top Themes from each Movie

We started with a set of about 150 movie reviews per movie (all Best Picture nominees and those featuring a Best Actor or Actress nominee), and used R and SPSS to do an analysis of which word themes frequently occurred in that set. For example, some of 45 Year’s themes included “marriage”, “secrets”, “aging”, “jealousy”. We gathered about 20 themes for each movie and each actor. Second, we used SPSS to count the frequency that these themes occurred in our 2015 Hollywood corpus. The total number of occurrences gave us an initial score for each movie or actor. Next, we adjusted the initial score by examining context. We looked at a theme’s context in movie reviews.   For example, in 45 Years, the husband receives a letter with important news. Therefore, a letter, in this context, is a personal communication sent from one person to another. In our Hollywood corpus, there were frequent occurrences of “letters to the editor”. That’s clearly a textual distortion not a cultural theme. We tried to make sure that thematic concepts were truly matches. When we judged the match to be spurious, we adjusted the score by removing the match.

We did try some alternative approaches. For example, we also asked ourselves whether the process worked in reverse. If we took key themes from the Hollywood corpus and then matched them to each movie, would be get similar results? If you think about it, you’ll see that this is a rather different question. There’s no guarantee that the top overall themes in Hollywood will match the top themes from ANY of our movies – so it’s possible that the answer to which movies match Hollywood themes isn’t the same as the answer which movie themes resonated most strongly in Hollywood. Our lead analyst on the SPSS text analytics, Brian Kaemingk, described these questions this way:

Academy Awards Questions

In the end, the models for each question produced quite similar results but there were a couple of movies (e.g. Bridge of Spies) that moved position significantly between the two methods. We decided that Question #1 worked better for our analysis, since the theme identification in the Movie corpus was richer and more specific than the them identification in the Hollywood corpus. We think those more specific themes are probably better in terms of capturing real aspects of the Hollywood worldview and creating that feeling of resonance we’re hoping to capture.

We also used this method to make our predictions around best actor and actress. Instead of using the whole review corpus, however, we first extracted concept maps around the character/actor. For Matt Damon in the Martian, that looked something like this:

Academy Awards Actor Concepts 2

We then matched these Concept Maps back to the Hollywood corpus. In our first try, we simply matched to the entire Hollywood corpus. However, we decided this confused concepts since optimism about the weather isn’t quite the same as being an optimistic person. So we decided to extract just people-themed concepts from the Hollywood corpus and then match those. The idea is that, just as we are matching the movie to broader cultural themes, we matched the character to the way Hollywood talks and reads about real people. Does Hollywood resonate to optimistic, imaginative scientists?

Well, at least Matt Damon’s handsome…

On the technical side of things, we used R to pre-process data and count theme frequency. R also helped to remove stop and non-thematic words and apply document stemming to make sure that themes were counted correctly. Stemming significantly boosts the accuracy of matching and theme consolidation. Most of our work, however, was done using IBM SPSS. We used SPSS to score themes and examine context using co-occurrence, semantic network, concept root derivation, concept inclusion, and text link analysis NLP techniques.

Text Analytics Team: Brian Kaemingk, Miguel Campo Rembado, Mohit Shroff, Jon Entwistle and Sarah Aiello


Machine Learning Method

Step 5-7: Training, Categorization and Scoring

We are experimenting with different methods of using our machine learning tools. But our first attempt is very much a brute force method. We loaded the Movie and the Hollywood corpus into a workset. We then created training categories for each movie and trained the tool using the movie reviews for that film. After the training, we simply let the tool categorize every article in the Hollywood corpus and counted which movie it was categorized as most resembling. The category in which the most Hollywood posts were sorted was the winner.

This approach is asking a lot of the machine learning tool, but it was simple and potentially interesting. The hard part was trying to figure out if the resulting categorization made sense! That’s often the difficulty when working with a Black Box tool. Even if you believe the results, it can be hard to make skeptics into converts with black-box systems. It was particularly challenging in this case because we weren’t at all confident that this brute force method would produce good results AND we really had no outside view of a plausible rank ordering of movies. Even if the assignment of posts to movies was completely random, it would be hard to tell if it was wrong.

Machine Learning Team: Emanuel Rasolofomasoandro, Michael Yeluashvili, Jesse Gross, Jin Liu, Mohit Shroff



Isn’t it awful when you get all the way through the hour of something like Dancing with the Stars and then the actual selection is carried over into the next episode? Totally sucks!

Unfortunately, it’s a 538 challenge and we owe them first shot at the actual prediction. I’ll push it as soon as we post there. The good news? You can see it there Tuesday and I’ll even update this post to include the prediction.


We’ve release the predictions. Here’s the initial rank ordering of Best Picture nominees by match to Hollywood themes:

  1. The Big Short
  2. Spotlight
  3. Brooklyn
  4. Room
  5. Bridge of Spies
  6. The Revenant
  7. Mad Max
  8. The Martian

So the Big Short wins it – and if you didn’t know, now you know!

Here’s the 538 article on our method (it also includes our probably disastrous picks in the acting categories)…


Digital Transformation of the Enterprise (with a side of Big Data)

Since I finished Measuring the Digital World and got back to regular blogging, I’ve been writing an extended series on the challenges of digital in the enterprise. Like many analysts, I’m often frustrated by the way our clients approach decision-making. So often, they lack any real understanding of the customer journey, any effective segmentation scheme, any real method for either doing or incorporating analytics into their decisioning, anything more than a superficial understanding of their customers, and anything more than the empty façade of a testing program. Is it any surprise that they aren’t very good at digital? This would be frustrating but understandable if companies simply didn’t invest in these capabilities. They aren’t magic, and no large enterprise can do these things without making a significant investment. But, in fact, many companies have invested plenty with very disappointing results. That’s maddening. I want to change that – and this series is an extended meditation on what it takes to do better and how large enterprises might truly gain competitive advantage in digital.

I hope that reading these posts is useful to people, but I know, too, that it’s hard to get the time. Heaven knows I struggle to read the stuff I’d like to. So I took advantage of the slow time over the holidays to do something that’s been on my wish list for about 2 years now – take some of the presentations I do and turn them into full online webinars. I started with a whole series that captures the core elements of this series – the challenge of digital transformation.

There are two versions of this video series. The first is a set of fairly short (2-4 minute) stories that walk through how enterprise decision-making gets done, what’s wrong with the way we do it, and how we can do better. It’s a ten(!) part series and meant to be tackled in order. It’s not really all that long…like I said, most of the videos are just 2-4 minutes long. I’ve also packaged up the whole story (except Part 10) in single video that runs just a little over 20 minutes. It’s shorter than viewing all 10 of the others, but you need a decent chunk of uninterrupted time to get at it. If you’re really pressed and only want to get the key themes without the story, you can just view Parts 8-10.

Here’s the video page that has all of these laid out in order:

Digital Transformation Video Series

Check it out and let me know what you think! To me it seems like a faster, better, and more enjoyable way to get the story about digital transformation and I’m hoping it’s very shareable as well. If you’re struggling to get analytics traction in your organization, these videos might be an easy thing to share with your CMO and digital channel leads to help drive real change.

I have to say I enjoyed doing these a lot and they aren’t really hard to do. They aren’t quite professional quality, but I think they are very listenable and I’ll keep working to make them better. In fact, I enjoyed doing the digital transformation ones so much that I knocked out another this last week – Big Data Explained.

This is one of my favorite presentations of all time – it’s rich in content and intellectually interesting. Big data is a subject that is obscured by hype, self-interest, and just plain ignorance; everyone talks about it but no one has a clear, cogent explanation of what it is and why it’s important. This presentation deconstructs the everyday explanation about big data (the 4Vs) and shows why it misses the mark. But it isn’t designed to merely expose the hype, it actually builds out a clear, straightforward and important explanation of why big data is real, why it challenges common IT and analytics paradigms, and how to understand whether a problem is a big data problem…or not. I’ve written about this before, but you can’t beat a video with supporting visuals for this particular topic. It’s less than fifteen minutes and, like the digital transformation series, it’s intended for a wide audience. If you have decision-makers who don’t get big data or are skeptical of the hype, they’ll appreciate this straightforward, clear, and no-nonsense explication of what it is.

You can get it on my video page or direct on Youtube

This is also a significant topic toward the end of Measuring the Digital World where I try to lay out a forward looking plan for digital analytics as a discipline.

I’m planning to do a steady stream of these videos throughout the year so I’d love thoughts/feedback if you have suggestions!

Next week I hope to have an update on my EY Counseling Family’s work in the 538 Academy Awards challenge. We’ve built our initial Hollywood culture models – it’s pretty cool stuff and I’m excited to share the results. Our model may not be as effective as some of the other challengers (TBD), but I think it’s definitely more fun.