Tag Archives: machine learning

A Guided Tour through Digital Analytics (Circa 2016)

I’ve been planning my schedule for the DA Hub in late September and while I find it frustrating (so much interesting stuff!), it’s also enlightening about where digital analytics is right now and where it’s headed. Every conference is a kind of mirror to its industry, of course, but that reflection is often distorted by the needs of the conference – to focus on the cutting-edge, to sell sponsorships, to encourage product adoption, etc.  With DA Hub, the Conference agenda is set by the enterprise practitioners who are leading groups – and it’s what they want to talk about. That makes the conference agenda unusually broad and, it seems to me, uniquely reflective of the state of our industry (at least at the big enterprise level).

So here’s a guided tour of my DA Hub – including what I thought was most interesting, what I choose, and why. At the end I hope that, like Indiana Jones picking the Holy Grail from a murderers row of drinking vessels, I chose wisely.

Session 1 features conversations on Video Tracking, Data Lakes, the Lifecycle of an Analyst, Building Analytics Community, Sexy Dashboards (surely an oxymoron), Innovation, the Agile Enterprise and Personalization. Fortunately, while I’d love to join both Twitch’s June Dershewitz to talk about Data Lakes and Data Swamps or Intuit’s Dylan Lewis for When Harry (Personalization) met Sally (Experimentation), I didn’t have to agonize at all, since I’m scheduled to lead a conversation on Machine Learning in Digital Analtyics. Still, it’s an incredible set of choices and represents just how much breadth there is to digital analytics practice these days.

Session 2 doesn’t make things easier. With topics ranging across Women in Analytics, Personalization, Data Science, IoT, Data Governance, Digital Product Management, Campaign Measurement, Rolling Your Own Technology, and Voice of Customer…Dang. Women in Analytics gets knocked off my list. I’ll eliminate Campaign Measurement even though I’d love to chat with Chip Strieff from Adidas about campaign optimization. I did Tom Bett’s (Financial Times) conversation on rolling your own technology in Europe this year – so I guess I can sacrifice that. Normally I’d cross the data governance session off my list. But not only am I managing some aspects of a data governance process for a client right now, I’ve known Verizon’s Rene Villa for a long time and had some truly fantastic conversations with him. So I’m tempted. On the other hand, retail personalization is of huge interest to me. So talking over personalization with Gautam Madiman from Lowe’s would be a real treat. And did I mention that I’ve become very, very interested in certain forms of IoT tracking? Getting a chance to talk with Vivint’s Brandon Bunker around that would be pretty cool. And, of course, I’ve spent years trying to do more with VoC and hearing Abercrombie & Fitch’s story with Sasha Verbitsky would be sweet. Provisionally, I’m picking IoT. I just don’t get a chance to talk IoT very much and I can’t pass up the opportunity. But personalization might drag me back in.

In the next session I have to choose between Dashboarding (the wretched state of as opposed to the sexiness of), Data Mining Methods, Martech, Next Generation Analytics, Analytics Coaching, Measuring Content Success, Leveraging Tag Management and Using Marketing Couds for Personalization. The choice is a little easier because I did Kyle Keller’s (Vox) conversation on Dashboarding two years ago in Europe. And while that session was probably the most contentious DA Hub group I’ve ever been in (and yes, it was my fault but it was also pretty productive and interesting), I can probably move on. I’m not that involved with tag management these days – a sign that it must be mature – so that’s off my list too. I’m very intrigued by Akhil Anumolu’s (Delta Airlines) session on Can Developers be Marketers? The Emerging Role of MarTech. As a washed-up developer, I still find myself believing that developers are extraordinarily useful people and vastly under-utilized in today’s enterprise. I’m also tempted by my friend David McBride’s session on Next Generation Analytics. Not only because David is one of the most enjoyable people that I’ve ever met to talk with, but because driving analytics forward is, really, my job. But I’m probably going to go with David William’s session on Marketing Clouds. David is brilliant and ASOS is truly cutting edge (they are a giant in the UK and global in reach but not as well known here), and this also happens to be an area where I’m personally involved in steering some client projects. David’s topical focus on single-vendor stacks to deliver personalization is incredibly timely for me.

Next up we have Millennials in the Analytics Workforce, Streaming Video Metrics, Breaking the Analytics Glass Ceiling, Experimentation on Steroids, Data Journalism, Distributed Social Media Platforms, Customer Experience Management, Ethics in Analytics(!), and Customer Segmentation. There are several choices in here that I’d be pretty thrilled with: Dylan’s session on Experimentation, Chip’s session on CEM and, of course, Shari Cleary’s (Viacom) session on Segmentation. After all, segmentation is, like, my favorite thing in the world. But I’m probably going to go with Lynn Lanphier’s (Best Buy) session on Data Journalism. I have more to learn in that space, and it’s an area of analytics I’ve never felt that my practice has delivered on as well as we should.

In the last session, I could choose from more on Customer Experience Management, Driving Analytics to the C-Suite, Optimizing Analytics Career-Oaths, Creating High-Impact Analytics Programs, Building Analytics Teams, Delivering Digital Products, Calculating Analytics Impact, and Moving from Report Monkey to Analytics Advisor. But I don’t get to choose. Because this is where my second session (on driving Enterprise Digital Transformation) resides. I wrote about doing this session in the EU early this summer – it was one of the best conversations around analytics I’ve had the pleasure of being part of. I’m just hoping this session can capture some of that magic. If I didn’t have hosting duties, I think I might gravitate toward Theresa Locklear’s (NFL) conversation on Return on Analytics. When we help our clients create new analytics and digital transformation strategies, we have to help them justify what always amount to significant new expenditures. So much of analytics is exploratory and foundational, however, that we don’t always have great answers about the real return. I’d love to be able to share thoughts on how to think (and talk) about analytics ROI in a more compelling fashion.

All great stuff.

We work in such a fascinating field with so many components to it. We can specialize in data science and analytics method, take care of the fundamental challenges around building data foundations, drive customer communications and personalization, help the enterprise understand and measure it’s performance, optimize relentlessly in and across channels, or try to put all these pieces together and manage the teams and people that come with that. I love that at a Conference like the Hub I get a chance to share knowledge with (very) like-minded folks and participate in conversations where I know I’m truly expert (like segmentation or analytics transformation), areas where I’d like to do better (like Data Journalism), and areas where we’re all pushing the outside of the envelope (IoT and Machine Learning) together. Seems like a wonderful trade-off all the way around.

See you there!
See you there!



The State of the Art in Analytics – EU Style

(You spent your vacation how?)

I spent most of the last week at the fourth annual Digital Analytics Hub Conference outside London, talking analytics. And talking. And talking. And while I love talking analytics, thank heavens I had a few opportunities to get away from the sound of my own voice and enjoy the rather more pleasing absence of sounds in the English countryside.


With X Change no more, the Hub is the best conference going these days in digital analytics (full disclosure – the guys who run it are old friends of mine). It’s an immensely enjoyable opportunity to talk in-depth with serious practitioners about everything from cutting edge analytics to digital transformation to traditional digital analytics concerns around marketing analytics. Some of the biggest, best and most interesting brands in Europe were there: from digital and bricks-and-mortar behemoths to cutting-edge digital pure-plays to a pretty good sampling of the biggest consultancies in and out of the digital world.

As has been true in previous visits, I found the overall state of digital analytics in Europe to be a bit behind the U.S. – especially in terms of team-size and perhaps in data integration. But the leading companies in Europe are as good as anybody.

Here’s a sampling from my conversations:

Machine Learning

I’ve been pushing my team to grow in the machine learning space using libraries like TensorFlow to explore deep learning and see if it has potential for digital. It hasn’t been simple or easy. I’m thinking that people who talk as if you can drop a digital data set into a deep learning system and have magic happen have either:

  1. Never tried it
  2. Been trying to sell it

We’ve been having a hard time getting deep learning systems to out-perform techniques like Random Forests. We have a lot of theories about why that is, including problem selection, certain challenges with our data sets, and the ways we’ve chosen to structure our input. I had some great discussions with hardcore data scientists (and some very bright hacker analysts more in my mold) that gave me some fresh ideas. That’s lucky because I’m presenting some of this work at the upcoming eMetrics in Chicago and I want to have more impressive results to share. I’ve long insisted on the importance of structure to digital analytics and deep learning systems should be able to do a better job parsing that structure into the analysis than tools like random forests. So I’m still hopeful/semi-confident I can get better results.

In broader group discussion, one of the most controversial and interesting discussions focused on the pros-and-cons of black-box learning systems. I was a little surprised that most of the data scientist types were fairly negative on black-box techniques. I have my reservations about them and I see that organizations are often deeply distrustful of analytic results that can’t be transparently explained or which are hidden by a vendor. I get that. But opacity and performance aren’t incompatible. Just try to get an explanation of Google’s AlphaGo! If you can test a system carefully, how important is model transparency?

So what are my reservations? I’m less concerned about the black-boxness of a technique than I am its completeness. When it comes to things like recommendation engines, I think enterprise analysts should be able to consistently beat a turnkey blackbox (or not blackbox) system with appropriate local customization of the inputs and model. But I harbor no bias here. From my perspective it’s useful but not critical to understand the insides of a model provided we’ve been careful testing to make sure that it actually works!

Another huge discussion topic and one that I more in accord with was around the importance of not over-focusing on a single technique. Not only are there many varieties of machine learning – each with some advantages to specific problem types – but there are powerful analytic techniques outside the sphere of machine learning that are used in other disciplines and are completely untried in digital analytics. We have so much to learn and I only wish I had more time with a couple of the folks there to…talk!

New Technology

One of the innovations this year at the Hub was a New Technology Showcase. The showcase was kind of like spending a day with a Silicon Valley VC and getting presentations from the technology companies in their portfolio (which is a darn interesting way to spend a day). I didn’t know most of the companies that presented but there were a couple (Piwik and Snowplow) I’ve heard of. Snowplow, in particular, is a company that’s worth checking out. The Snowplow proposition is pretty simple. Digital data collection should be de-coupled from analysis. You’ve heard that before, right? It’s called Tag Management. But that’s not what Snowplow has in mind at all. They built a very sophisticated open-source data collection stack that’s highly performant and feeds directly into the cloud. The basic collection strategy is simple and modern. You send json objects that pass a schema reference along with the data. The schema references are versioned and updates are handled automatically for both backwardly compatible and incompatible updates. You can pass a full range of strongly-typed data and you can create cross-object contexts for things like visitors. Snowplow has built a whole bunch of simple templates to make it easier for folks used to traditional tagging to create the necessary calls. But you can pass anything to Snowplow – not just Web data. It’s very adaptable for mobile (far more so than traditional digital analytics systems) and really for any kind of data at all. Snowplow supports both real-time and batch – it’s a true lambda architecture. It seems to do a huge amount of the heavy lifting for you when it comes to creating a  modern cloud-based data collection system. And did I mention it’s open-source? Free is a pretty good price. If you’re looking for an independent data collection architecture and are okay with the cloud, you really should give it a look.

Cloud vs. On-Premise

DA Hub’s keynote featured a panel with analytics leaders from companies like Intel, ASOS and the Financial Times. Every participant was running analytics in the cloud (with both AWS and Azure represented though AWS had an unsurprising majority). Except for barriers around InfoSec, it’s unclear to me why ANY company wouldn’t be in the cloud for their analytics.

Rolling your own Technology

We are not sheep
We are not sheep

Here in the States, there’s been widespread adoption of open-source data technologies (Hadoop/Spark) to process and analyze digital data. But while I do see companies that have completely abandoned traditional SaaS analytics tools, it’s pretty rare. Mostly, the companies I see run both a SaaS solution to collect data and (perhaps) satisfy basic reporting needs as well as an open-source data platform. There was more interest in the people I talked to in the EU about a complete swap out including data collection and reporting. I even talked to folks who roll most of the visualization stack themselves with open-source solutions like D3. There are places where D3 is appropriate (you need complete customization of the surrounding interface, for example, or you need widespread but very inexpensive distribution), but I’m very far from convinced that rolling your own visualization solutions with open-source is the way to go. I would have said that same thing about data collection but…see above.

Digital Transformation

I had an exhilarating discussion group centered around digital transformation. There were a ton of heavy hitters in the room – huge enterprises deep into projects of digital transformation, major consultancies, and some legendary industry vets. It was one of the most enjoyable conference experiences I’ve ever had. I swear that we (most of us anyway) could have gone on another 2 hours or more – since we just scratched the surface of the problems. My plan for the session was to cover what defines excellence in digital (what do you have to be able to do digital well), then tackle how a large-enterprise that wants to transform in digital needs to organize itself. Finally, I wanted to cover the change management and process necessary to get from here to there. If you’re reading this post that should sound familiar!

It’s a long path

Well, we didn’t get to the third item and we didn’t finish the second. That’s no disgrace. These are big topics. But the discussion helped clarify my thinking – especially around organization and the very real challenges in scaling a startup model into something that works for a large enterprise. Much of the blending of teams and capabilities that I’ve been recommending in these posts on digital transformation are lessons I’ve gleaned from seeing digital pure-plays and how they work. But I’ve always been uncomfortably aware that the process of scaling into larger teams creates issues around corporate communications, reporting structures, and career paths that I’m not even close to solving. Not only did this discussion clarify and advance my thinking on the topic, I’m fairly confident that it was of equal service to everyone else. I really wish that same group could have spent the whole day together. A big THANKS to everyone there, you were fantastic!

I plan to write more on this in a subsequent post. And I may drop another post on Hub learnings after I peruse my notes. I’ve only hit on the big stuff – and there were a lot of smaller takeaways worth noting.

See you there!
See you there!

As I mentioned in my last post, the guys who run DA Hub are bringing it to Monterey, CA (first time in the U.S.) this September. Do check it out. It’s worth the trip (and the venue is  pretty special). I think I’m on the hook to reprise that session on digital transformation. And yes, that scares me…you don’t often catch lightning in a bottle twice.

Space 2.0

The New Frontier of Commercial Satellite Imagery for Business

One of my last speaking gigs of the spring season was, for me, both the least typical and one of the most interesting. Space 2.0 was a brief glimpse into a world that is both exotic and fascinating. It’s a gathering of high-tech, high-science companies driving commercialization of space.

Great stuff, but what the heck did they want with me?

Well, one of the many new frontiers in the space industry is the commercialization of geo-spatial data. For years now, the primary consumer of satellite data has been the government. But the uses for satellite imagery are hardly limited to intel and defense. For the array of Space startups and aggressive tech companies, intel and defense are relatively mature markets – slow moving and difficult to crack if you’re not an established player. You ever tried selling to the government? It’s not easy.

So the big opportunity is finding ways to open up the information potential in geo-spatial data and satellite imagery to the commercial marketplace. Now I may not know HyperSpectral from IR but I do see a lot of the challenges that companies face both provisioning and using big data. So I guess I was their doom-and-gloom guy – in my usual role of explaining why everything always turns out to be harder than we expect when it comes to using or selling big data.

For me, though, attending Space 2.0 was more about learning that educating. I’ve never had an opportunity to really delve into this kind of data and hearing (and seeing) some of what is available is fascinating.

Let’s start with what’s available (and keep in mind you’re not hearing an expert view here – just a fanboy with a day’s exposure). Most commercial capture is visual (other bands are available and used primarily for environmental and weather related research). Reliance on visual spectrum has implications that are probably second-nature to folks in the industry but take some thought if you’re outside it. Once speaker described their industry as “outside” and “daytime” focused. It’s also very weather dependent. Europe, with its abundant cloudiness, is much more challenging than the much of the U.S. (though I suppose Portland and Seattle must be no picnic).

Images are either panchromatic (black and white), multi-spectral (like the RGB we’re used to but with an IR band as well and sometimes additional bands) or hyperspectral (lots of narrow bands on the spectrum). Perhaps even more important than color, though, is resolution. As you’d probably expect, black and white images tend to have the highest resolution – down to something like a 30-40cm square. Color and multi-band images might be more in the meter range but the newest generation take the resolution down to the 40-50cm range in full color. That’s pretty fine grained.

How fine-grained? Well, with a top-down 40cm square per pixel it’s not terribly useful for things like people. But here’s an example that one of the speakers gave in how they are using the data. They pick selected restaurant locations (Chipotle was the example) and count cars in the parking lot during the day. They then compare this data to previous periods to create estimates of how the location is doing. They can also compare competitor locations (e.g. Panera) to see if the trends are brand specific or consistent.

Now, if you’re Chipotle, this data isn’t all that interesting. There are easy ways to measure your business than trying to count cars in satellite images. But if you’re a Fund Manager looking to buy or sell Chipotle stock in advance of earnings reports, this type of intelligence is extremely valuable. You have hard-data on how a restaurant or store is performing before everyone else. That’s the type of data that traders live for.

Of course, that’s not the only way to get that information. You may have heard about the recent FourSquare prediction targeted to exactly the same problem. Foursquare was able to predict Chipotle’s sales decline almost to the percentage point. As one of the day’s panelist’s remarked, there are always other options and the key to market success is being cheaper, faster, easier, and more accurate than alternative mechanisms.

You can see how using Foursquare data for this kind of problem might be better than commercial satellite. You don’t have weather limitations, the data is easier to process, it covers walk-in and auto traffic, and it covers a 24hr time band. But you can also see plenty of situations where satellite imagery might have advantages too. After all, it’s easily available, relatively inexpensive, has no sampling bias, has deep historical data and is global in reach.

So how easy is satellite data to use?

I think the answer is a big “it depends”. This is, first of all, big data. Those multi and hyper band images at hi-res are really, really big. And while the providers have made it quite easy to find what you want and get it, it didn’t seem to me that they had done much to solve the real big data analytics problem.

I’ve described what I think the real big data problem is before (you can check out this video if you want a big data primer). Big data analytics is hard because it requires finding patterns in the data and our traditional analytics tools aren’t good at that. This need for pattern recognition is true in my particular field (digital analytics), but it’s even more obviously true when it comes to big data applications like facial recognition, image processing, and text analytics.

On the plus side, unlike digital analytics, the need for image (and linguistic) processing is well understood and relatively well-developed. There are a lot of tools and libraries you can use to make the job easier. It’s also a space where deep-learning has been consistently successful so that libraries from companies like Microsoft and Google are available that provide high-quality deep-learning tools – often tailor made for processing image data – for free.

It’s still not easy. What’s more, the way you process these images is highly likely to be dependent on your business application. Counting cars is different than understanding crop growth which is different than understanding storm damage. My guess is that market providers of this data are going to have to develop very industry-specific solutions if they want to make the data reasonably usable.

That doesn’t necessarily mean that they’ll have to provide full on applications. The critical enabler is providing the ability to extract the business-specific patterns in the data – things like identifying cars. In effect, solving the hard part of the pattern recognition problem so that end-users can focus on solving the business interpretation problem.

Being at Space 2.0 reminded me a lot of going to a big data conference. There’s a lot of technologies (some of them amazingly cool) in search of killer business applications. In this industry, particularly, the companies are incredibly sophisticated technically. And it’s not that there aren’t real applications. Intelligence, environment and agriculture are mature and profitable markets with extensive use of commercial satellite imagery. The golden goose, though, is opening up new opportunities in other areas. Do those opportunities exist? I’m sure they do. For most of us, though, we aren’t thinking satellite imagery to solve our problems. And if we do think satellite, we’re likely intimidated by difficulty of solving the big data problem inherent in getting value from the imagery for almost any new business application.

That’s why, as I described it to the audience there, I suspect that progress with the use and adoption of commercial satellite imagery will seem quite fast to those of us on the outside – but agonizingly slow to the people in the industry.

Matching (and Scoring) Content to Culture and Predicting the Academy Awards

Thoughts and Reflections on the Process

We’ve spent our spare time in the last six weeks participating in the 538 Academy Awards Prediction Challenge. On Sunday, we’ll find out how we did. But even though we expect to crash and burn on the acting awards and are probably no better than 1-3 in a very close movie race, we ended up quite satisfied with our unique process and the model that emerged. You can get full and deep description of our culture matching model with it’s combination of linguistic analysis and machine learning in this previous post.

What I love about projects like this is that they give people a glimpse into how analytics actually works. Analysis doesn’t get made at all the way people think and in most cases there is far more human intuition and direction than people realize or that anyone reading screeds on big data and predictive analytics would believe. Our culture-matching analysis pushes the envelope more than most we do in the for-pay world, so it’s probably an exaggerated case. But think about the places where this analysis relied on human judgment:

  1. Deciding on the overall approach: Obviously, the approach was pretty much created whole-cloth. What’s more, we lacked any data to show that culture matching might be an effective technique for predicting the Oscars. We may have used some machine learning, but this approach didn’t and wouldn’t have come from throwing a lot of data into a machine learning system.
  2. Choosing potentially relevant corpora for Hollywood and each movie: This process was wholly subjective in the initial selection of possible corpora, was partly driven by practical concerns (ease of access to archival stories), and was largely subjective in the analyst review stage. In addition to selecting our sources, we further rejected categories like “local”, “crime” and “sports”. Might we have chosen otherwise? Certainly. In some cases, we tuned the corpora by running the full analysis and judging whether the themes were interesting. That may be circular, but it’s not wrong. Nearly every complex analysis has elements of circularity.
  3. Tuning themes: Our corpora had both obvious and subtle biases. To get crisp themes, we had to eliminate words we thought were too common or were used in different senses. I’m pretty confident we missed lots of these. I hope we caught most. Maybe we eliminated something important. Likely, we’ll never know.
  4. Choosing our model: If you only do 1 model, you don’t have this issue. But when you have multiple models it’s not always easy to tell which one is better. With more time and more data, we could try each approach against past years. But lots of analytic techniques don’t even generate predictions (clustering, for example). The analyst has to decide which clustering scheme looks better, and the answer isn’t always obvious. Even within a single approach (text analytics/linguistics), we generated two predictions based on which direction we used to match themes. Which one was better? That was a topic of considerable internal debate with no “right” answer except to test against the real-world (which in this case will be a very long test).
  5. Deciding on Black-Box Validity: This one is surprisingly hard. When you have a black-box system, you generally rely on being able to measure it’s predictions against a set of fairly well known decisions before you apply it to the real-world. We didn’t have that and it was HARD to decide how and whether our brute force machine-learning system was working at all. But even in cases where external measurement comparisons exist, it’s the unexpected predictions that cause political problems with analytics adoption. If you’ve ever tried to convince a skeptical organization that a black-box result is right, you know how hard this.
  6. Explaining the model: There’s an old saying in philosophy (from James) that a difference that makes no difference is no difference. If a model has an interesting result but nobody believes it, does it matter? A big part of how interesting, important and valid we think a model is comes from how well it’s explained.

This long litany is why, in the end, the quality of your analysis is always about the quality of your people. We had access to some great tools (Sysomos, Boilerpipe, Java, SPSS, R and Crimson Hexagon), but interesting approaches and interesting results don’t come from tools.

That being said, I can’t resist special call-outs to Boilerpipe which did a really nice job of text extraction and SPSS Text Analytics which did a great job facilitating our thematic analysis and matching.


Thoughts on the Method and Results

So is culture matching a good way to predict the Oscars?

It might be a useful variable but I’m sure it’s not a complete prediction system. That’s really no different that we hoped going into this exercise. And we’ll learn a little (but not much) more on Awards night. It would be better if we got the full vote to see how close our rank ordering was.

Either way, the culture-matching approach is promising as a technique. Looking through the results, I’m confident that it passes the analyst sniff test – there’s something real here. There are a number of extensions to the system we haven’t (and probably won’t) try – at least for this little challenge. We’d like to incorporate sentiment around themes, not just matching. We generated a number of analyst-driven cultural dimensions for machine training that we haven’t used. We’d like to try some different machine-learning techniques that might be better suited to our source material. There is a great deal of taxonomic tuning around themes that might drive better results. It’s rare that an ambitious analytics project is every really finished, though the world often says otherwise.

In this case, I was pleased with the themes we were able to extract by movie. A little less with the themes in our Hollywood corpus. Why? I suspect because long-form movie reviews are unusually rich in elaborating the types of cultural themes we were interested in. In addition, a lot of the themes that we pulled out of the culture corpus are topical. It’s (kind of) interesting to know that terrorism or the presidential campaign were hot topics this last year, but that isn’t the type of theme we’re looking for. I’m particularly interested in whether and how successful we can be in deepening themes beyond the obvious one. Themes around race, inequality and wealth are fairly easy to pick out. But if the Martian scores poorly because Hollywood isn’t much about engineering and science (and I’m pretty sure that’s true), what about its human themes around exploration, courage and loneliness? Those topics emerged as key themes from the movie reviews, but they are hard to discover in the Hollywood corpus. That might be because they aren’t very important in the culture – that’s certainly plausible – but it also seems possible that our analysis wasn’t rich enough to find their implicit representations.

Regardless, I’m happy with the outcome. It seems clear to me that this type of culture matching can be successful and brings analytic rigor to a topic that is otherwise mostly hot-air. What’s more it can be successful in a reasonable timeframe and for a reasonable amount of money (which is critical for non-academic use-cases). From start to finish, we spent about four weeks on this problem – and while we had a large team, it was all part-timers.

This was definitely a problem to fall in love with and we’d kill to do more, expand the method, and prove it out on more substantial and testable data. If you have a potential use for culture matching, give us a call. We probably can’t do it for free, but we will do if for less than cost. And, of course, if you just need an incredible team of analysts who can dream up a creative solution to a hard, real-world problem, pull data from almost anything, bring to bear world-class tools across traditional stats, machine-learning and text analytics, and deliver interesting and useful results…well, that’s fine too.


Torture is Bad – Don’t Waterboard your Models even when you know they are Wrong

Predicting the Best Actor and Actress Categories

My Analytics Counseling Family here at EY has been participating in the 538 Academy Award Challenge. Our project involved creating a culture-matching engine – a way to look at pieces of content (in this case, obviously, movies) and determine how well they match a specific community’s worldview. The hypothesis is that the more a movie matches the current Hollywood zeitgeist, the more likely it I to win. In my last post, I described in some detail the way we did that and our results for predicting the Best Movie (The Big Short). We were pretty happy with the way the model worked and the intuitive fit between the movies and our culture-matching engine. Of course, nothing in what we’ve done proves that culture matching is a great way to predict the Oscars (and even if we’re right it won’t prove much in a single year), but that wasn’t really the point. Culture-matching is a general technique with interesting analytics method and if the results are promising in terms of our ability to make a match, we think that’s pretty great.

The second part of our task, however, was to predict the Best Actor and Actress awards. Our method for doing this was similar to our method for predicting the best movie award but there were a few wrinkles. First, we extracted language specific to each character in the nominated movie. This is important to understand. We aren’t looking at how Hollywood talks about DiCaprio or Cranston or Lawrence as people and actors. We aren’t looking at how they are reviewed. We’re entirely focused on how their character is described.

This is the closest analogue we could think of to culture matching movies. However, this was a point of considerable debate internal to our team. To me, it seems intuitively less likely that people will prefer an actor or actress because their character matches our worldview than when discussing a movie as a whole. We all understood that and agreed that our approach was less compelling when it came to ANY of the secondary awards. However, our goal was to focus on culture-matching more than it was to find the best method for predicting acting awards. We could have predicted screenplay, I suppose, but there’s no reason to think the analysis would deviate in the slightest from our prediction around movie.

Once we had key themes around each nominated role, we matched those themes to our Hollywood corpus. In our first go round, we matched to the entire corpus matching actor themes to broad cultural themes. This didn’t work well. It turned out that we were conflating themes about people with themes about other things in ways that didn’t make much sense. So for our second pass, we tightened the themes in the Hollywood corpus to only those which were associated with people.

In essence, we’re saying which roles best correspond to the way Hollywood talks about people and picking the actor/actress who played that role.

So here’s how it came out:

1Bryan Cranston
2Michael Fassbender
3Leonardo DiCaprio
4Eddie Redmayne
5Matt Damon


1Jennifer Lawrence
2Brie Larson
3Cate Blanchett
4Saoirse Ronan
5Charlotte Rampling


Do I think we’re going to be right? Not a chance.

But that doesn’t mean the method isn’t working pretty well. In fact, I think it worked about as well as we could have hoped. Here, for example, are the themes we extracted for some of the key actors and actresses (by which I mean their nominated roles):

For Matt Damon in the Martian: Humor, Optimism, Engineer, Scientist, and leadership.

For Leonardo DiCaprio in the Revenant: Survival, Endurance, Tragedy, Individual, Unrelenting, Warrior, Physicality

For Bryan Cranston in Trumbo: Idealist, humanity, drinking, liberal, civil rights

If you’ve seen these movies, I think you can agree that the thematic pulls are reasonable. And is it any surprise, as you read the list, that Cranston is our predicted winner? I think not. To me, this says more about whether our method is applicable to this kind of prediction – and the answer is probably not – than whether the method itself is working well. Take away what we know about the actors and the process, and I think you’d probably agree that the model has done the best possible job of culture matching to Hollywood.

I was a bit concerned about the Jennifer Lawrence prediction. I saw the logic of Cranston’s character immediately, but Joy didn’t immediately strike me as an obvious fit to Hollywood’s view of people. When I studied the themes that emerged around her character, though, I thought it made reasonable sense:

Lawrence in Joy: Forceful, personality, imagination, friendship, heroine

WDYT? There are other themes I might have expected to emerge that didn’t, but these seem like a fairly decent set and you can see where something like forceful, in particular, might match well (it did).

In the end, it didn’t make me think the model was broken.

We tried tuning these models, but while different predictions can be forced from the model, nothing we did convinced us that, when it came to culture matching, we’d really improved our result. When you start torturing your model to get the conclusions you think are right, it’s probably time to stop.

It’s all about understanding two critical items: what your model is for and whether or not you think the prediction could be better. In this case, we never expected our model to be able to predict the Academy Awards exactly. If we understand why our prediction isn’t aligned to likely outcomes, that may well be good enough. And, of course, even the best model won’t predict most events with anything like 100% accuracy. If you try too hard to fit your model to the data or – even worse – to your expectations, you remove the value of having a model in the first place.

Just like in the real world, with enough pain you can make your model say anything. That doesn’t make it reliable.

So we’re going down with this particular ship!


Machine Learning

We’ve been experimenting with a second method that focuses on machine learning. Essentially, we’re training a machine learning system with reviews about each movie and then categorizing the Hollywood corpus and seeing which movie gets the most hits. Unfortunately, real work has gotten in the way of some our brute-force machine learning work and we haven’t progressed as much on this as we hoped.

To date, it hasn’t done a great job. Well, that’s being kind. Really it kind of sucks. Our results look pretty random and where we’ve been able to understand the non-random results, they haven’t captured real themes but only passing similarities (like a tendency to mention New York). With all due respect to Ted Cruz, we don’t think that’s a good enough cultural theme to hang our hat on.

As of right now, our best conclusion is that the method doesn’t work well.

We probably won’t have time to push this work further, but right now I’d say that if I was doing this work again I’d concentrate on the linguistic approach. I think our documents were too long and complex and our themes too abstract to work well with the machine learning systems we were using.

In my next post, I have some reflections on the process and it what it tells us about how analytics works.

Bet your Shirt on The Big Short

Early Results

We’re still tweaking the machine learning system and the best actor and actress categories. But our text/linguistic culture-matching model produced the following rank ordering for the best picture category:


So if you don’t know, now you know…The Big Short wins it.

Incidentally, we also scored movies that had best actor/actress nominees (since they were in our corpus). Big Short still won, but some of those movies (such as Trumbo) scored very well. You can read that anyway you like – it might indicate that the best actor and actress nominations are heavily influenced by how much voters liked the type of movie (which is certainly plausible) or it might indicate that our model is a pretty bad predictor since those movies didn’t even garner nominations. And, of course, given our sample size, it probably means nothing at all.

I think the list makes intuitive sense – which is always something of a relief when you’ve gone the long way around with a methodology. I particularly think the bottom of the list makes sense with The Martian and Mad Max. Both movies feel well outside any current Hollywood zeitgeist (except maybe the largely silent super-model refugees in MMFR). If a system can pick the losers, perhaps it can pick the winners as well. But more important to me, it suggests that our method is doing a credible job of culture matching.

With a few more weeks, we’ll probably take a closer look at some of the classifications and see if there are any biasing words/themes that are distorting the results. This stuff is hard and all too easy to get wrong – especially in your spare time. We’ll also have results from the black-box machine learning system, though we’re not confident about it, as well as what I hope will be interesting results for the actor/actress category. We’ve never believed that the method is as applicable to that problem (predicting acting awards) but we’re fairly satisfied with the initial themes that emerged from each actor/actress so we’re a little more optimistic that we’ll have an interesting solution.

Stay tuned…