Tag Archives: snowplow

The State of the Art in Analytics – EU Style

(You spent your vacation how?)

I spent most of the last week at the fourth annual Digital Analytics Hub Conference outside London, talking analytics. And talking. And talking. And while I love talking analytics, thank heavens I had a few opportunities to get away from the sound of my own voice and enjoy the rather more pleasing absence of sounds in the English countryside.


With X Change no more, the Hub is the best conference going these days in digital analytics (full disclosure – the guys who run it are old friends of mine). It’s an immensely enjoyable opportunity to talk in-depth with serious practitioners about everything from cutting edge analytics to digital transformation to traditional digital analytics concerns around marketing analytics. Some of the biggest, best and most interesting brands in Europe were there: from digital and bricks-and-mortar behemoths to cutting-edge digital pure-plays to a pretty good sampling of the biggest consultancies in and out of the digital world.

As has been true in previous visits, I found the overall state of digital analytics in Europe to be a bit behind the U.S. – especially in terms of team-size and perhaps in data integration. But the leading companies in Europe are as good as anybody.

Here’s a sampling from my conversations:

Machine Learning

I’ve been pushing my team to grow in the machine learning space using libraries like TensorFlow to explore deep learning and see if it has potential for digital. It hasn’t been simple or easy. I’m thinking that people who talk as if you can drop a digital data set into a deep learning system and have magic happen have either:

  1. Never tried it
  2. Been trying to sell it

We’ve been having a hard time getting deep learning systems to out-perform techniques like Random Forests. We have a lot of theories about why that is, including problem selection, certain challenges with our data sets, and the ways we’ve chosen to structure our input. I had some great discussions with hardcore data scientists (and some very bright hacker analysts more in my mold) that gave me some fresh ideas. That’s lucky because I’m presenting some of this work at the upcoming eMetrics in Chicago and I want to have more impressive results to share. I’ve long insisted on the importance of structure to digital analytics and deep learning systems should be able to do a better job parsing that structure into the analysis than tools like random forests. So I’m still hopeful/semi-confident I can get better results.

In broader group discussion, one of the most controversial and interesting discussions focused on the pros-and-cons of black-box learning systems. I was a little surprised that most of the data scientist types were fairly negative on black-box techniques. I have my reservations about them and I see that organizations are often deeply distrustful of analytic results that can’t be transparently explained or which are hidden by a vendor. I get that. But opacity and performance aren’t incompatible. Just try to get an explanation of Google’s AlphaGo! If you can test a system carefully, how important is model transparency?

So what are my reservations? I’m less concerned about the black-boxness of a technique than I am its completeness. When it comes to things like recommendation engines, I think enterprise analysts should be able to consistently beat a turnkey blackbox (or not blackbox) system with appropriate local customization of the inputs and model. But I harbor no bias here. From my perspective it’s useful but not critical to understand the insides of a model provided we’ve been careful testing to make sure that it actually works!

Another huge discussion topic and one that I more in accord with was around the importance of not over-focusing on a single technique. Not only are there many varieties of machine learning – each with some advantages to specific problem types – but there are powerful analytic techniques outside the sphere of machine learning that are used in other disciplines and are completely untried in digital analytics. We have so much to learn and I only wish I had more time with a couple of the folks there to…talk!

New Technology

One of the innovations this year at the Hub was a New Technology Showcase. The showcase was kind of like spending a day with a Silicon Valley VC and getting presentations from the technology companies in their portfolio (which is a darn interesting way to spend a day). I didn’t know most of the companies that presented but there were a couple (Piwik and Snowplow) I’ve heard of. Snowplow, in particular, is a company that’s worth checking out. The Snowplow proposition is pretty simple. Digital data collection should be de-coupled from analysis. You’ve heard that before, right? It’s called Tag Management. But that’s not what Snowplow has in mind at all. They built a very sophisticated open-source data collection stack that’s highly performant and feeds directly into the cloud. The basic collection strategy is simple and modern. You send json objects that pass a schema reference along with the data. The schema references are versioned and updates are handled automatically for both backwardly compatible and incompatible updates. You can pass a full range of strongly-typed data and you can create cross-object contexts for things like visitors. Snowplow has built a whole bunch of simple templates to make it easier for folks used to traditional tagging to create the necessary calls. But you can pass anything to Snowplow – not just Web data. It’s very adaptable for mobile (far more so than traditional digital analytics systems) and really for any kind of data at all. Snowplow supports both real-time and batch – it’s a true lambda architecture. It seems to do a huge amount of the heavy lifting for you when it comes to creating a  modern cloud-based data collection system. And did I mention it’s open-source? Free is a pretty good price. If you’re looking for an independent data collection architecture and are okay with the cloud, you really should give it a look.

Cloud vs. On-Premise

DA Hub’s keynote featured a panel with analytics leaders from companies like Intel, ASOS and the Financial Times. Every participant was running analytics in the cloud (with both AWS and Azure represented though AWS had an unsurprising majority). Except for barriers around InfoSec, it’s unclear to me why ANY company wouldn’t be in the cloud for their analytics.

Rolling your own Technology

We are not sheep
We are not sheep

Here in the States, there’s been widespread adoption of open-source data technologies (Hadoop/Spark) to process and analyze digital data. But while I do see companies that have completely abandoned traditional SaaS analytics tools, it’s pretty rare. Mostly, the companies I see run both a SaaS solution to collect data and (perhaps) satisfy basic reporting needs as well as an open-source data platform. There was more interest in the people I talked to in the EU about a complete swap out including data collection and reporting. I even talked to folks who roll most of the visualization stack themselves with open-source solutions like D3. There are places where D3 is appropriate (you need complete customization of the surrounding interface, for example, or you need widespread but very inexpensive distribution), but I’m very far from convinced that rolling your own visualization solutions with open-source is the way to go. I would have said that same thing about data collection but…see above.

Digital Transformation

I had an exhilarating discussion group centered around digital transformation. There were a ton of heavy hitters in the room – huge enterprises deep into projects of digital transformation, major consultancies, and some legendary industry vets. It was one of the most enjoyable conference experiences I’ve ever had. I swear that we (most of us anyway) could have gone on another 2 hours or more – since we just scratched the surface of the problems. My plan for the session was to cover what defines excellence in digital (what do you have to be able to do digital well), then tackle how a large-enterprise that wants to transform in digital needs to organize itself. Finally, I wanted to cover the change management and process necessary to get from here to there. If you’re reading this post that should sound familiar!

It’s a long path

Well, we didn’t get to the third item and we didn’t finish the second. That’s no disgrace. These are big topics. But the discussion helped clarify my thinking – especially around organization and the very real challenges in scaling a startup model into something that works for a large enterprise. Much of the blending of teams and capabilities that I’ve been recommending in these posts on digital transformation are lessons I’ve gleaned from seeing digital pure-plays and how they work. But I’ve always been uncomfortably aware that the process of scaling into larger teams creates issues around corporate communications, reporting structures, and career paths that I’m not even close to solving. Not only did this discussion clarify and advance my thinking on the topic, I’m fairly confident that it was of equal service to everyone else. I really wish that same group could have spent the whole day together. A big THANKS to everyone there, you were fantastic!

I plan to write more on this in a subsequent post. And I may drop another post on Hub learnings after I peruse my notes. I’ve only hit on the big stuff – and there were a lot of smaller takeaways worth noting.

See you there!
See you there!

As I mentioned in my last post, the guys who run DA Hub are bringing it to Monterey, CA (first time in the U.S.) this September. Do check it out. It’s worth the trip (and the venue is  pretty special). I think I’m on the hook to reprise that session on digital transformation. And yes, that scares me…you don’t often catch lightning in a bottle twice.