Tag Archives: SPEED

What is Data Science and (closely related) what is a Data Scientist?

I came across an interesting read recently on the definition of both data scientist and data science. Now, even though I’m about to disagree with almost everything in the article, that doesn’t mean I think it’s wrong-headed or not worth a read. It’s a fairly conventional, industry standard view of the world and provides a common-sense and reasonable set of definitions for both data scientist and data science. I’d encourage you take a look if you’re interested in this type of question.

Meanwhile, if you’re willing to rely on my summary, here’s what I take to be the gist of the article:

  1. Data Science is about finding insights in data to make better decisions
  2. Data Scientists bring to bear three primary skills: subject matter expertise, programming and data manipulation skills, and statistical knowledge to find those insights.
  3. Using survey techniques and asking data professionals to classify their skills, there are four major styles of data scientist. Three styles (business management professionals, developers, and researchers) map directly to the three key skills elaborated above (subject matter expertise, programming and statistics). Then there’s a fourth category appropriately titled “Creatives” who aren’t good at any of these skills…okay I jest…perhaps it’s more fair to say they are balanced fairly equally across the skill sets.
  4. Popular analytics methods (SMART and CRISP-DM) are essentially no more than variants of the “Scientific Method” and, when you get right down to it, data science is nothing more (or less since the diminutive is not meant to imply anything) than the application of that method to whatever problem a data professional is trying to solve. In other words, and here I quote directly, “data science just is science”.
  5. Science works via the “Scientific Method” described as:
    1. Formulate a question or problem statement
    2. Generate a hypothesis that is testable
    3. Gather/Generate data
    4. Analyze data to test the hypotheses / Draw conclusions
    5. Communicate results to interested parties or take action

That’s it. And you’re probably wondering how or why I would disagree with any of this since it’s pretty innocuous stuff. Yes, I’ve written in the past about my suspicions around the whole ‘data science’ term – though heaven knows I use it myself since the market seems to reward it. Taken as it generally is, it’s either a cunning replacement for the label statistician (since we all “know” statisticians aren’t much use when it comes to driving business value) or a demand that analysts should have “full-stack” skills. I don’t necessarily buy the idea that full-stack skills are critical or that there’s a huge benefit in combining them in a single person instead of spreading them across a team, but it’s not something I lose sleep over.

What’s more, once you start flavoring data scientists based on their real proficiencies inside that three-part set, you’re really just back to having analysts (the subject matter expertise folks), programmers, and statisticians. The same people you always had except now they call themselves data scientists and charge you quite a bit more for doing the same stuff they’ve always done. Since I’m one of those people, I not deeply opposed to the whole trend. Here’s a way to think about all this that I think is a little more useful.

None of which is really worth bothering to disagree about though. It’s semantics of a fairly uninteresting sort.

No, what really bothers me about this conventional view is encapsulated in the last two claims:  #4 and #5. The idea that data science is science and that the scientific method is applicable to business analytics. I’m not at all sure that business analytics is or should aspire to be science and I’m quite sure that the scientific method won’t save us.

On the other hand, I agree with the first part of the claim in #4. Namely, that methodologies like CRISP-DM are just faintly warmed over versions of the scientific method.

Despite what most people would assume, that’s not a good thing and here I’m going to go all “philosophy guy” on you to explain why, and also why I think this is actually a pretty important point.

 

Debunking the Scientific Method

In the past five hundred years, the dominant theme in Western culture has been the continuing and astonishing success of the scientific endeavor. Only the most hardened skeptic could doubt the importance and success of scientific disciplines like physics, chemistry and biology in dramatically improving our understanding of the natural world. When it comes to the success of the scientific endeavor, I’m not skeptical at all. It’s worked and it’s worked amazingly well.

But why is that?

The popular conception is that science works because scientists apply the scientific method – testing theories experimentally and proving or refuting them. It’s the five step process enumerated above.

And it just isn’t right. Since way back in the day when I was studying philosophy of science, there’s been a broad consensus that the “scientific method” is a deeply flawed account of the scientific endeavor. Karl Popper provided the best and most influential account of the traditional scientific method and the importance of refutation as opposed to proof. Thomas Kuhn pretty much debunked that explanation as an historical account of how science actually works (despite having his own deeply unsuccessful explanation) and Quine absolutely destroyed it as an intellectual model. It turns out that it’s basically impossible to refute a single hypothesis in isolation with an experiment. Quine actually influenced my thinking on why KPIs, taken in isolation, are always useless. Depending on the background assumptions, any change of a KPI (and in any direction) can have diametrically opposed meanings. It’s pretty much the same thing with a hypothesis. You can rescue any hypothesis from experimental refutation by changing the background assumptions. What’s more, Kuhn showed that this happens all the time in science – punctuated by dramatic cases where it doesn’t.

I doubt there is a single working historian or philosopher of science who would accept the “scientific method” as a reasonable explanation for how science works from either an historical or intellectual perspective.

What’s more, the scientific method as popularly elaborated is almost contentless. Strip away the fancy language and it translates into something like this:

  1. Decide what problem you want to solve
  2. Think about the problem until you have an idea of how it might be solved
  3. Try it out and see if it works
  4. Repeat until you solve the problem

Does this feel action guiding and powerful?

It feels to me like the sort of thing you might sell on late-night TV. Available now, limited time only – a one stop absolutely foolproof method for solving any problem of any sort in any field! The Scientific Method! Buy!

The only part of the scientific method that feels significant in any respect is that requirement that your idea should be capable of specific refutation (testable) via experiment. Sadly, that’s exactly the concept that Quine showed to be impossible. So the scientific method as popularly understood is pretty much a bunch of boilerplate with one mistaken idea bolted on.

The idea that this type of general problem solving procedure is the explanation for the success of science seems implausible on its face and is contradicted by experience.

Implausible because the method as described is so contentless. How do I pick which problems to tackle from the infinite set available? The method is silent. How do I generate hypothesis? The method is silent. How do I know they are testable? The method is silent. How do I test them? The method is silent. How do I know what to do when a test doesn’t refute a hypothesis? The method is silent. How many failures to refute a hypothesis is enough to prove it? The method is silent. How do I communicate the results? The method is silent.

If what we want in a methodology is a massively generalized process that provides zero guidance on how to accomplish the tasks it lays out and has one impossible to meet demand, then the scientific method is great.

Hence the implausibility of the claim that the scientific method is a reasonable explanation for why science works. The scientific endeavor is neither defined, nor described, by the scientific method.

On a less important note, I’m not at all sure that it’s correct to think of data science as even potentially a scientific endeavor – at least when it comes to business analytics. The belief that the scientific endeavor works in general is broadly contradicted by experience – it doesn’t work for everything. Yes, the scientific endeavor has worked extraordinarily well in physics and biology. But smart people have tried to emulate the scientific approach in lots of other places too. Fields like history, sociology, philosophy and psychology (and lots of other disciplines as well) have all drunk the “scientific method” moonshine with a conspicuous absence of success. Clearly something about the scientific endeavor makes it very effective for some types of problems and not effective at all for others. That seems to me a pretty important fact to keep in mind when we claim that business analytics and data science are “just science”. It’s comforting to think we can re-cast business as science, but it’s not clear why we should think that’s true. I’ve never thought of business analytics as a truly scientific enterprise and renaming it data science doesn’t make it seem any more  likely to be so.

 

Why CRISP-DM and most other generalized analytics models are the scientific method…and LESS

Unfortunately, methods specific to analytics like CRISP-DM are worse not better. They lack even the idea of specific testability which, though incorrect, at least made some sense as a driver of a method. CRISP-DM lays out a process for analytics that essentially says it works like this: figure out what your problem is, figure out what data you need, setup your data, build your model, check your model, deploy your model.

Wow. That’s very helpful.

Here’s a CRISP-DM like method for becoming President of the United States.

  1. Decide which political party to join
  2. Register as a candidate for president
  3. Create lots of positive press about yourself and your positions
  4. Raise a lot of money
  5. Convince people to vote for you

Armed with a cutting-edge method like this, your path to power is assured. Donald Trump beware!

Really, how different is CRISP-DM from this? It adds a few little flourishes and some academic language but it lives at the same level of empty generality. I suppose it’s good to know that you deploy models only after you build them, but I’m thinking a formal methodology should give us a little more utility than that.

Methodologies like Six Sigma or SPEED (which I laid out last week and which is why this topic is much on my mind and seems important) provide something real and essential – they provide enough guidance to actually drive a process.

As a side note, I’d point out that successful methodologies are nearly always domain specific (SPEED is entirely specific to digital analytics and Six Sigma has been mostly successful in a very specific range of manufacturing production problems) for the simple reason that generality destroys utility when it comes to method.

 

So is Business Analytics a “Science”?

It’s a real question, then, whether business analytics can reasonably be considered a science and, in fact, it’s a much more ambitious claim than most people would realize (at least when it’s cloaked in the idea that data science is a science – after all, it says science right there in the title). I’m highly skeptical of the idea that data science is science because I’m highly skeptical that business analytics problems are scientific problems.

They don’t seem like it to me. Business analytics problems map very poorly indeed to the natural sciences and only very partially to the social sciences where the track record of the scientific endeavor is, to say the least, mixed.

So claiming that data science is about using the scientific method on data problems might seem like a “Mom and Apple Pie” kind of thing, but I think it’s wrong on two counts.

It’s wrong because business analytics problems are not obviously the types of problems that are scientific. I can’t say for sure that they aren’t – and I might be persuaded otherwise – but first glance I think there are strong reasons for skepticism and little reason to think that advocates of this view really understand what they are saying or have good reasons to back their claim.

It’s especially wrong because the scientific method as popularly understood is neither meaningful nor a method. This is important. In fact, this is the one really important thing you really should take away from this post. If you think hiring data scientists ensures you have a method (and not just a method but a “scientific” one), you’re going to be sadly disappointed. Data scientists don’t arrive at your doorstep complete with a real method for continuous improvement in digital.  It doesn’t matter how data sciencey they are. And if you believe that telling your analysts to use the “scientific method” is going to make your analytics more successful…well that, my friend, is even more absurd.

I have strong reasons for thinking that Six Sigma (for example) isn’t an appropriate methodology for digital analytics. But at least it’s a real method. Flawed as it is when applied to digital analytics, it’s rather more likely to drive results than the “scientific” method. And, of course, I have my own axe to grind. The methodology I described in SPEED is purpose-built for digital and is action-guiding. I’d love to have people adopt and use it. But even if you don’t like SPEED, the importance of having a real method and using that method to drive continuous improvement shouldn’t be discounted.

Go ahead, build your own. Just make sure it’s not of the “figure out your problem, then solve your problem, then iterate” variety; unless, of course, you want an analytics method to sell on late-night TV.

 

I promise there’s no (well…very little) philosophy in ‘Measuring the Digital World’ – but I do think there is some good method! It’s available for pre-order now on Amazon.

SPEED: A Process for Continuous Improvement in Digital

Everyone always wants to get better. But without a formal process to drive performance, continuous improvement is more likely to be an empty platitude than a reality in the enterprise. Building that formal process isn’t trivial. Existing methodologies like Six Sigma illustrate the depth and the advantages of a true improvement process versus an ad hoc “let’s get better” attitude, but those methodologies (largely birthed in manufacturing) aren’t directly applicable to digital. In my last post, I laid out six grounding principles that underlie continuous improvement in digital. I’ll summarize them here as:

  • Small is measurable. Big changes (like website redesigns) alter too much to make optimization practical
  • Controlled Experiments are essential to measure any complex change
  • Continuous improvement will broadly target reduction in friction or improvement in segmentation
  • Acquisition and Experience (Content) are inter-related and inter-dependent
  • Audience, use-case, prequalification and target content all drive marketing performance
  • Most content changes shift behavior rather than drive clear positive or negative outcomes

Having guiding principles isn’t the same thing as having a method, but a real methodology can be fashioned from this sub-structure that will drive true continuous improvement. A full methodology needs a way to identify the right areas to work on and a process for improving those areas. At minimum, that process should include techniques for figuring out what to change and for evaluating the direction and impact of those changes. If you have that, you can drive continuous improvement.

I’ll start where I always start: segmentation. Specifically, 2-tiered segmentation. 2-tiered segmentation is a uniquely digital approach to segmentation that slices audiences by who they are (traditional segmentation) and what they are trying to accomplish (this is the second tier) in the digital channel. This matrixed segmentation scheme is the perfect table-set for continuous improvement. In fact, I don’t think it’s possible to drive continuous improvement without this type of segmentation. Real digital improvement is always relative to an audience and a use-case.

But segmentation on its own isn’t a method for continuous improvement. 2-tiered segmentation gives us a powerful framework for understanding where and why improvement might be focused, but it doesn’t tell us where to target improvements or what those improvements might be. To have a real method, we need that.

Here’s where pre-qualification comes in. One of the core principles is that acquisition and experience are inter-related and inter-dependent. This means that if you want to understand whether or not content is working (creating lift of some kind), then you have to understand the pre-existing state of the audience that consumes that content. Content with a 100% success rate may suck. Content with a 0% success rate may be outstanding. It all depends on the population you give them. Every single person in line at the DMV will stay there to get their license. That doesn’t mean the experience is a good one. It just means that the self-selected audience is determined to finish the process. We need that license! Similarly, if you direct garbage traffic to even the best content, it won’t perform at all. Acquisition and content are deeply interdependent. It’s impossible to measure the latter without understanding the former.

Fortunately, there’s a simple technique for measuring the quality of the audience sourced for any given content area that we call pre-qualification. To understand the pre-qualification level of an audience at a given content point, we use a very short (typically nor more than 3-4 questions) pop-up survey. The pre-qualification survey explores what use-case visitors are in, where they are in the buying cycle, and how committed they are to the brand. That’s it.

It may be simple, but pre-qualification is one of the most powerful tools in the digital analytics arsenal and it’s the key to a successful continuous improvement methodology.

First we segment. Then we measure pre-qualification. With these two pieces we can measure content performance by visitor type, use-case and visitor quality. That’s enough to establish which content and which marketing campaigns are truly underperforming.

How?

Hold the population, use-case and pre-qualification level constant and measure the effectiveness of content pieces and sequences in creating successful outcomes. You can’t effectively measure content performance unless you hold these three variables constant, but when you control for these three variables you open up the power of digital analytics.

We now have a way to target potential improvement areas – just pick the content with the worst performance in each cell (visitor type x visit type x qualification level).

But there is much more that we can do with these essential pieces in place. By evaluating whether content underperforms across all pre-qualification levels equally or is much worse for less qualified visitors, you can determine if the content problem is because of friction (see guiding principle #3).

Friction problems tend to impact less qualified visitors disproportionately. So if less qualified visitors within each visitor type perform even worse than expected after consuming a piece of content, then some type of friction is likely the culprit.

Further, by evaluating content performance across visitor type (within use-case and with pre-qualification held constant), you have strong clues as to whether or not there are personalization opportunities to drive segmentation improvement.

Finally, where content performs well for qualified audiences but receives a disproportionate share of unqualified visitors, you know that you have to go upstream to fix the marketing campaigns sourcing the visits and targeting the content.

Segment. Pre-Qualify. Evaluate by qualification for friction and acquisition, and by visitor type for personalization.

Step four is to explore what to change. How do you do that? Often, the best method is to ask. This is yet another area for targeted VoC, where you can explore what content people are looking for, how they make decisions, what they need to know, and how that differs by segment. A rich series of choice/decision questions should create the necessary material to craft alternative approaches to test.

You can also break up the content into discrete chunks (each with a specific meta-data purpose or role) and then create a controlled experiment that tests which content chunks are most important and deliver the most lift. This is a sub-process for testing within the larger continuous improvement process. Analytically, it should also be possible to do a form of conjoint analysis on either behavior or preferences captured in VoC.

Segment. Pre-Qualify. Evaluate. Explore.

Now you’re ready to decide on the next round of tests and experiments based on a formal process for finding where problems are, why they exist, and how they can be tackled.

Segment, Pre-Qualify. Evaluate. Explore. Decide.

SPEED.

Sure, it’s just another consulting acronym. But underneath that acronym is real method. Not squishy and not contentless. It’s a formal procedure for identifying where problems exist, what class of problems they are, what type of solution might be a fit (friction reduction or personalization), and what that solution might consist of. All wrapped together in a process that can be endlessly repeated to drive measurable, discrete improvement for every type of visitor and every type of visit across any digital channel. It’s also specifically designed to be responsive to the guiding principles enumerated above that define digital.

If you’re looking for a real continuous improvement process in digital, there’s SPEED and then there’s…

Well, as far as I know, that’s pretty much it.

 

Interested in knowing more about 2-Tiered Segmentation and Pre-Qualification, the key ingredients to SPEED? “Measuring the Digital World” provides the most detailed descriptions I’ve ever written of how to do both and is now available for pre-order on Amazon.