Tag Archives: analytics methodology

Burning Down the House

Nowhere is the challenge of getting people to understand how to use data better illustrated than the methodology wars being fought in the discipline of Psychology. If you haven’t heard of the methodology wars be assured that the battlefields – studies in psychological research – are being fought over like blocks of Stalingrad; and like that famous battle, not much is left standing in the aftermath.

I’m not sure exactly how the methodology wars started. Somehow, somewhere, someone decided to actually re-test a “classic” study in psychology. A study that’s been accepted into the core of the discipline – that established somebody’s reputation, made somebody a career. Only it didn’t replicate. They re-did the experiment as carefully as they could and it didn’t show the same result. Didn’t, usually, show any result at all. Increase the sample size to fix the problem and the signal becomes even clearer. Alas, the signal always seems to be that there is no signal.

Pretty soon people started calling into question nearly every Psych study done over the last fifty years and testing them. And many – and I mean many – have failed.

Slate’s article (and it’s really good – giving a great overview of the issue – so give it a read) recounts the latest block to burn down in psychology’s methodology wars. The research in question centered around the idea that our facial states feedback into our emotions. If we smile (even inadvertently) we will feel happier.

It’s an interesting idea – intuitively plausible – and apparently widely supported by a huge variety of studies in the field. It’s an idea which strikes me as perfectly reasonable and in which I have zero vested interested one way or another.

But when it was submitted to rigorous simultaneous validation in a number of different labs, it failed. Completely.

The original test involved 32 participants and a change in average subjective scoring between the two groups of 4.4 to 5.5. That means each group had sixteen participants. The improved second test had 92 total participants and showed a scoring difference of 4.3 to 51.

That’s a pretty small sample.

Especially for something that became received wisdom. A classic.

So how would people react if it turned out to be wrong?

Well, the Slate article answers that question pretty definitively. Because when the multi-lab tests came back, here’s what happened. Seventeen different labs replicated the experiment with nearly 2000 subjects. In half the participating labs, participants who smiled recorded a slightly higher average on the resulting happiness test (but much lower than in the original experiment). In the other half, it went the other way.

Net, net, there was no correlation at all. Zero.

Okay, so far you have just another sad story of a small sample size failure.

That’s not what really attracted my attention. Nope. What really made me laugh in utter disbelief was the comment of the “scientist” who had done the original research. Here it is, and I quote in full lest you think I’m about to exaggerate:

“Fritz Strack has no regrets about the RRR, but then again, he doesn’t take its findings all that seriously. “I don’t see what we’ve learned,” he said.

Two years ago, while the replication of his work was underway, Strack wrote a takedown of the skeptics’ project with the social psychologist Wolfgang Stroebe. Their piece, called “The Alleged Crisis and the Illusion of Exact Replication,” argued that efforts like the RRR reflect an “epistemological misunderstanding,” since it’s impossible to make a perfect copy of an old experiment. People change, times change, and cultures change, they said. No social psychologist ever steps in the same river twice. Even if a study could be reproduced, they added, a negative result wouldn’t be that interesting, because it wouldn’t explain why the replication didn’t work.

So when Strack looks at the recent data he sees not a total failure but a set of mixed results. Nine labs found the pen-in-mouth effect going in the right direction. Eight labs found the opposite. Instead of averaging these together to get a zero effect, why not try to figure out how the two groups might have differed? Maybe there’s a reason why half the labs could not elicit the effect.

[Bolding is mine]

So here’s a “scientist” who, despite presumably being familiar with the extensive literature on statistics and the methodology wars, somehow believes that because half the labs reported a number slightly above average the key thing to look at is why the other half didn’t. Apparently, the only thing that would satisfy him is if all the labs reported an exactly opposite result. Which, presumably, would result in a new classic paper that frowning makes you happier!

Ring, Ring!

Clue Phone.

It’s random variation calling for you, Dr. Strack!

Cause here’s the thing…you’d expect about half the labs to show a positive result when there is no correlation. If some labs didn’t report a positive result then the correlation would pretty much have to be negative, right?

This isn’t, as he appears to believe, half-corroboration. It’s the way every null result ever found actually looks out here in the real world. I’d advise him to try flipping a coin 100 times, repeatedly, and see how often it comes out 50 heads and 50 tails. He might be surprised to learn that about the half the time this test will yield more heads flips than tail flips. This does not mean that heads is more likely than tails and it does not suggest that researchers should focus on why some trials yielded more heads and other trials yielded more tails.


Okay, I get it. You published a study. You made a career out of it. It’s embarrassing that it turns out to be wrong. But it’s hard to know in this case which is the worse response – intellectual dishonesty or sheer stupidity. Frankly, I think the latter. Because I don’t care how dishonest you are, some explanations should be too embarrassing to try on for size. And the idea that the right interpretation of these results would be to look for why some labs had slightly different results than others clearly belongs in that category.

I find the defense based on the difficulties of true replication more respectable. And yet, what are we to make of an experiment so delicate that it can’t be replicated AT ALL even with the most careful controls? How important can any inference we make from such an experiment plausibly be? By definition, it could only fit the most narrow range of cases imaginable. And the idea that replication of an experiment doesn’t matter seems…you know…a tad unscientific.

From my perspective, it isn’t the original study that illustrates the extraordinary problem we have getting people to use data well. Yes, over-reliance on small sample sizes is all too common and all too easy. That’s unfortunate, not shameful. But the deeper problem is that even when data is used well, a lethal combination of self-interest and a near total lack of understanding of basic statistics make it all too possible for people to ignore the data whenever they wish.

As Simon and Garfunkel plaintively observed, “A man hears what he wants to hear, and disregards the rest”.

If it wasn’t so sad, it would be funny.

Dammit. It is funny.

For it’s easy to see that in this version of the psych methodology wars, the defenders have their own unique version of a foxhole – with posterior high in the air and head firmly planted in the sand.

[Getting close to the Digital Analytics Hub. If you love talking analytics, check it out. Would be great to see you there!]

Digital Transformation of the Enterprise (with a side of Big Data)

Since I finished Measuring the Digital World and got back to regular blogging, I’ve been writing an extended series on the challenges of digital in the enterprise. Like many analysts, I’m often frustrated by the way our clients approach decision-making. So often, they lack any real understanding of the customer journey, any effective segmentation scheme, any real method for either doing or incorporating analytics into their decisioning, anything more than a superficial understanding of their customers, and anything more than the empty façade of a testing program. Is it any surprise that they aren’t very good at digital? This would be frustrating but understandable if companies simply didn’t invest in these capabilities. They aren’t magic, and no large enterprise can do these things without making a significant investment. But, in fact, many companies have invested plenty with very disappointing results. That’s maddening. I want to change that – and this series is an extended meditation on what it takes to do better and how large enterprises might truly gain competitive advantage in digital.

I hope that reading these posts is useful to people, but I know, too, that it’s hard to get the time. Heaven knows I struggle to read the stuff I’d like to. So I took advantage of the slow time over the holidays to do something that’s been on my wish list for about 2 years now – take some of the presentations I do and turn them into full online webinars. I started with a whole series that captures the core elements of this series – the challenge of digital transformation.

There are two versions of this video series. The first is a set of fairly short (2-4 minute) stories that walk through how enterprise decision-making gets done, what’s wrong with the way we do it, and how we can do better. It’s a ten(!) part series and meant to be tackled in order. It’s not really all that long…like I said, most of the videos are just 2-4 minutes long. I’ve also packaged up the whole story (except Part 10) in single video that runs just a little over 20 minutes. It’s shorter than viewing all 10 of the others, but you need a decent chunk of uninterrupted time to get at it. If you’re really pressed and only want to get the key themes without the story, you can just view Parts 8-10.

Here’s the video page that has all of these laid out in order:

Digital Transformation Video Series

Check it out and let me know what you think! To me it seems like a faster, better, and more enjoyable way to get the story about digital transformation and I’m hoping it’s very shareable as well. If you’re struggling to get analytics traction in your organization, these videos might be an easy thing to share with your CMO and digital channel leads to help drive real change.

I have to say I enjoyed doing these a lot and they aren’t really hard to do. They aren’t quite professional quality, but I think they are very listenable and I’ll keep working to make them better. In fact, I enjoyed doing the digital transformation ones so much that I knocked out another this last week – Big Data Explained.

This is one of my favorite presentations of all time – it’s rich in content and intellectually interesting. Big data is a subject that is obscured by hype, self-interest, and just plain ignorance; everyone talks about it but no one has a clear, cogent explanation of what it is and why it’s important. This presentation deconstructs the everyday explanation about big data (the 4Vs) and shows why it misses the mark. But it isn’t designed to merely expose the hype, it actually builds out a clear, straightforward and important explanation of why big data is real, why it challenges common IT and analytics paradigms, and how to understand whether a problem is a big data problem…or not. I’ve written about this before, but you can’t beat a video with supporting visuals for this particular topic. It’s less than fifteen minutes and, like the digital transformation series, it’s intended for a wide audience. If you have decision-makers who don’t get big data or are skeptical of the hype, they’ll appreciate this straightforward, clear, and no-nonsense explication of what it is.

You can get it on my video page or direct on Youtube

This is also a significant topic toward the end of Measuring the Digital World where I try to lay out a forward looking plan for digital analytics as a discipline.

I’m planning to do a steady stream of these videos throughout the year so I’d love thoughts/feedback if you have suggestions!

Next week I hope to have an update on my EY Counseling Family’s work in the 538 Academy Awards challenge. We’ve built our initial Hollywood culture models – it’s pretty cool stuff and I’m excited to share the results. Our model may not be as effective as some of the other challengers (TBD), but I think it’s definitely more fun.

Building Analytics Culture – One Decision at a Time

In my last post, I argued that much of what passes for “building culture” in corporate America is worthless. It’s all about talk. And whether that talk is about diversity, ethics or analytics, it’s equally arid. Because you don’t build culture by talking. You build culture though actions. By doing things right (or wrong if that’s the kind of culture you want). Not only are words not effective in building culture, they can be positively toxic. When words and actions don’t align, the dishonesty casts other – possibly more meaningful words – into disrepute. Think about which is worse – a culture where bribery is simply the accepted and normal way of getting things done (and is cheerfully acknowledged) and one where bribery is ubiquitous but is cloaked behind constant protestations of disinterest and honesty? If you’re not sure about your answer, take it down to a personal level and ask yourself the same question. Do we not like an honest villain better than a hypocrite? If hypocrisy is the compliment vice pays to virtue, it is a particularly nasty form of flattery.

What this means is that you can’t build an analytics culture by telling people to be data driven. You can’t build an analytics culture by touting the virtues of analysis. You can’t even build an analytics culture by hiring analysts. You build an analytics culture by making good (data-driven) decisions.

That’s the only way.

But how do you get an organization to make data-driven decisions? That’s the art of building culture. And in that last post, I laid out seven (a baker’s half-dozen?) tactics for building good decision-making habits: analytic reporting, analytics briefing sessions, hiring a C-Suite analytics advisor, creating measurement standards, building a rich meta-data system for campaigns and content, creating a rapid VoC capability and embracing a continuous improvement methodology like SPEED.

These aren’t just random parts of making analytic decisions. They are tactics that seem to me particularly effective in driving good habits in the organization and building the right kind of culture. But seven tactics doesn’t nearly exhaust my list. Here’s another set of techniques that are equally important in helping drive good decision-making in the organization (my original list wasn’t in any particular order so it’s not like the previous list had all the important stuff):

Yearly Agency Performance Measurement and Reviews

What it is: Having an independent annual analysis of your agency’s performance. This should include review of goals and metrics, consideration of the appropriateness of KPIs and analysis of variation in campaign performance along three dimensions (inside the campaign by element, over time, and across campaigns). This must not be done by the agency itself (duh!) or by the owners of the relationship.

Why it builds culture: Most agencies work by building strong personal relationships. There are times and ways that this can work in your favor, but from a cultural perspective it both limits and discourages analytic thinking. I see many enterprises where the agency is so strongly entrenched you literally cannot criticize them. Not only does the resulting marketing nearly always suck, but this drains the life out of an analytics culture. This is one of many ways in which building an analytic culture can conflict with other goals, but here I definitely believe analytics should win. You don’t need a too cozy relationship with your agency. You do need objective measurement of their performance.


Analytics Annotation / Collaboration Tool like Insight Rocket

What it is: A tool that provides a method for rich data annotation and the creation and distribution of analytic stories across the analytics team and into the organization. In Analytic Reporting, I argued for a focus on democratizing knowledge not data. Tools like Insight Rocket are a part of that strategy, since they provide a way to create and rapidly disseminate a layer of meaning on top of powerful data exploration tools like Tableau.

Why it builds culture: There aren’t that many places where technology makes much difference to culture, but there are a few. As some of my other suggestions make clear, you get better analytics culture the more you drive analytics across and into the organization (analytic reporting, C-Suite Advisor, SPEED, etc.). Tools like Insight Rocket have three virtues: they help disseminate analytics thinking not just data, they boost analytics collaboration making for better analytic teams, and they provide a repository of analytics which increases long-term leverage in the enterprise. Oh, here’s a fourth advantage, they force analysts to tell stories – meaning they have to engage with the business. That makes this piece of technology a really nice complement to my suggestion about a regular cadence of analytics briefings and a rare instance of technology deepening culture.



What it is: Building analytics expertise internally instead of hiring it out and, most especially, instead of off-shoring it.

Why it builds culture: I’d be the last person to tell you that consulting shouldn’t have a role in the large enterprise. I’ve been a consultant for most of my working life. But we routinely advise our clients to change the way they think about consulting – to use it not as a replacement for an internal capability but as a bootstrap and supplement to that capability. If analytics is core to digital (and it is) and if digital is core to your business (which it probably is), then you need analytics to be part of your internal capability. Having strong, capable, influential on-shore employees who are analysts is absolutely necessary to analytics culture. I’ll add that while off-shoring, too, has a role, it’s a far more effective culture killer than normal consulting. Off-shoring creates a sharp divide between the analyst and the business that is fatal to good performance and good culture on EITHER side.


Learning-based Testing Plan

What it is: Testing plans that include significant focus on developing best design practices and resolving political issues instead of on micro-optimizations of the funnel.

Why it works: Testing is a way to make decisions. But as long as its primary use is to decide whether to show image A or image B or a button in this color or that color, it will never be used properly. To illustrate learning-based testing, I’ve used the example of video integration – testing different methods of on-page video integration, different lengths, different content types and different placements against each key segment and use-case to determine UI parameters for ALL future videos. When you test this way, you resolve hundreds of future questions and save endless future debate about what to do with this or that video. That’s learning based testing. It’s also about picking key places in the organization where political battles determine design – things like home page real-estate and the amount of advertising load on a page – and resolving them with testing; that’s learning based testing, too. Learning based testing builds culture in two ways. First, in and of itself, it drives analytic decision-making. Almost as important, it demonstrates the proper role of experimentation and should help set the table for decision-makers tests to ask for more interesting tests.


Control Groups

What it is: Use of control groups to measure effectiveness whenever new programs (operational or marketing) are implemented. Control groups use small population subsets chosen randomly from a target population who are given either no experience or a neutral (existing) experience instead. Nearly all tests feature a baseline control group as part of the test, but the use of control groups transcends A/B testing tools. Use of control groups common in traditional direct response marketing and can be used in a wide variety of on and offline contexts (most especially as I recently saw Elea Feit of Drexel hammer home at the DAA Symposium – as a much more effective approach to attribution).

Why it works: One of the real barriers to building culture is a classic problem in education. When you first teach students something, they almost invariably use it poorly. That can sour others on the value of the knowledge itself. When people in an organization first start using analytics, they are, quite inevitably, going to fall into the correlation trap. Correlation is not causation. But in many cases, it sure looks like it is and this leads to many, many bad decisions. How to prevent the most common error in analytics? Control groups. Control groups build culture because they get decision-makers thinking the right way about measurement and because they protect the organization from mistakes that will otherwise sour the culture on analytics.


Unified Success Framework

What it is: A standardized, pre-determined framework for content and campaign success measurement that includes definition of campaign types, description of key metrics for those types, and methods of comparing like campaigns on an apples-to-apples basis.

Why it works: You may not be able to make the horse drink, but leading it to water is a good start. A unified success framework puts rigor around success measurement – a critical part of building good analytics culture. On the producer side, it forces the analytics team to make real decisions about what matters and, one hopes, pushes them to prove that proxy measures (such as engagement) are real. On the consumer side, it prevents that most insidious destroyer of analytics culture, the post hoc success analysis. If you can pick your success after the game is over, you’ll always win.


The Enterprise VoC Dashboard

What it is: An enterprise-wide state-of-the-customer dashboard that provides a snapshot and trended look at how customer attitudes are evolving. It should include built in segmentation so that attitudinal views are ALWAYS shown sliced by key customer types with additional segmentation possible.

Why it works: There are so many good things going on here that it’s hard to enumerate them all. First, this type of dashboard is one of the best ways to distill customer-first thinking in the organization. You can’t think customer-first, until you know what the customer thinks. Second, this type of dashboard enforces a segmented view of the world. Segmentation is fundamental to critical thinking about digital problems and this sets the table for better questions and better answers in the organization. Third, opinion data is easier to absorb and use than behavioral data, making this type of dashboard particularly valuable for encouraging decision-makers to use analytics.


Two-Tiered Segmentation

What it is: A method that creates two-levels of segmentation in the digital channel. The first level is the traditional “who” someone is – whether in terms of persona or business relationship or key demographics. The second level captures “what” they are trying to accomplish. Each customer touch-point can be described in this type of segmentation as the intersection of who a visitor is and what their visit was for.

Why it works: Much like the VoC Dashboard, Two-Tiered Segmentation makes for dramatically better clarity around digital channel decision-making and evaluation of success. Questions like ‘Is our Website successful?’ get morphed into the much more tractable and analyzable question ‘Is our Website successful for this audience trying to do this task?’. That’s a much better question and big part of building analytics culture is getting people to ask better questions. This also happens to be the main topic of my book “Measuring the Digital World” and in it you can get a full description of both the power and the methods behind Two-Tiered Segmentation.


I have more, but I’m going to roll the rest into my next post on building an agile organization since they are all deeply related to the integration of capabilities in the organization. Still, that’s fifteen different tactics for building culture. None of which include mission statements, organizational alignment or C-Level support (okay, Walking the Walk is kind of that but not exactly and I didn’t include it in the fifteen) and none of which will take place in corporate retreats or all-hands conferences. That’s a good thing and makes me believe they might actually work.

Ask yourself this: is it possible to imagine an organization that does even half these things and doesn’t have a great analytics culture? I don’t think it is. Because culture just is the sum of the way your organization works and these are powerful drivers of good analytic thinking. You can imagine an organization that does these things and isn’t friendly, collaborative, responsible, flat, diverse, caring or even innovative. There are all kinds of culture, and good decision-making isn’t the only aspect of culture to care about*. But if you do these things, you will have an organization that makes consistently good decisions.

*Incidentally, if you want to build culture in any of these other ways, you have to think about similar approaches. Astronomers have a clever technique for seeing very faint objects called averted vision. The idea is that you look just to the side of the object if you want to get the most light-gathering power from your eyes. It’s the same with culture. You can’t tackle it head-on by talking about it. You have to build it just a little from the side!

Is Data Science a Science?

I got a fair amount of feedback through various channels around my argument that data science isn’t a science and that the scientific method isn’t a method (or at least much of one). I wouldn’t consider either of these claims particularly important in the life of a business analyst, and I think I’ve written pieces that are far more significant in terms of actual practice, but I’ve written few pieces about topics which are evidently more fun to argue about. Well, I’m not opposed to a fun argument now and again, so here’s a redux on some of the commentary and my thoughts in response.

There were two claims in that post:

  1. I was somewhat skeptical that data science was correctly described as a science
  2. I was extremely skeptical that the scientific method was a good description of the scientific endeavor

The comment that most engaged me came from Adam Gitzes and really focused on the first claim:

Science is the distillation of evidence into a causal understanding of the world (my definition anyway). In business analytics, we use surveys, data analysis techniques, and experimental design to also understand causal relationships that can be used to drive our business.

On re-reading my initial post, I realized that while I had argued that business analytics wasn’t science (#1 above), I hadn’t really put many reasons on the table for that view – partly because I was too busy demolishing the “Scientific Method” and partly because I think it’s the less important of the two claims and also the more likely to be correct. Mostly, I just said I was skeptical of the idea. So I think Adam’s right to push out a more specific description of science and ask why data science might not be reasonably described as a kind of scientific endeavor.

I’m not going to get into the thicket of trying to define science. Really. I’m not. That’s the work of a different career. If I got nothing else out of my time studying Philosophy, I got an appreciation for how incredibly hard it is to answer seemingly simple questions like “what is science?” For the most part, we know it when we see it. Physics is science. Philosophy isn’t. But knowing it when you see it is precisely what fails when it comes to edge cases like data science or sociology.

When it comes to business analytics and data science, however, there are a couple of things that make me skeptical of applying the term science that I think we might actually agree on and that use our shared, working understanding of the scientific endeavor.

In business analytics, our main purpose isn’t to understand the world. It’s to improve a specific part of it. Science has no such objective.

Does that seem like a small difference? I don’t think it is. Part of what makes the scientific endeavor unique is that there is no axe to grind. Understanding is the goal. This isn’t to say that people don’t get attached to their ideas or that their careers don’t benefit if they are successful advocates for them – it’s done by humans after all. It would be no more accurate to suggest that the goal of a business is always profit. External forces can and often do set the agenda for researchers. But these are corruptions of the process not the process itself. Business analytics starts (appropriately) with an axe to grind and true science doesn’t.

To see why this makes a difference, consider my own domain – digital analytics. If our goal was just to understand the digital world, we’d have a very different research program than we do. If knowledge was our only goal, we’d spend as much time analyzing why people create certain kinds of digital worlds as how people consume them. That’s not the way it works. In reality, our research program is entirely focused on why and how people use a digital property and what will get more of them to take specific actions – not why and how it was created.

We are, rightly I believe, skeptical of the idea that research sponsored by tobacco companies into lung cancer is, properly speaking, science. That’s not because those researchers don’t follow the general outline of the scientific endeavor – it’s because they have an axe to grind and their research program is determined by factors outside the community of science. When it comes to business analytics, we are all tobacco scientists.

Perhaps we’re not so biased as to the findings of our experiments – good analytics is neutral as to what will work – but we’re every bit as biased when it comes to the outcomes desired and the shape of the research program.

Here’s another crucial difference. I think it’s fair to suggest that in data science we sometimes have no interest in causality. If I’m building a forecast model and I can find variables that are predictive, I may have little interest in whether those variables are also causal. If I’m building a look-alike targeting model, for example, it doesn’t matter one whit whether the variables are causal. Now it’s true that philosophers of science hotly debate the role and necessity of causality in science, but I tend to agree with Adam that there is something in the scientific endeavor that makes the demand for causality a part of the process. But in business analytics, we may demand causality for some problems but be entirely and correctly unconcerned with it in others. In business analytics, causality is a tool not a requirement.

There is, also, the nature of the analytics problem – at least in my field (digital). Science is typically concerned with studying natural phenomena. The digital world is not a natural world, it’s an engineered world. It’s created and adapted with intention. Perhaps even worse, it responds to and changes with the measurements we make and those measurements influence our intentions in subsequent building (which is the whole point after all).

This is Heisenberg’s Uncertainty Principle with a vengeance! When we measure the digital world, we mean to change it based on the measurement. What’s more, once we change it, we can never go back to the same world. We could restore the HTML, but not the absence of users with an alternative experience. In digital, every test we run changes the world in a fundamental way because it changes the users of that world. There is no possibility of conducting a digital test that doesn’t alter the reality we’re measuring – and while this might be true at the quantum level in physics, at the macro level where the scientific endeavor really lives, it seems like a huge difference.

What’s more, each digital property lives in the context of a larger digital world that is being constantly changed with intention by a host of other people. When new Apps like Uber change our expectations of how things like payment should work or alter the design paradigm on the Web, these exogenous and intentional changes can have a dramatic impact on our internal measurement. There is, then, little or no possibility of a true controlled experiment in digital. In digital analytics, our goal is to optimize one part of a giant machine for a specific purpose while millions of other people are optimizing other, inter-related parts of the same machine for entirely different and often opposed purposes.

This doesn’t seem like science to me.

There are disciplines that seem clearly scientific that cannot do controlled experiments. However, no field where the results of an experiment change the measured reality in a clearly significant fashion and are used to intentionally shape the resulting reality is currently described as scientific.

So why don’t I think data science is a science – at least in the realm of digital analytics? It differs from the scientific endeavor in several aspects that seem to me to be critical. Unlike science, business analytics and data science start with an agenda that isn’t just understanding and this fundamentally shapes the research program. Unlike science, business analytics and data science have no fixed commitment to causal explanations – just a commitment to working explanations. Finally, unlike science, business analytics and data science change the world they measure in a clearly significant fashion and do so intentionally with respect to the measurement.

Given that we have no fixed and entirely adequate definition of science, none of this is proof. I can’t demonstrate to you with the certainty of a logical proof that the definition of science requires X, data science is not X, so data science is not a science.

However, I think I have shown that at least by many of the core principles we associate with the scientific endeavor, that business analytics (which I take to be a proxy in this conversation for data science) is not well described as a science.

This isn’t a huge deal. I’ve done business analytics for many years and never once thought of myself as a scientist. What’s more, once we realize that being scientists doesn’t attach a powerful new methodology to business analytics – which was the rather more important point of my last post – it’s much less clear why anyone would think it makes a difference.



A few other notes on the comments I received. With regards to Nikolaos’ question “why should we care?” I’m obviously largely in agreement. There is intellectual interest in these questions (at least for me), but I won’t pretend that they are likely to matter in actual practice or will determine ‘what works’. I’m also very much in agreement with Ake’s point about qualitative data. The truth is that nothing in the scientific endeavor precludes the use of qualitative data in addition to behavioral data. But even though there’s no determinate tie between the two, I certainly think that advocates for data science as a science are particularly likely to shun qualitative data (which is a shame). As far as Patrick’s comment goes, I think it dodges the essential question. He’s right to suggest that the term data science is contentless because data is not the subject of science, the data is always about something which is the subject of science. But I take the deeper claim to be what I have tackled here; namely, that business analytics is a scientific endeavor. That claim isn’t contentless, just wrong. I remain, still, deeply unconvinced of the utility of CRISP-DM.


Now is as good a time as any (how’s that for a powerful call to action?) to pre-order my book, ‘Measuring the Digital World’ on Amazon.