Is Data Science a Science?

I got a fair amount of feedback through various channels around my argument that data science isn’t a science and that the scientific method isn’t a method (or at least much of one). I wouldn’t consider either of these claims particularly important in the life of a business analyst, and I think I’ve written pieces that are far more significant in terms of actual practice, but I’ve written few pieces about topics which are evidently more fun to argue about. Well, I’m not opposed to a fun argument now and again, so here’s a redux on some of the commentary and my thoughts in response.

There were two claims in that post:

  1. I was somewhat skeptical that data science was correctly described as a science
  2. I was extremely skeptical that the scientific method was a good description of the scientific endeavor

The comment that most engaged me came from Adam Gitzes and really focused on the first claim:

Science is the distillation of evidence into a causal understanding of the world (my definition anyway). In business analytics, we use surveys, data analysis techniques, and experimental design to also understand causal relationships that can be used to drive our business.

On re-reading my initial post, I realized that while I had argued that business analytics wasn’t science (#1 above), I hadn’t really put many reasons on the table for that view – partly because I was too busy demolishing the “Scientific Method” and partly because I think it’s the less important of the two claims and also the more likely to be correct. Mostly, I just said I was skeptical of the idea. So I think Adam’s right to push out a more specific description of science and ask why data science might not be reasonably described as a kind of scientific endeavor.

I’m not going to get into the thicket of trying to define science. Really. I’m not. That’s the work of a different career. If I got nothing else out of my time studying Philosophy, I got an appreciation for how incredibly hard it is to answer seemingly simple questions like “what is science?” For the most part, we know it when we see it. Physics is science. Philosophy isn’t. But knowing it when you see it is precisely what fails when it comes to edge cases like data science or sociology.

When it comes to business analytics and data science, however, there are a couple of things that make me skeptical of applying the term science that I think we might actually agree on and that use our shared, working understanding of the scientific endeavor.

In business analytics, our main purpose isn’t to understand the world. It’s to improve a specific part of it. Science has no such objective.

Does that seem like a small difference? I don’t think it is. Part of what makes the scientific endeavor unique is that there is no axe to grind. Understanding is the goal. This isn’t to say that people don’t get attached to their ideas or that their careers don’t benefit if they are successful advocates for them – it’s done by humans after all. It would be no more accurate to suggest that the goal of a business is always profit. External forces can and often do set the agenda for researchers. But these are corruptions of the process not the process itself. Business analytics starts (appropriately) with an axe to grind and true science doesn’t.

To see why this makes a difference, consider my own domain – digital analytics. If our goal was just to understand the digital world, we’d have a very different research program than we do. If knowledge was our only goal, we’d spend as much time analyzing why people create certain kinds of digital worlds as how people consume them. That’s not the way it works. In reality, our research program is entirely focused on why and how people use a digital property and what will get more of them to take specific actions – not why and how it was created.

We are, rightly I believe, skeptical of the idea that research sponsored by tobacco companies into lung cancer is, properly speaking, science. That’s not because those researchers don’t follow the general outline of the scientific endeavor – it’s because they have an axe to grind and their research program is determined by factors outside the community of science. When it comes to business analytics, we are all tobacco scientists.

Perhaps we’re not so biased as to the findings of our experiments – good analytics is neutral as to what will work – but we’re every bit as biased when it comes to the outcomes desired and the shape of the research program.

Here’s another crucial difference. I think it’s fair to suggest that in data science we sometimes have no interest in causality. If I’m building a forecast model and I can find variables that are predictive, I may have little interest in whether those variables are also causal. If I’m building a look-alike targeting model, for example, it doesn’t matter one whit whether the variables are causal. Now it’s true that philosophers of science hotly debate the role and necessity of causality in science, but I tend to agree with Adam that there is something in the scientific endeavor that makes the demand for causality a part of the process. But in business analytics, we may demand causality for some problems but be entirely and correctly unconcerned with it in others. In business analytics, causality is a tool not a requirement.

There is, also, the nature of the analytics problem – at least in my field (digital). Science is typically concerned with studying natural phenomena. The digital world is not a natural world, it’s an engineered world. It’s created and adapted with intention. Perhaps even worse, it responds to and changes with the measurements we make and those measurements influence our intentions in subsequent building (which is the whole point after all).

This is Heisenberg’s Uncertainty Principle with a vengeance! When we measure the digital world, we mean to change it based on the measurement. What’s more, once we change it, we can never go back to the same world. We could restore the HTML, but not the absence of users with an alternative experience. In digital, every test we run changes the world in a fundamental way because it changes the users of that world. There is no possibility of conducting a digital test that doesn’t alter the reality we’re measuring – and while this might be true at the quantum level in physics, at the macro level where the scientific endeavor really lives, it seems like a huge difference.

What’s more, each digital property lives in the context of a larger digital world that is being constantly changed with intention by a host of other people. When new Apps like Uber change our expectations of how things like payment should work or alter the design paradigm on the Web, these exogenous and intentional changes can have a dramatic impact on our internal measurement. There is, then, little or no possibility of a true controlled experiment in digital. In digital analytics, our goal is to optimize one part of a giant machine for a specific purpose while millions of other people are optimizing other, inter-related parts of the same machine for entirely different and often opposed purposes.

This doesn’t seem like science to me.

There are disciplines that seem clearly scientific that cannot do controlled experiments. However, no field where the results of an experiment change the measured reality in a clearly significant fashion and are used to intentionally shape the resulting reality is currently described as scientific.

So why don’t I think data science is a science – at least in the realm of digital analytics? It differs from the scientific endeavor in several aspects that seem to me to be critical. Unlike science, business analytics and data science start with an agenda that isn’t just understanding and this fundamentally shapes the research program. Unlike science, business analytics and data science have no fixed commitment to causal explanations – just a commitment to working explanations. Finally, unlike science, business analytics and data science change the world they measure in a clearly significant fashion and do so intentionally with respect to the measurement.

Given that we have no fixed and entirely adequate definition of science, none of this is proof. I can’t demonstrate to you with the certainty of a logical proof that the definition of science requires X, data science is not X, so data science is not a science.

However, I think I have shown that at least by many of the core principles we associate with the scientific endeavor, that business analytics (which I take to be a proxy in this conversation for data science) is not well described as a science.

This isn’t a huge deal. I’ve done business analytics for many years and never once thought of myself as a scientist. What’s more, once we realize that being scientists doesn’t attach a powerful new methodology to business analytics – which was the rather more important point of my last post – it’s much less clear why anyone would think it makes a difference.

Agree?

 

A few other notes on the comments I received. With regards to Nikolaos’ question “why should we care?” I’m obviously largely in agreement. There is intellectual interest in these questions (at least for me), but I won’t pretend that they are likely to matter in actual practice or will determine ‘what works’. I’m also very much in agreement with Ake’s point about qualitative data. The truth is that nothing in the scientific endeavor precludes the use of qualitative data in addition to behavioral data. But even though there’s no determinate tie between the two, I certainly think that advocates for data science as a science are particularly likely to shun qualitative data (which is a shame). As far as Patrick’s comment goes, I think it dodges the essential question. He’s right to suggest that the term data science is contentless because data is not the subject of science, the data is always about something which is the subject of science. But I take the deeper claim to be what I have tackled here; namely, that business analytics is a scientific endeavor. That claim isn’t contentless, just wrong. I remain, still, deeply unconvinced of the utility of CRISP-DM.

 

Now is as good a time as any (how’s that for a powerful call to action?) to pre-order my book, ‘Measuring the Digital World’ on Amazon.