Tag Archives: a/b testing

Store Testing & Continuous Improvement

Continuous improvement is what drives the digital world. Whether applied as a specific methodology or simply present as a fundamental part of the background against which we do business, the discipline of change and measure is a fundamental part of the digital environment. A key part of our mission at Digital Mortar is simply this: to take that discipline of continuous improvement via change and measurement and bring it to stores.

Every part of DM1 – from store visualizations to segmentation to funnel analytics – is there to help measure and illuminate the in-store customer journey. You can’t build an effective strategy or process for continuous improvement without having that basic measurement environment. It provides the context that let’s decision-makers talk intelligently about what’s working, what isn’t and what change might accomplish.

But as I pointed out in my last post, some analytic techniques are particularly useful for the role they play in shaping strategy and action. Funnel Analysis, I argued, is particularly good at focusing optimization efforts and making them easily measurable. Funnels help shape decisions about what to change. Equally important, they provide clear guidance about what to measure to judge the success of that change. After all, if you made a change to improve the funnel, you’re going to measure the impact of the change using that same funnel.

That’s a good thing.

One of the biggest mistakes in enterprise measurement (and – surprisingly – even in broader scientific contexts) is failing to commit to your measurement of success when you start an experiment. It turns out that you can nearly always find some measure that improved after an experiment. It just may not be the right measure. If folks are looking for a way to prove success, they’ll surely find it.

Since we expect our clients to use DM1 to drive store testing, we’ve tried to make it easy on both ends of the process. Tools like funnel analysis help analysts find and target areas for improvement. At the other end of the process, analysts need to be able to easily see whether changes actually generated improvement.

This isn’t just for experimentation. As an analyst, I find that one of the most common tasks I have do is compare numbers. By store. By page. By time-period. By customer segment. Comparison provides basic measurement of change and context on that change.

Which makes comparison the core capability necessary for analyzing store tests but also applicable to many analytics exercises.

Though comparison is a fundamental part of the analytic process, it’s surprising how often it’s poorly supported in bespoke analytics tools. It took many years for tools like Adobe’s Workspace to evolve – providing comprehensive comparison capabilities. Until quite recently in digital analytics, you had to export reports to Excel if you wanted to lay key digital analytic data points from different reports side-by-side.

DM1’s Comparison tool is simple. It’s not a completely flexible canvas for analysis. It just takes any analytic view DM1 provides and allows you to use it in a side-by-side comparison. Simple. But it turns out to be quite powerful in practice.

Suppose you’re running a test in Store A with Store B as a control. DM1’s comparison view lets you lay those two Stores side-by-side during the testing period and see exactly what’s different. In this view, I’ve compared two similar stores by area looking at which areas drove the most shopper conversions:

Retail Analytics and Store Testing: Store Comparison in DM1

You can use ANY DM1 visualization in the Comparison. The funnel, the Store Viz or traditional reports and charts. In this view, I’ve compared the Shopper Funnel around a single merchandising category at two different stores. Not only can I see which store is more effective, I can see exactly where in the funnel the performance differences occur:

Retail Analytics and Store Testing: Time Comparison

Don’t have a control store? If you’re only measuring the customer journeys in a single store or if your store is a concept store, you won’t have another store to use as a control. No problem, DM1’s comparison view lets you compare the same store across two different time periods. You can compare season over season or consecutive time periods. You don’t even have to evenly match time periods. Here I’ve compared the October Funnel to Pre-Holiday November:

Retail Analytics and Funnels: Store Testing

Store and Date/Time are the most common type of comparison. But DM1’s comparison tool lets you compare on Segments and Metrics as well. I often want to understand how a single segment is different than other groups of visitors. By setting up a segmentation visualization, I can quickly page through a set of comparison segments while holding my target group constant. In the first screen, I’ve compared shoppers interested in Backpacks with shoppers focused on Team Gear in terms of how effective interactions with Associates are. With one click, I can do the same comparison between Women’s Jacket shoppers and Team Gear:

Funnel Analytics and Store TestingStore Analytics Comparison: Store Testing Segments

The ability to do this kind of comparison in the context of the visualizations is unusual AND powerful. The Comparison tool isn’t the only part of DM1 that supports comparison and contextualization. The Dashboard capability is surprisingly flexible and allows the analyst to put all sorts of different views side by side. And, of course, standard reporting tools like Charts and Table provide significant ways to do comparisons. But particularly when you want to use bespoke visualizations like Funnels and DM1’s store visualizations, having the ability to lay them side by side and quickly adjust metrics and view parameters is extraordinarily useful.

If you want to create a process of continuous improvement in the store, having measurement is THE essential component. Measurement that can help you identify and drive potential store testing opportunities. And measurement that can make understanding the real-world impact of change in all its complexity.

DM1 does both.

Click here to sign-up for a Demo of DM1.

Controlled Experimentation and Decision-Making

The key to effective digital transformation isn’t analytics, testing, customer journeys, or Voice of Customer. It’s how you blend these elements together in a fundamentally different kind of organization and process. In the DAA Webinar (link coming) I did this past week on Digital Transformation, I used this graphic to drive home that point:


I’ve already highlighted experience engineering and integrated analytics in this little series, and the truth is I wrote a post on constant customer research too. If you haven’t read it, don’t feel bad. Nobody has. I liked it so much I submitted it to the local PR machine to be published and it’s still grinding through that process. I was hoping to get that relatively quickly so I could push the link, but I’ve given up holding my breath. So while I wait for VoC to emerge into the light of day, let’s move on to controlled experimentation.

I’ll start with definitional stuff. By controlled experimentation I do mean testing, but I don’t just mean A/B testing or even MVT as we’ve come to think about it. I want it to be broader. Almost every analytics project is challenged by the complexity of the world. It’s hard to control for all the constantly changing external factors that drive or impact performance in our systems. What looks like a strong and interesting relationship in a statistical analysis is often no more than an artifact produced by external factors that aren’t being considered. Controlled experiments are the best tool there is for addressing those challenges.

In a controlled experiment, the goal is to create a test whereby the likelihood of external factors driving the results is minimized. In A/B testing, for example, random populations of site visitors are served alternative experiences and their subsequent performance is measured. Provided the selection of visitors into each variant of the test is random and there is sufficient volume, A/B tests make it very unlikely that external factors like campaign sourcing or day-time parting will impact the test results. How unlikely? Well, taking a random sample doesn’t guarantee randomness. You can flip a fair coin fifty times and get fifty heads so even a sample collected in a fully random manner may come out quite biased; it’s just not very likely. The more times you flip, the more likely your sample will be representative.

Controlled experiments aren’t just the domain of website testing though. They are a fundamental part of scientific method and are used extensively in every kind of research. The goal of a controlled experiment is to remove all the variables in an analysis but one. That makes it really easy to analyze.

In the past, I’ve written extensively on the relationship between analytics and website testing (Kelly Wortham and I did a whole series on the topic). In that series, I focused on testing as we think of it in the digital world – A/B and MV tests and the tools that drive those tests. I don’t want to do that here, because the role for controlled experimentation in the digital enterprise is much broader than website testing. In an omni-channel world, many of the most important questions – and most important experiments – can’t be done using website testing. They require experiments which involve the use, absence or role of an entire channel or the media that drives it. You can’t build those kinds of experiments in your CMS or your testing tool.

I also appreciate that controlled experimentation doesn’t carry with it some of the mental baggage of testing. When we talk testing, people start to think about Optimizely vs. SiteSpect, A/B vs. MVT, landing page optimization and other similar issues. And when people think about A/B tests, they tend to think about things like button colors, image A vs. image B and changing the language in a call-to-action. When it comes to digital transformation, that’s all irrelevant.

It’s not that changing the button colors on your website isn’t a controlled experiment. It is; it’s just not a very important one. It’s also representative of the kind of random “throw stuff at a wall” approach to experimentation that makes so many testing programs nearly useless.

One of the great benefits of controlled experimentation is that, done properly, the idea of learning something useful is baked into the process. When you change the button color on your Website, you’re essentially framing a research question like this:

Hypothesis: Changing the color of Button X on Page Y from Red to Yellow will result in more clicks of the button per page view

An A/B test will indeed answer that question. However, it won’t necessarily answer ANY other question of higher generality. Will changing the color of any other button on any other page result in more clicks? That’s not part of the test.

Even with something as inane as button colors, thinking in terms of a controlled experiment can help. A designer might generalize this hypothesis to something that’s a little more interesting. For example, the hypothesis might be:

Hypothesis: Given our standard color pallet, changing a call-to-action on the page to a higher contrast color will result in more clicks per view on the call-to-action

That’s a somewhat more interesting hypothesis and it can be tested with a range of colors with different contrasts. Some of those colors might produce garish or largely unreadable results. Some combinations might work well for click-rates but create negative brand impressions. That, too, can be tested and might perhaps yield a standardized design heuristic for the right level of contrast between the call-to-action and the rest of a page given a particular color palette.

The point is, by casting the test as a controlled experiment we are pushed to generalize the test in terms of some single variable (such as contrast and its impact on behavior). This makes the test a learning experience; something that can be applied to a whole set of cases.

This example could be read as an argument for generalizing isolated tests into generalized controlled experiments. That might be beneficial, but it’s not really ideal. Instead, every decision-maker in the organization should be thinking about controlled experimentation. They should be thinking about it as way to answer questions analytics can’t AND as a way to assess whether the analytics they have are valid. Controlled experimentation, like analytics, is a tool to be used by the organization when it wants to answer questions. Both are most effective when used in a top-down not a bottom-up fashion.

As the sentence above makes clear, controlled experimentation is something you do, but it’s also a way you can think about analytics – a way to evaluate the data decision-makers already have. I’ve complained endlessly, for example, about how misleading online surveys can be when it comes to things like measuring sitewide NPS. My objection isn’t to the NPS metric, it’s to the lack of control in the sample. Every time you shift your marketing or site functionality, you shift the distribution of visitors to your website. That, in turn, will likely shift your average NPS score – irrespective of any other change or difference. You haven’t gotten better or worse. Your customers don’t like you less or more. You’ve simply sampled a somewhat different population of visitors.

That’s a perfect example of a metric/report which isn’t very controlled.  Something outside what you are trying to measure (your customer’s satisfaction or willingness to recommend you) is driving the observed changes.

When decision-makers begin to think in terms of controlled experiments, they have a much better chance of spotting the potential flaws in the analysis and reporting they have, and making more risk-informed decisions. No experiment can ever be perfectly controlled. No analysis can guarantee that outside factors aren’t driving the results. But when decision-makers think about what it would take to create a good experiment, they are much more likely to interpret analysis and reporting correctly.

I’ve framed this in terms of decision-makers, but it’s good advice for analysts too. Many an analyst has missed the mark by failing to control for obvious external drivers in their findings. A huge part of learning to “think like an analyst” is learning to evaluate every analysis in terms of how to best approximate a controlled experiment.

So if controlled experimentation is the best way to make decisions, why not just test everything? Why not, indeed? Controlled experimentation is tremendously underutilized in the enterprise. But having said as much, not every problem is amenable to or worth experimenting on. Sometimes, building a controlled experiment is very expensive compared to an analysis; sometimes it’s not. With an A/B testing tool, it’s often easier to deploy a simple test than try to conduct and analysis of a customer preference. But if you have an hypothesis that involves re-designing the entire website, building all that creative to run a true controlled experiment isn’t going to be cheap, fast or easy.

Media mix analysis is another example of how analysis/experimentation trade-offs come into play. If you do a lot of local advertising, then controlled experimentation is far more effective than mix modeling to determine the impact of media and to tune for the optimum channel blend. But if much of your media buy is national, then it’s pretty much impossible to create a fully controlled experiment that will allow you to test mix hypotheses. So for some kinds of marketing organizations, controlled experimentation is the best approach to mix decisions; for others, mix modelling (analysis in other words – though often supplemented by targeted experimentation) is the best approach.

This may all seem pretty theoretical, so I’ll boil it down to some specific recommendations for the enterprise:

  • Repurpose you’re A/B testing group as a controlled experimentation capability
  • Blend non-digital analytics resources into that group to make sure you aren’t thinking too narrowly – don’t just have a bunch of people who think A/B testing tools
  • Integrate controlled experimentation with analytics – they are two sides of the same coin and you need a single group that can decide which is appropriate for a given problem
  • Train your executives and decision-makers in experimentation and interpreting analysis – probably with a dedicated C-Suite resource
  • Create constant feedback loops in the organization so that decision-makers can request new survey questions, new analysis and new experiments at the same time and with the same group

I see lots of organizations that think they are doing a great job testing. Mostly they aren’t even close. You’re doing a great job testing when every decision maker at every level in the organization is thinking about whether a controlled experiment is possible when they have to make a significant decision. When those same decision-makers know how to interpret the data they have in terms of its ability to approximate a controlled experiment. And when building controlled experiments is deeply integrated into the analytics research team and deployed across digital and omni-channel problems.