Show Notes
In this episode of Startups For The Rest Of Us, Rob and Mike talk about the common mistakes people make when A/B testing.
Items mentioned in this episode:
Transcript
Mike [00:00]: In this episode of “Startups for the Rest of Us,” Rob and I are going to be talking about nine common A-B testing mistakes people make. This is “Startups for the Rest of Us” Episode 258.
[00:07]: [music plays]
Mike [00:15]: Welcome to “Startups for the Rest of Us,” the podcast that helps developers, designers and entrepreneurs be awesome at launching software products, whether you’ve built your first product or you’re just thinking about it. I’m Mike.
Rob [00:24]: And I’m Rob.
Mike [00:25]: And we’re here to share experiences to help you avoid the same mistakes we’ve made. What’s going on this week Rob?
Rob [00:29]: You know, at this point, it’s almost October 1st, and that gives us just about six weeks to get our stuff done before Thanksgiving happens here in the States. And frankly, the world goes into kind of holiday mode, it seems. So I’ve really been thinking about what the next six or seven weeks holds. I know by the time this goes live, it’ll probably be mid-October. So you only have a month, but think about that with whatever you’re doing. If you have another two to three weeks left to develop a big feature that you wanted to launch, you’re probably not going to launch it until January at this point. Maybe you could launch it first week of December, but it’s just so choppy from here on out. And I know there’s still three months left in the year, but it just starts getting touch-and-go here over the next few months as kind of the retail space – it’s great for eCommerce, not typically that good for B-to-B apps in terms of growth, and in terms of finding new customers. Because our heads are with plans of hanging out with family and doing other things. And a lot of folks are not necessarily like about, what’s the new app that they can sign up with and have a new trial running over your Thanksgiving break or your Christmas break. How about you, what’s going on?
Mike [01:35]: Well, I went on a personal retreat last weekend. And I found it really helpful. I came out of it kind of settled and at peace with myself on a number of different fronts, and more or less really raring to go and go after a couple of different key points that came out of my retreat. So kind of looking forward to the next six to twelve weeks and see what happens.
Rob [01:53]: When you say personal retreat, was it, you thought about work stuff or personal as well?
Mike [01:58]: Personal and work, so like I kind of dedicated some time to think about like things that were going on in my personal life and then also, in terms of like the business, and where I want to go and kind of what I want to do next. So I definitely sat down and analyzed a lot of things that have gone on over the past few years. Again, both on the personal front and on the business front. And then just helps me to make decisions on where to go next.
Rob [02:19]: And where are you going next?
Mike [02:21]: Well, it’s going to be validation for a couple of different ideas. And I basically went through a process that I had laid out in my book last year, and started ranking some of the different ideas and trying to figure out which one I should start validating first. So I kind of got it narrowed down a little bit. And right now, I’m going through the validation process with a bunch of people. I’ve had half a dozen conversations, and I’ve got at least half a dozen more scheduled over the next several days. And basically, like every day one of my tasks is to go through and start lining up more people to talk to for those conversations. So they’re going well so far. And I’ll just kind of see where it takes us. Until I get further through the process, I’ll probably just keep it quiet, I guess a little bit quiet for the time being. But I will talk about it probably a lot more on the podcast as things progress.
Rob [03:03]: Very nice. We have a slew of new iTunes reviews, and I won’t go through them all, but we had one from a couple of months ago, and it says, “I listen while I eat sandwiches. Really enjoy listening to the show, general candor and advice. Your knowledge is being used directly to introduce [Deli?] Empire, which includes a couple of different brands,” he says, “called Sandwich [Knob?] and Seafood [Dial?] to the world. Looking forward to catching up on the newer episodes. Another review says, “The last couple of months has been like watching Netflix.” And EA760 says, “Binged on all Rob and Mike’s past episodes and now have to wait a whole week for new episodes. Each episode is full of so much great info for the self-fronted entrepreneur. It’s the only podcast I listen to at 1x speed, because I’m constantly rewinding to take down notes. Thank you for the consistently invaluable advice and insight.” So thanks so much for your 5-star reviews. Even if you don’t want to write a full commentary like that, if you could log into iTunes, Downcast, Stitcher and plunk us a 5-star if you get any value out of the show, we would really appreciate it.
Mike [04:03]: One other thing to note: about a month ago, I did an interview with Matthew Paulson on email marketing demystified. And his book just came out, so it’s listed on Amazon right now, and we’ll link that up onto the show. But I just wanted to mention that because the episode did come out a little while ago. So before we dive into today’s episode, there is a listener question that we want to address. And this one comes in from Christian, and he says, “Blockers like Ghost Read, Privacy Badger, etc. are blocking third-party widgets like Qualaroo, Mixpanel and others. As content blockers grow in popularity, do you think this will kill Java script widget products?”
Rob [04:35]: So I think that the answer is no. I do think it can have an impact on them. I also know that some of these third-party blockers are getting a little, I’ll say ambitious, and they’re blocking things that aren’t exactly ad widgets. And that if you report – like at one point, the Drip JavaScript got blocked by an ad blocker. And when I reported it, they said, “Oh, you’re not an advertising platform. That’s okay.” And they unblocked it. So I don’t have experience with Ghost Read, Privacy Badger, to know if they’re doing this intentionally or if it’s accidentally, because technically most of these guys don’t want to block non-ads. They really just want to block ad networks and that kind of tracking and cookie stuff. I think some of them are getting, like I said, a little ambitious. And if they start to dive in and block things like Qualaroo or Mixpanel and other things, they’re going to start breaking the web at some point. Because people are relying on these services often to have some mission critical functionality. So in my estimation, number one, most people aren’t going to install these ad blockers. I mean, it’s a lot of people in our circles. But if you look at the total number of people on the internet versus the total number of people with ad blockers, it’s a tiny, tiny percentage. And then secondarily, I think that it’s kind of like you can go around the internet with cookies blocked and JavaScript blocked, but it breaks a lot of things. And that’s where these guys are going to have to find their line, because if they push it too far and it actually starts breaking things that require the site needs to do the fundamental functions, then they’re going to have to back off with that. And I think there will be backlash if they push it too far. So to answer that, the original question, I don’t think these are going to kill JS widget products. They may put a small damper on it on the short term as things adjust, but long-term, I feel like these things are going to continue.
Mike [06:17]: Yeah, I’m in agreement with you. I don’t think that they’re going to take over the world and get rid of every JavaScript widget product that’s out there. The fact of the matter is there’s a bunch of these products out there, and these types of products have been around easily for at least 10 years. And so like ad blockers of some kind, and this is really a variation of those, is like an ad blocker. But those have been out for at least 10 years and like I still don’t use an ad blocker. I can’t name anybody off the top of my head who does. I know that people do use them and I know that historically I’ve seen them advertised around and seen different places where they are present. But the bulk of the internet is not using them, and I just don’t think they’re going to become popular enough to make that significant of a dent in the businesses that they’re trying to I guess go against.
Rob [07:02]: So what are we talking about today for the main topic?
Mike [07:04]: Well, today’s topic is nine common A-B testing mistakes that people make. And this topic doesn’t come from any specific source, but I did use a bunch of different resources from Kissmetrics and kickSprout and ConversionXL to essentially put together this list. So some of the things are repeated on their list, but they have, I guess if you were to total up all of theirs, there’s probably 25 or so. It seemed like there was a bunch of low-hanging fruit that were highly overlapped between them. So I essentially concentrated on those for the outline for this episode. But the main gist of this is that a lot of people will advocate that you do A-B testing, but when you start getting into A-B testing, there is a ton of things that people will simply accept as true or fact. And unfortunately those things have a lot of subtle nuances to them. And I think that the main idea that will become a little bit more clear as we start going through these different mistakes that people are making.
Rob [07:54]: Let’s dive in.
Mike [07:55]: So the first one, is testing random stuff. And everyone’s probably heard about the 41 shades of blue test at Google where they were testing 41 different shades of blue to figure out which one would convert better on one of the Google buttons. And I would classify that as essentially random stuff. Because I look at something like that, it’s like, why would you even test that? What would possess Google to do that? And the fact is that they’ve got that kind of time on their hands to be able to do something like that, and they don’t know what else to do. And my question would be, is that really the best place in your sales funnel to be testing. And in their case, it may very well have, because they’ve only got a one-page website that they want people to click through on that button. So for them, it kind of makes sense. But if you’re going to start A-B testing, really figure out where you could make it the biggest impact on your sales funnel. Don’t just randomly test stuff on your website, changing buttons from one color to another. Really look at your sales funnel and try to analyze where could you make the biggest impact. So for example, if you have a 10-stage sales funnel, and you have let’s say 10 sales per month, and at the last stage of your sales funnel, you have 60 people. Well, if you can increase that last stage by just 10%, that conversion rate from the last stage to the paying customer, then you’re going to convert an extra six customers. Well, that just raises your revenue by 60%. That 10% conversion rate increase will raise your revenue by 60%. So your definitely key piece is where you can increase revenue, but knowing where to tweak those buttons is really, really important.
Rob [09:25]: Yeah, and I mean, to answer your earlier kind of ponderance of Google is I think the reason they test all the shades of blue is because they have such an enormous volume, that they can, and they can get definitive results and they can repeat them. Whereas, with someone, if you have 10,000 uniques a month or 20-30,000 uniques a month, you can’t test that many things, and it does become a waste of time. So the point you’re making here is don’t test things that aren’t going to have a really big impact on your numbers, on the numbers that matter too, right? It’s what’s going to have the biggest impact on revenue or on trial count, and look at those. Focus on those first, because that is your low-hanging fruit. And you’re not off wandering, trying to test button colors, which at some point may be worthwhile, but with our meager 30-40,000 uniques a month, it might take you quite a long time to get any type of result from that.
Mike [10:13]: Yeah, and all of that is about, just having some sort of a methodology that you’re following and knowing where you can get the most reward for the things that you’re doing or identifying whether you’re at a local maximum to do broad testing or if you really need to narrow down your focus on to those little things, like as you mentioned, the button colors. I mean, maybe that’s the best place for your time. But again, just testing random things, is not a good strategy. Like, you have to have a strategy going into this.
Rob [10:40]: And the second common A-B testing mistake is assuming that tests other people have run are going to turn out the same for your business. So it’s basically looking at a website, reading a blog post about someone who ran a test where the red button outperformed the green button, and then just changing all your buttons to red without testing. The point here is that you shouldn’t take the stuff as face-value that you read on the internet. Not that it would be fake, but it’s just not going to work for your audience. So good examples of this are, do you ask for credit card up front, or do you allow credit card without a trial, right? So that could be one thing that you might read about and you hear the rules of thumb. We talk about them here on this show for sure, and there’s other discussions. And you can take that as a default and then test it. Same thing with button colors. There’s all this debate about what button colors work well and orange and yellow are often cited as the best. So that’s kind of the design rule of thumb that I would start with a my control, and then I would test from there – even long form versus short form, landing pages or home pages, same thing. You’ll hear certain copywriters swear by long form, and then you’ll hear people run tests and have there be no difference and even have the long form perform worse. We actually had this with Micropreneur.com landing page, where we had a short form, a long form and basically a medium form. And the middle one outperformed the other two in repeated split tests, and that’s never something that I’ve heard in particular, kind of a mid-form page would work. But once we tested it, we realized that was the best result for us.
Mike [12:10]: The third common mistake people make is not running tests long enough. Your tests need to be run long enough in order for it to be statistically significant. And I’m not going to go into the math behind that, but essentially you need to be reasonably sure and confident that enough people have gone through that test that you can look at that and say, yes, this is statistically significant. I would say rule of thumb for using something like this, is if you’re not sure what that really means, then you should have at least 100 conversions coming through on the tests that you’re running. And that’s not 100 visitors, it’s 100 conversions actually going through in both directions. And then you also want to run those tests for at least one to two weeks. Don’t run a test like over the weekend or for a partial week. Running it longer is generally better. Obviously there’s exceptions to that, but you want to run it for at least one week, if not at least two weeks, so that you’re not dealing with any sort of variation where you get this influx of traffic from a particular website, and then it drops off very quickly, because those types of events that happen can radically alter the skew of the numbers and the percentages and how those number map out inside of you’re A-B test.
Rob [13:13]: Yeah, the formulas for A-B testing and getting the statistical significance are a little bit complicated. And there is a website out there, and I forget who does this, but he basically split-tested the same two pages against each other, and they will have different statistical significance, like one will outperform the other by 10 or 20% with statistical significance. And that’s just kind of a downer to think about it, that split tests are not, they’re not fool-proof basically, right? You’re not going to get the exact same response from the same page if you split the audience. It is fallible. And that’s how I kind of think about split testing, is it’s like a good guide, it’s the best we can do, but there’s always room for error, not in terms of the math itself being wrong, but just that the way the audience is split is not going to be identical. And there’s room for it not to be statistically significant, even if it looks like it should be. It’s kind of easy to trip yourself up on that, I think. The fourth common A-B testing mistake is not killing insignificant tests. What you’ll notice is that most of the tests you run are not going to show statistically significant differences. And if you have a bunch of tests running or even if you have one test and it’s running for a long time, and there’s not a significant difference, then it’s basically a distraction, right? It could negatively influence other tests. It could also be a waste of your time because you only have so many visitors and so much time to run these tests that you really want bigger wins. You don’t want marginal ones that are going to take forever to identify themselves, but still not be significant. The nice part about getting something drastically different in terms of results is those tests tend to run very quickly, and those are the kind of wins you want to go after. So if you start a test and things are close, I tend to shut them down pretty early and try to do something more dramatic to get a more dramatic result.
Mike [15:02]: The flip side of that is that you don’t want to ignore small gains if they are significant. So if you’re able to get a small statistically significant gain over the course of two weeks or four weeks or something like that, there’s no reason to not implement it. Because if you can get a 1% day over day gain, then over a course of a year, that’s a 37x yearly gain. So there’s clearly benefits to doing that, but only if it actually is statistically significant. The fifth mistake is to not consider the needs of your prospects. So when you’re looking to determine which tests you’re going to run, think about how this test is going to impact your prospects. When they’re looking at a website that you’re changing, what information are you taking away or adding to the site, and specifically why are you doing that? How is it going to help them or how is it going to benefit them? Is it going to move them closer to a decision faster or is it going to do so at the expense of making them feel tricked down the road and experiencing some sort of buyer’s remorse. So that’s something you have to be a little bit careful of. And you could recognize that if you have higher conversions, but also a higher churn down the road. And unfortunately you’re not going to see that churn until they get past the point of paying you, and they’re going to ask for refunds or cancel your service. So you will see those a little bit further down the road, and you may have to back pedal on some of your tests, especially if it showed that significant increase and you implemented it as kind of a permanent change, and then down the road you see that churn. You’re going to have to back that out. But those are the types of things that you need to watch for and be mindful of what it is that the person who is looking at the website are seeing.
Rob [16:33]: Our sixth mistake is trying to move too fast. What we mean by that is testing too many things at the same time on the same page is really hard. This is multi-variant testing, you need a lot of traffic to do this. And you kind of need even more of a plan because of the complexity you run into. Same thing with running too many simultaneous tests, right? If you’re testing headlines on the homepage and then something on the tour page and then something else on the pricing page, these things can interact with each other. And it’s easy to make a mistake and either misattribute something or, like you said with the fifth mistake, it’s easy to make a change and then not realize there’s a kind of a downward element to it that doesn’t show up for two or three months. So if you’re just starting out or you’re really not an expert in split-testing, this is always something I say, tell people to do, is not to jump into multi-variant testing, but to kind of run one test at a time until you feel comfortable with it, and you feel like you’re able to make progress and interpret the results before you jump in to try to do any type of multi-variant stuff.
Mike [17:30]: The seventh mistake is to not use any widely accepted A-B testing tools. And there are a lot of A-B testing frameworks and libraries out there. Some people will decide that they’re going to build their own. I would question whether or not that’s a good use of your time, because there are so many good tools out there. In terms of the A-B testing frameworks and libraries themselves, I would be somewhat cautious about using something that was an open source library or something that was freely available, just in case the math behind it doesn’t work. And I did run into a case where I started using a library and then realized after the fact that some of my results didn’t seem to quite be matching up with what I would have expected, and I started digging. And I found out that the way they had implemented the A-B testing framework itself, was actually wrong. It wasn’t a real A-B test, it was actually kind of, I forget the specifics of it, but it literally did not work right. So you do have to be careful when you start going out and doing those things. The common wisdom is to basically do an A-A test where you’re testing the same thing against itself to be sure that the tool you’re using is a valid tool to be using. But there’s a lot of different things that you can use like Google Analytics, Optimizely, Visual Website Optimizer, Unbounce, all these types of things make your life easier when you’re trying to do A-B tests, and make sure you’re getting statistically significant results.
Rob [18:48]: Our eighth mistake for A-B testing is that your tests should have a specific hypothesis. In other words, don’t run a test just to “see what happens.” Run it with a desired result in mind, typically this is to increase the number of people who click to the next page, or it’s to increase the number of trials that you get out of this, or increasing engagement with a particular page. The problem with running one just to see what happens is it’s hard to learn anything unless you actually have a goal in mind. So you might see the result, but you aren’t able to match it against your beliefs about what is actually going on. And frankly, if you’re just testing to see what happens, your time is probably better spent elsewhere rewriting copy on another page or just doing some other type of marketing activity.
Mike [19:30]: Yeah, I think the basic idea behind this is to really kind of challenge what your own beliefs are. To make sure that, one, you’re on the right track, and, two, that if you’re not on the right track, that you can be corrected. As Rob said, if you’re just running the tests to see what happens, the reason you’re not learning anything is because there’s no opportunity for you to be wrong. And that’s really what you’re trying to do is try to find those places where you believe something to be true and you can either verifiably prove it, or verifiably disprove it. If you just do it to see what happens, there is no opportunity for that to happen. And the last A-B test mistake that people make is to ignore the potential for skewed or broken tests. Don’t ever assume that all your data is correct. There’s plenty of opportunities for either bad tooling or things to be blocked sometimes within the communication framework. There could be seasonal shifts. So for example, if you run a test in the middle of December, for example, then there’s probably a very high likelihood that that test is going to be dramatically impacted by the amount of online shopping that’s going on. There’s definitely seasonal things that can happen throughout the year. Over at the summer for example, people are searching for different things than they are in the winter. You can also run into issues where if you’re sending out emails to your email list, then those subscribers are going to have a bit of familiarity of who you are and what your website looks like. So those people are going to have different conversion rates than a new visitor to your website. And then of course, you have to also deal with the fact that there’s sometimes parts of your website that are just going to be broken within a web browser. And you may or may not know that. So those are the types of things that you need to be at least mindful of and realize that A-B testing itself, it does generally work mathematically, but there’s always those things that you have to be careful of because it’s not a foolproof mechanism. It’s a tool, and like any other tool, it can be broken in certain ways. So you have to be careful that you just don’t accept everything as fact, and dig a little bit to make sure that nothing’s going wrong. So to do a quick recap, the nine common A-B testing mistakes are: 1) Testing just randomly instead of having a plan. 2) Assuming that other tests are going to be valid for your business. 3) Not running tests long enough. 4) Not killing insignificant tests quickly enough. 5) Not considering the needs of your prospects. 6) Trying to go too fast too quickly. 7) Not using A-B testing tools. 8) Not having a specific hypothesis in mind when doing A-B tests. 9) Ignore the potential for skewed or broken tests.
Rob [22:02]: If you have a question for us, call our voicemail number at: (888) 801-9690, or email us at: questions@startupsfortherestofus.com. Our theme music is an excerpt from “We’re Outta Control” by MoOT used under Creative Commons. Subscribe to us on iTunes by searching for “startups.” And visit: startupsfortherestofus.com for a full transcript of each episode. Thanks for listening, we’ll see you next time.
ruben
I recently subscribed to the Startups for the Rest of Us podcast and I must say, there is a world of knowledge that I’ve gained. I’m curious to know if these same principles apply to mobile web experiences.
cheers!
Peep Laja
Guys, you can’t say stuff like “you should have at least 100 conversions”. That’s wrong. This is science, not magic. Forget magic numbers.
Here’s when to stop a test:
http://conversionxl.com/stopping-ab-tests-how-many-conversions-do-i-need/