T
THINK Marketing

The hitchhiker’s guide to email testing

By , April 27, 2017
post_thumb

So you’ve mastered the A/B subject line test. You know exactly what character count gets you a 0.0001% uplift in opens. So, what now?

Every day, I am grateful for the Internet, the ability to search for anything and get a clear answer. Well, almost clear. The point I am trying to make is there is more to testing than just subject lines or the colours of the call to actions in an email. Getting inspiration for your testing is, on the other hand, a sometimes difficult task.

Email testing best practices

First things first; there are some best practices for email testing you must review if you want the good results to continue.

  1. Create a hypothesis: Saying I will test subject line A versus B teaches us nothing. There has to be a reason for the test.
  2. Thou shall not test for testing sake: A continuation of the above, but I felt it pertinent to get its own line.
  3. Make sure it’s significant: Not just significant, but statistically significant.
  4. Give it time: Compare apples to apples in terms of time frames. Give some tests more time and others less but make sure there is a reason for the difference.
  5. Test one variable per KPI: More on this later as it’s a big one.

With that done we can go ahead and address the points in a little big more detail with some examples.

1. Create a hypothesis for your email test

I used to have a boss who kept harping on about the hypothesis and whilst he and I didn’t really see eye-to-eye he did have a point drilling that into my brain and I am thankful for it.

We drink water or wine because we are thirsty, we eat delicious bacon or barbeque because we are hungry; it’s the human condition! Why then test without a reason.

Having ‘Because the boss asked’ as a reason is not acceptable. A common theme over all five rules is that everything has a reason for getting done. So often, I hear companies say, ‘We are testing…. Subject lines’ and when I ask what they have learnt, all I get in return is that sometimes a stock subject wins over a personalised one or vice versa. There is a reason for this; no thought was put behind the test itself.

So to keep things simple, let’s figure out a good hypothesis for a subject line test.

Hypothesis – If I have a promotion and use a percentage discount over a monetary value I will get a higher open rate.

So many times have I seen magic numbers creep up (more on this in point 2) just from this basic test. For a brand selling a lot of discounted items, a monetary value can offer a false view of value to customers, making them think the discount (thus value) is larger than the 10-15% they have gotten in the last 6 emails. This might increase the open rates dramatically, if not just keep testing with this concept in mind.

Same goes for content:

Hypothesis – Our email clicks are position led, meaning links in the top will always get more opens than those in the bottom.

This example forces us to use a part of point 4, comparing apples to apples. Use the same content of the same size at the bottom (Version B) and the top (Version A) and if the total amount of clicks on the content in Version B is far less than A, then you know the hypothesis is true and you should position important links at the top. However if proven false, this offers you a chance to get people to look at other content before they get to the good stuff at the bottom. Think about it.

2. Thou shall not test for testing sake!

This is a pet hate of mine, companies who test every week to make the most of their time and valuable audience, but for what? Testing personalised versus static subject lines every week on a 10%/10% basis and then sending the email to the winner is a HORRID idea. Are you really seeing such a large difference and if so which one does it lean to on average?

If your personalised subject lines are getting the higher open consistently then just send personalised subject lines to everyone from the get go. Chances are, you are not living the number 4 rule in any case.

Adding a button here or there or changing the branding is a waste of time. Hopefully number 1 stops this habit, but think about it. Test when you need to, test when you can, but make sure it’s for an outcome and something you can use in the future.

If you tested yellow buttons a couple of weeks ago and the results were statistically significant (and its on brand) then use yellow buttons and retest this theory again in 6 months to see if the audience appetite has changed. Doing it again two weeks later is just a waste of your time.

3. Make sure the test is significant

Test, please dear Lord Test, but if you fail at the test then don’t inflate how influential the results were.

A result is only significant if the test group’s results will make a real world difference to the numbers when applied to the entire database.

To better explain this, let’s use a little example.

My database is 100,000 people, my average numbers are:

20% open rate (20,000 opens)

5% click rate (5,000 clicks)

0.2% conversion rate (200 conversions)

On a test group of 10% (10,000 people) I would need to change my numbers somewhat to equate to big results when replicated to the rest of the database.

If I increase my conversion rate to 0.3% on the sample, that would equate to 10 extra conversions, 100 over the total database and on an example average order value of £36 that means an extra £3600 per email which could become (based on 3 emails per week and the wind behind our backs) in excess of £500,000 additional revenue per year.

On the other side, increasing the conversion rate to 0.21% means 1 extra conversion per email, 10 over the total database and £5000+ per year. Yeah it’s something, but it might not be worth the extra effort, time, and resource.

4. Give it time

Just like we mentioned in point 2, compare apples to apples but make sure you shouldn’t maybe be using oranges versus oranges. This goes back to experiences in my marketing life where I have heard the words, ‘this automated program just isn’t working’ too many times. Not after 6 months but after a week.

Make sure the time for testing fits the purpose. If you are testing for opens/clicks, conversion, or to change behaviour, do not let things creep in which could eschew the results. Running a welcome program to get people to go from their first to their second purchase sooner will offer unclear and sometimes erroneous results if you dump a sale into the middle of the time frame.

I also say give it time as, testing one thing from week to week is great, but not if you pull reports for last week’s email a week later and compare it to a test you started doing a day earlier. My rule of thumb is to check when the majority would have completed an action.

If you want to test opens, then we would want to see when 50% or more of your openers have done the task. In essence, if I email 100,000 people and have a 20% open rate (20,000 people) and 10,000 of those open in the first 24 hours, then use that as the measuring stick for opens. The same goes for clicks.

Each week, set up your open tests and check the total opens after a 24 hour period after the send. Boom, real results!

For automations, make sure a significant amount has gone through the programs before you nullify the test or announce the great success. If you are looking to change behaviour – i.e. a re-purchase program and want to sneak in an extra purchase on a yearly basis – such a test would see results much later and you just need to grin and bear it whilst it’s running.

5. Test one variable per KPI

Last but not least is the old multivariate warning. This is email…not web!

Email is a soft, delicate beast which you cannot poke or prod too much, otherwise it fouls the carpet. This means, test many things at once but make sure you are testing one thing at a time for opens, clicks, and conversion.

Subject lines and Pre-headers are used to test opens. Call to action links, images and position are used to test clicks and conversions.

This means you can test for Opens and Clicks in the same email but make sure you test apples versus apples.

– Subject Lines A and B

– Content X and Z

You cannot run A and X against B and Z alone as B might influence Z so you would need to double up on testing. Thus testing would look like this:

  1. A and X
  2. B and X
  3. A and Z
  4. B and Z

This ensures you a view on if B did consistently better than A or not. Sometimes it’s a mess, other times it shows you the way!

If you have some other amazing testing examples, theories or comments please let me know. Love to chat about it! Tweet me @justissaayman.

Please note that DISQUS operates this forum. When you sign in to comment, IBM will provide your email, first name and last name to DISQUS. That information, along with your comments, will be governed by DISQUS’ privacy policy. By commenting, you are accepting the IBM commenting guidelines and the DISQUS terms of service.