Meaningful Testing
I love to test. I enjoy the challenge of developing new ideas that aim to lift response, increase the average gift, and add more donors to the constituent file. In fact, I wish I could do it more often. But the truth is sometimes it can be difficult to set up a test that will result in meaningful insights.
What makes a meaningful test?
In a perfect world, a meaningful test is one that generates a statistically significant result. Statistical significance is a calculation that tells us if the outcome of a test (i.e., a better response rate or larger average gift) is repeatable and unlikely to be caused by chance.
To help ensure that your next test will generate a significant result, you need to be mindful of two factors: mail volume and expected outcome.
Mail Volume Matters
When developing a test, you must first create a hypothesis about what your outcome may be. If you are conducting an A/B split in a mailing of 50,000 pieces of acquisition, you might — in this example —hypothesize that the test will work to lift response by 10%.
If we assume that that control cell will generate a 1% response rate, then this would result in a total of 250 gifts. If your test works to produce your desired 10% lift, then this would result in 275 gifts. Unfortunately, in this instance, the difference of only 25 gifts between the control and the test is not large enough to generate a statistically significant result.
However, if you replay this same scenario but assign 100,000 pieces to each acquisition test cell, a 10% lift for the test translates into 100 more total gifts than the control. Statistics tell us that this 100-gift difference is large enough to generate a significant result. We can now have confidence in this outcome being repeated in the future.
The key difference between these two examples is the mail volume. For large sample sizes, the difference between the control and test outcome does not need to be as great to mathematically prove that the result was not a coincidence.
Your Expected Outcome Matters Just as Much
Based on the size of your organization, and on the scenario above, you might think that your mail volume may be too small ever to generate a significant result. However, that’s not true.
Let’s take our original 50,000-piece scenario mentioned earlier. If we predict a 20% lift instead of a 10% lift in response, then our result suddenly becomes statistically significant. With the control generating 250 gifts and our test now generating 300 gifts, the difference in performance between the two groups is much larger, and therefore, we can have confidence that this outcome is not happenstance.
The bottom line is that if you expect a test to make a considerable impact (increasing response by a large percentage, for example), then you can get by with smaller mail volumes to generate a significant result.
Free Online Tools Can Set You Up for Success
Not every test will result in statistical significance — and that’s okay. There are still insights to be garnered from those efforts to guide your way forward. Thankfully, several free and easy-to-use online tools can help predict whether your next test could be statistically significant. You can find a resource that works best for you by using search terms like statistical significance calculator.
By using these tools — and keeping mail volume and expected outcome in mind — you’ll be prepared to avoid common pitfalls, such as testing for the sake of testing rather than testing to gain actionable insights that make a difference.