Jun 17 2007
Comparing Population Proportions – A/B Testing
Comparing Population Proportions – A/B Testing
Many metrics in web analytics are conveyed as percentages, or population proportions as statisticians like to call it. As I mentioned in my previous post on the Statistics for People Who (Think They) Hate Statistics, percentages are useful in the real world (business data) and I was surprised there was not a section dedicated to this topic. So I thought I would cover a post on comparing population proportions; namely conversion rate for landing pages.
Landing page optimization is one aspect of testing in web analytics. It’s great. You can test almost anything – layout, content, color, tag line, call to action, media, etc… In my scenario we were testing the tag line. Since we are only testing one aspect of the page you can refer to this testing methodology as A/B testing. This is very much different than multivariate testing where numerous “variables” or parts of the page are tested at once and I’ll leave that for another conversation. So for now, we tested one variable – the tag line. The call to action that was defined for a measure of success was a submission of an online lead generation form. Since the form was small in nature, the form was a portion of the landing page and there was no interim steps/pages that may have increased conversion failure.
In testing, you must first define your hypothesis. The hypothesis in this case is that landing page #1 out performed landing page #2. In metrics terms, we are saying that the conversion rate for landing page #1 was better than landing page #2 (with a statistical significance).
Null Hypothesis H0: p1 = p2 (or can be written as p1 - p2 = 0)
“conversion was not different”
Alternative Hypothesis Ha: p1 ≠ p2 (or can be written as (p1 - p2 ≠ 0)
“conversion is different”
Alpha = .05
The delivery of the pages were equally distributed among both pages but there were slight differences and that difference will be included in our calculation.
Landing page #1
Delivered 6,906 times
Conversion yield = 1.71%
Landing page #2
Delivered 6,534 times
Conversion yield = 1.44%
Some might just stop here and say, landing page #1 out performed landing page #2 and move on. But is that really a valid inference? Let’s see.
To test two population proportions you use the following equation:
The p with the ^ on top is referred to as “p-hat”. “P-hat” is the sample population proportion (the %’s from your data) and is used to estimate the true population proportion.
All the calculations from the above formula can be easily done in excel and can be seen in a sample file here.
The calculated z-score is 1.2563. In excel you can calculate the p-value by utilizing the NORMSDIST() formula. You can determine the critical region or sometimes referred to the “rejection region” for the null hypothesis just by the z-score but from an interpretation standpoint, it’s easier to compare the p-value to the previously defined alpha. Calculating the p-value will help understand whether the difference between the two percentages are statistically different. The p-value is just 1-NORMSDIST(z-score).
= 2*[1 - NORMSDIST(1.2563)]
= 2*[0.105]
= .209
Now, our alpha value was set at .05 per our testing criteria listed previously. Since our p-value > alpha; 0.209 > .05, we Fail to Reject the Null Hypothesis. What does that mean? It means that the difference between the two conversion rates are not statistically significant. Thus technically, even though the conversion rate for landing page #1 was higher than landing page #2 there really wasn’t much of a difference to warrant one having a “better” tag line.
Until next time… safe analyzing.

[…] are statistically significant. You can read more about how to do this on my previous post about A/B testing. You can find a downloadable excel file in this post that you can toggle various sample sizes and […]
Hi,
I was going thru your article, really like it. I think many people will find it useful
Just a short question, you use 1 - NORMSDIST(1.2563) and are therefore using 1 tailed test, but in fact it seems that you want to test whether p1 ≠ p2 which would be 2 tailed? i.e. p1 could be higher than p2 but it could also be lower…
If that’s the case you might consider using 2*NORMSDIST(-ABS(1.2563)) instead?
Thank you
Thierry
Hi Thierry, You are absolutely correct. You can achieve the same calculation as you have mentioned above by just simply doubling 1-NORMSDIST() calculation. Thus 2*[1-NORMSDIST()]. I will make and edit to the above for better clarification. Thanks.
Wendi,
what happens when you’ve got three tests running: a control and two variations. I can guess what to do with the denominator in your z1 formula, but not the numerator