The Ultimate Guide to A/B Testing Statistics

Make Sense of Your A/B Test Results.

A/B testing isn't just about launching two versions; it's about knowing with confidence which one is truly better. Statistical tests are the tools that help you separate random chance from a real improvement. This guide will help you choose the right test for your data.

Which Test Should I Use?

Answer the questions below to find the best statistical test for your experiment.

1. What kind of metric are you measuring?

Statistical Test Details

Z-Test for Proportions

When to use: When you're comparing two proportions, like the conversion rates of two different landing pages. This is one of the most common tests in A/B testing.

Assumptions: You need a large enough sample size (typically >30 in each group) and the samples should be independent.

Interactive Example: Conversion Rate

Visitors A

Conversions A

Visitors B

Conversions B

Student's t-Test

When to use: To compare the means of two groups for a continuous metric, like average revenue per user or average session duration. Useful for smaller sample sizes where the population variance is unknown.

Assumptions: Data is normally distributed, samples are independent, and variances between the two groups are approximately equal.

Interactive Example: Average Session Time (minutes)

Mean A

Std. Dev A

Mean B

Std. Dev B

Sample Size (each group)

Chi-Squared Test

When to use: To compare the distribution of categorical data between two or more groups. For example, testing if the proportion of users who choose option A, B, or C on a form is different between two website versions.

Assumptions: Data is categorical, observations are independent, and the expected frequency in each cell of the contingency table is at least 5.

Interactive Example: Feature Preference

Version A

Version B

Feature 1

Feature 2

Feature 3

ANOVA (Analysis of Variance)

When to use: When comparing the means of three or more groups for a continuous metric. It's like a t-Test for more than two variations (A/B/n testing). ANOVA tells you if there's a significant difference *somewhere* among the groups, but not which specific group is different.

Assumptions: Similar to a t-Test: data is normally distributed, samples are independent, and variances are equal across all groups.

Interactive Example: Avg. Purchase Value

Mean A

Mean B

Mean C

For simplicity, this example assumes equal sample sizes and variance. The p-value is a conceptual estimate.

Key Concepts Glossary

P-value

The p-value represents the probability of observing your data (or something more extreme) if the null hypothesis were true. A small p-value (typically < 0.05) suggests that your observed result is unlikely to be due to random chance alone, allowing you to reject the null hypothesis.

Null Hypothesis (H₀)

The null hypothesis is the default assumption that there is no difference between the groups you are testing. For example, "Version A's conversion rate is the same as Version B's." The goal of a statistical test is to see if you have enough evidence to reject this assumption.

Alternative Hypothesis (H₁)

The alternative hypothesis is what you are trying to prove. It states that there *is* a difference between the groups. For example, "Version B's conversion rate is different from Version A's."

Statistical Significance

A result is statistically significant if the p-value is below a predetermined threshold (the significance level, alpha, usually 0.05). It means the observed difference is unlikely to be a fluke of random sampling. It does NOT automatically mean the difference is large or practically important.