How to decide the sample size of A/B testing using Python?

Deciding the sample size for running A/B testing is an essential step. In this post, we take an example from the Udecity course - Overview of A/B Testing. 

We are interested in changing the color of "Start Now" button on an Udecity-like website to see the effect of click-through probability, which is measured by 

# of users clicked / # of users visited

Based on 1000 users visited, we found that 100 users clicked. This gives us 10% of the click-through probability. We are also interested in 
  • significant level (usually referred as $\alpha$) of 5% (0.05)
  • practical significance level of 2%, i.e., minimum effect that we care about
  • power/sensitivity of 80% (fairly standard)

First, we use an online calculator that has been introduced in the course: https://www.evanmiller.org/ab-testing/sample-size.html. The terminologies used by this calculator and the corresponding ones that we mentioned above are:
  • base conversion rate: click-through probability. This is estimated click-through probability before making the change
  • minimum detectable effect: practical significance level, and we care about absolute difference.
  • statistical power: power/sensitivity
  • significant level: significant level
And the result of the calculator is 3,623.




Here we use the same equation in Python to derive the same result mentioned above.





  
## Calculate required sample size
def calc_sample_size(alpha, power, p, pct_mde, absolute=True):
    """ Based on https://www.evanmiller.org/ab-testing/sample-size.html

    Args:
        alpha (float): How often are you willing to accept a Type I error (false positive)?
        power (float): How often do you want to correctly detect a true positive (1-beta)?
        p (float): Base conversion rate
        pct_mde (float): Minimum detectable effect, relative to base conversion rate.

    """
    if absolute:
        delta = pct_mde
    else:
        delta = p*pct_mde
    t_alpha2 = norm.ppf(1.0-alpha/2)
    t_beta = norm.ppf(power)

    sd1 = np.sqrt(2 * p * (1.0 - p))
    sd2 = np.sqrt(p * (1.0 - p) + (p + delta) * (1.0 - p - delta))

    return int(np.ceil((t_alpha2 * sd1 + t_beta * sd2) * (t_alpha2 * sd1 + t_beta * sd2) / (delta * delta)))

print(calc_sample_size(alpha=0.05, power=0.8, p=0.1, pct_mde=0.02))

Output:
3623
As we can see, the Python method produces the same result as the online calculator.