The p_values in the text area may be slightly inconsistent. This is because many simulaion runs were done after the text was written. However, there are no signfiicant changes in the p_values due to these rerunning of the simulations.

Import required libraries.

In [1]:

```
import pandas as pd
import numpy as np
import random
from scipy import stats
stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('darkgrid')
random.seed(42)
```

`1.`

Read in the dataset and take a look at the top few rows.

In [2]:

```
df = pd.read_csv('ab_data.csv')
df.head()
```

Out[2]:

b. Use the below cell to find the number of rows in the dataset.

In [3]:

```
nof_rows = df.shape[0]
nof_rows
```

Out[3]:

c. The number of unique users in the dataset.

In [4]:

```
df.user_id.nunique()
```

Out[4]:

d. The proportion of users converted.

In [5]:

```
df.converted.mean()
```

Out[5]:

e. The number of times the `new_page`

and `treatment`

don't line up.

In [6]:

```
df.groupby(['group', 'landing_page']).count()
```

Out[6]:

In [7]:

```
dont_line_up = 1928 + 1965
dont_line_up
```

Out[7]:

f. Do any of the rows have missing values?

In [8]:

```
df.isnull().sum()
```

Out[8]:

For the rows where **treatment** is not aligned with **new_page** or **control** is not aligned with **old_page**, we cannot be sure if this row truly received the new or old page. Store new dataframe in **df2**.

In [9]:

```
df2 = df[df[['group', 'landing_page']].apply(lambda x: x[0] == 'control' and x[1] == 'old_page' or x[0] == 'treatment' and x[1] == 'new_page', axis=1)]
df2.head(10)
```

Out[9]:

In [10]:

```
# Double Check all of the incorrect rows were removed - this should be 0
df2[((df2['group'] == 'treatment') == (df2['landing_page'] == 'new_page')) == False].shape[0]
```

Out[10]:

In [11]:

```
df2.shape
```

Out[11]:

How many unique **user_id**s are in **df2**?

In [12]:

```
df2['user_id'].nunique()
```

Out[12]:

There is one **user_id** repeated in **df2**.

In [13]:

```
df2[df2['user_id'].duplicated()].count()
```

Out[13]:

In [14]:

```
doubles = df2[df2.user_id.duplicated()|df2.user_id.duplicated()]
```

What is the row information for the repeat **user_id**?

In [15]:

```
doubles
```

Out[15]:

In [16]:

```
df2.shape
```

Out[16]:

d. Remove **one** of the rows with a duplicate **user_id**, but keep your dataframe as **df2**.

In [17]:

```
df2 = df2.drop_duplicates(subset='user_id', keep='first')
```

In [18]:

```
df2[df2['user_id'].duplicated()].count()
```

Out[18]:

What is the probability of an individual converting regardless of the page they receive?

In [19]:

```
df2['converted'].mean()
```

Out[19]:

In [20]:

```
df2.head()
```

Out[20]:

Given that an individual was in the `control`

group, what is the probability they converted?

In [21]:

```
ccr = df2.query('group == "control"')['converted'].mean()
ccr
```

Out[21]:

Given that an individual was in the `treatment`

group, what is the probability they converted?

In [22]:

```
tcr = df2.query('group == "treatment"')['converted'].mean()
tcr
```

Out[22]:

In [23]:

```
obs_diff = tcr - ccr
obs_diff
```

Out[23]:

What is the probability that an individual received the new page?

In [24]:

```
df2_groupby = df2.groupby('landing_page').count()
df2_groupby
```

Out[24]:

In [25]:

```
new_page_prob = df2_groupby.user_id.iloc[0]/(df2_groupby.user_id.iloc[0] + df2_groupby.user_id.iloc[1])
new_page_prob
```

Out[25]:

e. Use the results in the previous two portions of this question to suggest if you think there is evidence that one page leads to more conversions? Write your response below.

**Your answer goes here.**

- The overall conversion rate is 0.11965919355605512 BEFORE you clean up the data.
- AFTER you clean up the data the overall conversion rate is 0.11959708724499628
- conversion in the control group with Old Page is 0.1203863045004612.
- conversion in the treatment group with New Page is 0.11880806551510564

This is all very close. The new page MAY or MAY NOT resulte in a different conversion rate. Further study needs to be done. HOWEVER, it does not look promising for the new page so far.

Notice that because of the time stamp associated with each event, you could technically run a hypothesis test continuously as each observation was observed.

However, then the hard question is do you stop as soon as one page is considered significantly better than another or does it need to happen consistently for a certain amount of time? How long do you run to render a decision that neither page is better than another?

These questions are the difficult parts associated with A/B tests in general.

`1.`

For now, consider you need to make the decision just based on all the data provided. If you want to assume that the old page is better unless the new page proves to be definitely better at a Type I error rate of 5%, what should your null and alternative hypotheses be? You can state your hypothesis in terms of words or in terms of **$p_{old}$** and **$p_{new}$**, which are the converted rates for the old and new pages.

**Put your answer here.**
$$H_0: p_{new} - p_{old} \leq 0$$

`2.`

Assume under the null hypothesis, $p_{new}$ and $p_{old}$ both have "true" success rates equal to the **converted** success rate regardless of page - that is $p_{new}$ and $p_{old}$ are equal. Furthermore, assume they are equal to the **converted** rate in **ab_data.csv** regardless of the page.

Use a sample size for each page equal to the ones in **ab_data.csv**.

Perform the sampling distribution for the difference in **converted** between the two pages over 10,000 iterations of calculating an estimate from the null.

Use the cells below to provide the necessary parts of this simulation. If this doesn't make complete sense right now, don't worry - you are going to work through the problems below to complete this problem. You can use **Quiz 5** in the classroom to make sure you are on the right track.

In [26]:

```
# Exercise states "Assume under the null hypothesis, p_new and p_old both have
# "true" success rates equal to the converted success rate regardless of page -
# that is p_new and p_old are equal.
proportion = df.converted.mean()
proportion
```

Out[26]:

In [27]:

```
df2.head()
```

Out[27]:

In [28]:

```
treatment_df = df2.query('group == "treatment"')
treatment_cr = treatment_df.converted.mean()
treatment_cr
```

Out[28]:

In [29]:

```
control_df = df2.query('group == "control"')
control_cr = control_df.converted.mean()
control_cr
```

Out[29]:

In [30]:

```
difference = treatment_cr - control_cr
difference
```

Out[30]:

a. What is the **convert rate** for $p_{new}$ under the null?

In [31]:

```
p_new = df2['converted'].mean() # Same as Part 1 4a
p_new
```

Out[31]:

b. What is the **convert rate** for $p_{old}$ under the null?

In [32]:

```
p_old = df2['converted'].mean() # Same as Part 1 4a
p_old
```

Out[32]:

c. What is $n_{new}$?

In [33]:

```
n_new = treatment_df.shape[0]
n_new
```

Out[33]:

d. What is $n_{old}$?

In [34]:

```
n_old = control_df.shape[0]
n_old
```

Out[34]:

e. Simulate $n_{new}$ transactions with a convert rate of $p_{new}$ under the null. Store these $n_{new}$ 1's and 0's in **new_page_converted**.

In [35]:

```
new_page_converted = np.random.binomial(1, p_new, n_new)
tsm = new_page_converted.mean()
tsm
```

Out[35]:

f. Simulate $n_{old}$ transactions with a convert rate of $p_{old}$ under the null. Store these $n_{old}$ 1's and 0's in **old_page_converted**.

In [36]:

```
old_page_converted = np.random.binomial(1, p_old, n_old)
csm = old_page_converted.mean()
csm
```

Out[36]:

g. Find $p_{new}$ - $p_{old}$ for your simulated values from part (e) and (f).

In [37]:

```
# compute simulated difference in conversion rate
sim_diff = tsm - csm
# display simulation difference
sim_diff
```

Out[37]:

h. Simulate 10,000 $p_{new}$ - $p_{old}$ values using this same process similarly to the one you calculated in parts **a. through g.** above. Store all 10,000 values in **p_diffs**.

In [38]:

```
# The numpy way to simulate the above. Does not require a for loop.
new_converted_simulation = np.random.binomial(n_new, p_new, 10000)/n_new
old_converted_simulation = np.random.binomial(n_old, p_old, 10000)/n_old
p_diffs = new_converted_simulation - old_converted_simulation
```

In [98]:

```
p_diffs = np.array(p_diffs)
pdm = (p_diffs > obs_diff).mean()
pdm
```

Out[98]:

In [40]:

```
p_diffs.std()
```

Out[40]:

i. Plot a histogram of the **p_diffs**. Does this plot look like what you expected? Use the matching problem in the classroom to assure you fully understand what was computed here.

In [41]:

```
plt.hist(p_diffs);
```

This is what I expected. A very symmetrical normal looking distribution!

j. What proportion of the **p_diffs** are greater than the actual difference observed in **ab_data.csv**?

In [42]:

```
# Same as Part I 4c minus Part I 4b.
ccr = df2.query('group == "control"')['converted'].mean()
tcr = df2.query('group == "treatment"')['converted'].mean()
obs_diff = tcr - ccr
obs_diff
```

Out[42]:

In [43]:

```
# compute p value
pvalue = (p_diffs > obs_diff).mean()
pvalue
```

Out[43]:

In [44]:

```
# plot line for observed statistic
plt.hist(p_diffs, alpha=.5)
plt.axvline(x=obs_diff, color='red');
```

k. In words, explain what you just computed in part **j.**. What is this value called in scientific studies? What does this value mean in terms of whether or not there is a difference between the new and old pages?

**Put your answer here.**

- In part j we calculated what proportion of the p_diffs array (simulated difference in means) is greater than the observed difference. That is the actual difference from the dataset between the control group conversion rate and the treatment group conversion rate. This is NOT the simulated difference calculated above (sim_diff). The p_value is 0.9038.
- In scientific studies the definition of the p-value is and this is from Investopedia https://www.investopedia.com/terms/p/p-value.asp). "The p-value is the level of marginal significance within a statistical hypothesis test representing the probability of the occurrence of a given event. The p-value is used as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected."
- We computed a p_value of .9038 which means that 90.38% of the differences can be attributed to pure chance. 90.38% of Part II 2h's p_diffs (difference in means) is higher than the observed difference we see in our dataset. This was only a simulation. Not the actual data. The observed difference in the data is only higher than ~ 9% of the simulated difference assuming both pages conversion rates are the same. This is well above the .05 null hypothesis threshold. This is not a significant result. We would accept the null hypothesis.

l. We could also use a built-in to achieve similar results. Though using the built-in might be easier to code, the above portions are a walkthrough of the ideas that are critical to correctly thinking about statistical significance. Fill in the below to calculate the number of conversions for each page, as well as the number of individuals who received each page. Let `n_old`

and `n_new`

refer the the number of rows associated with the old page and new pages, respectively.

In [45]:

```
df2.head()
```

Out[45]:

In [46]:

```
import statsmodels.api as sm
convert_old = df2.query('group == "control" and converted == 1').shape[0]
convert_new = df2.query('group == "treatment" and converted == 1').shape[0]
n_old = control_df.shape[0]
n_new = treatment_df.shape[0]
print(convert_old, convert_new, n_old, n_new)
```

m. Now use `stats.proportions_ztest`

to compute your test statistic and p-value. Here is a helpful link on using the built in.

In [47]:

```
z_score, p_value = sm.stats.proportions_ztest([convert_old, convert_new], [n_old, n_new], alternative='smaller')
z_score, p_value
```

Out[47]:

n. What do the z-score and p-value you computed in the previous question mean for the conversion rates of the old and new pages? Do they agree with the findings in parts **j.** and **k.**?

**Put your answer here.**

- A z_score is the number of standard deviations away from the mean that the element is. In this case the element is the mean of the conversion events. With a z_score of 1.31 this means that this value lies between the 1st and 2nd standard deviations. A standard deviation of 1 contains ~68% of all elements. A standard deviation of 2 contains ~ 95% of all elements. This z_score puts the p_value comfortably in the distribution.
- .9051 > .05. With a p_value of .9051 we accept the null hypothesis. That is the old_page is the same or better than the new_page.
- Yes, p_values are virtually identical. In the ztest they are .9051 and with the simulation the p_value was .9038. This ztest does agree with the findings in Part II j and k.

`1.`

In this final part, you will see that the result you acheived in the previous A/B test can also be acheived by performing regression.

a. Since each row is either a conversion or no conversion, what type of regression should you be performing in this case?

**Put your answer here.**

We would use a logistic regression model. The reason why we would use this versus Ordinairy Least Squares (OLS/Chi-square goodness of fit) is that we have a categorical dependent variable (conversion rate). We will use statsmodels Logit method.

b. The goal is to use **statsmodels** to fit the regression model you specified in part **a.** to see if there is a significant difference in conversion based on which page a customer receives. However, you first need to create a column for the intercept, and create a dummy variable column for which page each user received. Add an **intercept** column, as well as an **ab_page** column, which is 1 when an individual receives the **treatment** and 0 if **control**.

In [48]:

```
# Painful to rerun simulation if I screw up df2. Make a copy. Also avoids hidden
# memory change (or not change) issues.
df2c = df2.copy()
df2c.head(2)
```

Out[48]:

In [49]:

```
df2c[['group_1', 'group_2' ]] = pd.get_dummies(df2c.group)
df2c[['lp1', 'lp2' ]] = pd.get_dummies(df2c.landing_page)
df2c = df2c.rename(columns={'group_2': 'ab_page', 'lp1': 'treatment'})
df2c = df2c.drop(['timestamp','group', 'landing_page', 'lp2', 'group_1', 'treatment'], axis=1)
```

In [50]:

```
df2c.head()
```

Out[50]:

In [51]:

```
df2c['intercept'] = 1
```

c. Use **statsmodels** to import your regression model. Instantiate the model, and fit the model using the two columns you created in part **b.** to predict whether or not an individual converts.

In [52]:

```
log_mod = sm.Logit(df2c['converted'], df2c[['intercept', 'ab_page']])
results = log_mod.fit()
results.summary()
```

Out[52]:

d. Provide the summary of your model below, and use it as necessary to answer the following questions.

In [53]:

```
# Need to exponentiate in order to interpret them.
np.exp(-1.9888), np.exp(-0.0150)
```

Out[53]:

In [54]:

```
# Since they are negative, easier to explain as 1/np.exp()
1/np.exp(-1.9888), 1/np.exp(-0.0150)
```

Out[54]:

Each of these exponentiated values is the multiplicative change in the odds of conversion occurring.

- When you have values that are less than one it is often advantageous to compute the reciprocal in order to explain the results more clearly. For each 1 unit decrease in ab_page, conversion is 1.015 times as likely holding all else constant. Not much of an impact.

e. What is the p-value associated with **ab_page**?

- p-value = .19

- In Part II the p_value is from .9051 to .9038. In Part III the p_value is .19.
- The reason is in Part II we constructed a hypothesis that would only be tested in one direction. We did this because we wanted to prove the Alternate Hypothes is True. Ergo the new_page has a significantly higher conversion rate than the old_page. This is just one sided. It is not interested in the possibility that the new_page could be much worse than the old_page. If the new_page is much worse. No big deal. We just keep the old_page. Our hypotheses statements affect what our p-value is because it changes which side(s) (or both sides) from the distribution we are calculating the p-value from. The hypotheses in Part II is a one tail test. Part III is a two tail regression test.
- You can equivalence the p-values using this math. (1 - (0.19/2) = 0.95).We calculated in Part II j a p_value 0f .9038 and a p_value of .9051 in Part II l. Close enough to .95 to be equivalent. In Part II we are concerned with which page had a higher conversion rate, so a one-tailed test. However, in Part III, a regression test is not concerned with a positive or negative change. It checks to see if the independent variable (ab_page/new_page) had any effect at all, so a two-tailed test.

f. Now, you are considering other things that might influence whether or not an individual converts. Discuss why it is a good idea to consider other factors to add into your regression model. Are there any disadvantages to adding additional terms into your regression model?

**Put your answer here.**

- One idea alluded to within this project is looking at the date and time information to see if that has correlations within it that changes the conversion rates in favor of the new_page. Timing of behaviour within an A/B test is often quite important. Different people respond differently to change. Some people like change (almost no matter what). Other people have a lot of inertia when it comes to change, even if it is positive. This data is time and date stamped. We will explore that idea below.
- It is always good to reflect upon what your data seems to be telling you. Simpson's Paradox is an excellent example of that. Looking at the data within categories (for example), sometimes obscures the fact, that overall the answer to the question that you are asking is in effect opposite of what you observe in the specific catetegories.
- Simpson's paradox (https://en.wikipedia.org/wiki/Simpson%27s_paradox), or the Yuleâ€“Simpson effect, is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. It is sometimes given the descriptive title reversal paradox or amalgamation paradox.[1]
- From https://en.wikipedia.org/wiki/Multiple_comparisons_problem In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously[1] or infers a subset of parameters selected based on the observed values.[2] In certain fields it is known as the look-elsewhere effect. The more inferences are made, the more likely erroneous inferences are to occur. This is a potential problem.
- There might be behaviours associated with visiting the webpage that we are not capturing. Perhaps the question should be, given that this is a big decision for people, they are likely to visit the conversion page several times. Perhaps the question should be: "For those that are likely to convert (e.g. defined as more than one visit to the conversion page) which conversion page (old_page vs new_page) had more conversions." Unfortunately, this data does not seem to contain that information. However, in the real world, it is extremely likely that this data would be captured. Identifying unique visitors to a website and how many times they visited a page and what the clicked on, the duration etc., is a requirement for virtually all data gathering by organizations on their websites.
- One likely disadvantage of adding additional variables to the model is covariance. That is two or more variables are correlated. Covariance causes interpretation difficulties and can and will skew R values. This creates challenges with creating appropriate statistical tests and interpreting them.

g. Now along with testing if the conversion rate changes for different pages, also add an effect based on which country a user lives. You will need to read in the **countries.csv** dataset and merge together your datasets on the approporiate rows. Here are the docs for joining tables.

Does it appear that country had an impact on conversion? Provide the statistical output as well as a written response to answer this question.

**Answer provided after results.summary() below**

In [55]:

```
cdf = pd.read_csv('countries.csv')
```

In [56]:

```
cdf.head(2)
```

Out[56]:

In [57]:

```
cdf2 = df2c.merge(cdf, on='user_id', how='inner')
cdf2.head(2)
```

Out[57]:

In [58]:

```
cdf2.isnull().sum()
```

Out[58]:

In [59]:

```
cdf2.country.unique()
```

Out[59]:

In [60]:

```
cdf2[['CA', 'UK', 'US']] = pd.get_dummies(cdf2['country'])
cdf2 = cdf2.drop(['country', 'CA'], axis=1)
```

In [61]:

```
cdf2.head()
```

Out[61]:

In [62]:

```
log_mod = sm.Logit(cdf2['converted'], cdf2[['intercept', 'ab_page', 'UK', 'US']])
results = log_mod.fit()
results.summary()
```

Out[62]:

In [63]:

```
# Need to exponentiate in order to interpret them.
np.exp(-0.0149), np.exp(0.0506), np.exp(0.0408)
```

Out[63]:

In [64]:

```
# Explain negative, easier to explain as 1/np.exp()
1/np.exp(-0.0149)
```

Out[64]:

Each of these exponentiated values is the multiplicative change in the odds of conversion occurring.

- For the UK variable (if the user was from the UK) conversion is 1.052 times as likely, holding all else constant.
- For the US variable (if the user was from the US) conversion is 1.042 times as likely, holding all else constant.
- When you have values that are less than one it is often advantageous to compute the reciprocal in order to explain the results more clearly. For each 1 unit decrease in new_page, conversion is 1.015 times as likely holding all else constant.

None of these p_values are significant. They are all above .05. We would accept the null hypothesis. The new_page is not significantly better than the old_page.

h. Though you have now looked at the individual factors of country and page on conversion, we would now like to look at an interaction between page and country to see if there significant effects on conversion. Create the necessary additional columns, and fit the new model.

Provide the summary results, and your conclusions based on the results.

In [65]:

```
df_int = cdf2.copy()
df_int.head()
```

Out[65]:

In [66]:

```
log_mod = sm.Logit(df_int['converted'], df_int[['intercept', 'ab_page', 'UK', 'US']])
results = log_mod.fit()
results.summary()
```

Out[66]:

- None of the p_values are significant.

In [67]:

```
df_int['UK_ab_page'] = df_int['UK'] * df_int['ab_page']
df_int['US_ab_page'] = df_int['US'] * df_int['ab_page']
df_int = df_int.drop('intercept', axis=1)
df_int.head()
```

Out[67]:

In [68]:

```
df_int.corr(method='spearman')
```

Out[68]:

This question is asking what the covariance is between the country and the page. For example, do people from the UK like the new_page while people from the USA like the old_page? Here are my findings.

- None of the p_values are significant.
- When you look at the ab_page (conversion page), there is a strong correlation between people in the US visiting the new_page and converting there (.734811). It is possible that if you cut this data simply on USA you would find that there is strong reason to suggest that USA visitors should be shown the new_page. I will do a quick test on this below here.

In [69]:

```
df2.head(2)
```

Out[69]:

In [70]:

```
df_usa = df2.merge(cdf, on='user_id', how='inner')
df_usa = df_usa.query('country == "US"')
df_usa.head(2)
```

Out[70]:

In [71]:

```
n_old = df_usa.query('group == "control"').count().tolist()
n_old = n_old[0]
n_old
```

Out[71]:

In [72]:

```
convert_old = df_usa.query('group == "control" and converted == 1').count().tolist()
convert_old = convert_old[0]
convert_old
```

Out[72]:

In [73]:

```
n_new = df_usa.query('group == "treatment"').count().tolist()
n_new = n_new[0]
n_new
```

Out[73]:

In [74]:

```
convert_new = df_usa.query('group == "treatment" and converted == 1').count()
convert_new = convert_new[0]
convert_new
```

Out[74]:

In [75]:

```
z_score, p_value = sm.stats.proportions_ztest([convert_old, convert_new], [n_old, n_new], alternative='smaller')
z_score, p_value
```

Out[75]:

The ztest yields a p_value of .9339. That is > a .05 level of significance. We would accept the null hypothesis_USA. People from the USA are no more likely to click on the new_page than any other country and the old_page is as good or better than the new_page.

I am Canadian. I want to see this question explicitly answered from a Canadian point of view.

In [76]:

```
cdf = pd.read_csv('countries.csv')
```

In [77]:

```
cdf.head(2)
```

Out[77]:

In [78]:

```
cdf2 = df2c.merge(cdf, on='user_id', how='inner')
cdf2.head(2)
```

Out[78]:

In [79]:

```
cdf2.isnull().sum()
```

Out[79]:

In [80]:

```
cdf2.country.unique()
```

Out[80]:

In [81]:

```
cdf2[['CA', 'UK', 'US']] = pd.get_dummies(cdf2['country'])
cdf2 = cdf2.drop(['country', 'US'], axis=1)
```

In [82]:

```
cdf2.head()
```

Out[82]:

In [83]:

```
log_mod = sm.Logit(cdf2['converted'], cdf2[['intercept', 'ab_page', 'CA', 'UK']])
results = log_mod.fit()
results.summary()
```

Out[83]:

In [84]:

```
# Need to exponentiate in order to interpret them.
np.exp(-0.0149), np.exp(-0.0408), np.exp(0.0099)
```

Out[84]:

In [85]:

```
# Explain negative, easier to explain as 1/np.exp()
1/np.exp(-0.0149), 1/np.exp(-0.0408)
```

Out[85]:

Each of these exponentiated values is the multiplicative change in the odds of conversion occurring.

- For the UK variable (if the user was from the UK) conversion is 1.0099 times as likely, holding all else constant.
- When you have values that are less than one it is often advantageous to compute the reciprocal in order to explain the results more clearly. For each 1 unit decrease in new_page, conversion is 1.015 times as likely holding all else constant.
- For each 1 unit decrease in new_page for Canadians, conversion is 1.0416 times as likely holding all else constant.

None of these p_values are significant. They are all above .05. We would accept the null hypothesis. The new_page is not significantly better than the old_page.

h. Though you have now looked at the individual factors of country and page on conversion, we would now like to look at an interaction between page and country to see if there significant effects on conversion. Create the necessary additional columns, and fit the new model.

Provide the summary results, and your conclusions based on the results.

In [86]:

```
df_int = cdf2.copy()
df_int.head()
```

Out[86]:

In [87]:

```
log_mod = sm.Logit(df_int['converted'], df_int[['intercept', 'ab_page', 'CA', 'UK']])
results = log_mod.fit()
results.summary()
```

Out[87]:

- None of the p_values are significant.

In [88]:

```
df_int['UK_ab_page'] = df_int['UK'] * df_int['ab_page']
df_int['CA_ab_page'] = df_int['CA'] * df_int['ab_page']
df_int = df_int.drop('intercept', axis=1)
df_int.head()
```

Out[88]:

In [89]:

```
df_int.corr(method='spearman')
```

Out[89]:

This question is asking what the covariance is between the country and the page. For example, do people from the UK like the new_page while people from the USA like the old_page? Here are my findings.

- None of the p_values are significant.
- When you look at the ab_page (conversion page), there is little correlation between people in Canada visiting the new_page and converting there (.160519).

The process is:

- Find the range of days
- If manageable, bar chart each day for successful conversions.
- Hopefully it looks significant.
- Conduct z_tests in a for loop that yields the p_value for each day.
- Sort the p_values
- See if any are any significant p_values and if there are any patterns surrounding the values. For example do you see a steady building or deterioration of conversions.

In [90]:

```
df2.head(2)
```

Out[90]:

In [91]:

```
df2s = df2.sort_values('timestamp')
df2s.iloc[0], df2s.iloc[-1]
```

Out[91]:

In [92]:

```
df2s['day'] = df2s['timestamp'].apply(lambda x: x[8:10])
df2s.head()
```

Out[92]:

In [93]:

```
df2g = df2s.groupby(['day', 'group']).count().reset_index()
df2g = df2g.drop(['user_id', 'timestamp', 'landing_page'], axis=1)
df2g.head()
```

Out[93]:

In [94]:

```
locations = df2g.day.unique()
heights_control = df2g.query('group == "control"')['converted'].tolist()
heights_treatment = df2g.query('group == "treatment"')['converted'].tolist()
labels = range(2, 25)
plt.bar(locations, height=heights_control, tick_label=labels, color = 'red', alpha=.25)
plt.bar(locations, height=heights_treatment, tick_label=labels, color='blue', alpha=.25);
```

Not much to see here.

In [95]:

```
zp = []
for value in locations:
convert_old = df2s.query('group == "control" and converted == 1 and day == @value').shape[0]
convert_new = df2s.query('group == "treatment" and converted == 1 and day == @value').shape[0]
n_old = df2s.query('group == "control" and day == @value').shape[0]
n_new = df2s.query('group == "treatment" and day == @value').shape[0]
z_score, p_value = sm.stats.proportions_ztest([convert_old, convert_new], [n_old, n_new], alternative='smaller')
zp.append((value, z_score, p_value))
```

In [96]:

```
pd.DataFrame(zp, columns = ['day', 'zscore', 'pvalue']).sort_values('pvalue')
```

Out[96]:

For one brilliant day, Jan 10, 2017, the new_page had a significant p_value of .008 which is < .05. However, there were no other days that exhibited this behaviour. Nor is there a pattern that arises from this day. E.g. building or subsiding from it. There are also no weekend or weekday patterns that emerged. We would accept the null hypothesis. There are no significant time related changes in conversion rates in this dataset.

Based on simulatons, logistical regression models (Logit) and z_tests we do not see any significant results. The p_values in all cases, except for one (January 10, 2017) were not signficant. We would accept the null hypothesis. The old_page is just as good if not better than the new_page.

In [97]:

```
from subprocess import call
call(['python', '-m', 'nbconvert', 'Analyze_ab_test_results_notebook.ipynb'])
```

Out[97]:

In [ ]:

```
```