one sample z-test & one sample t-test

The purpose of this notebook is to implement a one sample z-test and a one sample t-test, in order to demonstrate the mechanics of running such tests in python, using scipy and statsmodels.

Q: What's the difference between a Z-test and a t-test? Under which circumstances should I use one over the other?

Taken from from https://www.analyticsvidhya.com/blog/2020/06/statistics-analytics-hypothesis-testing-z-test-t-test/:

Z-Tests can be employed when:

z-Test Equation:

t-tests are a statistical way of testing a hypothesis when:

t-Test Equation:

Nifty flow-chart for which test to use when:

Get some sample data

One Sample Inference

Looks some-what normal to me.

We don't know the population variance (since we're trying to make an inference about a larger population via this study) but our sample size is large (n>=30), so using a Z-test is valid.

Z-Test

p-value is < 0.05, meaning we can reject the null hypothesis and accept the alternative - that the sample mean is > 80.

t-Test

We can also use a t-test, which should give very similar results, since the degrees of freedom is so large (n_samples - 1), so the t distribution resembles a standard normal distribution.

As we can see they overlap completely.

What about the PDF of t distributions with different degrees of freedom?

So the PDF of the t-distribution with dof=30 almost completely matches the PDF of the standard normal distribution.

Look at CDF briefly:

Calculate t statistic to test the hypothesis that the sample mean is greater than a given population mean

With p < 0.05, we can reject the null at the 95% confidence level (1-95 = 0.05) and accept the alternative hypothesis that the sample represents a population with a mean that is LARGER than the hypothetical population (that has a mean value of 80).

use two-sided one sample t-test function to checking for whether the mean of the sample significantly differs from a given population mean

With p < 0.05, we can reject the null at the 95% confidence level (1-95 = 0.05) and accept the alternative hypothesis that the sample represents a different population than the hypothetical population with a mean value of 80.