The one sample t-test is very similar to the one sample z-test. A sample mean is being compared to a claimed population mean. The t-test is required when the population standard deviation is unknown. The t-test uses the sample’s standard deviation (not the population’s standard deviation) and the Student t-distribution as the sampling distribution to find a p-value.
While the z-test uses the normal distribution, which is only dependent on the mean and standard deviation of the population. Any of the various t-tests [one-sample, independent, dependent] use the t-distribution, which has an extra parameter over the normal distribution: degrees of freedom (df). The theoretical basis for degrees of freedom deserves a lot of attention, but for now for the one-sample t-test, df = n – 1.
The distributions above show how the degrees of freedom affect the shape of t distribution. The gray distribution is the normal distribution. Low df causes the tails of the distribution to be fatter, while a higher df makes the t-distribution become more like the normal distribution.
The practical outcome of this will be that samples with smaller n will need to be further from the population mean to reject the null hypothesis than samples with larger n. And compared to the z-test, the t-test will always need to have the sample mean further from the population mean.
Just like the one-sample z-test, we have to define our null hypothesis and alternate hypothesis. This time I’m going to show a two-tailed test. The null hypothesis will be that there is NO difference between the sample mean and the population mean. The alternate hypothesis will test to see if the sample mean is significantly different from the population mean. The null and alternate hypotheses are written out as:
The graphic above shows a t-distribution with a df = 5 with the critical regions highlighted. Since the shape of the distribution changes with degrees of freedom, the critical value for the t-test will change as well.
The t-stat for this test is calculated the same way as the z-stat for the z-test, except for the σ term [population standard deviation] in the z-test is replaced with s [sample standard deviation]:
Like the z-stat, the higher the t-stat is the more certainty there is that the sample mean and the population mean are different. There are three things make the t-stat larger:
- a bigger difference between sample mean and population mean
- a small sample standard deviation
- a larger sample size
Example in R
Since the one-sample t-test follows the same process as the z-test, I’ll simply show a case where you reject the null hypothesis. This will also be a two-tailed test, so we will use the null and alternate hypotheses found earlier on this page.
Once again using the height and weight data set from UCLA’s site, I’ll create a tall-biased sample of 50 people for us to test.
#reads data set
data <- read.csv('data/Height_data.csv')
height <- data$Height
#N - number in population
#n - number in sample
N <- length(height)
n <- 50
pop_mean <- mean(height)
cut <- 1:25000
weights <- cut^.6
sorted_height <- sort(height)
height_sample_biased <- sample(sorted_height, size=n, prob=weights)
This sample would represent something like athletes, CEOs, or maybe a meeting of tall people. After creating the sample, we use R's mean() and sd() functions to get the parameters for the t-stat formula from above.
sample_mean <- mean(height_sample_biased)
sample_sd <- sd(height_sample_biased)
Now using the population mean, the sample mean, the sample standard deviation, and the number of samples (n = 50) we can calculate the t-stat.
t <- (sample_mean - pop_mean) / (sample_sd/sqrt(n))
Now you could look up the critical value for the t-test with 49 degrees of freedom [50-1 = 49], but this is R, so we can find the area under the tail of the curve [the blue area from the critical region diagram] and see if it’s under 0.025. This will be our p-value, which is the probability that the value was achieved by random chance.
#p-value for t-test
The answer should be 0.006882297, which is well below 0.025, so the null hypothesis is rejected and the difference between the tall-biased sample and the general population is statistically significant.
You can find the full R code including code to create the t-distribution and normal distribution data sets on my GitHub .