Data Science with Python: T-Test VS Z-Test Comparison

Hypothesis Test (t-test Vs z-test):

Hypothesis: A supposition which must be accepted or rejected is called as a hypothesis.

Hypothesis testing procedures are classified in to 2 types:

1) Parametric test (Is based on the fact, that the variables are measured on an interval scale)

a) t-test

b) z-test

2) Non-parametric test (Is assumed to be measured on an ordinal scale)

T-Test and Z-test Comparison:

	T-Test	Z-Test
Definition:	T-test refers to a univariate hypothesis test based on t-statistic, wherein the mean is known, and population variance is approximated from the sample. è More precisely, a t-test is used to examine how the means taken from two independent samples differ. è T-test follows t-distribution, which is appropriate when the sample size is small, and the population standard deviation is not known. è The shape of a t-distribution is highly affected by the degree of freedom. The degree of freedom implies the number of independent observations in a given set of observations. Paired t-test: A statistical test applied when the two samples are dependent and paired observations are taken.	Z-test is also a univariate test that is based on standard normal distribution. è Z-test refers to a univariate statistical analysis used to test the hypothesis that proportions from two independent samples differ greatly. è It determines to what extent a data point is away from its mean of the data set, in standard deviation. è Z-test can be adopted when the population variance is known when there is a large sample size, sample variance is deemed to be approximately equal to the population variance.
Key Differences:	è Based on t-distribution. è T-test can be applied when the means of the two population is different from one another. è Population variance is unknown. è Can be applied when Sample Size is Small (<30)	è Based on Normal-distribution. è Z-test can be applied when the standard deviation is known, to determine, if the means of the two datasets differ from each other. è Population variance is known. è Can be applied when Sample Size is large (>30)
Assumptions:	All data points are independent. The sample size is small. Generally, a sample size exceeding 30 sample units is regarded as large, otherwise small but that should not be less than 5, to apply t-test. Sample values are to be taken and recorded accurately. The test statistic is: x ̅is the sample mean s is sample standard deviation n is sample size and μ is the population mean	All sample observations are independent Sample size should be more than 30. Distribution of Z is normal, with a mean zero and variance 1. The test statistic is: x ̅is the sample mean σ is population standard deviation n is sample size μ is the population mean

Useful Flow Charts:

Examples:

Z-Test Example:

Suppose, the mean height of women is 65″ with a standard deviation of 3.5″. What is the probability of finding a random sample of 50 women with a mean height of 70″, assuming the heights are normally distributed?

z = (x – μ) / (σ / √n) = (70 – 65) / (3.5/√50) = 5 / 0.495 = 10.1

The key here is that we’re dealing with a sampling distribution of means, so we know we have to include the standard error in the formula.

We also know that 99% of values fall within 3 standard deviations from the mean in a normal probability distribution.

Therefore, there’s less than 1% probability that any sample of women will have a mean height of 70″.

Example for Independent t-test:

A research study was conducted to examine the differences between older and younger adults on perceived life satisfaction. A pilot study was conducted to examine this hypothesis. Ten older adults (over the age of 70) and ten younger adults (between 20 and 30) were give a life satisfaction test (known to have high reliability and validity). Scores on the measure range from 0 to 60 with high scores indicative of high life satisfaction; low scores indicative of low life satisfaction. The data are presented below.

Older Adults Younger Adults

45 34

38 22

52 15

48 27

25 37

39 41

51 24

46 19

55 26

46 36

Mean = 44.5 Mean = 28.1,

S = 8.682677518 S = 8.543353492

S2 = 75.388888888 S2 = 72.988888888

By using the formula for t-test, the appropriate t-test value is = 4.257

Data Science with Python

Wednesday, September 18, 2019

T-Test VS Z-Test Comparison

1 comment:

ML-Model DecisionTree Example-IncomePrediction

Report Abuse

Labels