Chi-Square: Tests for Goodness of Fit and Independence
Week 12 Chapter 18
Parametric vs. Nonparametric
Tests that concern parameters and require assumptions about parameters (t tests
and ANOVAs) are called parametric tests. These tests require a numerical score
for each individual; thus data is from an interval or ratio scale.
When assumptions of a test are violated, the test may lead to errors in
interpretation. What can researchers do when their situations do not meet
assumptions required for parametric tests?
Nonparametric Tests
Hypotheses are not stated in terms of a specific parameter, and they make few
assumptions about the distribution; sometimes called distribution-free tests.
Nonparametric tests use categorical variables measuring nominal and ordinal
data; since means and variances cannot be calculated, we usually simply use
frequencies.
Parametric tests are more sensitive to calculating differences; thus we should
always opt for parametric (over nonparametric) designs if possible.
Chi-Square Tests
Chi-square tests are intended for research questions concerning the proportion
of the population in different categories.
For a chi-square test, there is not a numerical score for each individual and
you do not compute a sample mean or a sample variance.
Instead, each individual is simply classified into a category and you count the
number of individuals in each category. The resulting data are called observed
frequencies.
Chi-Square Test for Goodness of Fit
The data for the chi-square test for goodness of fit consist of a sample of
individuals who have been classified into categories of one variable.
The numbers of individuals in each category are called the observed frequencies.
The null hypothesis for the chi-square test for goodness of fit typically falls
into one of two types:
(1) a no-preference hypothesis which states that the population is distributed
evenly across the categories
(2) a no-difference hypothesis which states that the population distribution is
not different from an established distribution.
In either case, the proportions from the null hypothesis are used to construct
an ideal sample distribution, called expected frequencies, and then the
chi-square statistic is computed to determine how well the data (observed
frequencies) fit the hypothesis (expected frequencies).
A big discrepancy between the data and the hypothesis results in a large value
for chi-square, which leads to rejecting the null hypothesis.
Observed Frequencies
The observed frequency is the number of individuals from the sample who are
classified in a particular category. Each individual is counted in one and only
one category.
In the following example, each person classified into 1 of 3 personality
categories: A, B, or C
Personality
f
n = 40 A
15
B 19
C 6
Expected Frequencies
The expected frequency is the value that is predicted from the null hypothesis
and the sample size (n). This defines an ideal, hypothetical sample distribution
that would be obtained if the sample proportions were in perfect agreement with
the proportions specified in the null.
First, construct the hypothetical sample based on the null:
Personality
f
n = 40 A
25%
B
50%
C 2
5%
Then, calculate the raw numbers of individuals based on sample size n = 40 for
each of the categories. Where p is the proportion stated in the null and n is
the sample size,
expected frequency = fe = pn
25% of 40 = 0.25(40) = 10 individuals in category A
50% of 40 = 0.50(40) = 20 individuals in category B
25% of 40 = 0.25(40) = 10 individuals in category C
Chi-Square Test
First, find the difference between fo (data) and fe (hypothesis) for each
category.
Second, square the difference to ensure that all values are positive.
Next, divide the squared differences by fe.
Finally, add the values from all the categories.
Chi-square = Χ² = ∑ (fo fe)²
When there are large differences between fo and fe, the value of chi-square is
large, and we conclude that the data do not fit the hypothesis. Thus, a large
chi-square value leads us to reject the null.
When there are small differences between fo and fe, the chi-square is small, and
we conclude that there is a very good fit between the data and the hypothesis.
Thus, a small chi-square value leads us to fail to reject the null.
To decide whether a value is large or small we must refer to a chi-square
distribution.
Chi-Square Distribution and Degrees of Freedom
Chi-square distribution has the following characteristics:
The formula for chi-square involves adding squared values, so you can never
obtain a negative value. Thus, all chi-square values are zero or larger.
When the null is true, you expect the data (observed values) to be close to the
hypothesis (expected values). Thus, we expect chi-square values to be small when
the null is true.
These 2 factors suggest that the typical chi-square distribution will be
positively skewed.
The degrees of freedom are determined by df = C 1, where C is the number of
categories
How should we hang the abstract painting? Observed frequencies are as follows:
form
frequency
top up (correct) 18
bottom up
17
left side up
7
right side up
8
State the hypotheses and select an alpha level
Null: There is no preference for an orientation.
Alternative: One or more orientations is preferred over the
other
alpha level = .05
Locate the critical region in the table B.8. df = C 1 = 4 1 = 3
Critical value = 7.81
Calculate the chi-square statistic
First, compute fe; null specifies 25% for each category with n = 50.
fe = pn = Ό(50) = 12.5
Second, calculate chi-square
chi-square = Χ² = ∑ (fo fe)² = 8.08
State a decision and a conclusion: The obtained chi-square value is in the
critical region. Therefore, the null is rejected and the researcher may conclude
that the four orientations are not equally likely to be preferred.
Results
APA style formatting for reporting chi-square statistic in scientific journals
is as follows:
The participants showed significant preferences among the four orientations
for hanging the painting, X²(3, n = 50) = 8.08, p < .05.
The chi-square test for goodness of fit is a nonparametric version of the
single-sample t-test. Both use data from a single sample to test hypotheses
about a single population.
Chi-Square Test for Independence
The chi-square statistic may also be used to test whether there is a
relationship between 2 variables.
For the chi-square test for independence, each individual in the sample is
classified into one category for each of two different variables.
The categories of one variable form the rows of a data matrix and the categories
of the second variable form the columns. The number of individuals in each cell
of the matrix is the observed frequency for that cell.
Null Hypotheses
Ho version 1: For the general population of students, there is no relationship
between color preference and personality.
Evaluates relationship between 2 variables
Demonstrates similarity with correlation
Ho version 2: In the population of students, there is no difference between the
distribution of color preferences for introverts and the distribution of color
preferences for extroverts. The two distributions have the same shape
(proportions).
Determines significant differences
Demonstrates similarity with t test or ANOVA.
Both are essentially the same.
Observed and Expected Frequencies
Using the table of observed frequencies, remove all numbers within cells but
keep the column and row totals.
Calculate the % of individuals choosing each color, regardless of personality
type.
100 out of 200 = 50% prefer red
20 out of 200 = 10% prefer yellow
40 out of 200 = 20% prefer green
40 out of 200 = 20% prefer blue
Find Fe of both personality types by applying the overall distribution of color
preferences to both.
Table 18.4 (p. 595)
For the 50 introverts:
.5 choose red: = 0.50(50) = 25
.1 choose yellow: = 0.10(50) = 5
.2 choose green: = 0.20(50) = 10
.2 choose blue: = 0.20(50) = 10
For the 150 extroverts:
.5 choose red: = 0.50(150) = 75
.1 choose yellow: = 0.10(150) = 15
.2 choose green: = 0.20(150) = 30
.2 choose blue: = 0.20(150) = 30
Chi-Square Statistic
Another form for determining expected frequencies is:
fe = fcfr
The chi-square test for independence uses exactly the same formula as the test
for goodness of fit:
chi-square = Χ² = ∑ (fo fe)² = 8.08
degrees of freedom are df = (R 1)(C 1)
Example
In a research situation involving academic performance and self-esteem of
10-year old children, consider the differences between a Pearson correlation, a
t test, and a chi-square analysis.
Pearson: numerical scores for both variables, compute means, SS values, and SP;
measures relationship
t test: numerical scores for 1 variable, categorical variable for the other;
compute means and SS for each group; measures significant differences between
groups
chi-square: categories for both variables; frequencies used; measure significant
differences between groups.
Step 1: State hypotheses
Version 1:
Null: There is no relationship
between academic performance and self-esteem.
Alternative: There is a consistent,
predictable relationship between performance and self-esteem.
Version 2:
Null: The distribution of self-esteem
is the same for low and high academic performers.
Alternative: The distribution of
self-esteem for high performers is different from the distribution for low
academic performers.
Step 2: Determine degrees of freedom, and locate the critical region
df = (R 1)(C 1)
df = (2 1)(3 1) = 2
Determine expected frequencies
High performers:
20% of 60 = 12 students with high self esteem
50% of 60 = 30 students with medium self esteem
30% of 60 = 18 students with low self esteem
Low performers:
20% of 60 = 18 students with high self esteem
50% of 90 = 45 students with medium self esteem
30% of 90 = 27 students with low self esteem
Table 18.8 (p. 600)
Compute chi-square:
chi-square = Χ² = ∑ (fo fe)²
= 2.08 + 0.13 + 2.72 + 1.39 + 0.09 + 1.81
= 8.22
Make a decision: The obtained chi-square exceeds the critical value (5.99),
rejecting the null. This result would be reported as Χ²(2, n = 150) = 8.22, p <
.05
Version 1: There is a significant relationship between
academic performance and self-esteem
Version 2: There is a significant difference in self-esteem
between high and low academic performers
Effect Size
The measures of effect size are interpreted the same as correlations, and
provide a measure of the strength of the relationship.
When the chi-square test involves a 2 x 2 matrix, use the phi-coefficient.
When the chi-square test involves a matrix larger than 2 x 2, use Cramers V, a
modification of the phi-coefficient.
Independence of observations Each observed frequency is generated by a
different subject. A chi-square test would be inappropriate if a person could
produce responses that could be classified in more than one category or
contribute more than one frequency count to a single category.
Size of expected frequencies A chi-square test should not be performed when
the expected frequency of any cell is less than 5; this can distort the
statistic.
Special Applications
Chi-square and Pearson correlation
Categorical vs. continuous data
Significance vs. strength (effect size)
Chi-square and Phi-coefficient
Phi-coefficient measures relationship between 2 dichotomous
variables; can also do so with a 2 X 2 chi-square test for independence
Significance vs. strength (effect size)
Independent measures t and ANOVA require that DV is continuous.
Chi-square can be substituted when DV is categorical (nominal
or ordinal)