Chi-Square

Chi-Square: Tests for Goodness of Fit and Independence

Week 12 – Chapter 18

Parametric vs. Nonparametric
Tests that concern parameters and require assumptions about parameters (t tests and ANOVAs) are called parametric tests. These tests require a numerical score for each individual; thus data is from an interval or ratio scale.

When assumptions of a test are violated, the test may lead to errors in interpretation. What can researchers do when their situations do not meet assumptions required for parametric tests?

Nonparametric Tests
Hypotheses are not stated in terms of a specific parameter, and they make few assumptions about the distribution; sometimes called distribution-free tests.

Nonparametric tests use categorical variables measuring nominal and ordinal data; since means and variances cannot be calculated, we usually simply use frequencies.

Parametric tests are more sensitive to calculating differences; thus we should always opt for parametric (over nonparametric) designs if possible.

Chi-Square Tests
Chi-square tests are intended for research questions concerning the proportion of the population in different categories.

For a chi-square test, there is not a numerical score for each individual and you do not compute a sample mean or a sample variance.

Instead, each individual is simply classified into a category and you count the number of individuals in each category. The resulting data are called observed frequencies.

Chi-Square Test for Goodness of Fit
The data for the chi-square test for goodness of fit consist of a sample of individuals who have been classified into categories of one variable.

The numbers of individuals in each category are called the observed frequencies.

The null hypothesis for the chi-square test for goodness of fit typically falls into one of two types:

(1) a no-preference hypothesis which states that the population is distributed evenly across the categories

(2) a no-difference hypothesis which states that the population distribution is not different from an established distribution.

In either case, the proportions from the null hypothesis are used to construct an ideal sample distribution, called expected frequencies, and then the chi-square statistic is computed to determine how well the data (observed frequencies) fit the hypothesis (expected frequencies).

A big discrepancy between the data and the hypothesis results in a large value for chi-square, which leads to rejecting the null hypothesis.

Observed Frequencies
The observed frequency is the number of individuals from the sample who are classified in a particular category. Each individual is counted in one and only one category.

In the following example, each person classified into 1 of 3 personality categories: A, B, or C

            Personality      f
n = 40         A           15
                    B          19
                    C           6

Expected Frequencies
The expected frequency is the value that is predicted from the null hypothesis and the sample size (n). This defines an ideal, hypothetical sample distribution that would be obtained if the sample proportions were in perfect agreement with the proportions specified in the null.

First, construct the hypothetical sample based on the null:
                    Personality             f
n = 40             A                     25%
                        B                     50%
                        C 2                    5%

Then, calculate the raw numbers of individuals based on sample size n = 40 for each of the categories. Where p is the proportion stated in the null and n is the sample size,

expected frequency = fe = pn

25% of 40 = 0.25(40) = 10 individuals in category A
50% of 40 = 0.50(40) = 20 individuals in category B
25% of 40 = 0.25(40) = 10 individuals in category C

Chi-Square Test
First, find the difference between fo (data) and fe (hypothesis) for each category.

Second, square the difference to ensure that all values are positive.

Next, divide the squared differences by fe.

Finally, add the values from all the categories.

Chi-square = Χ² = ∑ (fo – fe)²

When there are large differences between fo and fe, the value of chi-square is large, and we conclude that the data do not fit the hypothesis. Thus, a large chi-square value leads us to reject the null.

When there are small differences between fo and fe, the chi-square is small, and we conclude that there is a very good fit between the data and the hypothesis. Thus, a small chi-square value leads us to fail to reject the null.

To decide whether a value is “large” or “small” we must refer to a chi-square distribution.

Chi-Square Distribution and Degrees of Freedom
Chi-square distribution has the following characteristics:

The formula for chi-square involves adding squared values, so you can never obtain a negative value. Thus, all chi-square values are zero or larger.

When the null is true, you expect the data (observed values) to be close to the hypothesis (expected values). Thus, we expect chi-square values to be small when the null is true.

These 2 factors suggest that the typical chi-square distribution will be positively skewed.

The degrees of freedom are determined by df = C – 1, where C is the number of categories

How should we hang the abstract painting? Observed frequencies are as follows:
                form                     frequency
                top up (correct)     18
                bottom up             17
                left side up             7
                right side up             8

State the hypotheses and select an alpha level
    Null: There is no preference for an orientation.
    Alternative: One or more orientations is preferred over the other
    alpha level = .05

Locate the critical region in the table B.8. df = C – 1 = 4 – 1 = 3
    Critical value = 7.81

Calculate the chi-square statistic

First, compute fe; null specifies 25% for each category with n = 50.
    fe = pn = ¼(50) = 12.5

Second, calculate chi-square
    chi-square = Χ² = ∑ (fo – fe)² = 8.08

State a decision and a conclusion: The obtained chi-square value is in the critical region. Therefore, the null is rejected and the researcher may conclude that the four orientations are not equally likely to be preferred.

Results
APA style formatting for reporting chi-square statistic in scientific journals is as follows:

The participants showed significant preferences among the four orientations for hanging the painting, X²(3, n = 50) = 8.08, p < .05.

The chi-square test for goodness of fit is a nonparametric version of the single-sample t-test. Both use data from a single sample to test hypotheses about a single population.

Chi-Square Test for Independence
The chi-square statistic may also be used to test whether there is a relationship between 2 variables.

For the chi-square test for independence, each individual in the sample is classified into one category for each of two different variables.

The categories of one variable form the rows of a data matrix and the categories of the second variable form the columns. The number of individuals in each cell of the matrix is the observed frequency for that cell.

Null Hypotheses
Ho version 1: For the general population of students, there is no relationship between color preference and personality.

Evaluates relationship between 2 variables

Demonstrates similarity with correlation

Ho version 2: In the population of students, there is no difference between the distribution of color preferences for introverts and the distribution of color preferences for extroverts. The two distributions have the same shape (proportions).

Determines significant differences

Demonstrates similarity with t test or ANOVA.

Both are essentially the same.

Observed and Expected Frequencies
Using the table of observed frequencies, remove all numbers within cells but keep the column and row totals.

Calculate the % of individuals choosing each color, regardless of personality type.
100 out of 200 = 50% prefer red
20 out of 200 = 10% prefer yellow
40 out of 200 = 20% prefer green
40 out of 200 = 20% prefer blue

Find Fe of both personality types by applying the overall distribution of color preferences to both.

Table 18.4 (p. 595)

For the 50 introverts:
.5 choose red: = 0.50(50) = 25
.1 choose yellow: = 0.10(50) = 5
.2 choose green: = 0.20(50) = 10
.2 choose blue: = 0.20(50) = 10

For the 150 extroverts:
.5 choose red: = 0.50(150) = 75
.1 choose yellow: = 0.10(150) = 15
.2 choose green: = 0.20(150) = 30
.2 choose blue: = 0.20(150) = 30

Chi-Square Statistic
Another form for determining expected frequencies is:
    fe = fcfr

The chi-square test for independence uses exactly the same formula as the test for goodness of fit:
chi-square = Χ² = ∑ (fo – fe)² = 8.08

degrees of freedom are df = (R – 1)(C – 1)

Example
In a research situation involving academic performance and self-esteem of 10-year old children, consider the differences between a Pearson correlation, a t test, and a chi-square analysis.

Pearson: numerical scores for both variables, compute means, SS values, and SP; measures relationship

t test: numerical scores for 1 variable, categorical variable for the other; compute means and SS for each group; measures significant differences between groups

chi-square: categories for both variables; frequencies used; measure significant differences between groups.

Step 1: State hypotheses

    Version 1:

        Null: There is no relationship between academic performance and self-esteem.

        Alternative: There is a consistent, predictable relationship between performance and self-esteem.

    Version 2:

        Null: The distribution of self-esteem is the same for low and high academic performers.

        Alternative: The distribution of self-esteem for high performers is different from the distribution for low academic performers.

Step 2: Determine degrees of freedom, and locate the critical region
    df = (R – 1)(C – 1)
    df = (2 – 1)(3 – 1) = 2

Determine expected frequencies

High performers:
20% of 60 = 12 students with high self esteem
50% of 60 = 30 students with medium self esteem
30% of 60 = 18 students with low self esteem

Low performers:
20% of 60 = 18 students with high self esteem
50% of 90 = 45 students with medium self esteem
30% of 90 = 27 students with low self esteem

Table 18.8 (p. 600)

Compute chi-square:
chi-square = Χ² = ∑ (fo – fe)²

= 2.08 + 0.13 + 2.72 + 1.39 + 0.09 + 1.81

= 8.22

Make a decision: The obtained chi-square exceeds the critical value (5.99), rejecting the null. This result would be reported as Χ²(2, n = 150) = 8.22, p < .05

    Version 1: There is a significant relationship between academic performance and self-esteem

    Version 2: There is a significant difference in self-esteem between high and low academic performers

Effect Size
The measures of effect size are interpreted the same as correlations, and provide a measure of the strength of the relationship.

When the chi-square test involves a 2 x 2 matrix, use the phi-coefficient.

When the chi-square test involves a matrix larger than 2 x 2, use Cramer’s V, a modification of the phi-coefficient.

Independence of observations – Each observed frequency is generated by a different subject. A chi-square test would be inappropriate if a person could produce responses that could be classified in more than one category or contribute more than one frequency count to a single category.

Size of expected frequencies – A chi-square test should not be performed when the expected frequency of any cell is less than 5; this can distort the statistic.

Special Applications
Chi-square and Pearson correlation
    Categorical vs. continuous data
    Significance vs. strength (effect size)

Chi-square and Phi-coefficient
    Phi-coefficient measures relationship between 2 dichotomous variables; can also do so with a 2 X 2 chi-square test for independence
    Significance vs. strength (effect size)

Independent measures t and ANOVA require that DV is continuous.
    Chi-square can be substituted when DV is categorical (nominal or ordinal)