Reliability
Psychology of Testing & Measurements
Lecture, Chapter 4
Introduction
What makes measurement in psychology so difficult?
Psychologists must measure abstract qualities, such as grief, trauma,
intelligence, and/or aggressiveness.
Because of the lack of precise instruments, such as a ruler, psychologists have
become concerned with measurement error.
What exactly is error?
The difference between a person’s true score and their observed score results
from measurement error.
Some degree of measurement error occurs in most physical, social, and biological
sciences.
Tests that are relatively free of measurement error are deemed to be reliable.
What factors might cause error in measurement?
Situational – testing methods
Illness of participant
Degree of participant’s physical comfort (ex. temperature)
Time factor (in pre-test, post-test)
If items (questions) are not clear.
If an interview/observation setting, disagreement among observers.
Classical Test Score Theory
Assumes that each person has a “true” score that would be obtained without
error.
Each person’s score almost always differs from her true ability.
X (observed) = T (true) + E (error)
Assumes that errors of measurement are random.
Assumes that a person’s true score will not change with repeated testing;
different scores are produced because of random error.
Uses standard error of measurement, which is a standard deviation of errors, to
tell us how much a score differs from the true score.
Models of Reliability
Time Sampling: Test-Retest Method
Parallel or equivalent forms reliability
Internal consistency: Split-Half Method
KR20 Formula
Coefficient Alpha
How Reliable is Reliable?
It is suggested that .70 to .80 is good enough in most cases
When tests are used to make important decisions about someone’s future,
evaluators must be certain to minimize any error in classification.
As a result, for a decision that could affect someone’s future, evaluators
should attempt to find tests with a reliability greater than .95
Low reliability can be addressed by increasing the length of the test or by
throwing out items that run down the reliability