Reliability
Psychology of Testing & Measurements
Lecture, Chapter 4

Introduction
What makes measurement in psychology so difficult?
Psychologists must measure abstract qualities, such as grief, trauma, intelligence, and/or aggressiveness.
Because of the lack of precise instruments, such as a ruler, psychologists have become concerned with measurement error.

What exactly is error?
The difference between a person’s true score and their observed score results from measurement error.
Some degree of measurement error occurs in most physical, social, and biological sciences.
Tests that are relatively free of measurement error are deemed to be reliable.

What factors might cause error in measurement?
Situational – testing methods
Illness of participant
Degree of participant’s physical comfort (ex. temperature)
Time factor (in pre-test, post-test)
If items (questions) are not clear.
If an interview/observation setting, disagreement among observers.

Classical Test Score Theory
Assumes that each person has a “true” score that would be obtained without error.
Each person’s score almost always differs from her true ability.
X (observed) = T (true) + E (error)
Assumes that errors of measurement are random.
Assumes that a person’s true score will not change with repeated testing; different scores are produced because of random error.
Uses standard error of measurement, which is a standard deviation of errors, to tell us how much a score differs from the true score.

Models of Reliability
    Time Sampling: Test-Retest Method
    Parallel or equivalent forms reliability
    Internal consistency: Split-Half Method
    KR20 Formula
    Coefficient Alpha

How Reliable is Reliable?
It is suggested that .70 to .80 is good enough in most cases
When tests are used to make important decisions about someone’s future, evaluators must be certain to minimize any error in classification.
As a result, for a decision that could affect someone’s future, evaluators should attempt to find tests with a reliability greater than .95
Low reliability can be addressed by increasing the length of the test or by throwing out items that run down the reliability