Correlation and Regression
Psychology of Testing & Measurements
Lecture, Chapter 3

Parts of a Relationship between 2 Variables
Scatter Diagram – a picture of a relationship between 2 variables that allows visual inspection

Correlation – a mathematical description of whether or not two variables covary

see p. 66 for graphs of the following relationships:
    Positive correlation – As X increases, Y increases.
    Negative correlation – As X increases, Y decreases.
    No correlation – As X increases, Y stays the same.
Correlation Coefficient – a mathematical index describing the direction and magnitude of a relationship, ranging from -1.0 to 1.0 (0 = no correlation)

Regression/Correlation
Predictions can be made about one variable from another using a technique called regression.

A regression line is the centermost straight line through a set of points in a scatter plot.

Regression
lRegression line is found by minimizing the squared deviation around the regression line.
lThe regression line is the running mean or the line of least squares in two dimensions.
lPredictions about one variable (Y) can be made from a score on another variable (X) by using the regression equation:

a = Y - bX

lUsing the previous equation, you can predict performance on Y by knowing X; the predicted value is
lCorrelation is a special case of regression in which the scores for both variables are in standardized, or Z, units.
–Because both are in Z scores, they both have a mean of 0; thus, the intercept will always be 0 (when X is 0, Y is also 0)
–The equation becomes Y = rX.
lActual and predicted scores are rarely the same, so the difference is called the residual.

 

Terminology in Correlations 
lStandard error of estimate = measure of accuracy of the prediction; consists of the standard deviation of the residuals.
lCoefficient of determination = the proportion of the total variation in Y as a function of X:
 
lCoefficient of alienation = measure of nonassociation between two variables (square root of 1 – coefficient of determination; note that this is not a direct inverse)
lShrinkage = the amount of decrease observed when a regression equation is created for one population then applied to another.
 
Causation versus Correlation
lThere exists a very significant danger of incorrect conclusions when inferring causation from correlation.
 
lOne study found a correlation between ice cream sales and drownings.
–One does not cause the other – they are both related to a third variable, summer
–Even then we cannot infer that summer causes ice cream eating and/or drowning because there are examples of both happening in the winter also.
–
lAnother study found a significant association between IQ score and academic achievement.
 
Pearson Correlation
lPearson product moment correlation coefficient
–a ratio used to determine the degree of variation in one variable based on knowledge of variation in the other (both continuous)
–

                         r  = ∑(dx)(dy)                                                                                       ________

                              N(Sx)(Sy)

 

–where
lr = correlation coefficient
l∑ = sum of the values
ldx = deviation from the mean of the first set of scores
ldy = deviation from the mean of the second set of scores
lN = number of scores

Causation versus Correlation
There exists a very significant danger of incorrect conclusions when inferring causation from correlation.
One study found a correlation between ice cream sales and drownings.
One does not cause the other – they are both related to a third variable, summer
Even then we cannot infer that summer causes ice cream eating and/or drowning because there are examples of both happening in the winter also.
Another study found a significant association between IQ score and academic achievement.

Other Correlation Coefficients
Pearson product moment correlation coefficient – a ratio used to determine the degree of variation in one variable based on knowledge of variation in the other (both continuous)

Dichotomous variables – only 2 levels (naturally occurring = true; continuous scales forced into dichotomy = artificial)

Spearman’s rho – association between two sets of ranks (ordinal variables).
Biserial correlation – 1 continuous and 1 artificial dichotomous variable
Point biserial correlation – 1 continuous and 1 true dichotomous variable
Phi coefficient – both are dichotomous and at least 1 is true
Tetrachoric correlation – both dichotomous variables are artificial

Multivariate Analysis
Multivariate analysis considers the relationship among combinations of two or more predictor variables and one criterion using a linear combination.
    Multiple regression
    Discriminant Analysis
Factor Analysis – a type of analysis used to study the interrelationships among a set of variables without reference to a criterion.