Navigational Menu
MAIN MENU
OVERVIEW OF STATISTICAL THINKING
Levels of Measurement
Samples
Descriptive Statistics
Statistical Inference
MICROCASE
Getting Started
File Management
Data Management
MICROCASE
Basic Statistics Options
UnivariateStatistics
CrossTabulations
ttest/ANOVA
Mapping
Scatterplot
Correlation
Regression
AVAILABLE DATA SETS
STATISTICAL SOURCES ON LINE
|
MICROCASE
BASIC STATISTICS
CORRELATION
CORRELATIONBdescribes
the direction and strength of the relationship(s) between two (or more)
variables. In MicroCase the
correlation coefficient is the Pearson product moment correlation;
there are other correlation coefficients, but MicroCase does not
calculate those under the CORRELATION heading.
Eta-squared is calculated on the ANOVA option; many of the
measures of association appropriate for nominal or ordinal data are
calculated under the statistics option on the CROSS TABULATION option.
1.
When you select the CORRELATION option, a window will appear for
entering the variables you want correlated.
You must enter at least two variables; you may enter a much
longer list.
A.
CORRELATION is appropriate only when the variable attributes can
be ordered along a numerical scale from high to low; ideally the
variables will be intervally measured.
B.
Correlation is less useful when there are a lot of tied scores;
that is, when a lot of people or states have the same value on the
variable, e.g., a large survey where a given variable has only a few
response codes (e.g., agree/disagree).
C.
Correlation is inappropriate with unordered categorical variables
such as sex, race, ethnicity, or religion.
1.
Note: many scholars create Adummy@ variables so that categorical data can be
legitimately included in a correlation (and more likely, a regression)
analysis. A dummy variable
is one with two values 1=presence of the attribute, 0=absence of the
attribute. Many times
scholars work with a set of dummy variables.
For example 1=Protestant, 0=Not-Protestant; 1=Catholic,
0=Not-Catholic; 1=Jew, 0=Non-Jew; 1=None; 0=Not-None. The number
of dummy variables created from a categorical variable will be one less
than the number of attributes in the original variable.
To include a complete set of dummy variables (i.e., the same number of
dummy variables as attributes in a categorical variable) creates a
problem called "collinearity," which distorts regression
coefficients. Consult a
statistics book for the logic behind the use of dummy variables and
avoidance of collinearity.
2.
Be sure you have eliminated responses such as Adon=t know@ or Ano
answer.@
If you have not already done that, you can use the SUBSET option
on the CORRELATION screen to do so.
3.
You may also use the subset option to restrict the correlation to
only a part of the sample, for example, only males, only college
graduates, only young persons, only Southern states.
4.
There is no way to identify outliers in the CORRELATION option.
If outliers are a concern, identify them using the SCATTERPLOT
option; once you have identified the data point you want to exclude, you
may do that in the CORRELATION option by using the SUBSET function.
A.
Be careful here. The results of the correlation analysis will not show you if
you=ve
eliminated inappropriate responses or limited the analysis to a subset
of the population. You need
to keep careful notes of what you=ve
done, so that you can reproduce your process (and the outcome!) if asked
to. Before you report a
correlation coefficient in a paper, it is a good idea to do the analysis
a second time and be sure you have done it correctly.
5.
When you=ve entered the variables you want to correlate and any subset
variables, click on OK in the upper right hand corner of the window.
6.
The output for the correlation analysis is a table of correlation
coefficients (called a correlation matrix).
If you are using two variables, the table will have two columns
and two rows (a 2x2 table). If you are using five variables, the table will have five
columns and five rows (a 5x5 table).
A.
When a variable is correlated against itself, the coefficient is
always a perfect +1.00. The set of perfect correlation coefficients define the
diagonal of the correlation matrix; the coefficients above the diagonal
are the same as the coefficients below.
Pearson correlation coefficients are symmetrical.
B.
The number in parentheses below the correlation coefficient is
the number of units (people, states, etc.) for whom there were data on
both variables. Be cautious
in interpreting the coefficients if the number of units used in the
calculation is small. Some
scholars require a sample size of 30; others are less stringent and will
let you get away with more than 10.
C.
The asterisks with some correlation coefficients report the level
of statistical significance.
D.
This CORRELATION screen also reports Cronbach=s
Alpha, a
measure of internal consistency for a set of variables, which is often
used to assess the reliability of a set of questions.
(Cronbach=s
alpha is calculated only if all the variables in the correlation
analysis are positively related to one another.)
E.
Note that you can switch from LISTWISE deletion of missing data
(i.e., if a case is missing data on any variable, it is eliminated from
the analysis of all variables) to PAIRWISE deletion of missing data
(i.e., if a case is missing data on a variable, the case is deleted from
only those relationships including that variable).
7.
The icons in the top tool bar allow you to print or cut and paste
a screen, review and cut and paste the FILE NOTES, review and cut and
paste the variable definitions, return to the basic CORRELATION screen,
or return to the MicroCase MENU screens.
for questions or comments contact me at mduncombe@coloradocollege.edu
last updated on November 25, 2002
|