Navigational Menu

MAIN MENU

OVERVIEW OF STATISTICAL THINKING

MICROCASE

MICROCASE
BASIC STATISTICS OPTIONS

AVAILABLE DATA SETS

STATISTICAL SOURCES ON LINE

QUANTITATIVE METHODS

MICROCASE

BASIC STATISTICS

CORRELATION

CORRELATIONBdescribes the direction and strength of the relationship(s) between two (or more) variables.  In MicroCase the correlation coefficient is the Pearson product moment correlation; there are other correlation coefficients, but MicroCase does not calculate those under the CORRELATION heading.  Eta-squared is calculated on the ANOVA option; many of the measures of association appropriate for nominal or ordinal data are calculated under the statistics option on the CROSS TABULATION option.

1.  When you select the CORRELATION option, a window will appear for entering the variables you want correlated.  You must enter at least two variables; you may enter a much longer list.  
A.  CORRELATION is appropriate only when the variable attributes can be ordered along a numerical scale from high to low; ideally the variables will be intervally measured.
B.  Correlation is less useful when there are a lot of tied scores; that is, when a lot of people or states have the same value on the variable, e.g., a large survey where a given variable has only a few response codes (e.g., agree/disagree).
C.  Correlation is inappropriate with unordered categorical variables such as sex, race, ethnicity, or religion.
      1.  Note: many scholars create Adummy@ variables so that categorical data can be legitimately included in a correlation (and more likely, a regression) analysis.  A dummy variable is one with two values 1=presence of the attribute, 0=absence of the attribute.  Many times scholars work with a set of dummy variables.  For example 1=Protestant, 0=Not-Protestant; 1=Catholic, 0=Not-Catholic; 1=Jew, 0=Non-Jew; 1=None; 0=Not-None.  The number of dummy variables created from a categorical variable will be one less than the number of attributes in the original variable.  To include a complete set of dummy variables (i.e., the same number of dummy variables as attributes in a categorical variable) creates a problem called "collinearity," which distorts regression coefficients.  Consult a statistics book for the logic behind the use of dummy variables and avoidance of collinearity.

2.  Be sure you have eliminated responses such as Adon=t know@ or Ano answer.@  If you have not already done that, you can use the SUBSET option on the CORRELATION screen to do so.  

3.  You may also use the subset option to restrict the correlation to only a part of the sample, for example, only males, only college graduates, only young persons, only Southern states.

4.  There is no way to identify outliers in the CORRELATION option.  If outliers are a concern, identify them using the SCATTERPLOT option; once you have identified the data point you want to exclude, you may do that in the CORRELATION option by using the SUBSET function.  
A.  Be careful here.  The results of the correlation analysis will not show you if you
=ve eliminated inappropriate responses or limited the analysis to a subset of the population.  You need to keep careful notes of what you=ve done, so that you can reproduce your process (and the outcome!) if asked to.  Before you report a correlation coefficient in a paper, it is a good idea to do the analysis a second time and be sure you have done it correctly.

5.  When you=ve entered the variables you want to correlate and any subset variables, click on OK in the upper right hand corner of the window.

6.  The output for the correlation analysis is a table of correlation coefficients (called a correlation matrix).  If you are using two variables, the table will have two columns and two rows (a 2x2 table).  If you are using five variables, the table will have five columns and five rows (a 5x5 table).  
A.  When a variable is correlated against itself, the coefficient is always a perfect +1.00.  The set of perfect correlation coefficients define the diagonal of the correlation matrix; the coefficients above the diagonal are the same as the coefficients below.  Pearson correlation coefficients are symmetrical.
B.  The number in parentheses below the correlation coefficient is the number of units (people, states, etc.) for whom there were data on both variables.  Be cautious in interpreting the coefficients if the number of units used in the calculation is small.  Some scholars require a sample size of 30; others are less stringent and will let you get away with more than 10.
C.  The asterisks with some correlation coefficients report the level of statistical significance.
D.  This CORRELATION screen also reports Cronbach
=s Alpha, a measure of internal consistency for a set of variables, which is often used to assess the reliability of a set of questions.  (Cronbach=s alpha is calculated only if all the variables in the correlation analysis are positively related to one another.)
E.  Note that you can switch from LISTWISE deletion of missing data (i.e., if a case is missing data on any variable, it is eliminated from the analysis of all variables) to PAIRWISE deletion of missing data (i.e., if a case is missing data on a variable, the case is deleted from only those relationships including that variable).

7.  The icons in the top tool bar allow you to print or cut and paste a screen, review and cut and paste the FILE NOTES, review and cut and paste the variable definitions, return to the basic CORRELATION screen, or return to the MicroCase MENU screens. 

for questions or comments contact me at mduncombe@coloradocollege.edu
last updated on November 25, 2002