|   Navigational Menu
         MAIN MENU OVERVIEW OF STATISTICAL THINKINGLevels of Measurement
 Samples
 Descriptive Statistics
 Statistical Inference
 MICROCASEGetting Started
 File Management
 Data Management
 MICROCASEBasic Statistics Options
 UnivariateStatistics
 CrossTabulations
 ttest/ANOVA
 Mapping
 Scatterplot
 Correlation
 Regression
 AVAILABLE DATA SETS STATISTICAL SOURCES ON LINE  
       | MICROCASE
         BASIC STATISTICS CORRELATION
         CORRELATIONBdescribes
        the direction and strength of the relationship(s) between two (or more)
        variables.  In MicroCase the
        correlation coefficient is the Pearson product moment correlation;
        there are other correlation coefficients, but MicroCase does not
        calculate those under the CORRELATION heading. 
        Eta-squared is calculated on the  ANOVA  option; many of the
        measures of association appropriate for nominal or ordinal data are
        calculated under the statistics option on the  CROSS TABULATION option.
        
         1. 
        When you select the CORRELATION option, a window will appear for
        entering the variables you want correlated. 
        You must enter at least two variables; you may enter a much
        longer list.
         A. 
        CORRELATION is appropriate only when the variable attributes can
        be ordered along a numerical scale from high to low; ideally the
        variables will be intervally measured.
 B. 
        Correlation is less useful when there are a lot of tied scores;
        that is, when a lot of people or states have the same value on the
        variable, e.g., a large survey where a given variable has only a few
        response codes (e.g., agree/disagree).
 C. 
        Correlation is inappropriate with unordered categorical variables
        such as sex, race, ethnicity, or religion.
 1. 
        Note: many scholars create Adummy@ variables so that categorical data can be
        legitimately included in a correlation (and more likely, a regression)
        analysis.  A dummy variable
        is one with two values 1=presence of the attribute, 0=absence of the
        attribute.  Many times
        scholars work with a set of dummy variables. 
        For example 1=Protestant, 0=Not-Protestant; 1=Catholic,
        0=Not-Catholic; 1=Jew, 0=Non-Jew; 1=None; 0=Not-None.  The number
        of dummy variables created from a categorical variable will be one less
        than the number of attributes in the original variable. 
        To include a complete set of dummy variables (i.e., the same number of
        dummy variables as attributes in a categorical variable) creates a
        problem called "collinearity," which distorts regression
        coefficients.  Consult a
        statistics book for the logic behind the use of dummy variables and
        avoidance of collinearity.
 2. 
        Be sure you have eliminated responses such as Adon=t know@ or Ano
        answer.@ 
        If you have not already done that, you can use the SUBSET option
        on the CORRELATION screen to do so.
         
 3. 
        You may also use the subset option to restrict the correlation to
        only a part of the sample, for example, only males, only college
        graduates, only young persons, only Southern states.
 
 4. 
        There is no way to identify outliers in the CORRELATION option. 
        If outliers are a concern, identify them using the  SCATTERPLOT
        option; once you have identified the data point you want to exclude, you
        may do that in the CORRELATION option by using the SUBSET function.
 A. 
        Be careful here.  The results of the correlation analysis will not show you if
        you=ve
        eliminated inappropriate responses or limited the analysis to a subset
        of the population.  You need
        to keep careful notes of what you=ve
        done, so that you can reproduce your process (and the outcome!) if asked
        to.  Before you report a
        correlation coefficient in a paper, it is a good idea to do the analysis
        a second time and be sure you have done it correctly.
 5. 
        When you=ve entered the variables you want to correlate and any subset
        variables, click on OK in the upper right hand corner of the window.
        
         6. 
        The output for the correlation analysis is a table of correlation
        coefficients (called a correlation matrix). 
        If you are using two variables, the table will have two columns
        and two rows (a 2x2 table).  If you are using five variables, the table will have five
        columns and five rows (a 5x5 table).
         A. 
        When a variable is correlated against itself, the coefficient is
        always a perfect +1.00.  The set of perfect correlation coefficients define the
        diagonal of the correlation matrix; the coefficients above the diagonal
        are the same as the coefficients below. 
        Pearson correlation coefficients are symmetrical.
 B. 
        The number in parentheses below the correlation coefficient is
        the number of units (people, states, etc.) for whom there were data on
        both variables.  Be cautious
        in interpreting the coefficients if the number of units used in the
        calculation is small.  Some
        scholars require a sample size of 30; others are less stringent and will
        let you get away with more than 10.
 C. 
        The asterisks with some correlation coefficients report the level
        of statistical significance.
 D. 
        This CORRELATION screen also reports Cronbach=s
        Alpha, a
        measure of internal consistency for a set of variables, which is often
        used to assess the reliability of a set of questions. 
        (Cronbach=s
        alpha is calculated only if all the variables in the correlation
        analysis are positively related to one another.)
 E. 
        Note that you can switch from LISTWISE deletion of missing data
        (i.e., if a case is missing data on any variable, it is eliminated from
        the analysis of all variables) to PAIRWISE deletion of missing data
        (i.e., if a case is missing data on a variable, the case is deleted from
        only those relationships including that variable).
 7. 
        The icons in the top tool bar allow you to print or cut and paste
        a screen, review and cut and paste the FILE NOTES, review and cut and
        paste the variable definitions, return to the basic CORRELATION screen,
        or return to the MicroCase MENU screens. 
        
         for questions or comments contact me at mduncombe@coloradocollege.edulast updated on November 25, 2002
 |