Navigational Menu

MAIN MENU

OVERVIEW OF STATISTICAL THINKING

MICROCASE

MICROCASE
BASIC STATISTICS OPTIONS

AVAILABLE DATA SETS

STATISTICAL SOURCES ON LINE

QUANTITATIVE METHODS

MICROCASE

BASIC STATISTICS

SCATTERPLOT

SCATTERPLOT--Graphing the relationship between two variables (both of which should be intervally measured or at least arrayed along a continuum).

1.  When you select the SCATTERPLOT option, a window appears in which you indicate your independent variable (X-axis) and your dependent variable (Y-axis).
A.  You can also indicate if you want a subset of the cases included in the scatterplot.
B.  If you have a large sample or a small number of attributes on either variable, the scatterplot may be difficult to interpret because it will put cases with the same score on top of each other, but looking at a two-dimension screen or reading a two-dimensional printout, you will not be able to determine that a number of cases are at the same point.  If this occurs, you might get a clearer scatterplot by restricting the analysis to a subset of the original data set.

2.  When you have indicated your independent and dependent variables and any subset variables, click on the OK button in the upper right corner of the screen.
A.  The outcome is a graph of the relationship between the two variables.
B.  Remember scatterplots are asymmetrical; if you reverse the independent and dependent variables, you will get different graphs.
C.  The correlation coefficient (Pearson r), the number of cases in the analysis, and the number of cases with missing data are printed below the scatterplot. The probability value refers to the statistical significance of the correlation coefficient.
D.  Click on CASE on the left side of the screen to find the point that represents a particular state.
E.  Click on REGRESSION LINE on the left side of the screen.  Doing this will add the regression line to the scatterplot and the regression equation (the slope is unstandardized) to the information below the scatterplot.  You can also look at the residuals (deviations of actual points from the regression line) by clicking on RESIDUAL on the left side of the screen.  To remove the regression line, click a second time on REG.LINE.
F.  Outliers (unusual or extreme scores) exert a powerful distorting effect on the slope of a regression line and the correlation coefficient.  Note that there is an option to identify and then eliminate the outliers.  Click on OUTLIER on the left side of the screen.  A data point will be highlighted, its values on independent and dependent variables identified, the new correlation coefficient if it were to be eliminated calculated.  You then choose whether to leave it in (click a second time on OUTLIER) or take it out (click on REMOVE).  When using Ecological Data you can also click on individual points in the scatterplot and evaluate their impact on the correlation coefficient; once you have highlighted a point, you can take it out by clicking on REMOVE.  Be very careful with this option; be sure to have a rationale for removing a data point--more than just a bigger correlation coefficient.  Once you
=ve removed a data point, you can not add it back in.  You have to redo the analysis (click on the counter clockwise arrow in the tool bar; then click on OK in the upper right corner of the window).

3.  The icons in the top tool bar allow you to print or cut and paste a screen, review and cut and paste the FILE NOTES, review and cut and paste the variable definitions, return to the basic SCATTERPLOT screen, or return to the MicroCase MENU screens.

for questions or comments contact me at mduncombe@coloradocollege.edu
last updated on August 19, 2003