Navigational Menu

MAIN MENU

OVERVIEW OF STATISTICAL THINKING
Levels of Measurement
Samples
Descriptive Statistics
Statistical Inference

MICROCASE
Getting Started

File Management
Data Management

MICROCASE
Basic Statistics Options
UnivariateStatistics

CrossTabulations

ttest/ANOVA

Mapping

Scatterplot

Correlation

Regression

AVAILABLE DATA SETS

STATISTICAL SOURCES ON LINE

 

MICROCASE

BASIC STATISTICS
CROSS-TABULATION

CHI-SQUARE
TEST OF INDEPENDENCE

     Chi-Square is a statistical test which determines the probability that any relationship evident in tabular data is due to sampling error alone.  Chi-Square is a test of statistical inference, although there are a number of measures of association derived from it.  There is another Chi-Square test called the "Goodness-of-Fit" test, also called the "Chi-Square one sample test," but this statistic is not available in MicroCase.  Sidney Siegel discusses both Chi-Square tests in his Non-Parametric Statistics for the Behavioral Sciences (McGraw Hill, 1956).  Additionally, Logistic Regression generates a Chi-Square value that tests the significance of likelihood ratios.  Logistic Regression is available in MicroCase under the Advanced Statistics menu.  Consult Fred C. Pampel, Logistic Regression: A Primer (Sage University Paper, Quantitative Applications in the Social Sciences, #132, 2000) for a discussion of Logistic Regression.

     Chi-Square Test of Independence determines the probability that the relationship evident in the sample data could be due to sampling error if the two variables under consideration are indeed independent of one another in the population from which the sample was drawn.  

Table 1.  Self-Reported Marital Happiness by Amount of Housework (also self-reported) Done by the Respondent in Percentages. (From 1996 GSS)
Self-Reported
Extent of
Marital
       Amount of Housework Done by Respondent (self-reported
Happiness     ALL/MOST    HALF/SOME   LITTLE/NONE    TOTAL

VERY HAPPY    48.8%       65.0%       67.9%          59.3%
PRETTY HAPPY  48.0%       33.0%       28.6%          38.2%
NOT TOO HAPPY  3.3%        2.0%        3.6%           2.5%
TOTAL        100.0%      100.0%      100.0%         100.0%

(N)         (244)     (403)      (28)         (675)

     Table 1 shows a moderate relationship between perceived sharing of housework and happiness with one's marriage, but maybe this relationship is a fluke produced by the bad luck of the GSS having drawn an unrepresentative sample.  The Chi-Square test determines the probability that if there were NO relationship between shared housework and marital happiness, what is the probability that sampling error could have produced such an unrepresentative sample that the GSS would have obtained data showing an apparent relationship.

Table 0.  Hypothetical Data for the Relationship Between Self-Reported Marital Happiness and Amount of Housework Done by the Respondent. 
Self-Reported
Extent of
Marital
       Amount of Housework Done by Respondent (self-reported
Happiness     ALL/MOST    HALF/SOME   LITTLE/NONE    TOTAL

VERY HAPPY    59.3%       59.3%       59.3%          59.3%
PRETTY HAPPY  38.2%       38.2%       38.2%          38.2%
NOT TOO HAPPY  2.5%        2.5%        2.5%           2.5%
TOTAL        100.0%      100.0%      100.0%         100.0%

(N)         (244)     (403)      (28)         (675)

     Table 0 shows NO relationship between perceived sharing of housework and happiness with one's marriage; in other words, Table 0 shows the table that we would have obtained if sharing of housework and marital happiness are independent of each other in the population and if we chose a representative sample of that population. The null hypothesis tested by the Chi-Square Test of Independence is that Table 0 is the relationship in the population.  We reject the null hypothesis when we conclude that the probability is low that if Table 0 were true we could obtain Table 1 by sampling error alone.  We retain the null hypothesis when the probability is high that if Table 0 were true, we could obtain Table 1 by sampling error alone.

     Chi-Square is calculated by comparing the observed frequencies to the expected frequencies.  To calculate Chi-Square you would first convert both Table 1 and Table 0 to frequencies.    The formula is:    

                                              reproduced from http://www.zephryus.demon.co.uk/geography/resources/fieldwork/stats/chi.html      


Observed (normal font) and Expected (italicized font)frequencies for the Relationship Between Self-Reported Marital Happiness and Amount of Housework Done by the Respondent. 
Self-Reported
Extent of
Marital
       Amount of Housework Done by Respondent (self-reported
Happiness
     ALL/MOST    HALF/SOME   LITTLE/NONE   
Total
VERY HAPPY    119         262         19             400
              144.6       238.8       16.6
PRETTY HAPPY  117         133         8              258
              93.3        154.0       10.7    
NOT TOO HAPPY 8           8           1              17 
              6.1         10.1        0.7
Total        244        403       28           675
  

Fortunately MicroCase has already done that process for you  (see "Expected" in the gray menu of tables on the left side of the MicroCase Cross-Tabulation screen).  To calculate Chi-Square, for each cell of the table 
     1) subtract the expected frequency from the observed frequency;
     2) square the result of the subtraction in step 1 (which functions to eliminate negative numbers);
     3) divide the squared difference of step 2 by the expected frequency of that cell;
     4) sum the result of step 3 for all the cells.

The calculations would look like:
Chi-Square=[(119-144.6)2/144.6]+[(262-238.8)2/238.8]+[(19-16.6)2/16.6]+[(117-93.3)2/93.3]+
[(133-154)2/154]+[(8-10.7)2/10.7]+[(8-6.1)2/6.1]+[8-10.1)2/10.1]+[(1-0.7)2/0.7]=17.865

Fortunately, MicroCase has done these calculations for you.  To obtain the calculated Chi-Square value, click on the Summary dot under the Statistics menu on the left side of the Cross-Tabulation screen.

     The Chi-Square value is reported as one of the nominal statistics because Chi-Square makes no assumptions about the level of measurement of the two variables.  This makes Chi-Square a very useful statistical technique.  The Chi-Square value is reported as:

           Chi-square: 17.865 (DF=4  Prob.=0.001)

DEGREES OF FREEDOM
     DF refers to "Degrees of Freedom" which refers to the number of cells in a chi-square table that are free to vary before all other cells are determined.

Hypothetical Table A

 

Letter VARIABLE

Number Variable

Attribute 1

Attribute 2

Total

Attribute A

(a)

(b)

54

Attribute B

(c)

(d)

79

Total

67

66

133

Cell  "a" can be any number, as can cell "b" or cell "c" or cell "d."  At this point all four cells are free to vary.  The concept of degrees of freedom is how many cells are free to vary before the all the other cells are known.

Hypothetical Table B

 

Letter VARIABLE

Number Variable

Attribute 1

Attribute 2

Total

Attribute A

15

 

54

Attribute B

 

 

79

Total

67

66

133

In this table once cell "a" is known, all the other cells are also known.  Cells "b," "c," and "d" are no longer free to vary.  If cell "a" is 15, cell "b" must be 39, and cell "c" must be 52.  If cell "b" is 39 and cell "c" is 52, cell "d" must be 27.  In this case the DF=1. 
 


     Degrees of freedom are important because the chi-square sampling distibution is really a family of distributions depending on the size of the table (see the figure on the right for the curve for four different degrees of freedom).  Chi-square distributions are unreliable if the expected frequencies are small.  The rule of thumb is that at least 80% of cells must have expected frequencies larger than 5.  
reproduced from:
http://www.itl.nist.gov/div898/handbook/eda/section3/gif/chsppf4.gif

PROBABILITY
     The "Prob." value tells the chances that if the two variables were independent (i.e., no relationship) in the population, a researcher could have obtained the evidence of dependence (i.e. a relationship) in the sample data by sampling error alone.  In this case the probability is one in one thousand.  Although this sample might be the one in one thousand times, it is so unlikely that the researcher will likely conclude that probably it did not happen in this case.

for questions or comments contact me at mduncombe@coloradocollege.edu
last updated on October 25, 2002