| Navigational Menu MAIN MENU OVERVIEW OF STATISTICAL THINKINGLevels of Measurement
 Samples
 Descriptive Statistics
 Statistical Inference
 MICROCASEGetting Started
 File Management
 Data Management
 MICROCASEBasic Statistics Options
 UnivariateStatistics
 CrossTabulations
 ttest/ANOVA
 Mapping
 Scatterplot
 Correlation
 Regression
 AVAILABLE DATA SETS STATISTICAL SOURCES ON LINE | A
      sample is a sub-set of the population that is under investigation. 
      There are three considerations about samples that will shape the decision
      about which statistical test to use: 
        Is the sample(s) representative of the population
          from which it was drawn?What is the sample(s) size?Are the samples independent (unpaired) or
          dependent (paired or related)? RepresentativenessIf a researcher has data about the entire
      population, then descriptive statistics may be used to describe the
      population.  If a researcher has data about the entire population,
      there is no generalization involved.  If, however, a researcher has
      data about only a sub-set of the population, the researcher may want to
      use statistical inference to generalize to the whole population.
 Statistical inference requires that the sample be
      representative of the population from which it was drawn.   The
      only way to know whether a sample is representative is to study the whole
      population as a comparison.  If a researcher had data about the whole
      population, he or she would not bother with a sample at all.  Thus
      researchers hope that samples represent the population from which they are
      drawn, but there is no guarantee.  For this reason, research
      emphasizes the importance of replication.  Researcher A may derive
      important findings, but if Researchers B, and C, and D (and so forth)
      can't replicate the findings, then the scholarly community suspects that
      Researcher A's sample may have been unrepresentative--a fluke.
 The best way to maximize representativeness is to
      draw a probability sample--a sample in which every element in the
      population has a known probability of being selected into the
      sample.  See the Quantitative Research pages for more information
      about probability samples.
 Probability samples are the best, but not always
      possible or practical.  A probability sample requires a list of all
      the elements in the population under consideration*; often such lists do
      not exist (e.g., people with low self-esteem).  Even when they do
      exist, contacting every selected element may be a daunting task; a low
      response rate ruins the best probability sample.  The response rate
      is given by the equation: [(drawn sample size-number who refused or could
      not be reached)/drawn sample size]x100.  Anything in the 75% range is
      considered good; mailed surveys, even with two or three follow-ups, often
      achieve less than 50% response rates.
 Researchers make an argument for the
      representativeness of non-probability samples in two ways:  1) by
      comparing the sample to whatever characteristics are known about the
      population based on the assumption that representativeness on some
      characteristics may translate to representativeness on all
      characteristics; 2) by comparing early responders and late responders
      based on the assumption that nonresponders are more like late responders
      than like early responders.
 Sample SizeSample size is most often determined by
      practical considerations--how much time and resources does the research
      have?  In situations of unlimited time and a lot of resources, sample
      size is determined by three factors:
 
        The hypothesized distribution of the dependent
          variable, expressed as a dichotomy.  In the 2000 US presidential
          election, for example, approximately 50% of the voters supported
          George Bush and approximately 50% supported other candidates.  In
          satisfaction surveys of college students, approximately 90% of
          graduating seniors say they would probably or definitely choose the
          same school again and 10% say they would probably or definitely not
          choose the same school again.  If a researcher has no idea what
          the distribution of the dependent variable will be, the best choice is
          50%/50% as this distribution results in the largest sample size.The margin of error the researcher is willing to
          tolerate.  In pre-election, candidate-preference polling, results
          are often reported as 45% support candidate X with a margin or error
          or +/-3%.  In other words, the poll predicts that between 42% and
          48% of the population favors candidate X.  A given researcher may
          be willing to accept a 10% margin of error (35%-55%) or may want the
          margin of error to be as small as 1%(44%-46%).  The smaller the
          margin of error the larger the required sample size.Degree of confidence that the sample results
          represent the population.  Commonly researchers use 95% or 99%
          confidence.  The greater the confidence desired the larger the
          required sample size.      There are formulas to
      determine required sample size based on these three factors.  Using
      one of the on-line sample size calculators is easier, however.      http://www.surveysystem.com/sscalc.htm If a researcher wanted to be 95% confident about a
      50/50 percentage with a margin of error of 5% from a population sized
      2000, the researcher would need a sample size of 322.  99% confidence
      in the same situation would require a sample of 500. Number and Nature of the SamplesMuch social science research involves the
      comparison of two or more groups, for example
 
        Are the political attitudes of incoming first
          years and graduating seniors different?Do competitive athletes have better mental health
          than recreational athletes? Do either of these groups have better
          mental health than couch potatoes? Are these differences the same for
          men as for women?Do states with an above average income tax rate
          have higher high school graduation rates than states with a below
          average income tax rate?      Some statistical tests can
      be used with any number of samples; other statistical tests are
      appropriate for a two-sample comparison only.Whether a two-sample or a multi-sample
      comparison, there are different tests for independent and dependent
      samples.
 Dependent (or paired or related)
      samples occur whenever there is reason to suspect that the responses of
      one member of one sample is dependent upon the responses of a specific
      other member(s) of the other sample(s).  For example, the political
      attitudes of a particular graduating senior might be dependent upon or
      related to that person's political attitudes as an incoming first-year
      student.  A husband's marital satisfaction might well depend upon his
      wife's marital satisfaction.  Dependent samples occur when there is a
      good reason to pair members of the samples.
 Independent (or unpaired) samples occur
      when there is no reason to pair a respondent in one sample with a
      particular respondent in the other sample(s). An independent sample test
      might well have a sample of husbands and a sample of wives, but these
      folks would not be married to each other.  Whenever there are
      different sample sizes for the two or more samples, an independent sample
      test has been used.
 With dependent samples, the paired scores are
      compared and the differences are summarized.  With independent
      samples, each sample is summarized and the group summaries are compared.
 ____________________
 *National polling organizations draw samples representative
      of the adult, non-institutionalized, English-speaking, mainland US
      population without such a comprehensive list of people.  Such
      organizations start with a list of census tracks and take a probability
      sample of these.  Then they obtain the census track maps for the
      selected tracks and take a sample of blocks within each census
      track.  At the block level, they hire an individual who walks the
      block and makes a list of the number of households.  The organization
      then draws a sample of households in each of the selected blocks. 
      Interviewers approach each household and make a list of the eligible
      respondents in the household.  From this list the interviewer selects
      the person to be interviewed using a pre-established probability
      formula.  Such a sampling plan is called a multi-stage cluster
      sample.
 |