MA 117 Introduction to Probability and Statistics 

Block 4, 2001

The Project

 

 



Project Background.

This project is an opportunity for you to apply the statistical methods you will learn in this class to a topic of your own choice. Below are the design features of the project which are required; the topic you choose to research is your choice. You will be spending a lot of time gathering and analysing the data, so it’s best to select a topic of personal interest to you - perhaps relevant to your major, or even just ‘campus trivia’ which interests you. you may extend the topic any way you wish.

There will be two preliminary phases to your project which will be due during the block; your completed statistical report will be due on the last day of class. On the second Tuesday you will turn in a statement of your topic selection. On the third Tuesday you will turn in your data together with basic summary statistics.

It is important that you start to think about your project topic soon (mostly because the data collection takes time). The design must include two types of variables - one quantitative and one dichotomous.

The Dichotomous Data

The dichotomous data can be viewed as a YES/NO response to some question. For example: Are you male? Do you drink coffee? Are you an upper-classman? Do you take recreational drugs?

The Quantitative Data

The quantitative data has numeric values. The differences between the variable values should carry comparitive meaning. Examples: What is your GPA? How many cups of coffee do you drink per day? How many hours do you sleep per night? An example which is not good would be What is your major? Although you can assign numeric values for students majors, the relative sizes of those numbers would not mean much.

One aim of the project is to determine what proportion of your ‘population’ falls into each dichotomous group. 

Examples:

For example, suppose that your population is the CC student body, your dichotomous variable is the response to "Do you drink coffee" and your quantitative variable is "How much sleep do you get each night". You would be able to determine the proportion of coffee drinkers on campus. Another goal of the project is to compare the two groups determined by the dichotomous variable. Suppose that the coffee drinkers get 9.6 hours sleep per night on average, and that the non-coffee drinkers get 9.8 hours of sleep per night. Is this a real difference? Is it just some random fluctuation due to chance error? A hypothesis test will enable you to make such decisions.

More Examples from previous years:

C (From 1998): What proportion of the population believe that President Clinton should be impeached, and is there a difference in educational level between the group which is for impeachment, and the group which is against?

C What proportion of students attended religious services regularly as children, and how often do they now attend religious services?

C What proportion of students are living off campus, and do the students living off campus own more potted plants than those living on campus?

C Do men and women on the swim-team score differently on the Rosenberg self-esteem scale? Is there a correlation with self-esteem and performance? 

C The dichotomous variables for this project: gender, parental income, upper vs. lower classmen. Quantitative question: How much involvement do students have on campus in community service.  

C Does dating affect school performance? 

C Are male students really better at spatial perception, and faster to identify geometric objects which have been rotated?

C Blood Pressure project: This project had several dichotomous variables: sex, alcoholism, smoking, stress, diet, and attempted to observed differences in blood pressure between groups. 

C Does the practice of martial arts cause more violence, or does it teach self control? Qualitative question: Have you taken a martial arts class? Quantitative question: How many fist-fights have you been in on campus each year? 

C Of all phone calls to the investigators dorm room, what proportion are for the investigator and what proportion are for her room-mate? Does the investigator's room-mate spend more minutes on a phone call than the investigator does herself?

C What proportion of drivers have red cars, and do they run the red light by a greater margin than other drivers?

C Project on peoples perceptions and beliefs about the use of animal research for developing drugs.

C What are the different levels of alcohol consumption between various campus groups (male / female, upper/lower classmen, etc.)

C By interviewing people outside the DMV, determine the proportion of people who drive a four wheel drive vehicle, and whether or not they are less accident prone.

C Is there a difference in education levels/GPA between creationists and those who believe man evolved from Apes.

C What proportion of CC students are in a sorority or fraternity, and do they have more tattoos?

C What proportion of CC students have tattoos, and do the number of body piercings differ between the tattooed and non-tattooed group?

C What proportion of CC students use marijuana, and do they have more tattoos?

C Do in state or out of state students spend more money on phone calls?

C  Do male students smoke more than female students?

C  Do people involved in organized campus activities which require 15 or more hours a week of time have a poorer GPA?

C  Are people who do not label themselves as feminists really feel the same way about women’s issues as people who do label themselves as feminists?

C Did women or men spend more time in the Colorado State Mental Institution, 1879 to 1889?

C Do people who claim to be at their ideal weight really weigh less than those who do not claim to be at their ideal weight?

 

Examples of How to take Surveys:

C Telephoning students from a random sample of 100 registered students obtained from the registrar.

C Worner box survey ... expect  a 20 % response rate!

C Colorado Springs telephone survey.

C Design a survey and ask people at Worner centre to complete it.

C Watching and recording data directly - e.g. observing traffic patterns.

 

The size of the sample should be between 50 and 100, but it may be smaller in special cases. Discuss it with me if a sample of size at least 50 is not possible. The method of making the random selection is up to you, but you should decide on a method before you begin and then select all of your subjects consistently. In your eventual write-up you will comment on any biases which may have been introduced by your method. You must select your own sample, but your sources may include college offices, local businesses, government agencies, laboratory experiments or a survey you conduct. Randomly generated lists of 100 CC students (first names and phone numbers) will be available. Your grade on the project will be based predominantly on your final report.

 

Phase I: Project Proposal Monday December 3rd at 5 p.m. Turn in a statement of
  • The population you will study
  • The means by which you will select a sample of that population.
  • Your dichotomous variable.
  • Your quantitative variable.

You can turn in your proposal in on recycled paper, or by e-mail if you wish.

 
Phase II: Data Collection and Analysis Monday Dec. 10th at 5 p.m. Collect the two types of data (dichotomous and quantitative). Submit a copy only and keep your original!! Include the following:

A.   A cover page which restates

  1. The population
  2. The Dichotomous Variable, together with the percentage of your sample in each dichotomous category.
  3. The Quantitative Variable

B.   For each dichotomous group, the averages, medians, modes, ranges, and standard deviation of your quantitative variable.

C.   The raw data (use SPSS or a similar statistical program to record your data)

D.   Frequency distributions

  1. one for your quantitative variable over the entire sample
  2. one for your quantitative variable over the 'YES' group
  3. one for your quantitative variable over the 'NO' group

E.   Frequency Histograms

  1. one for your quantitative variable over the entire sample
  2. one for your quantitative variable over the 'YES' group
  3. one for your quantitative variable over the 'NO' group

Histograms and descriptive statistics may be carried out on SPSS. Refer to our first lab session for methods. This part of the project can be turned in on recycled paper.

 
Phase III: The Paper, due Tuesday December 18th. Guidelines: The paper should be typed, and double spaced, at least five pages long excluding the charts and computations. This should be turned in as a formal report typed on fresh paper.

Discuss the Data Collection Phase:

We’ve spent quite a bit of time in class discussing methods of data collection. Describe your method in detail, including any practical difficulties you had during the collection phase. If you did a survey, discuss the responses from the subjects of the survey (cooperative, evasive, hostile etc.). Discuss any biases you suspect may have been present in your procedure. What sort of effects you think they had on the investigation? Do you think your method was truly random? How might you improve your method?

Required Tests and Computations.

C For the dichotomous variable, find the sample percentage of the ‘yes’ responses. Name the parameter that you are estimating. What is the standard error? Give 68% and 95% confidence intervals for your parameter. Do the same for the ‘no’ group.

C For the ‘yes’ group of the dichotomous variable, compute the sample average of your quantitative variable. Name the parameter that you are estimating. What is the standard deviation? What is the standard error? Give 68% and 95% confidence intervals for your parameter. Do the same for the ‘no’ group.

C Carry out a hypothesis test to decide whether the difference in the averages of the quantitative variable for your two groups is significant, or whether it can be explained away by chance error. Be certain that the assumptions for the test you use are satisfied.

Correct grammar and usage of statistical terms and statistical principles are important. Don’t forget that the resources of the writing center are available to help you. There will be a possibility of obtaining extra credit (up to about 10% or so). You may for example look for a correlation between two quantitative variables, or carry out a test for independence over several categories of your sample. You may use SPSS to create additional histograms/calculations. However, organize your SPSS output. Include only what is relevant, and explain the purpose/procedures/results of those tests.

 

Conclusions:

Comment on any skew or unusual shape in your histogram. Comment on the hypothesis test(s) you used and the significance level(s), and explain why those tests were appropriate. Speculate on anything in the data that might need further investigation.