Unit 4 Concept Map

Samples and Surveys

Observational Studies and Experiments

Non-Sampling Error

Vocabulary of Experiments

Sampling Error

A convenience sample is when people are chosen based on the fact that they're easy to talk to. However, it usually creates underrepresented data. If I went to a football field and asked everyone what they're favorite sport is, it would likely be an unrepresentative sample because they all probably love football.

Wording Bias is when a question is worded specifically to make people answer the question in a certain way. If I asked people "Do you believe in aliens?" most people would say no. However, if I asked "Do you believe in the potential for intelligent life somewhere in the universe?" the responses would be quite different.

Bias

Bias is the systematic changing of parameters in a population in order to effect the samples outcomes in a favorable way. For example if you wanted to see how many people enjoy meat and you sample at a vegetarian resturant, there will be a bias because the only people in the survey do not like meat.

Good Sampling Techniques

A Simple Random Sample, or SRS, is when any individual in a population has an equal chance of being chosen, and every set of individuals from the sample has an equal chance of being chosen. If we wanted to take a a random 3 kids from this class, we could write everyone's names down on a piece of paper and choose 3, giving us a SRS. An SRS is a truly random sample because everyone has an equal chance of being chosen. However, it is difficult to use an SRS with a very large group of people.

A Stratified Simple Random Sample is when a large population is separated into more similar individuals into "strata." From these strata, we can take an SRS from each strata to get a representation of the entire population. A benefit to strata is that specific groups can be more accurately represented, and that all different groups can be represented equally by being split up. However, clusters may have less variability because they are from specific groups.

A Cluster Sample is when a population is divided into smaller groups that are meant to represent the total population. Although it is easier to take a cluster sample because we will work with smaller groups of people, it is difficult to find a cluster that accurately represents the total population.

Example of Survey

A voluntary response sample is when people make the decision to respond to a general question to the public. These samples usually are biased because people who have a strong opinion on the question are more likely to respond than people who don't really have an opinion. For example, people who respond to political phone surveys are not necessarily representative of the population because they feel strongly about political issues.

Under-coverage occurs when a sample is unrepresentative of a population. This can occur through many ways of sampling errors. For example, if I went to a student council meeting and asked everyone how they feel about participating in school activities, most would say they loved to participated. However, this wouldn't accurately represent the total population of Duxbury High School because most people may not do any activity.

Bias always has a direction, or inclination in favor of one thing or another. Bias also always has a source, or the reason that a sample has bias and the reason that it favors a certain outcome. An example of this is when you have a survey to find out the number of high school students who own a football helmet, and you only ask members of the football team. Bias is evident here and the direction of the bias is in favor of the number of people who have a football helmet. The source of this bias is that you only asked members of the football team, who all obviously own a football helmet.

Response Bias happens when people deliberately lie during surveys. For example, if people are asked if they have been pulled over before, people may answer "no" because they want to seem like better drivers.

A nonresponse occurs when certain individuals can't be reached or refuse to respond to a survey. If a company is doing a phone survey and only call during 11-2 every day, people who work away from their house won't be reached.

Should the DHS prom venue be at Gillette Stadium or at the New England Aquarium?
For our survey, we will ask the senior class at Duxbury High School which venue they prefer using a simple random survey. There are 250 students in the senior class, and each student will be given a number from 001-250 alphabetically by last name. Next, we will randomly choose a line to start at on the random number table. After this, we will read through each line and search for 3 digits at a time. Every number between 001 and 250 would represent a digit of success, but every number from 251 to 999 is a failure. There cannot be any duplicate numbers, so a repeated number would be a failure. Each digit of success will be recorded, and after we reach 50 students from the sample we will survey them and ask which venue they prefer.

Lurking Variables in an Observational Study

A Lurking Variable is a variable that is not the explanatory or response variable in an experiment or study but can impact the outcome of the response variable. When discovering the effects two variables have on a response variable in an observational study, it can be hard to distinguish which variable is effecting the response variable and this is called confounding. A lurking variable in an observational study can be confounding if a lurking variable can influence the response variable as well as the explanatory variable, because if they both influence the response variable in a similar way, the lurking variable cannot be distinguished from the explanatory variable.

Experimental Units are the individuals being tested in an experiment. If the experimental units are humans, then they are called subjects.

Factors are the explanatory variables in an experiment. If we are measuring plant height, we can have factors such as sunlight or different water amounts.

Treatments are the specific conditions that an experimental unit experiences during an experiment. For example, if we were conducting an experiment with the growth of plants, the different amount of water a plant receives would be a treatment.

A Response Variable is the measurable outcome of the experiment. If we are measuring the effect of water on plant heights, the response variable is the height.

Observational Studies and Experiments

Observational Studies observe an experimental unit's response to an explanatory variable. However, the person conducting the study will not attempt to influence the responses. Observational studies can't be used as evidence for causation, but sometimes specific links between explanatory and response variables can't be ignored.

An experiment is when a researcher chooses to affect the treatment of certain individuals to measure the specific responses to the treatment. The purpose of an experiment is to determine if the treatment affects the response.

Example of Lurking Variable leading to Confounding-You want to see how many visits to the vet a dog has a year per based on the breed of the dog. You use 5 Golden Retrievers and 5 Black Labs as your sample. The explanatory variable in this study is the breed of the dog, but the lurking variable that can lead to confounding is any pre-existing conditions the dogs in the sample have. This could affect the number of visits to the vet a dog does similarly to how the breed of the dog could effect the number of visits to the vet a dog can have in a year.

Three Principles of Experimental Design

Definitions

Experiment Example

  1. Replication. In an experiment, one must start with enough experimental units so that natural variation does not affect the experiment.
  2. Randomness. In an experiment, the experimental units must be randomly assigned to treatment groups.
  3. Control. In an experiment, it is important to use a control group so that lurking variables are accounted for in the experiment.

Placebo Effect- During an experiment a placebo is a treatment that has no effect on the subject, but the subject believes that the placebo has an effect on them.

click to edit

Blinding- A Double Blind experiment is when neither the subjects nor the people measuring the response variable know which treatment the subject is receiving, but both do not know which treatment the subject is receiving. A Single Blind experiment is when either the subjects or those measuring the response variable know what treatment the subjects received, but one does not know which treatment is being received.

For our experiment, we will be looking at which types of soil and how much sunlight produce the tallest sunflower plants. Our experimental units are 30 sunflower plants. After doing a simple random sample, 10 plants will be planted in soil, 10 plants in fertilizer, and 10 plants in sand. Within each of these groups, a simple random sample will be conducted to choose 5 plants to get full sun and 5 plants to get half sun. The material that the sunflowers are planted in and the amount of sunlight each plant gets are the factors, or explanatory variables in out experiment. In the past, we have always planted sunflowers in soil and they have received full sun, so that treatment will serve as our control group. After 1 month, we will measure the height of the plants in centimeters to look at the response variables.