Unit 4: Producing Data Unit Project
Observational Studies and Experiments
Samples and Surveys
Observational studies observe individuals and measures variables of interest but does not attempt to influence the responses. Treatments cannot be assigned randomly as it may be unethical so observers must simply note the effects of certain people's choices. For example, a study on the relationship between income and alcohol consumption would have to be an observational study because you could not ethically randomly assign different incomes to people.
Experiments deliberately impose specific treatments to different experimental units or subjects to study the effects of different factors. Experiments should have a random selection of individuals and a random assignment of treatments. For example, randomly assigning dosages of plutonium to randomly selected mice would be an experiment as all the facets of the study are random.
Sampling Error
Designed Experiment
This experiment will study the effects of SuperFertilizer™ on grass growth.
20 plots of dirt with grass seed in them in a yard are all assigned a two-digit number from #01-#20. Slips of paper with the numbers on them are placed into a hat and 10 are drawn at random to receive treatment A. The other 10 will receive treatment B.
10 receive treatment B
10 receive treatment A
A factor is the explanatory variable of an experiment. Factors are studied to see their effects on the experimental units. In this experiment, the factors are the products applied to the plots of dirt.
A treatment is a specific condition applied to the experimental units. In this experiment the two treatments are:
A: No fertilizer
B: SuperFertilizer™
An experimental unit (or a subject if it is human) is an individual that is given treatments and studied in an experiment. The experimental units in this experiment are plots of dirt.
Percentage of plot covered with grass is measured after two weeks.
Percentage of plot covered with grass is measured after two weeks.
Convenience sampling
Voluntary Response Sampling
Under-coverage
Non-sampling error
Example: Only surveying your best friends about their favorite color is not going to give you an accurate sample of high school students’ preferences.
Definition: Error in a statistical analysis arising from the unrepresentativeness of the sample taken.
Example: If a survey is conducted one day of college students’ favorite classes, and all the science students were away on a field trip, the population of science students would be left out of the sample.
Example: If a group of people were asked how they liked Disney world, yet it was raining hard that day, that would affect their answers.
Subtract group A's average coverage from group B's average coverage. If this difference is positive, then SuperFertilizer™ is more effective than natural grass growth. If the difference is negative, then SuperFertilizer™ is less effective than natural grass growth. If the difference is very near to zero, then SuperFertilizer™ makes no difference to grass growth.
Wording Bias
Example: Politicians will use young children as the focal point of a policy they want to pass to the minds of voters.
Response bias
Example: A survey on fascism in the United States would likely contain lots of response bias because fascists would not want to admit to their fascism.
Non-response
Example: If we are at a baseball game and we selected random number of seats, yet one of the men who was sitting in one of the assigned seats left the game early.
Bias
Example: People at a protest would be more likely to oppose the current politicians in power than the general population.
Definition: Choosing individuals who are easiest to reach in a convenience sample.
Example: A mail-in survey about opinions on immigration will only be filled in by those who strongly support or oppose it because others will not feel that it is worth the time or the postage. Thus, the result will not represent American opinions about immigration only those of the most passionate.
Example: if you took a survey of a southern evangelical church about who their favorite presidential candidate was in 2016, you would not have a result that is representative of the American population.
Definition: Response Sampling-consists of people who choose themselves by responding to a general appeal. Voluntary response samples show bias because people with strong opinions (often in the same direction) are most likely to respond.
Definition: occurs when some groups in the population are left out of the process of choosing the sample.
Definition: Errors that are not dependent on the selection of the sample population.
Definition: The phrasing of something positively or negatively to invoke a reaction on people.
Definition: People answer a survey incorrectly either on purpose or by accident.
Defintion: occurs when individual chosen for the sample can’t be contacted or refuses to participate.
Definition: When error is made in gathering the data due to poor execution.
Source
The three principles of experiment design were all fulfilled in this design. Replication was met because we had enough experimental units to overcome natural variation. It is important because natural variation can cause inaccurate results.
Randomization was met because we used randomness in assigning treatments and selecting experimental units. Randomization is important because it helps overcome natural variation and eliminate false results.
We had Control because one of our treatments was to not touch the plots at all and simply let them grow. There was no blinding or placebo involved in this experiment because the experimental units were not sentient. Control is important because it is important to understand what happens under normal conditions so results of other treatments can be compared to what normally happens.
Definition: Social type/preference will affect an answer, as well as poor wording of the survey or experiment.
Example: If forced to choose between one, a person would most likely save a family member over a stranger since they are biased towards their own family.
==>
Direction
Example: A survey conducted of dog trainers about their favorite pet will clearly be biased in favor of dogs.
SRS
Definition: A simple random sample (SRS) of size n consists of n individuals from the population of size N chosen in such a way that every individual has an equal chance to be in the sample actually selected.
Example: Voters in Duxbury are all assigned a 5-digit number and 2,000 are chosen at random to be part of a sample.
Definition: Bias causing either an over- or under-prediction of a result.
Advantages: Gives every individual equal chance to be chosen; gives every sample equal chance to be chosen; doesn’t favor any part of population.
Disadvantages: Hard to complete in a very large or very diverse population and can ignore certain characteristics of subgroups within the study.
A lurking variable is a variable that is not among the explanatory or response variables in a study but that may influence the response variable. A lurking variable can lead to people assuming a relationship between two unrelated factors that are both affected by the lurking variable. For example, if an NFC North quarterback in Chicago is observed to have a lower pass completion rate than his NFC South twin in Atlanta, it may be concluded that AFC South quarterbacks are better. However, the lurking variable of Chicago being the "windy city" may affect how accurate the quarterback's passes are.
Example: Anyone in Massachusetts has an equal chance to be selected for a survey on global warming attitudes.
Example: Interviewing everyone in the world about what their favorite fast food is would be very difficult to do.
Stratified SRS
Definition: The population is classified into groups of similar individuals called strata. An SRS is done within each stratum and then the stratas are combines to form a full sample.
Example: The Duxbury Interfaith Council is a collective group of different faiths in one body. By gathering each faith’s opinion on a certain matter, they combine to represent the ideals of the Duxbury Interfaith Council.
These differences are the response variables. They are the measured results of the experiments that are then compared and interpreted to make a conclusion about the effects of the factors on the experimental units.
While not used in this experiment, two concepts important in human experiments are the placebo effect and blinding. The placebo effect is a psychological effect when a subject is told they are given a specific treatment and thus demonstrate effects of that treatment, despite being given an inert treatment. For example, if somebody was given non-alcoholic beer and told it was high in alcohol content, he would believe himself to be drunk.
Blinding is the practice of not telling the subjects and the observers which treatment has been applied. This is important because it negates a potential placebo effect. If the subject is unaware of his or her treatment, then they cannot be tricked that they would be affected one way. Double blinding is the practice of telling neither the subject nor the observer of the treatment used, so that there is no bias in demonstrating or observing results. For example, if a subject was given a pill and neither the subject nor the observer were told whether it was Advil or Xanax, then it would be an example of double blinding.
Advantages: Gives more precise information about different groups within a population than a SRS.
Disadvantages: Hard to use when populations are very similar to each other or large and spread out over wide areas.
Example: At DHS there are many different types of extra curricular clubs. If we ever wanted to conduct a survey within DHS, it would be very easy to gather accurate data as different clubs are already organized within DHS.
Example: it would be very difficult to conduct a stratified random sample among Duke Engineering students, as there are no clearly different groups within those students.
Cluster Sampling
Definition: Divide the population into smaller group. These clusters should mirror the characteristics of the population. Then choose an SRS of the clusters. All individuals in the chosen clusters are included in the sample.
Example: It would be a cluster sample if attendees at a Duxbury football game are sorted into 10 groups then 3 groups are chosen at random to have all individuals questioned about their preference for hot dogs or hamburgers.
Advantages: They are used for practical advantages. The population in cluster matches cluster population so you can compare with less effort.
Example: By having people vote in their own individual town, MA can see where each town lies for certain political policies.
Disadvantages: Doesn’t offer statistical advantages of better information about about the population that stratified random samples do.
Example: A cluster sample of high schoolers’ favorite music would not show that jocks prefer different music than goths.
Random Survey Example
There are 30 students in Ms. Heath’s honors physics course. Each has been assigned a 2-digit number from 01 to 30. Slips with those numbers are written on equally-sized pieces of paper and then placed in a hat. 10 students’ numbers are drawn from the hat at random to be studied. Those students are asked what their favorite subject is in school. The responses from those 10 randomly selected students are used to represent the opinions of the class as a whole.