Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 5 - Data collection and sampling - Coggle Diagram
Chapter 5 - Data collection and sampling
Data
Data are observed values of a variable or variables that are of interest to us
Simple random sample
Every sample of size n has an equal chance of being selected
Selection may be with replacement or without replacement
Advantages and disadvantages
Stratified sample
Ensures representation of individuals across the entire population
Cluster sample:
More cost effective
Less efficient
Simple random sample and systematic sample
Simple to use
May not be a good representation of the population's underlying characteristics
Experiments (2)
Advantage => Better way to produce data
Disadvantage => More expensive than direct observation
Data collected in this manner is said to be experimental data
Methods to collect data
Experiments (2)
Surveys (3)
Direct observation (1)
Direct observation (1)
Advantage => Relatively inexpensive
Disadvantage => Difficult to obtain useful information in this way
Data collected in this manner is said to be observational data
Types of samples used
Non-probability samples
Judgement
Quota
Chunk
Convenience
Probability samples
Simple random
Systematic
Stratified
Cluster
Reasons for drawing a sample
Less time consuming than a census
Less costly to administer than a census
Less cumbersome and more practical to administer than a census of the targeted population
Systematic sample
Divide frame of N individuals into groups of k individuals; k = N/n
Randomly select one individual from the 1st group
Decide on sample size, n
Select every kth individual thereafter
Stratified sample
A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes
Samples from subgroups are combined into one
Divide population into two or more subgroups (called strata) according to some common characteristics
Selection with PPS
In such case a selection process that takes the magnitude of monetary values is preferred
We refer to this type of selection process as selection proportional to size, where size refers to the monetary value on each invoice
If the correctness of a monetary value must be verified, the magnitude of the monetary value becomes important
With PPS selection on invoice is selected in an indirect manner, because a rand unit is selected first and then the invoice on which it occurs is selected
In the case of a random sample, elements of the population are selected without the monetary value on the invoice playing a role
Note that each rand unit has the same chance of selection, but the chance of selection for each invoice is proportionate to the number of rand units that appears on it
Sampling and non-sampling errors
Sampling errors
Refers to differences between the sample and the population that exists because of only some of the observations are included in the sample
Sampling errors can be reduced by taking larger samples
Non-sampling errors
Are due to mistakes made in the acquisition of data or due to incorrect sampling methods
Three types of non-sampling errors
=> Measurement errors - arises from the incorrect recording of information or faulty measurement equipment
=> Non-response error - Is introduced when responses are not obtained from all the members of the sample
=> Coverage error - occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample
Cluster sample
Population is divided into several "clusters", each representative of the population
Surveys (3)
One of the best known methods used to collect data
Different types of surveys:
Personal interview
Telephonic interview
Self-administered surverys