Sampling
First step is defining the target population. A population can be defined as all people or items (unit of analysis) with the characteristics that one wishes to study. The unit of analysis may be a person, group, organization, country, object, or any other entity that you wish to draw scientific inferences about.
Second step in the sampling process is to choose a sampling frame. This is an accessible section of the target population (usually a list with contact information) from where a sample can be drawn. Note that sampling frames may not entirely be representative of the population at large, and if so, inferences derived by such a sample may not be generalizable to the population
Last step is choosing a sample from the sampling frame using a well-defined sampling technique. Sampling techniques can be grouped into two broad categories: probability (random) sampling and non-probability sampling
Probability sampling Probability sampling is a technique in which every unit in the population has a chance (non-zero probability) of being selected in the sample, and this chance can be accurately determined
Non-probability sampling is a sampling technique in which some units of the population have zero chance of selection or where the probability of selection cannot be accurately determined
Simple random sampling technique, all possible subsets of a population (more accurately, of a sampling frame) are given an equal probability of being selected.
Systematic sampling technique, the sampling frame is ordered according to some criteria and elements are selected at regular intervals through that ordered list.
Stratified sampling, the sampling frame is divided into homogeneous and non-overlapping subgroups (called “strata”), and a simple random sample is drawn within each subgroup.
Cluster sampling, If you have a population dispersed over a wide geographic region, it may not be feasible to conduct a simple random sampling of the entire population. In such case, it may be reasonable to divide the population into “clusters” (usually along geographic boundaries), randomly sample a few clusters, and measure all units within that cluster
Matched-pairs sampling, Sometimes, researchers may want to compare two subgroups within one population based on a specific criterion. For instance, why are some firms consistently more profitable than other firms?
Multi-stage sampling, This is a two-stage combination of stratified and systematic sampling. Likewise, you can start with a cluster of school districts in the state of New York, and within each cluster, select a simple random sample of schools; within each school, select a simple random sample of grade levels; and within each grade level, select a simple random sample of students for study. In this case, you have a four-stage sampling process consisting of cluster and simple random sampling.
Convenience sampling, also called accidental or opportunity sampling, this is a technique in which a sample is drawn from that part of the population that is close to hand, readily available, or convenient. For instance, if you stand outside a shopping center and hand out questionnaire surveys to people or interview them as they walk in, the sample of respondents you will obtain will be a convenience sample.
Quota sampling, In this technique, the population is segmented into mutuallyexclusive subgroups (just as in stratified sampling), and then a non-random set of observations is chosen from each subgroup to meet a predefined quota.
Expert sampling, this is a technique where respondents are chosen in a non-random manner based on their expertise on the phenomenon being studied. For instance, in order to understand the impacts of a new governmental policy such as the Sarbanes-Oxley Act, you can sample an group of corporate accountants who are familiar with this act
Snowball sampling, in snowball sampling, you start by identifying a few respondents that match the criteria for inclusion in your study, and then ask them to recommend others they know who also meet your selection criteria. For instance, if you wish to survey computer network administrators and you know of only one or two such people, you can start with them and ask them to recommend others who also do network administration.
Statistics of Sampling
Responses from different respondents to the same item or observation can be graphed into a frequency distribution based on their frequency of occurrences.
Confidence interval is the estimated probability that a population parameter lies within a specific interval of sample statistic values
The variability or spread of a sample statistic in a sampling distribution (i.e., the standard deviation of a sampling statistic) is called its standard error.
Sampling distribution is a frequency distribution of a sample statistic (like sample mean) from a set of samples, while the commonly referenced frequency distribution is the distribution of a response (observation) from a single sample.
Sample statistics may differ from population parameters if the sample is not perfectly representative of the population; the difference between the two is called sampling error.
When you measure a certain observation from a given unit, such as a person’s response to a Likert-scaled item, that observation is called a response. In other words, a response is a measurement value provided by a sampled unit.
For a large number of responses in a sample, this frequency distribution tends to resemble a bell-shaped curve called a normal distribution, which can be used to estimate overall characteristics of the entire sample, such as sample mean (average of all observations in a sample) or standard deviation (variability or spread of observations in a sample).
These sample estimates are called sample statistics (a “statistic” is a value that is estimated from observed data).
Populations also have means and standard deviations that could be obtained if we could sample the entire population. However, since the entire population can never be sampled, population characteristics are always unknown, and are called population parameters (and not “statistic” because they are not statistically estimated from data).