DESCRIPTIVE STATISTICS

Data are numerical evidences. Knowledge is required to be able to understand data. Data are numbers with a context, in which, we have to understand to make sense of the numbers or examining data. This branch of statistics is called descriptive statistics

Statistics are used daily:
newspapers, television, internet, advertisements, and in ordinary conversations.

Any set of data contains information are organized in variables, if each set of data has different conditions or different categories

The distribution of a variable tells us what values it takes and how often it takes these values.

Quantitative variable takes numerical values representing counts or measurements for which arithmetic operations such as adding and averaging make sense.Quantitative variables can be split into 2 types.

Discrete variables result from a countable number of possible values. (ex, number of children in a family, number customers waiting to be served, etc.)

Continuous variables can have infinitely many possible values that can be associated with points on a continuous scale in such a way that there are no gaps or interruptions. (ex, the amount of milk a cow can produce per day, etc.)

Qualitative Variable places an individual into one of 2 or more group of categories

It is a range of methods for planning the collection of, collecting, organizing, summarizing, and presenting data.

There are also tools to analyze, interpret, and draw conclusions based on the collected data. This is known as inferential statistics. At the core of these two branches we have the concepts of population and sample.

Example: Acceptance sampling, a computer company buys a shipment of hard disks from a supplier. The contract with the supplier states that less than 1% of the disks may be defective. The company cannot test all the shipment. It is time-consuming, costly, and also partly damages the units. By taking this sample, say of 100 disks, and if more than 1 defective disk is found, the shipment is returned. The whole shipment is the population and the 100 disks selected is a sample.

Information contained in the sample is usually used to make an inference concerning a parameter, which is a numerical characteristic of the population.

Example, predicting the winning party in elections, interested in the proportion of voters choosing that party on election day. That is a parameter.

A parameter is estimated by computing a similar characteristic of the sample. That is called a statistic.

SAMPLING: Any study concerning populations needs data to be collected. For statistical studies, data from samples is used, we do not collected data from the entire population.

How to choose a sample:

  1. Specify the population of interest
  2. Choose an appropriate sampling method
  3. Collect the sample data
  4. Analyze the pertinent information in the sample
  5. Use the results of the sample analysis to make an inference about the population
  6. Provide a measure of the inference's reliability.

Sampling Process

Research problem

Specify population

Choose a sampling method
and collect data

Explore data

Use inferential statistics
to draw conclusions

Are conclusions reliable?

No?

Yes?

Report Results

Go back to choose a sampling method and collect data

Why Sampling?

Sample can save money and time

Sample can broaden the scope of the study. More detailed information can be gathered by taking a sample than from the whole population.

Sampling is deemed to be appropriate, it must be decided how to select a sample.

Sample will be employed to draw conclusions about the entire population, it is crucial that the sample is representative, it should reflect as closely as possible the relevant parameter of the population under consideration.

There are two types of sampling: Random and non-random sampling.

Random sampling (probability sample), every unit of the population has the same probability of being selected into the sample. Random sampling implies that chance enters into the process of selection.

Non-random sampling, not every unit of the population has the same probability of being selected into the sample.

simple random sampling

stratified random sampling

systematic random sampling

Each unit of the frame is numbered from 1 to N (where N is the size of the population). Next, a random number generator (or a table of random numbers, which is an outdated technique) is used to select n items into the sample.

Population is divided into non-overlapping subpopulations called strata. The researchers then extracts a sample from each of the subpopulations. Using stratified random sampling is that it has the potential for reducing sampling error.

The potential to match the sample closely to the population is greater than it is with simple random sampling because of portions of the total sample are taken from different population subgroups.

However, stratified random sampling is generally more costly than simple random sampling because each unit of the population must be assigned to a stratum before the random selection process begins.

Strata selection is usually based on available information. Information may been obtained from previous censuses or surveys. Stratification benefits increase as the strata differ more. Internally, a stratum should be relatively homogeneous; externally, strata should contrast with each other.

systematic sampling is not done in an attempt to reduce sampling error. Rather, systematic sampling is used because of its convenience and relative ease of administration. With systematic sampling, every kth item is selected to produce a sample of size n from a population of size N. The value of k, sometimes called the sampling cycle, can be determined using k = N/n.

if k is not an integer value, then a whole-number value should be used.

for example, a sample of (n) 20 students from the (N) 659 high school students using systematic sampling. this would be: k = 659/20 ~ 32.

Besides convenience, systematic sampling has other advantages. Because systematic sampling is evenly distributed across the population, a knowledgeable person can easily determine whether a sampling plan has been followed in a study.

Sampling techniques that does not involve a random selection process are called non-random sampling techniques. Because chance is not used to select items for the samples, these techniques are non-probability techniques and are not desirable for use in gathering data to be analyzed by standard methods of inferential statistics.

Convenience Sampling , elements for the sample are selected for the convenience of the researcher. Choose elements that are readily available.

Quota Sampling , appears to be similar to stratified random sampling. However, instead of selecting a sample from each stratum, we use a non-random sampling method to gather data from one stratum until the desire quota of samples is filled.

For example, a convenience sample of homes for door-to-door interviews might include houses where people are at home, houses with no dogs, houses near the street, first-floor apartments, and houses with friendly people.

in contrast, a random sample would require the researcher to gather data only from houses and apartments that have been selected randomly.

Quotas are described by setting the sizes of the samples to be obtained from the subgroups. A quota is based on the proportions of the subclasses in the population.

For example, a company is test marketing a new soft drink and is interested in how different age groups react to it. The researchers go to a shopping mall and interview shoppers aged 16-20, for example, until enough responses are obtained to fill the quota. In quota sampling, an interviewer would begin by asking a few filter questions; if the respondents represents a subclass whose quota has been filled, the interviewer would stop the interview.

quota sampling can be useful if no previous information is available for the population.

quota sampling is less expensive than most random sampling techniques because it is essentially a technique of convenience. Another advantage of quota sampling is the speed of data gathering, we do not need to call back or send out a second questionnaire if we do not receive a responses; we just move on to the next element.

the problem with quota sampling is that it is a non-random sampling technique. Some researchers believe that a solution to this issue can be achieved if the quota is filled by randomly selecting elements and discarding those not from a stratum. This way quota sampling is essentially a version of stratified random sampling. The object is to gain the benefits of stratification without a high costs of stratification. However, it remains a non-probability sampling method.

Alheamina Abraham