Quantitative Analysis
Descriptive Statistics:
Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs
Inferential analysis refers to the statistical testing of hypotheses (theory testing). In this chapter, we will examine statistical techniques used for descriptive analysis, and the next chapter will examine statistical techniques for inferential analysis.
Data Preparation
Data coding. Coding is the process of converting data into numeric format. A codebook should be created to guide the coding process.
A codebook is a comprehensive document containing detailed description of each variable in a research study, items or measures for that variable, the format of each item (numeric, text, etc.), the response scale for each item (i.e., whether it is measured on a nominal, ordinal, interval, or ratio scale; whether such scale is a five-point, seven-point, or some other type of scale), and how to code each value into a numeric format
Data entry. Coded data can be entered into a spreadsheet, database, text file, or directly into a statistical program like SPSS. Most statistical programs provide a data editor for entering data.
Missing values. Missing data is an inevitable part of any empirical data set. Respondents may not answer certain questions if they are ambiguously worded or too sensitive.
Data transformation. Sometimes, it is necessary to transform data values before they can be meaningfully interpreted.
Univariate Analysis
The frequency distribution of a variable is a summary of the frequency (or percentages) of individual values or ranges of values for that variable.
Central tendency is an estimate of the center of a distribution of values.
The arithmetic mean
(often simply called the “mean”) is the simple average of all values in a given distribution
the median, is the middle value within a range
of values in a distribution.
the mode is the most frequently occurring value in a distribution of values.
Note that any value that is estimated from a sample, such as mean, median,
mode, or any of the later estimates are called a statistic
Dispersion refers to the way values are spread around the central tendency, for
example, how tightly or how widely are the values clustered around the mean.
The range is the difference
between the highest and lowest values in a distribution.
The
square of the standard deviation is called the variance of a distribution
Bivariate Analysis
The most common bivariate statistic is the bivariate correlation (often, simply called “correlation”), which is a number between -1 and +1 denoting the strength of the relationship between two variables.
The probability
that a statistical inference is caused pure chance is called the p-value
The degree of freedom is the number of values that can vary
freely in any calculation of a statistic.
If the correlations involve variables measured using interval scales, then
this specific type of correlations are called Pearson product moment correlations.
A cross-tab is a table that describes the frequency (or percentage) of all combinations of two or more nominal or categorical variables.
The two variables in this dataset are age (x) and self-esteem (y). Age is a ratio-scale variable, while self-esteem is an average score computed from a multi-item self-esteem scale measured using a 7-point Likert scale, ranging from “strongly disagree” to “strongly agree.”
After computing bivariate correlation, researchers are often interested in knowing whether the correlation is significant (i.e., a real one) or caused by mere chance. Answering such a question would require testing the following hypothesis
H0 is called the null hypotheses, and H1 is called the alternative hypothesis (sometimes, also represented as Ha).
This is done by multiplying the marginal column total and the marginal row
total for each cell and dividing it by the total number of observations.