Quantitative Analysis: Descriptive Statistics

Data Preparation

Univariate Analysis

Bivariate Analysis

Refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs using programs such as SPSS or SAS

The steps that occur to convert collected data to be analyzed: data coding, data entry, missing values, and data transformation

Data entry: Allows coded data to be entered into spreadsheets, databases, text files or directly into statistical program such as SPSS. Statistical programs saves the data in native format and cannot be shared with other programs. Therefore, it is better to save data into a spreadsheet or database for sharing

Data transformation: Examples include reverse coding, creating a weighted index of observed measures and collapsing multiple values into fewer categories

Missing values: These are inevitable in empirical data set and the solution varies on how to handle it. Some programs require a specific numeric value while some drop the entire observation (listwise deletion) and some allow for an estimated value (imputation)

Data coding: The process of converting data into numeric format. A codebook, comprehensive document that contains detailed description of each variable, the format, the response scale, and how to code each value into a numeric value, is created to guide this process

This process helps keep the coded data consistent and aids others in understanding and interpreting the coded data

Analysis of a single variable that describe the general properties of that one variable includes frequency distribution, central tendency, and dispersion

Frequency distribution: The summary of the frequency (or percentages) of individual values or ranges of values for the variable. Can be viewed as a bar chart with vertical axis representing frequency and horizontal axis represents the category of variable.

Central tendency: An estimate of the center of distribution of values. Three major estimates include mean, median, and mode

Dispersion: Refers to the way values are spread around central tendency. Two common measures of dispersion are range and standard deviation

Median: The middle value determined by ordering the values in increasing order and selecting the middle value. If there is an even number of values, average the two middle values

Mode: The most frequently occurring value.

Mean: The average of all values. Could also be geometric mean and harmonic mean. Add up all the values and then divide by the total number of values

Range: The difference between the highest and lowest values

Standard deviation: Corrects for outliers in data that accounts for how close or how far each value is from the mean

Analysis that examines how two variables are related to each other. Most common statistic is the bivariate correlation, which is a number between -1 and +1 denoting the strength of the relationship between two variables

The scatter plot shows the relationship in either a upward slope (positive relationship) or downward slope (negative relationship). A zero slope (a horizontal line) would indicate no relationship between the variables

After testing correlation, researchers can test hypothesis- null hypotheses or the alternative hypotheses

Cross tabulation is another way to present bivariate data. A cross tab is a table that describes the frequency of all combinations of two or more nominal or categorical variables. A chi-square test is computed as the average difference between observed and expected counts across all cells