Please enable JavaScript.
Coggle requires JavaScript to display documents.
Inferential Statistics - Coggle Diagram
Inferential Statistics
British philosopher Karl Popper said that theories can never be proven, only disproven.
A second problem with testing hypothesized relationships in social science research is that the dependent variable may be influenced by an infinite number of extraneous variables
and it is not plausible to measure and control for all of these extraneous effects.
Sir Ronald A. Fisher, He said that a statistical result may be considered significant if it can be shown that the probability of it being rejected due to chance is 5% or less.
In inferential statistics, this probability is called the p-value, 5% is called the significance level (a), and the desired relationship between the p-value and a is denoted as: p<0.05.
The significance level is the maximum level of risk that we are
willing to accept as the price of our inference from the sample to the population.
We must also understand three related statistical concepts: sampling distribution,
standard error, and confidence interval.
A sampling distribution is the theoretical
distribution of an infinite number of samples from the population of interest in your study.
However, because a sample is never identical to the population, every sample always has some
inherent level of error, called the standard error.
-
Most inferential statistical procedures in social science research are derived from a
general family of statistical models called the general linear model (GLM).
A model is an estimated mathematical equation that can be used to represent a set of data, and linear refers to a straight line.
Hence, a GLM is a system of equations that can be used to represent linear
patterns of relationships in observed data.
Though most variables in the GLM tend to be interval or ratio-scaled, this does not have to be the case.
The GLM is a very powerful statistical tool because it is not one single statistical method, but rather a family of methods that can be used to conduct sophisticated analysis with different
types and quantities of predictor and outcome variables.
The most important problem in GLM is model specification, i.e., how to specify a regression equation (or a system of equations) to best represent the phenomenon of interest.
One of the simplest inferential analyses is comparing the post-test outcomes of treatment and control group subjects in a randomized post-test only control group design, such as whether students enrolled to a special program in mathematics perform better than those in a traditional math curriculum.
The t-test was introduced in 1908 by William Sealy Gosset, a chemist working for the Guiness Brewery in Dublin, Ireland to monitor the quality of stout Ȃ a dark beer popular with 19 th century porters in London.
The t-test examines whether the means of two groups are statistically different from each other (non-directional or two-tailed test), or whether one group has a statistically larger (or smaller) mean than the other (directional or one-tailed test).
Factor analysis is a data reduction technique that is used to statistically aggregate a large number of observed measures (items) into a smaller set of unobserved (latent) variables called factors based on their underlying bivariate correlation patterns.
Discriminant analysis is a classificatory technique that aims to place a given observation in one of several nominal categories based on a linear combination of predictor variables.
Logistic regression (or logit model) is a GLM in which the outcome variable is binary (0 or 1) and is presumed to follow a logistic distribution, and the goal of the regression analysis is to predict the probability of the successful outcome by fitting data into a logistic curve.
Probit regression (or probit model) is a GLM in which the outcome variable can vary between 0 and 1 (or can assume discrete values 0 and 1) and is presumed to follow a standard normal distribution, and the goal of the regression is to predict the probability of each outcome,
Path analysis is a multivariate GLM technique for analyzing directional relationships
among a set of variables.
Time series analysis is a technique for analyzing time series data, or variables that
continually changes with time.