Topic 6 Hypotheses & Regression

The simple linear regression model

The simple linear regression model typically takes the form: Y =β0 + β1x +ϵ
Y= is the response variable/dependent variable,, x= is the explanatory variable/predictor/ independent variable,, β0 and β1 1 are the regression coefficients,,ϵ is the e random error, with E[ϵ]=0 and Var[ϵ] = σ ^2 , β0, β1 , σ ^2 are parameters Screenshot 2024-01-24 100208

Scatter point :

  • A scatter plot visually summarizes the relationship between two variables, providing insights into correlation or regression.
  • The pattern in the plot indicates the type and strength of the relationship. Screenshot 2024-01-24 095043
  • Regression analysis explores variable relationships through probabilistic models.
  • The basic linear model, 𝑦 = 𝛽0 + 𝛽1𝑥, connects independent variable x to dependent variable y. - Simple linear regression involves two variables (X and Y), often visualized with a scatter plot. This graph helps interpret the relationship before in-depth analysis.

The Coefficient of Determination

The sample coefficient of determination r^2 represents the proportion of the total variation of the variable Y that can be explained by a linear relationship with the values of X.. A quantitative measure of the total amount of variation in observed y values is given
by the total sum of squares

Steps to Solve Simple Linear Regression Problem

Step 1: Draw the scatter plot of the (X,Y) data for visual inspection of the relationship
that may exist between X and Y

Step 2: Construct the following table to facilitate computation Screenshot 2024-01-25 222004

Step 3: Calculate the linear regression parameters ( β0,β1) using the formula below: ^β0= ȳ - ^β1x̄"" where " y= ∑y/n , x= ∑x/n "

Step 4: The linear regression model of the data is given by
" y = ^β0 + ^β1x" by substitute the values of β0 and β1, Additionally we can compute "SSE= Syy - ^β1 Sxy" 𝑦 and
hence, an unbiased estimate of σ^2, " ^σ^2 = s^2 = SSE/n-2 "
where " Syy= ∑y^2 - (∑y)^2 /n

Estimating σ^2 and σ

The parameter variance, σ^2 determines the amount of variability inherent in the regression model.
After a regression model has been fitted, the fitted values ^yi 𝑖
are obtained via : "^yi=^β0 +^β1x" with residuals " ei= yi -^yi" "^σ^2= s^2= SSE/n-2" , SSE is the error sum of square of errors: " SSE = Syy - (SXY/SXX)SXY = SYY - β1SXY "

Correlation

  • Correlation analysis is used to measure the strength of linear relation between X and
    Y by means of a single number called a correlation coefficient. - p= σXY/√σXX σXYY" , "r= Sxy/√Sxx Sxy" or " r= √r^2"

The Coefficient of Determination
The sample coefficient of determination r^2 represents the proportion of the total variation of the variable Y that can be explained by a linear relationship with the values of X.. A quantitative measure of the total amount of variation in observed y values is given
by the total sum of squares

  • formula : "SST= Syy= Σ(yi - ȳ)^2 = Σyi^2 - (Σyi)^2 /n"
    • SSE = the sum of squared deviations about the least squares line Y= β0 + β1X,
    • SST = the sum of squared deviations about the horizontal line at height y,
    • SSE/SST = the proportion of total variation that cannot be explained by the simple linear regression model.
    • SSE/SST = the proportion of observed y variation explained by the model
    • THUS: " r^2 =1-SSE/SST"

Estimated regression model

Screenshot 2024-01-24 103109

Least Squares Method: Consider a given sample data { x1,y1} , {(x2,y2) ... (xi,yi)..(xn,yn)}. let yi be the observed value of a random variable Yi, where " Yi = β0 + β1xi + ϵi". The errors ϵi are independent random variables. If the line "y= β0 + β1x "is used to fit the model, the fitted values ^yi are obtained via ^yi= β0 + β1x. the residual " ei= yi-^yi= yi -β0 - β1xi" is the vertical deviation of the point (xi,yi) from the fitted line "y= β0+ β1x " The error sum of squares, denoted by SSE, is the sum of the squared residuals: SSE = ∑ (yi-^yi)^2

n regression analysis, the least squares method identifies the optimal least square regression line through a scatter plot, minimizing the sum of squared vertical distances. Represented as ^y = ^β0 + ^β1x , his line provides a precise fit with estimated coefficients ^β0 and ^β1 capturing the linear relationship between the dependent and independent variables.

Least Squares Regression Line is a statistical method used to find the best-fitting straight line through a set of data points. It minimizes the sum of the squared vertical distances between the observed and predicted values, providing an optimal linear model for predicting outcomes based on input variables least-squares (regression) line" y= ^𝛽̂ 0 + ^𝛽̂1x" , e least squares estimate of the slope:" ^𝛽̂1= Sxy/Sxx " where
"Sxy = ∑xy- (∑x)(∑y) /n " and "Sxx = ∑x^2 - (∑x)^2 /n" and least squares estimate of the intercept: " ^𝛽̂0= ȳ - ^β1x̄

formula : "SST= Syy= Σ(yi - ȳ)^2 = Σyi^2 - (Σyi)^2 /n"
SSE = the sum of squared deviations about the least squares line Y= β0 + β1X,
SST = the sum of squared deviations about the horizontal line at height y,
SSE/SST = the proportion of total variation that cannot be explained by the simple linear regression model.
SSE/SST = the proportion of observed y variation explained by the model
THUS: " r^2 =1-SSE/SST"