Given a set of coordinate points, how
do we find the linear regression using least square method? Imagine the
set of 10 points like this:
Xi | Yi |
215 | 30.8 |
201 | 32.5 |
196 | 35.4 |
226 | 28.1 |
226 | 24.4 |
348 | 24.1 |
226 | 28.5 |
348 | 24.2 |
148 | 32.8 |
226 | 28.0 |
The
regression line will take on the form of y = B0 + B1*x, with variance
σ^2. The equations for the variables B0, B1, and σ^2 are:
- B1 = (Σxi*yi - Σxi*Σyi/n) / (Σxi^2-(Σxi)^2/n)
- B0 = ӯ(n) - B1*x̄(n)
- σ^2 = (Σyi^2-n*ӯ(n)^2-B1*(Σxi*yi-Σxi*Σyi/n)) / (n-2)
Xi | Yi | Xi^2 | Yi^2 | Xi*Yi | ||
215 | 30.8 | 46225 | 948.64 | 6622 | ||
201 | 32.5 | 40401 | 1056.25 | 6532.5 | ||
196 | 35.4 | 38416 | 1253.16 | 6938.4 | ||
226 | 28.1 | 51076 | 789.61 | 6350.6 | ||
226 | 24.4 | 51076 | 595.36 | 5514.4 | ||
348 | 24.1 | 121104 | 580.81 | 8386.8 | ||
226 | 28.5 | 51076 | 812.25 | 6441 | ||
348 | 24.2 | 121104 | 585.64 | 8421.6 | ||
148 | 32.8 | 21904 | 1075.84 | 4854.4 | ||
226 | 28.0 | 51076 | 784 | 6328 | ||
Sum | 2360 | 288.8 | 593458 | 8481.56 | 66389.7 |
- B1 = (66389.7-2360*288.8/10) / (593458-2360^2/10) = -0.0484
- B0 = (288.8/10) - (-0.0484)(2360/10) = 40.3024
- σ^2 = (8481.56-10*(288.8/10)^2-(-0.0484)*(66389.7-2360*288.8/10)) / 8 = 6.9360
Although the value of the coefficient of determination is quite low, the equation of the least-square best-fit line corresponds with the numbers we obtained. Also, using the function =STEYX(C2:C11,B2:B11) in Excel, the sample standard deviation was calculated to be 2.632951. Squaring that number, we get 6.932, nearly identical as the value we calculated above as well.