David Arnold
2013-Nov-24 19:42 UTC
[R] Creating a set that has line of best fit y=3+2x so that SST, SSR, SSE are whole numbers
I wanted to find a set (x,y) of integers so that their line of best fit was y = 3 + 2x. So I thought I'd be losing 2 degrees of freedom and chose 1,2,3,4, and x for my explanatory data and 3, 8, 8, 12, and y for my response data. I then used b = (n sum(xy) - sum(x)sum(y))/(n sum(x^2) - (sum(x))^2) to determine the equation 2=(5(xy+91)-(x+10)(y+31))/(5(x^2+30)-(x+10)^2). Then, because (mean(x), mean(y)) lies on the line of best fit, and mean(x)=(x+10)/5 and mean(y)=(y+31)/5, subbing them gave me the equation y=2x+4. Subbing that into my first equation gave me x=-1 and y=2. Sure enough: x <- c(-1,1,2,3,4) y <- c(2,3,8,8,12) plot(x,y) lm.res <- lm(y~x) lm.res abline(lm.res) Gave me the correct coefficients. Coefficients: (Intercept) x 3 2 Also, it was true that SSY = SSR + SSE, where SSY=sum(y-mean(y))^2, SSR=(yhat-mean(y))^2, and SSE=sum(y-yhat)^2. yhat <- predict(lm.res) tab <- cbind(x,y,yhat,(y-mean(y))^2,(yhat-mean(y))^2,(y-yhat)^2) addmargins(tab,1) x y yhat 1 -1 2 1 21.16 31.36 1 2 1 3 5 12.96 2.56 4 3 2 8 7 1.96 0.16 1 4 3 8 9 1.96 5.76 1 5 4 12 11 29.16 19.36 1 Sum 9 33 33 67.20 59.20 8 That is, 67.20 = 59.20 + 8. However, what I'd like to have is a set of numbers x and y that have a line of best fit with equation y = 3+ 2x, but all of the numbers in the last table are integers (or whole numbers). That would give me a good image I can show in class to demonstrate this idea without having to do too many calculations with decimals. Wondering if their might be a method in R to keep picking choices for x and y until this happens? D. -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-set-that-has-line-of-best-fit-y-3-2x-so-that-SST-SSR-SSE-are-whole-numbers-tp4681074.html Sent from the R help mailing list archive at Nabble.com.