On 11/02/2015 4:30 PM, Goldschneider, Jill wrote:> I was playing with some examples of piecewise regression using lm() and
have come across a behavior I'm uncertain about.
> Below is simple 3-segment dataset. I compare predicted output of a model
created by one call to lm() to that of 3 models created by 3 calls to lm().
> In case A and B, the results are the same. However, in case C the results
differ for the middle segment.
> Is the output of lm() for case C to be expected or not and if so, why?
Take a look at the fit value, and you'll see what happened:
> lm(y ~ (x<=10)*x + I(x>10 & x<=20) + (x>20)*x)
Call:
lm(formula = y ~ (x <= 10) * x + I(x > 10 & x <= 20) + (x > 20)
*
x)
Coefficients:
(Intercept) x <= 10TRUE
x I(x > 10 & x <= 20)TRUE
35.0741 -15.0733
-0.1644 -17.7344
x > 20TRUE x <= 10TRUE:x x:x >
20TRUE
NA 1.2397 -0.9658
The * in a formula means "main effect plus interaction", not
multiplication. W hat you want is
lm(y ~ I((x<=10)*x) + I(x>10 & x<=20) + I((x>20)*x))
This doesn't give exactly the same results as the segmented regression,
because it uses the same intercept in all three segments; you might want
to add I(x <= 10) as well if you don't want that.
Duncan Murdoch
> Thank you,
> Jill
>
> set.seed(133)
> y <- c(21:30, rep(15,10), 10:1) + runif(30, -2, 2)
> x <- 1:30
>
> # Case A: 3 segments, each fit with a constant value
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~1)), predict(lm(y[11:20]~1)),
predict(lm(y[21:30]~1)))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ I(x<=10) + I(x>10 & x<=20) +
I(x>20)))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
>
> # Case B: 3 segments, each fit with a line
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~x[1:10])), predict(lm(y[11:20]~x[11:20])),
predict(lm(y[21:30]~x[21:30])))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ (x<=10)*x + (x>10 & x<=20)*x +
(x>20)*x))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
>
> # Cast C: 3 segments - the middle fit with a constant value, the outer by a
line
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~x[1:10])), predict(lm(y[11:20]~1)),
predict(lm(y[21:30]~x[21:30])))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ (x<=10)*x + I(x>10 & x<=20) +
(x>20)*x))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
> ## the single call to lm does not have a constant value fit to the middle
section
> plot(x, p.c3-p.lm1 )
>
>
>
> The information contained in this transmission may contain confidential
information. If the reader of this message is not the intended recipient, you
are hereby notified that any review, dissemination, distribution or duplication
of this communication is strictly prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies of
the original message.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>