Yunkai Huang
2012-Aug-21 20:23 UTC
[R] Estimating and predicting using "segmented" Package
Hello, Any one who can help me with the problems regrarding segmented regression. 1.When I run the code as following: library(segmented) n <-100 S<-200 x1 <- seq(-0.99,0.99,length=29) MSEPtrm<-matrix(0,S,1) MAEtrm<-matrix(0,S,1) TRM<-matrix(0,S,length(x1)) for (s in seq(1:S)){ set.seed(123*s+123) x<-seq(-1,1,length=n) y0 <- 0.5*x+0.4*exp(-20*(x+0.25)^2)-0.4*exp(-20*(x-0.25)^2) y1 <- 0.5*x1+0.4*exp(-20*(x1+0.25)^2)-0.4*exp(-20*(x1-0.25)^2) epsilon <- rnorm(n, 0, 0.3) epsilon1 <- rnorm(length(x1),0,0.3) y <- y0 + epsilon y1<- y1 + epsilon1 mydata<-data.frame(y,x) mydata1<-data.frame(y=y1,x=x1) ##TRM segmented regregression with 2 breakpoints reg<-lm(y~x,mydata) seg2<-segmented(reg,seg.Z=~x,psi=c(-0.5,0.5)) mydata2<-data.frame(x=x1) mydata2$dummy1<-pmax(mydata2$x-seg2$psi[1,2],0) mydata2$dummy2<-I(mydata2$x>seg2$psi[1,2])*coef(seg2)[3] mydata2$dummy3<-pmax(mydata2$x-seg2$psi[2,2],0) mydata2$dummy4<-I(mydata2$x>seg2$psi[2,2])*coef(seg2)[4] names(mydata2)[-1]<-names(model.frame(seg2))[-c(1,2)] segpredict<-predict(seg2,mydata2) segpe<-(segpredict-mydata1$y)^2 MSEPtrm<-mean((segpredict-mydata1$y)^2) MAEtrm<-mean(abs(segpredict-mydata1$y)) TRM[s,]<-segpe } I found it works fine when s=1:15. However, when s=16, it returns message ?Error in diag(Cov[id, id]) : subscript out of bounds? Actually it returns same message as well when some other numbers being selected for s. But if I run the line: seg2<-segmented(reg,seg.Z=~x,psi=c(-0.5,0.5)) twice individually. No error message returns anymore. I wonder why this happens. 2. When I change x from uniformly spaced to uniform distribution library(segmented) n <-100 S<-200 x1 <- seq(-0.99,0.99,length=29) MSEPtrm<-matrix(0,S,1) MAEtrm<-matrix(0,S,1) TRM<-matrix(0,S,length(x1)) for (s in seq(1:S)){ set.seed(123*s+123) z<-runif(n,-1,1) x<-z[order(z)] y0 <- 0.5*x+0.4*exp(-20*(x+0.25)^2)-0.4*exp(-20*(x-0.25)^2) y1 <- 0.5*x1+0.4*exp(-20*(x1+0.25)^2)-0.4*exp(-20*(x1-0.25)^2) epsilon <- rnorm(n, 0, 0.3) epsilon1 <- rnorm(length(x1),0,0.3) y <- y0 + epsilon y1<- y1 + epsilon1 mydata<-data.frame(y,x) mydata1<-data.frame(y=y1,x=x1) ##TRM segmented regregression with 2 breakpoints reg<-lm(y~x,mydata) seg2<-segmented(reg,seg.Z=~x,psi=c(-0.5,0.5)) mydata2<-data.frame(x=x1) mydata2$dummy1<-pmax(mydata2$x-seg2$psi[1,2],0) mydata2$dummy2<-I(mydata2$x>seg2$psi[1,2])*coef(seg2)[3] mydata2$dummy3<-pmax(mydata2$x-seg2$psi[2,2],0) mydata2$dummy4<-I(mydata2$x>seg2$psi[2,2])*coef(seg2)[4] names(mydata2)[-1]<-names(model.frame(seg2))[-c(1,2)] segpredict<-predict(seg2,mydata2) segpe<-(segpredict-mydata1$y)^2 MSEPtrm<-mean((segpredict-mydata1$y)^2) MAEtrm<-mean(abs(segpredict-mydata1$y)) TRM[s,]<-segpe } Besides the message it may return as problem 1, it may return error message ?Error in segmented.lm(reg, seg.Z = ~x, psi = c(-0.5, 0.5)) : only 1 datum in an interval: breakpoint(s) at the boundary or too close? (ie, when s=9) I checked the data plot when it returned this message. I found the data points are too few between the two change points. I understand why it returns this message technically. However, I do wonder whether the segmented() function can be applied to the data set that x (independent variable as well as threshold variable ) is randomly spaced and with few points between break points. (Since randomly drawn data set cannot guarantee enough points between change points). Since I want do some simulation study, I need to apply this function to a lot of different data sets. Regarding this problem, do you have any suggestions? Thanks a lot! Victor, PhD Candidate Department of Economics, University of Victoria P.O. Box 1700, STN CSC, Victoria, B.C., CANADA, V8W 2Y2 Email: yunkai at uvic.ca