Benedikt Drosse
2013-Jan-12 10:13 UTC
[R] Question on broken-line regression: 'segmented' or alternative
Dear R-Users,
I have a question concerning the determination of breakpoints and
comparison of slopes from broken-line regression models. Although this
is rather a standard problem in data analysis, all information I
gathered so far, did not answer my questions.
I added a subsetted example of my data. Basically it is a timeseries of
recorded phenotypes in three different groups of plants.
You will see in the plot that the phenotype between groups behaves
differently over time. Plotted data of Group 3 look like a nice linear
relationship between phenotype (y) and time (x). Group 1 seems to have 1
breakpoint (at x=31) and Group 3 has 2 breakpoints (at x=c(14,31)) in
the data.
1. I am very interested in finding the breakpoints (x-values), where the
slope of a regression line changes. In the plot I estimated the
break-points by eye, subsetted the data and performed linear regression
on the subsets for each group. However, this is not the way I want to
go, since I would like to have some reasonable criteria (better than my
eye) to determine breakpoints.
2. Additionally I am interested in testing, whether the slopes of
(partial-)regression lines between groups are different from each other.
Considering my questions, the 'segmented' package should exactly fit my
needs. The problem I have is, that on the one hand I would also like to
compare slopes of regression lines, which do not have breakpoints (e.g
group 3 of the example) to lines, which have breakpoints (group 1 and
2). This seems to be not possible with the 'segmented' package.
On the other hand the 'segmented' package seems to have a poor
error-reporting, so it is very hard for me to figure out, what is going
wrong in the analysis (see example).
Can you think of any other way/procedure/package I could use to analyze
my data? Hopefully you can also give me a clue on how to proceed with
the analysis, when the error of the segmented()-function in the creation
of the covariance-matrix occurs.
I would be grateful for any comment on how to determine breakpoints and
compare slopes of regression lines in a different way than subsetting
the data by hand and regress the subsetted parts as I did it for the
example plot. Thanks to anyone, who has an idea, already. Sorry for the
long example code.
Best regards,
Benedikt
#Start of Code
library(car)
library(segmented)
#Data file:
#Timescale:
x <-
c(0,0,0,3,3,3,3,3,3,6,6,6,9,9,9,14,14,14,17,17,17,20,20,20,24,24,24,28,28,28,31,31,31,34,34,34,38,38,38,45,45,45,0,0,0,3,3,3,3,3,3,6,6,6,9,9,9,14,14,14,17,17,17,20,20,20,24,24,24,28,28,28,31,31,31,34,34,34,38,38,38,45,45,45,0,0,0,3,3,3,7,7,7,9,9,9,13,13,13,13,13,16,16,16,20,20,20,24,24,24,28,28,28,34,34,34,41,41,41,50,50,50)
#y-data for three groups (t1, t2, t3):
t1 <-
c(41,35,38,43,47,46,50,41,47,74,61,58,72,87,84,89,114,93,113,99,105,106,98,200,170,145,153,187,202,195,198,177,195,288,274,233,320,348,428,545,520,549)
t2 <-
c(44,40,38,57,42,47,42,57,40,71,75,64,70,83,88,115,95,106,132,145,115,192,172,190,273,300,250,403,392,343,400,402,444,NA,NA,NA,NA,NA,NA,430,430,459)
t3 <-
c(41,35,38,39,43,47,58,33,45,67,60,70,64,78,78,67,62,78,79,71,93,86,82,105,96,102,131,126,124,112,100,133,170,161,163,154,178,185)
group <- as.factor(c(rep(1, times=length(t1)), rep(2,
times=length(t2)), rep(3, times=length(t3))))
data <- data.frame(group, x=x, y=c(t1,t2,t3))
#Groupwise plot of datapoints vs time:
scatterplot(y~x*group, data=data, smooth=FALSE, reg.line=FALSE,
legend.coords="topleft")
#Regressions for group 1 with 1 breakpoints (red, dashed line)
reg_1.1 <- lm(data[which(data$x<=30 &
data$group==1),3]~data[which(data$x<=30 & data$group==1),2])
abline(reg_1.1, col="black")
reg_1.2 <- lm(data[which(data$x>=30 &
data$group==1),3]~data[which(data$x>=30 & data$group==1),2])
abline(reg_1.2, col="black")
#Regressions for group 2 with 2 breakpoints (black, solid line)
reg_2.1 <- lm(data[which(data$x<=14 &
data$group==2),3]~data[which(data$x<=14 & data$group==2),2])
abline(reg_2.1, col="red", lty=2)
reg_2.2 <- lm(data[which(data$x>=14 & data$x<45 &
data$group==2),3]~data[which(data$x>=14 & data$x<45 &
data$group==2),2])
abline(reg_2.2, col="red", lty=2)
reg_2.3 <- lm(data[which(data$x>=31 & data$x<=45 &
data$group==2),3]~data[which(data$x>=31 & data$x<=45 &
data$group==2),2])
abline(reg_2.3, col="red", lty=2)
#Regression for group 3 without breakpoints (green, solid line)
reg_3.1 <- lm(data[which(data$group==3),3]~data[which(data$group==3),2])
abline(reg_3.1, col="green", lty=1)
#Segmented package to estimate breakpoints and determine slopes for
group2 and 3:
subset_data <- subset(data, subset=data$group!=3)
names(subset_data) <- c("sub_group", "sub_x",
"sub_y")
attach(subset_data)
X <- model.matrix(~0+sub_group)*sub_x
time.1 <- X[,1]
time.2 <- X[,2]
time.3 <- X[,3]
olm <- lm(sub_y~0 + sub_group + time.1 + time.2)
os <- segmented(olm, seg.Z= ~ time.1 + time.2,
psi=list(time.1=c(14,30), time.2=40))
#End of code
David Winsemius
2013-Jan-12 21:06 UTC
[R] Question on broken-line regression: 'segmented' or alternative
On Jan 12, 2010, at 2:13 AM, Benedikt Drosse wrote:> Dear R-Users, > > I have a question concerning the determination of breakpoints and > comparison of slopes from broken-line regression models. Although > this is rather a standard problem in data analysis, all information > I gathered so far, did not answer my questions. > > I added a subsetted example of my data. Basically it is a timeseries > of recorded phenotypes in three different groups of plants. > You will see in the plot that the phenotype between groups behaves > differently over time. Plotted data of Group 3 look like a nice > linear relationship between phenotype (y) and time (x). Group 1 > seems to have 1 breakpoint (at x=31) and Group 3 has 2 breakpoints > (at x=c(14,31)) in the data. > > 1. I am very interested in finding the breakpoints (x-values), where > the slope of a regression line changes. In the plot I estimated the > break-points by eye, subsetted the data and performed linear > regression on the subsets for each group. However, this is not the > way I want to go, since I would like to have some reasonable > criteria (better than my eye) to determine breakpoints. > > 2. Additionally I am interested in testing, whether the slopes of > (partial-)regression lines between groups are different from each > other. > > Considering my questions, the 'segmented' package should exactly fit > my needs. The problem I have is, that on the one hand I would also > like to compare slopes of regression lines, which do not have > breakpoints (e.g group 3 of the example) to lines, which have > breakpoints (group 1 and 2). This seems to be not possible with the > 'segmented' package. > On the other hand the 'segmented' package seems to have a poor error- > reporting, so it is very hard for me to figure out, what is going > wrong in the analysis (see example). > > Can you think of any other way/procedure/package I could use to > analyze my data? Hopefully you can also give me a clue on how to > proceed with the analysis, when the error of the segmented()- > function in the creation of the covariance-matrix occurs. > > I would be grateful for any comment on how to determine breakpoints > and compare slopes of regression lines in a different way than > subsetting the data by hand and regress the subsetted parts as I did > it for the example plot. Thanks to anyone, who has an idea, already. > Sorry for the long example code. >Have you looked at the capabilities of the 'strucchange' package? http://cran.r-project.org/web/packages/strucchange/index.html -- David.> Best regards, > Benedikt > > > > > #Start of Code > library(car) > library(segmented) > > #Data file: > #Timescale: > x <- > c(0,0,0,3,3,3,3,3,3,6,6,6,9,9,9,14,14,14,17,17,17,20,20,20,24,24,24,28,28,28,31,31,31,34,34,34,38,38,38,45,45,45,0,0,0,3,3,3,3,3,3,6,6,6,9,9,9,14,14,14,17,17,17,20,20,20,24,24,24,28,28,28,31,31,31,34,34,34,38,38,38,45,45,45,0,0,0,3,3,3,7,7,7,9,9,9,13,13,13,13,13,16,16,16,20,20,20,24,24,24,28,28,28,34,34,34,41,41,41,50,50,50) > > #y-data for three groups (t1, t2, t3): > t1 <- > c > (41,35,38,43,47,46,50,41,47,74,61,58,72,87,84,89,114,93,113,99,105,106,98,200,170,145,153,187,202,195,198,177,195,288,274,233,320,348,428,545,520,549 > ) > t2 <- > c > (44,40,38,57,42,47,42,57,40,71,75,64,70,83,88,115,95,106,132,145,115,192,172,190,273,300,250,403,392,343,400,402,444 > ,NA,NA,NA,NA,NA,NA,430,430,459) > t3 <- > c > (41,35,38,39,43,47,58,33,45,67,60,70,64,78,78,67,62,78,79,71,93,86,82,105,96,102,131,126,124,112,100,133,170,161,163,154,178,185 > ) > > group <- as.factor(c(rep(1, times=length(t1)), rep(2, > times=length(t2)), rep(3, times=length(t3)))) > > data <- data.frame(group, x=x, y=c(t1,t2,t3)) > > #Groupwise plot of datapoints vs time: > scatterplot(y~x*group, data=data, smooth=FALSE, reg.line=FALSE, > legend.coords="topleft") > > > #Regressions for group 1 with 1 breakpoints (red, dashed line) > reg_1.1 <- lm(data[which(data$x<=30 & data$group==1), > 3]~data[which(data$x<=30 & data$group==1),2]) > abline(reg_1.1, col="black") > > reg_1.2 <- lm(data[which(data$x>=30 & data$group==1), > 3]~data[which(data$x>=30 & data$group==1),2]) > abline(reg_1.2, col="black") > > > #Regressions for group 2 with 2 breakpoints (black, solid line) > reg_2.1 <- lm(data[which(data$x<=14 & data$group==2), > 3]~data[which(data$x<=14 & data$group==2),2]) > abline(reg_2.1, col="red", lty=2) > > reg_2.2 <- lm(data[which(data$x>=14 & data$x<45 & data$group==2), > 3]~data[which(data$x>=14 & data$x<45 & data$group==2),2]) > abline(reg_2.2, col="red", lty=2) > > reg_2.3 <- lm(data[which(data$x>=31 & data$x<=45 & data > $group==2),3]~data[which(data$x>=31 & data$x<=45 & data$group==2),2]) > abline(reg_2.3, col="red", lty=2) > > > #Regression for group 3 without breakpoints (green, solid line) > reg_3.1 <- lm(data[which(data$group==3),3]~data[which(data > $group==3),2]) > abline(reg_3.1, col="green", lty=1) > > > #Segmented package to estimate breakpoints and determine slopes for > group2 and 3: > subset_data <- subset(data, subset=data$group!=3) > names(subset_data) <- c("sub_group", "sub_x", "sub_y") > attach(subset_data) > > X <- model.matrix(~0+sub_group)*sub_x > > time.1 <- X[,1] > time.2 <- X[,2] > time.3 <- X[,3] > > olm <- lm(sub_y~0 + sub_group + time.1 + time.2) > os <- segmented(olm, seg.Z= ~ time.1 + time.2, psi=list(time. > 1=c(14,30), time.2=40)) > > #End of code > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Alameda, CA, USA