Sumitrajit Dhar
2018-Jun-03 16:03 UTC
[R] Running segmented on grouped data and collating model parameters in a data frame
Hi Folks, I am trying to teach myself how to solve the problem described below but am running out of time. Hence the plea for help. Thanks in advance. Here is my data frame.> t# A tibble: 12 x 12 subject ageGrp ear hearingGrp sex freq L2 Ldp Phidp NF SNR Ldp_p <fct> <fct> <fct> <fct> <fct> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> 1 HALAF573 A L A F 2 0 -19.6 197. -28.5 8.88 2.10e-6 2 HALAF573 A L A F 2 2 -18.7 203. -22.0 3.25 2.31e-6 3 HALAF573 A L A F 2 4 -29.1 255. -27.4 -1.64 7.04e-7 4 HALAF573 A L A F 2 6 -12.4 174. -12.2 -0.206 4.78e-6 5 HALAF573 A L A F 4 0 -28.6 232. -26.7 -1.87 7.45e-7 6 HALAF573 A L A F 4 2 -27.2 351. -28.8 1.59 8.68e-7 7 HALAF573 A L A F 4 4 -20.4 26.2 -35.0 14.6 1.92e-6 8 HALAF573 A L A F 4 6 -20.0 85.1 -29.8 9.75 2.00e-6 9 HALAF573 A L A F 8 0 -22.8 39.2 -22.1 -0.689 1.45e-6 10 HALAF573 A L A F 8 2 -14.5 13.4 -20.7 6.23 3.76e-6 11 HALAF573 A L A F 8 4 -17.3 345. -21.6 4.30 2.73e-6 12 HALAF573 A L A F 8 6 -14.1 320. -21.7 7.59 3.94e-6 # Note that there are more levels of L2 (31 in total) and 344 other subjects but I truncated the frame for posting. # I want to do this: t %>% group_by(freq) %>% [run segmented] %>% [create a data frame with [subject, freq, breakpoint1, breakpoint2, slope1, slope2, slope3, L2 when Ldp_p == 0]. # Also note that ultimately I will be grouping by "subject, freq". # I can run the models and get believable results. The following run on a data frame with L2 between 0 and 60. out.lm <- lm(Ldp_p ~ L2, data = t) o <- segmented(out.lm, seg.Z = ~L2, psi = list(L2 = c(20,45)), control = seg.control(display = FALSE) o$psi # Initial Est. St.Err #psi1.L2 20 30.78256 0.5085192 #psi2.L2 45 53.16390 0.4671701 slope(o) slope(o) $L2 # Est. St.Err. t value CI(95%).l CI(95%).u #slope1 1.2060e-06 1.6606e-07 7.2622 8.6397e-07 1.5480e-06 #slope2 1.0708e-05 2.9196e-07 36.6770 1.0107e-05 1.1309e-05 #slope3 -4.5791e-06 1.3694e-06 -3.3439 -7.3995e-06 -1.7588e-06 Thanks again for bailing me out. Regards, Sumit [[alternative HTML version deleted]]
Vito M. R. Muggeo
2018-Jun-04 15:08 UTC
[R] Running segmented on grouped data and collating model parameters in a data frame
dear Sumi, I am not familiar with the dplyr package (%>%..), however if you want to fit the model for each subject times freq interaction, a simple for loop will suffice. Here possible code: Assuming d is the dataframe, something like subj<-levels(d$subject) fr<-unique(d$freq) #new dataframe where the estimates will be stored newd<-expand.grid(subj=subj, freq=fr) newd<-cbind(newd, matrix(-99,nrow(newd),9)) names(newd)[-(1:2)]<-c(paste0("slope",1:3), paste0(c("CI.inf.","CI.sup."), rep(paste0("slope",1:3),each=2))) library(segmented) #fit the model for each subject x freq combination for(i in subj){ for(j in fr){ current.d<-subset(d, subject==i & freq==j) if(nrow(current.d)>0){ o<-lm(Ldp_p~L2, data=current.d) os<-try(segmented(o, ~L2, psi=c(20,50)), silent=TRUE) if(class(os)[1]!="try-error"){ est.slope<-as.vector(slope(os)[[1]][,1]) #point est ci.slope<-as.vector(t(slope(os)[[1]][,4:5])) #CI newd[newd$subj==i & newd$freq==j, -c(1:2)]<-c(est.slope, ci.slope) } } } } However it seems that you can be interested in segmented *mixed* models.. I mean, rather than estimating a segmented model for each subject, you could estimate a single model assuming random effects (for each model parameter, including the breakpoint) for each subject: Have a look to http://journals.sagepub.com/doi/abs/10.1177/1471082X13504721 Let me know (not on the R list) if you are interested in relevant code best, vito Il 03/06/2018 18:03, Sumitrajit Dhar ha scritto:> Hi Folks, > > I am trying to teach myself how to solve the problem described below but am > running out of time. Hence the plea for help. Thanks in advance. > > Here is my data frame. > >> t > # A tibble: 12 x 12 > subject ageGrp ear hearingGrp sex freq L2 Ldp Phidp NF > SNR Ldp_p > <fct> <fct> <fct> <fct> <fct> <int> <int> <dbl> <dbl> <dbl> > <dbl> <dbl> > 1 HALAF573 A L A F 2 0 -19.6 197. -28.5 > 8.88 2.10e-6 > 2 HALAF573 A L A F 2 2 -18.7 203. -22.0 > 3.25 2.31e-6 > 3 HALAF573 A L A F 2 4 -29.1 255. -27.4 > -1.64 7.04e-7 > 4 HALAF573 A L A F 2 6 -12.4 174. -12.2 > -0.206 4.78e-6 > 5 HALAF573 A L A F 4 0 -28.6 232. -26.7 > -1.87 7.45e-7 > 6 HALAF573 A L A F 4 2 -27.2 351. -28.8 > 1.59 8.68e-7 > 7 HALAF573 A L A F 4 4 -20.4 26.2 -35.0 > 14.6 1.92e-6 > 8 HALAF573 A L A F 4 6 -20.0 85.1 -29.8 > 9.75 2.00e-6 > 9 HALAF573 A L A F 8 0 -22.8 39.2 -22.1 > -0.689 1.45e-6 > 10 HALAF573 A L A F 8 2 -14.5 13.4 -20.7 > 6.23 3.76e-6 > 11 HALAF573 A L A F 8 4 -17.3 345. -21.6 > 4.30 2.73e-6 > 12 HALAF573 A L A F 8 6 -14.1 320. -21.7 > 7.59 3.94e-6 > > # Note that there are more levels of L2 (31 in total) and 344 other > subjects but I truncated the frame for posting. > > # I want to do this: t %>% group_by(freq) %>% [run segmented] %>% [create > a data frame with [subject, freq, breakpoint1, breakpoint2, slope1, slope2, > slope3, L2 when Ldp_p == 0]. > > # Also note that ultimately I will be grouping by "subject, freq". > > # I can run the models and get believable results. The following run on a > data frame with L2 between 0 and 60. > > out.lm <- lm(Ldp_p ~ L2, data = t) > o <- segmented(out.lm, seg.Z = ~L2, psi = list(L2 = c(20,45)), > control = seg.control(display = FALSE) > > o$psi > # Initial Est. St.Err > #psi1.L2 20 30.78256 0.5085192 > #psi2.L2 45 53.16390 0.4671701 > > slope(o) > slope(o) > $L2 > # Est. St.Err. t value CI(95%).l CI(95%).u > #slope1 1.2060e-06 1.6606e-07 7.2622 8.6397e-07 1.5480e-06 > #slope2 1.0708e-05 2.9196e-07 36.6770 1.0107e-05 1.1309e-05 > #slope3 -4.5791e-06 1.3694e-06 -3.3439 -7.3995e-06 -1.7588e-06 > > Thanks again for bailing me out. > > Regards, > Sumit > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- =============================================Vito M.R. Muggeo Dip.to Sc Econom, Az e Statistiche Universit? di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 23895240 fax: 091 485726 http://dssm.unipa.it/vmuggeo Associate Editor, Statistical Modelling Chair, Statistical Modelling Society