Andrew Dolman
2010-Mar-12 12:15 UTC
[R] using xval in mvpart to specify cross validation groups
Dear R's I'm trying to use specific rather than random cross-validation groups in mvpart. The man page says: xval Number of cross-validations or vector defining cross-validation groups. And I found this reply to the list by Terry Therneau from 2006 The rpart function allows one to give the cross-validation groups explicitly. So if the number of observations was 10, you could use > rpart( y ~ x1 + x2, data=mydata, xval=c(1,1,2,2,3,3,1,3,2,1)) which causes observations 1,2,7, and 10 to be left out of the first xval sample, 3,4, and 9 out of the second, etc. Terry Therneau I can't see how this string of values, c(1,1,2,2,3,3,1,3,2,1), codes for observations 1,2,7,10 being left out of the 1st and so on. Can anyone fill me in please? Thanks, andydolman at gmail.com
Andrew Dolman
2010-Mar-12 22:05 UTC
[R] using xval in mvpart to specify cross validation groups
Thank you Dennis, I've got the idea now. However, a followup question to make sure I'm not wasting my time. If I specify the precise CV folds to use, should I not get the same tree every time? e.g. here I have an hypothetical time sequence observed with error from 3 sites 's' If I specify to leave out 1 site each time in a 3-fold CV (leaving aside that 3-fold cv might not be a good idea) Should I not get the same tree each time? library(mvpart) library(lattice) y <- rep(sin(seq(0.1,6, 0.1)),3) y1 <- y+rnorm(length(y), sd=0.5) x <- rep(1:(length(y)/3),3) s <- rep(1:3, each=(length(y)/3)) dat <- data.frame(x,y1,s) xyplot(y1~x|s, data=dat) (mvpart(y1~x, data=dat, xv="1se", xval=s)) Thank you for your help. andydolman at gmail.com On 12 March 2010 18:03, Dennis Murphy <djmuser at gmail.com> wrote:> Hi: > > See inline... > > On Fri, Mar 12, 2010 at 4:15 AM, Andrew Dolman <andydolman at gmail.com> wrote: >> >> Dear R's >> >> I'm trying to use specific rather than random cross-validation groups >> in mvpart. >> >> The man page says: >> xval Number of cross-validations or vector defining cross-validation >> groups. >> >> >> And I found this reply to the list by Terry Therneau from 2006 >> >> The rpart function allows one to give the cross-validation groups >> explicitly. >> So if the number of observations was 10, you could use >> ? > rpart( y ~ x1 + x2, data=mydata, xval=c(1,1,2,2,3,3,1,3,2,1)) >> which causes observations 1,2,7, and 10 to be left out of the first xval >> sample, 3,4, and 9 out of the second, etc. >> >> ? ? ? ?Terry Therneau >> >> >> I can't see how this string of values, c(1,1,2,2,3,3,1,3,2,1), codes >> for observations 1,2,7,10 being left out of the 1st and so on. > > >> x <- c(1,1,2,2,3,3,1,3,2,1) >> which(x == 1)?????? # elements left out of the first xval sample > [1]? 1? 2? 7 10 >> which(x == 2)?????? # elements left out of the second xval sample > [1] 3 4 9 >> which(x == 3)?????? # elements left out of the third xval sample > [1] 5 6 8 > > This vector is used to index a response vector/model matrix. > > To see how this is applied, consider the following. y is a vector of > length 10, the same as x: >> y <- rpois(10, 15) >> y > ?[1] 12 15 17 11 14 14 12 12 16 16 >> y[x != 1]????????????????? # first xval sample (y[1], y[2], y[7], y[10] >> removed) > [1] 17 11 14 14 12 16 >> y[x != 2]????????????????? # second xval sample (y[3], y[4], y[9] removed) > [1] 12 15 14 14 12 12 16 >> y[x != 3]????????????????? # third xval sample (y[5], y[6], y[8] removed) > [1] 12 15 17 11 12 16 16 > > Indexing is one of the most important and powerful features of R. > > HTH, > Dennis > >> Can anyone fill me in please? >> >> Thanks, >> >> andydolman at gmail.com >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >