Hi, all When I am using mle.cv(wle), I find a interesting problem: I can't do leave-one-out cross-validation with mle.cv(wle). I will illustrate the problem as following:> xx=matrix(rnorm(20*3),ncol=3) > bb=c(1,2,0) > yy=xx%*%bb+rnorm(20,0,0.001)+0 > summary(mle.cv(yy~xx,split=nrow(xx)-1,monte.carlo=2*nrow(xx),verbose=T),num.max=1)[[1]] mle.cv: dimension of the split subsample set to default value = 9 (Intercept) xx1 xx2 xx3 cv 0.000000e+00 1.000000e+00 1.000000e+00 0.000000e+00 1.292513e-06 So does anybody know how to do linear model selection by leave-one-out cross-validation? I've written one function, but it runs toooooo slow~~~ Thanks firstly This is my super slow function: #################### ## function: rec.comb ## input: vec -- vector ## output: all possible combination from the elements of vec #################### rec.comb=function(vec) { if(length(vec)==0){list(NULL) }else { tmp=rec.comb(vec[-1]) tmp2=sapply(tmp,function(x)c(vec[1],x)) c(tmp,tmp2) } } #################### ## function: CV1glm--CV1 using K fold CV ## input: y -- response vector; x -- predictors without intercept ## output: the vector whether each predictor should be selected. E.g. (0,1,0,1) means no intecept while var1 and var3 should be selected #################### CV1glm=function(y,x){ n.var=ncol(x) n=nrow(x) comb=rec.comb(1: (n.var)) n.comb=length(comb) pe=c() data=data.frame(y=y) glm=glm(y~1,data=data) pe[1]=cv.glm(data,glm)$delta[1] for(i in 2:n.comb){ data=data.frame(y=y,x=x[,comb[[i]]]) glm=glm(y~.,data=data) pe[i]=cv.glm(data,glm)$delta[1] } pe1=c() ####################################without intercept pe1[1]=Inf for(i in 2:n.comb){ data=data.frame(y=y,x=x[,comb[[i]]]) glm=glm(y~.-1,data=data) pe1[i]=cv.glm(data,glm)$delta[1] } var=rep(0,n.var) if(min(pe)<min(pe1)){ int=1 var[comb[[which(pe==min(pe))]]]=1 }else{ int=0 var[comb[[which(pe1==min(pe1))]]]=1 } c(int,var) } -- Junjie Li, klijunjie@gmail.com Undergranduate in DEP of Tsinghua University, [[alternative HTML version deleted]]
Prof Brian Ripley
2007-May-11 15:35 UTC
[R] model seleciton by leave-one-out cross-validation
What is 'cv.glm'? That's almost certainly where the time is going, and there are fast methods of LOO cross-validation for linear models (as distinct from glms). If this is the function of that name from package boot, that comment certainly applies. Do you know about the relationship between LOO CV and AIC in model selection? On Fri, 11 May 2007, ?????? wrote:> Hi, all > > When I am using mle.cv(wle), I find a interesting problem: I can't do > leave-one-out cross-validation with mle.cv(wle). I will illustrate the > problem as following: > >> xx=matrix(rnorm(20*3),ncol=3) >> bb=c(1,2,0) >> yy=xx%*%bb+rnorm(20,0,0.001)+0 >> summary(mle.cv(yy~xx,split=nrow(xx)-1,monte.carlo=2*nrow(xx),verbose=T), > num.max=1)[[1]] > mle.cv: dimension of the split subsample set to default value = 9 > (Intercept) xx1 xx2 xx3 cv > 0.000000e+00 1.000000e+00 1.000000e+00 0.000000e+00 1.292513e-06 > > > So does anybody know how to do linear model selection by leave-one-out > cross-validation? I've written one function, but it runs toooooo slow~~~ > > Thanks firstly > > This is my super slow function: > > > #################### > ## function: rec.comb > ## input: vec -- vector > ## output: all possible combination from the elements of vec > #################### > rec.comb=function(vec) > { > if(length(vec)==0){list(NULL) > }else { tmp=rec.comb(vec[-1]) > tmp2=sapply(tmp,function(x)c(vec[1],x)) > c(tmp,tmp2) > } > } > > #################### > ## function: CV1glm--CV1 using K fold CV > ## input: y -- response vector; x -- predictors without intercept > ## output: the vector whether each predictor should be selected. E.g. > (0,1,0,1) means no intecept while var1 and var3 should be selected > #################### > CV1glm=function(y,x){ > n.var=ncol(x) > n=nrow(x) > comb=rec.comb(1: (n.var)) > n.comb=length(comb) > pe=c() > > data=data.frame(y=y) > glm=glm(y~1,data=data) > pe[1]=cv.glm(data,glm)$delta[1] > for(i in 2:n.comb){ > data=data.frame(y=y,x=x[,comb[[i]]]) > glm=glm(y~.,data=data) > pe[i]=cv.glm(data,glm)$delta[1] > } > pe1=c() ####################################without intercept > pe1[1]=Inf > for(i in 2:n.comb){ > data=data.frame(y=y,x=x[,comb[[i]]]) > glm=glm(y~.-1,data=data) > pe1[i]=cv.glm(data,glm)$delta[1] > } > > var=rep(0,n.var) > if(min(pe)<min(pe1)){ > int=1 > var[comb[[which(pe==min(pe))]]]=1 > }else{ > int=0 > var[comb[[which(pe1==min(pe1))]]]=1 > } > c(int,var) > } > > > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595