thr3ads.net - R help - [R] model seleciton by leave-one-out cross-validation [May 2007]

If this information is useful, please help other people find it:
Share via:

李俊杰

2007-May-11 14:49 UTC

[R] model seleciton by leave-one-out cross-validation

Hi, all

When I am using mle.cv(wle), I find a interesting problem: I can't do
leave-one-out cross-validation with mle.cv(wle). I will illustrate the
problem as following:
> xx=matrix(rnorm(20*3),ncol=3)
> bb=c(1,2,0)
> yy=xx%*%bb+rnorm(20,0,0.001)+0
> summary(mle.cv(yy~xx,split=nrow(xx)-1,monte.carlo=2*nrow(xx),verbose=T),num.max=1)[[1]]
mle.cv: dimension of the split subsample set to default value =  9
 (Intercept)          xx1          xx2          xx3           cv
0.000000e+00 1.000000e+00 1.000000e+00 0.000000e+00 1.292513e-06


So does anybody know how to do linear model selection by leave-one-out
cross-validation? I've written one function, but it runs toooooo slow~~~

Thanks firstly

This is my super slow function:


####################
## function: rec.comb
## input: vec -- vector
## output: all possible combination from the elements of vec
####################
rec.comb=function(vec)
{
  if(length(vec)==0){list(NULL)
  }else { tmp=rec.comb(vec[-1])
    tmp2=sapply(tmp,function(x)c(vec[1],x))
    c(tmp,tmp2)
  }
}

####################
## function: CV1glm--CV1 using K fold CV
## input: y -- response vector; x -- predictors without intercept
## output: the vector whether each predictor should be selected. E.g.
 (0,1,0,1) means no intecept while var1 and var3 should be selected
####################
CV1glm=function(y,x){
 n.var=ncol(x)
 n=nrow(x)
 comb=rec.comb(1: (n.var))
 n.comb=length(comb)
 pe=c()

 data=data.frame(y=y)
 glm=glm(y~1,data=data)
 pe[1]=cv.glm(data,glm)$delta[1]
 for(i in 2:n.comb){
  data=data.frame(y=y,x=x[,comb[[i]]])
  glm=glm(y~.,data=data)
  pe[i]=cv.glm(data,glm)$delta[1]
 }
 pe1=c() ####################################without intercept
 pe1[1]=Inf
 for(i in 2:n.comb){
  data=data.frame(y=y,x=x[,comb[[i]]])
  glm=glm(y~.-1,data=data)
  pe1[i]=cv.glm(data,glm)$delta[1]
 }

 var=rep(0,n.var)
 if(min(pe)<min(pe1)){
  int=1
  var[comb[[which(pe==min(pe))]]]=1
 }else{
  int=0
  var[comb[[which(pe1==min(pe1))]]]=1
 }
 c(int,var)
}



-- 
Junjie Li,                  klijunjie@gmail.com
Undergranduate in DEP of Tsinghua University,

	[[alternative HTML version deleted]]

Prof Brian Ripley

2007-May-11 15:35 UTC

head link

[R] model seleciton by leave-one-out cross-validation

What is 'cv.glm'?  That's almost certainly where the time is going,
and
there are fast methods of LOO cross-validation for linear models (as 
distinct from glms).  If this is the function of that name from package 
boot, that comment certainly applies.

Do you know about the relationship between LOO CV and AIC in model 
selection?

On Fri, 11 May 2007, ?????? wrote:
> Hi, all
>
> When I am using mle.cv(wle), I find a interesting problem: I can't do
> leave-one-out cross-validation with mle.cv(wle). I will illustrate the
> problem as following:
>
>> xx=matrix(rnorm(20*3),ncol=3)
>> bb=c(1,2,0)
>> yy=xx%*%bb+rnorm(20,0,0.001)+0
>>
summary(mle.cv(yy~xx,split=nrow(xx)-1,monte.carlo=2*nrow(xx),verbose=T),
> num.max=1)[[1]]
> mle.cv: dimension of the split subsample set to default value =  9
> (Intercept)          xx1          xx2          xx3           cv
> 0.000000e+00 1.000000e+00 1.000000e+00 0.000000e+00 1.292513e-06
>
>
> So does anybody know how to do linear model selection by leave-one-out
> cross-validation? I've written one function, but it runs toooooo
slow~~~
>
> Thanks firstly
>
> This is my super slow function:
>
>
> ####################
> ## function: rec.comb
> ## input: vec -- vector
> ## output: all possible combination from the elements of vec
> ####################
> rec.comb=function(vec)
> {
>  if(length(vec)==0){list(NULL)
>  }else { tmp=rec.comb(vec[-1])
>    tmp2=sapply(tmp,function(x)c(vec[1],x))
>    c(tmp,tmp2)
>  }
> }
>
> ####################
> ## function: CV1glm--CV1 using K fold CV
> ## input: y -- response vector; x -- predictors without intercept
> ## output: the vector whether each predictor should be selected. E.g.
> (0,1,0,1) means no intecept while var1 and var3 should be selected
> ####################
> CV1glm=function(y,x){
> n.var=ncol(x)
> n=nrow(x)
> comb=rec.comb(1: (n.var))
> n.comb=length(comb)
> pe=c()
>
> data=data.frame(y=y)
> glm=glm(y~1,data=data)
> pe[1]=cv.glm(data,glm)$delta[1]
> for(i in 2:n.comb){
>  data=data.frame(y=y,x=x[,comb[[i]]])
>  glm=glm(y~.,data=data)
>  pe[i]=cv.glm(data,glm)$delta[1]
> }
> pe1=c() ####################################without intercept
> pe1[1]=Inf
> for(i in 2:n.comb){
>  data=data.frame(y=y,x=x[,comb[[i]]])
>  glm=glm(y~.-1,data=data)
>  pe1[i]=cv.glm(data,glm)$delta[1]
> }
>
> var=rep(0,n.var)
> if(min(pe)<min(pe1)){
>  int=1
>  var[comb[[which(pe==min(pe))]]]=1
> }else{
>  int=0
>  var[comb[[which(pe1==min(pe1))]]]=1
> }
> c(int,var)
> }
>
>
>
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Reasonably Related Threads

Search for more apparently analagous threads

R help - May 2007 - model seleciton by leave-one-out cross-validation

[R] model seleciton by leave-one-out cross-validation

[R] model seleciton by leave-one-out cross-validation

Reasonably Related Threads