> From: Weiwei Shi
>
> Hi, there:
> I made a function to do k-fold cross-validation as
> below. Basically whenever I call cv(test) for example,
> an error message like:
> 20Fold 1
> Error in model.frame(formula, rownames, variables,
> varnames, extras, extranames, :
> variable lengths differ
>
> please help.
>
> My test dataset has 142 variables, the last one is a
> categorical response variable.
> also, i am not sure how to save the trees into a list
> or something so that I can handle, like pointer array
> or something in C.
>
> Thanks.
>
> Weiwei Shi, Ph.D
>
> cv<- function(all.data,n.folds=10,mcp=0.003) {
>
> n <- nrow(all.data)
> idx <- sample(n,n)
> all.data <- all.data[idx,]
>
> n.each.part <- as.integer(n/n.folds)
> r.model<- vector()
> r.model.prune<- vector()
>
> for(i in 1:n.folds) {
> cat('Fold ',i,'\n')
> out.fold <- ((i-1)*n.each.part+1):(i*n.each.part)
> tmp<-all.data[-(out.fold),1:141]
> r.model[i]<- rpart(all.data$V142~., data=tmp,
^^^^^^^^^^^^^^^^^^^^^^^^^
That ain't gonna work. You specify the response from all.data, which has
length n, but the rest of the variables (found in tmp) has fewer cases,
hence the error.
I'd recommend that you consult either `Monder Applied Statistics in S',
4th
ed., or `S Programming' (which has a chapter on how to do CV efficiently) if
you really want to learn how to code CV. Otherwise I'd suggest that you use
the errorest() function in the ipred package on CRAN.
Andy
> parms=list(split='gini'), cp=0)
> #r.model.prune[i]<-prune(r.model[i], cp=mcp)
>
> }
> return (r.model)
> }
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>