This post is NOT a question, but an answer. For readers please disregard all earlier posts by myself about this question. I'm posting for two reasons. First to say thanks, especially to Dimitris, for suggesting the use of errorest in the ipred library. Second, so that the solution to this problem is in the archives in case it gets asked again. If one wants to run a k-fold cross-validation using specified folds, and get misclassification error and root mean squared error this is what you do. Below is a script that will do this returning the results of two errorest results combined into a data frame. Now this script assumes that the variable you are going to fold with is an integer. If you want to have the predicted values or models returned from each fold, the calls to errorest, can be modified. Please see the help page for control.errorest on details. T -- Trevor Wiens twiens at interbaun.com =========================================================myglm <- function(formula, family, data){ ret <- glm(formula, family=binomial, data=data) return(ret) } myfacpred <- function(object, newdata) { ret <- as.factor(ifelse(predict.glm(object, newdata, type='response') < 0.5, 0, 1)) return(ret) } # logerrorest takes four arguments # mdata is a data frame holding the data to be modeled # form is the output of the is.formula function # rvar is the response variable as a string, for example 'birdx' # fvar is the fold variable, for example 'recordyear' logerrorest <- function(mdata, form, rvar, fvar) { require(Hmisc) require(ipred) # determine index of variables rpos <- match(rvar, names(mdata)) fpos <- match(fvar, names(mdata)) # get fold values and count for each group vardesc <- describe(mdata[[fpos]])$values fvarlist <- as.integer(dimnames(vardesc)[[2]]) k <- length(fvarlist) countlist <- vardesc[1,1] for (i in 2:k) { countlist[i] <- vardesc[1,i] } n <- length(mdata[[fpos]]) # get index list for each fold cc <- list() for (i in 1:k) { cc[[i]] <- as.integer(rownames(mdata[mdata[[fpos]] == fvarlist[i], ])) } # determine root mean squared error ee <- errorest(form, mdata, estimator='cv', model=myglm, est.para=control.errorest(list.tindx = cc)) # determine misclassification error # first convert to factor width = length(mdata) response <- as.factor(as.integer(mdata[[rpos]])) newmatrix <- data.frame(cbind(mdata[1:(rpos-1)], response, mdata[(rpos+1):width])) newform <- as.formula(paste('response ~ ', as.character(form)[[3]])) me <- errorest(newform, newmatrix, estimator='cv', model=myglm, predict=myfacpred, est.para=control.errorest(list.tindx = cc)) ret <- data.frame(cbind(ee, me)) return(ret) }
I have problem with legend command. Please look this script: >dcbv.fm Time Series: Start = 1980 End = 2002 Frequency = 1 [1] 2994.023 3388.414 3111.762 2990.967 3077.438 3058.274 3049.934 2974.130 [9] 2889.659 2801.790 2631.391 2661.700 2312.526 2518.968 2567.044 2443.952 [17] 2117.638 2042.461 2025.816 1939.560 1640.775 1583.609 1659.912 > dcbv.ms Time Series: Start = 1980 End = 2002 Frequency = 1 [1] 3700.239 4076.438 3856.495 3680.345 3871.887 3789.770 3831.173 3768.876 [9] 3585.572 3754.374 3372.859 3419.667 3185.194 3319.215 3445.845 3265.214 [17] 2773.961 2661.904 2669.835 2569.190 2187.719 2217.756 2196.378 >plot(dcbv.ms,ylim=c(min(dcbv.fm),max(dcbv.ms))) >lines(dcbv.fm,col=2) >legend(1984,2500,c("DCVB-MS","DCBV-FM"),col=c(1,2),cex=.6,fill=T) At end of script the legend of plot have only one color: black. I think the legend will have two colors: black and red. Where I make mistake? version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 0.1 year 2004 month 11 day 15 language R Thanks in advance Bernardo Rangel Tura, MD, MSc National Institute of Cardiology Laranjeiras Rio de Janeiro Brazil -- No virus found in this outgoing message. Checked by AVG Anti-Virus.
Bernardo Rangel Tura wrote:> I have problem with legend command. Please look this script: > > >dcbv.fm > Time Series: > Start = 1980 > End = 2002 > Frequency = 1 > [1] 2994.023 3388.414 3111.762 2990.967 3077.438 3058.274 3049.934 > 2974.130 > [9] 2889.659 2801.790 2631.391 2661.700 2312.526 2518.968 2567.044 > 2443.952 > [17] 2117.638 2042.461 2025.816 1939.560 1640.775 1583.609 1659.912 > > > dcbv.ms > Time Series: > Start = 1980 > End = 2002 > Frequency = 1 > [1] 3700.239 4076.438 3856.495 3680.345 3871.887 3789.770 3831.173 > 3768.876 > [9] 3585.572 3754.374 3372.859 3419.667 3185.194 3319.215 3445.845 > 3265.214 > [17] 2773.961 2661.904 2669.835 2569.190 2187.719 2217.756 2196.378 > > >plot(dcbv.ms,ylim=c(min(dcbv.fm),max(dcbv.ms))) > > >lines(dcbv.fm,col=2) > > >legend(1984,2500,c("DCVB-MS","DCBV-FM"),col=c(1,2),cex=.6,fill=T)So you want filles boxes? Then you should specify the color in the fill argument: legend(1984, 2500, c("DCVB-MS", "DCBV-FM"), cex=.6, fill=1:2) or do you want some lines? legend(1984, 2500, c("DCVB-MS", "DCBV-FM"), cex=.6, col=1:2, lwd=2) Uwe Ligges> > At end of script the legend of plot have only one color: black. I think > the legend will have two colors: black and red. > > Where I make mistake? > > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 0.1 > year 2004 > month 11 > day 15 > language R > > Thanks in advance > > Bernardo Rangel Tura, MD, MSc > National Institute of Cardiology Laranjeiras > Rio de Janeiro Brazil >