This post is NOT a question, but an answer. For readers please disregard all
earlier posts by myself about this question.
I'm posting for two reasons. First to say thanks, especially to Dimitris,
for suggesting the use of errorest in the ipred library. Second, so that the
solution to this problem is in the archives in case it gets asked again.
If one wants to run a k-fold cross-validation using specified folds, and get
misclassification error and root mean squared error this is what you do.
Below is a script that will do this returning the results of two errorest
results combined into a data frame. Now this script assumes that the variable
you are going to fold with is an integer. If you want to have the predicted
values or models returned from each fold, the calls to errorest, can be
modified. Please see the help page for control.errorest on details.
T
--
Trevor Wiens
twiens at interbaun.com
=========================================================myglm <-
function(formula, family, data){
ret <- glm(formula, family=binomial, data=data)
return(ret)
}
myfacpred <- function(object, newdata) {
ret <- as.factor(ifelse(predict.glm(object, newdata, type='response')
< 0.5, 0, 1))
return(ret)
}
# logerrorest takes four arguments
# mdata is a data frame holding the data to be modeled
# form is the output of the is.formula function
# rvar is the response variable as a string, for example 'birdx'
# fvar is the fold variable, for example 'recordyear'
logerrorest <- function(mdata, form, rvar, fvar) {
require(Hmisc)
require(ipred)
# determine index of variables
rpos <- match(rvar, names(mdata))
fpos <- match(fvar, names(mdata))
# get fold values and count for each group
vardesc <- describe(mdata[[fpos]])$values
fvarlist <- as.integer(dimnames(vardesc)[[2]])
k <- length(fvarlist)
countlist <- vardesc[1,1]
for (i in 2:k)
{
countlist[i] <- vardesc[1,i]
}
n <- length(mdata[[fpos]])
# get index list for each fold
cc <- list()
for (i in 1:k)
{
cc[[i]] <- as.integer(rownames(mdata[mdata[[fpos]] == fvarlist[i], ]))
}
# determine root mean squared error
ee <- errorest(form, mdata, estimator='cv', model=myglm,
est.para=control.errorest(list.tindx = cc))
# determine misclassification error
# first convert to factor
width = length(mdata)
response <- as.factor(as.integer(mdata[[rpos]]))
newmatrix <- data.frame(cbind(mdata[1:(rpos-1)], response,
mdata[(rpos+1):width]))
newform <- as.formula(paste('response ~ ', as.character(form)[[3]]))
me <- errorest(newform, newmatrix, estimator='cv', model=myglm,
predict=myfacpred, est.para=control.errorest(list.tindx = cc))
ret <- data.frame(cbind(ee, me))
return(ret)
}
I have problem with legend command. Please look this script:
>dcbv.fm
Time Series:
Start = 1980
End = 2002
Frequency = 1
[1] 2994.023 3388.414 3111.762 2990.967 3077.438 3058.274 3049.934 2974.130
[9] 2889.659 2801.790 2631.391 2661.700 2312.526 2518.968 2567.044 2443.952
[17] 2117.638 2042.461 2025.816 1939.560 1640.775 1583.609 1659.912
> dcbv.ms
Time Series:
Start = 1980
End = 2002
Frequency = 1
[1] 3700.239 4076.438 3856.495 3680.345 3871.887 3789.770 3831.173 3768.876
[9] 3585.572 3754.374 3372.859 3419.667 3185.194 3319.215 3445.845 3265.214
[17] 2773.961 2661.904 2669.835 2569.190 2187.719 2217.756 2196.378
>plot(dcbv.ms,ylim=c(min(dcbv.fm),max(dcbv.ms)))
>lines(dcbv.fm,col=2)
>legend(1984,2500,c("DCVB-MS","DCBV-FM"),col=c(1,2),cex=.6,fill=T)
At end of script the legend of plot have only one color: black. I think the
legend will have two colors: black and red.
Where I make mistake?
version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 0.1
year 2004
month 11
day 15
language R
Thanks in advance
Bernardo Rangel Tura, MD, MSc
National Institute of Cardiology Laranjeiras
Rio de Janeiro Brazil
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Bernardo Rangel Tura wrote:> I have problem with legend command. Please look this script: > > >dcbv.fm > Time Series: > Start = 1980 > End = 2002 > Frequency = 1 > [1] 2994.023 3388.414 3111.762 2990.967 3077.438 3058.274 3049.934 > 2974.130 > [9] 2889.659 2801.790 2631.391 2661.700 2312.526 2518.968 2567.044 > 2443.952 > [17] 2117.638 2042.461 2025.816 1939.560 1640.775 1583.609 1659.912 > > > dcbv.ms > Time Series: > Start = 1980 > End = 2002 > Frequency = 1 > [1] 3700.239 4076.438 3856.495 3680.345 3871.887 3789.770 3831.173 > 3768.876 > [9] 3585.572 3754.374 3372.859 3419.667 3185.194 3319.215 3445.845 > 3265.214 > [17] 2773.961 2661.904 2669.835 2569.190 2187.719 2217.756 2196.378 > > >plot(dcbv.ms,ylim=c(min(dcbv.fm),max(dcbv.ms))) > > >lines(dcbv.fm,col=2) > > >legend(1984,2500,c("DCVB-MS","DCBV-FM"),col=c(1,2),cex=.6,fill=T)So you want filles boxes? Then you should specify the color in the fill argument: legend(1984, 2500, c("DCVB-MS", "DCBV-FM"), cex=.6, fill=1:2) or do you want some lines? legend(1984, 2500, c("DCVB-MS", "DCBV-FM"), cex=.6, col=1:2, lwd=2) Uwe Ligges> > At end of script the legend of plot have only one color: black. I think > the legend will have two colors: black and red. > > Where I make mistake? > > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 0.1 > year 2004 > month 11 > day 15 > language R > > Thanks in advance > > Bernardo Rangel Tura, MD, MSc > National Institute of Cardiology Laranjeiras > Rio de Janeiro Brazil >