Dear R-help I'm working on a large dataset which I have divided into 20 subsets based on similar features. Each subset consists of observations from different locations and I wish to use the location as a random effect. For each group I want to select regressors by a stepwise procedure and include a random effect on the intercept. I use stepAIC() and lme(). (The lmer()-function doesn't work with the stepAIC()-function.) Since I have many groups, and I wish to do the same thing for each group, I have constructed a function which takes the dataset as input variable and gives a prediction result (here mean absolute error) as output. This is an example using the Orthodont dataset: library(MASS) library(nlme) PredRes<-function(D1) { lmemod=lme(distance~age*Sex, random=~1|Subject, data=subset(D1,age!=14),method="ML") themod=stepAIC(lmemod,dir="both") prs=predict(themod,newdata=subset(D1,age==14)) obs<-subset(D1,age==14)$distance print(mean(obs-prs)) } Using this function with D1=Orthodont gives:> PredRes(Orthodont)Start: AIC=345.12 distance ~ age * Sex Error in subset(D1, age != 14) : object "D1" not found The code works when I take Orthodont in directly:> lmemod=lme(distance~age*Sex, random=~1|Subject, data=subset(Orthodont,age!=14),method="ML") > themod=stepAIC(lmemod,dir="both")Start: AIC=345.12 distance ~ age * Sex Df AIC - age:Sex 1 344.49 <none> 345.12 Step: AIC=344.49 distance ~ age + Sex Df AIC <none> 344.49 + age:Sex 1 345.12 - Sex 1 348.70 - age 1 371.77> prs=predict(themod,newdata=subset(Orthodont,age==14)) > obs<-subset(Orthodont,age==14)$distance > print(mean(obs-prs))[1] 0.2962963 How can I make this code work with dataset av input variable in a function? I'm using R version: Version: platform = x86_64-unknown-linux-gnu arch = x86_64 os = linux-gnu system = x86_64, linux-gnu status major = 2 minor = 6.0 year = 2007 month = 10 day = 03 svn rev = 43063 language = R version.string = R version 2.6.0 (2007-10-03) Locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C Search Path: .GlobalEnv, package:MASS, package:nlme, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base By the way the R version 2.7.0 Patched (2008-05-08 r45647) gives the same error message. -- Regards Jorunn Slagstad
It's a known scoping issue in lme -- you are doing this from a function. Make sure your dataset is visible -- e.g. use with(). On Fri, 9 May 2008, Jorunn Slagstad wrote:> Dear R-help > > I'm working on a large dataset which I have divided into 20 subsets based on similar features. Each subset consists of observations from different locations and I wish to use the location as a random effect. > For each group I want to select regressors by a stepwise procedure and include a random effect on the intercept. I use stepAIC() and lme(). (The lmer()-function doesn't work with the stepAIC()-function.) > > Since I have many groups, and I wish to do the same thing for each group, I have constructed a function which takes the dataset as input variable and gives a prediction result (here mean absolute error) as output. > > This is an example using the Orthodont dataset: > > library(MASS) > library(nlme) > PredRes<-function(D1) > { > lmemod=lme(distance~age*Sex, random=~1|Subject, data=subset(D1,age!=14),method="ML") > themod=stepAIC(lmemod,dir="both") > prs=predict(themod,newdata=subset(D1,age==14)) > obs<-subset(D1,age==14)$distance > print(mean(obs-prs)) > } > > Using this function with D1=Orthodont gives: > >> PredRes(Orthodont) > Start: AIC=345.12 > distance ~ age * Sex > > Error in subset(D1, age != 14) : object "D1" not found > > The code works when I take Orthodont in directly: > >> lmemod=lme(distance~age*Sex, random=~1|Subject, data=subset(Orthodont,age!=14),method="ML") >> themod=stepAIC(lmemod,dir="both") > Start: AIC=345.12 > distance ~ age * Sex > > Df AIC > - age:Sex 1 344.49 > <none> 345.12 > > Step: AIC=344.49 > distance ~ age + Sex > > Df AIC > <none> 344.49 > + age:Sex 1 345.12 > - Sex 1 348.70 > - age 1 371.77 >> prs=predict(themod,newdata=subset(Orthodont,age==14)) >> obs<-subset(Orthodont,age==14)$distance >> print(mean(obs-prs)) > [1] 0.2962963 > > How can I make this code work with dataset av input variable in a function? > > I'm using R version: > > Version: > platform = x86_64-unknown-linux-gnu > arch = x86_64 > os = linux-gnu > system = x86_64, linux-gnu > status > major = 2 > minor = 6.0 > year = 2007 > month = 10 > day = 03 > svn rev = 43063 > language = R > version.string = R version 2.6.0 (2007-10-03) > > Locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C > > Search Path: > .GlobalEnv, package:MASS, package:nlme, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base > > By the way the R version 2.7.0 Patched (2008-05-08 r45647) gives the same error message. > > -- > Regards > Jorunn Slagstad > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
DAVID ARTETA GARCIA
2008-May-13 12:16 UTC
[R] upperbound of C index Conf.int. greater than 1
R-users, I am bootstrapping the C Index of a model created using lrm{Design} and boot{boot}, and I get that the upperbound of the confidence interval is greater than 1. Here is my code:> library(HSAUR) > data(plasma)##fit model> fit.design <- lrm (ESR ~ fibrinogen + globulin,data=plasma) > fit.design$stats[6]C 0.8044872 ##bootstrap C Index> cindex <- function(formula,data,indices){+ d=data[indices,] + fit<-lrm(formula,data = d) + return(fit$stats[[6]]) + }> results <- boot(data=w,statistic=cindex,R=500,formula = ESR ~ > fibrinogen + globulin) > resultsORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = plasma, statistic = cindex, R = 500, formula = ESR ~ fibrinogen + globulin) Bootstrap Statistics : original bias std. error t1* 0.8044872 0.008834767 0.1574710> boot.ci(results,type="basic")BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 500 bootstrap replicates CALL : boot.ci(boot.out = results, type = "basic") Intervals : Level Basic 95% ( 0.6090, 1.1443 ) Calculations and Intervals on Original Scale I see that the std.error is rather large and this might be the problem, but how can I explain this for publication purposes? Is such an interval acceptable? Any help would be greatly appreciated David