Dear R-help
I'm working on a large dataset which I have divided into 20 subsets based on
similar features. Each subset consists of observations from different locations
and I wish to use the location as a random effect.
For each group I want to select regressors by a stepwise procedure and include a
random effect on the intercept. I use stepAIC() and lme(). (The lmer()-function
doesn't work with the stepAIC()-function.)
Since I have many groups, and I wish to do the same thing for each group, I have
constructed a function which takes the dataset as input variable and gives a
prediction result (here mean absolute error) as output.
This is an example using the Orthodont dataset:
library(MASS)
library(nlme)
PredRes<-function(D1)
{
lmemod=lme(distance~age*Sex, random=~1|Subject,
data=subset(D1,age!=14),method="ML")
themod=stepAIC(lmemod,dir="both")
prs=predict(themod,newdata=subset(D1,age==14))
obs<-subset(D1,age==14)$distance
print(mean(obs-prs))
}
Using this function with D1=Orthodont gives:
> PredRes(Orthodont)
Start: AIC=345.12
distance ~ age * Sex
Error in subset(D1, age != 14) : object "D1" not found
The code works when I take Orthodont in directly:
> lmemod=lme(distance~age*Sex, random=~1|Subject,
data=subset(Orthodont,age!=14),method="ML")
> themod=stepAIC(lmemod,dir="both")
Start: AIC=345.12
distance ~ age * Sex
Df AIC
- age:Sex 1 344.49
<none> 345.12
Step: AIC=344.49
distance ~ age + Sex
Df AIC
<none> 344.49
+ age:Sex 1 345.12
- Sex 1 348.70
- age 1 371.77> prs=predict(themod,newdata=subset(Orthodont,age==14))
> obs<-subset(Orthodont,age==14)$distance
> print(mean(obs-prs))
[1] 0.2962963
How can I make this code work with dataset av input variable in a function?
I'm using R version:
Version:
platform = x86_64-unknown-linux-gnu
arch = x86_64
os = linux-gnu
system = x86_64, linux-gnu
status major = 2
minor = 6.0
year = 2007
month = 10
day = 03
svn rev = 43063
language = R
version.string = R version 2.6.0 (2007-10-03)
Locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
Search Path:
.GlobalEnv, package:MASS, package:nlme, package:stats, package:graphics,
package:grDevices, package:utils, package:datasets, package:methods, Autoloads,
package:base
By the way the R version 2.7.0 Patched (2008-05-08 r45647) gives the same error
message.
--
Regards
Jorunn Slagstad
It's a known scoping issue in lme -- you are doing this from a function. Make sure your dataset is visible -- e.g. use with(). On Fri, 9 May 2008, Jorunn Slagstad wrote:> Dear R-help > > I'm working on a large dataset which I have divided into 20 subsets based on similar features. Each subset consists of observations from different locations and I wish to use the location as a random effect. > For each group I want to select regressors by a stepwise procedure and include a random effect on the intercept. I use stepAIC() and lme(). (The lmer()-function doesn't work with the stepAIC()-function.) > > Since I have many groups, and I wish to do the same thing for each group, I have constructed a function which takes the dataset as input variable and gives a prediction result (here mean absolute error) as output. > > This is an example using the Orthodont dataset: > > library(MASS) > library(nlme) > PredRes<-function(D1) > { > lmemod=lme(distance~age*Sex, random=~1|Subject, data=subset(D1,age!=14),method="ML") > themod=stepAIC(lmemod,dir="both") > prs=predict(themod,newdata=subset(D1,age==14)) > obs<-subset(D1,age==14)$distance > print(mean(obs-prs)) > } > > Using this function with D1=Orthodont gives: > >> PredRes(Orthodont) > Start: AIC=345.12 > distance ~ age * Sex > > Error in subset(D1, age != 14) : object "D1" not found > > The code works when I take Orthodont in directly: > >> lmemod=lme(distance~age*Sex, random=~1|Subject, data=subset(Orthodont,age!=14),method="ML") >> themod=stepAIC(lmemod,dir="both") > Start: AIC=345.12 > distance ~ age * Sex > > Df AIC > - age:Sex 1 344.49 > <none> 345.12 > > Step: AIC=344.49 > distance ~ age + Sex > > Df AIC > <none> 344.49 > + age:Sex 1 345.12 > - Sex 1 348.70 > - age 1 371.77 >> prs=predict(themod,newdata=subset(Orthodont,age==14)) >> obs<-subset(Orthodont,age==14)$distance >> print(mean(obs-prs)) > [1] 0.2962963 > > How can I make this code work with dataset av input variable in a function? > > I'm using R version: > > Version: > platform = x86_64-unknown-linux-gnu > arch = x86_64 > os = linux-gnu > system = x86_64, linux-gnu > status > major = 2 > minor = 6.0 > year = 2007 > month = 10 > day = 03 > svn rev = 43063 > language = R > version.string = R version 2.6.0 (2007-10-03) > > Locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C > > Search Path: > .GlobalEnv, package:MASS, package:nlme, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base > > By the way the R version 2.7.0 Patched (2008-05-08 r45647) gives the same error message. > > -- > Regards > Jorunn Slagstad > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
DAVID ARTETA GARCIA
2008-May-13 12:16 UTC
[R] upperbound of C index Conf.int. greater than 1
R-users,
I am bootstrapping the C Index of a model created using lrm{Design}
and boot{boot}, and I get that the upperbound of the confidence
interval is greater than 1. Here is my code:
> library(HSAUR)
> data(plasma)
##fit model> fit.design <- lrm (ESR ~ fibrinogen + globulin,data=plasma)
> fit.design$stats[6]
C
0.8044872
##bootstrap C Index
> cindex <- function(formula,data,indices){
+ d=data[indices,]
+ fit<-lrm(formula,data = d)
+ return(fit$stats[[6]])
+ }
> results <- boot(data=w,statistic=cindex,R=500,formula = ESR ~
> fibrinogen + globulin)
> results
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = plasma, statistic = cindex, R = 500, formula = ESR ~ fibrinogen +
globulin)
Bootstrap Statistics :
original bias std. error
t1* 0.8044872 0.008834767 0.1574710> boot.ci(results,type="basic")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 500 bootstrap replicates
CALL :
boot.ci(boot.out = results, type = "basic")
Intervals :
Level Basic
95% ( 0.6090, 1.1443 )
Calculations and Intervals on Original Scale
I see that the std.error is rather large and this might be the
problem, but how can I explain this for publication purposes? Is such
an interval acceptable?
Any help would be greatly appreciated
David