Gad Abraham
2008-Apr-17 05:24 UTC
[R] Error in Design package: dataset not found for options(datadist)
Hi, Design isn't strictly an R base package, but maybe someone can explain the following. When lrm is called within a function, it can't find the dataset dd: > library(Design) > age <- rnorm(30, 50, 10) > cholesterol <- rnorm(30, 200, 25) > ch <- cut2(cholesterol, g=5, levels.mean=TRUE) > fit <- function(ch, age) + { + d <- data.frame(ch, age) + dd <- datadist(d) + options(datadist="dd") + lrm(ch ~ age, data=d, x=TRUE, y=TRUE) + } > fit(ch, age) Error in Design(eval(m, sys.parent())) : dataset dd not found for options(datadist=) It works outside a function: > d <- data.frame(ch, age) > dd <- datadist(d) > options(datadist="dd") > l <- lrm(ch ~ age, data=d, x=TRUE, y=TRUE) Thanks, Gad > sessionInfo() R version 2.6.2 (2008-02-08) x86_64-pc-linux-gnu ... attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Design_2.1-1 survival_2.34 Hmisc_3.4-3 loaded via a namespace (and not attached): [1] cluster_1.11.9 grid_2.6.2 lattice_0.17-4 rcompgen_0.1-17 -- Gad Abraham Dept. CSSE and NICTA The University of Melbourne Parkville 3010, Victoria, Australia email: gabraham at csse.unimelb.edu.au web: http://www.csse.unimelb.edu.au/~gabraham
Frank E Harrell Jr
2008-Apr-17 13:03 UTC
[R] Error in Design package: dataset not found for options(datadist)
Gad Abraham wrote:> Hi, > > Design isn't strictly an R base package, but maybe someone can explain > the following. > > When lrm is called within a function, it can't find the dataset dd: > > > library(Design) > > age <- rnorm(30, 50, 10) > > cholesterol <- rnorm(30, 200, 25) > > ch <- cut2(cholesterol, g=5, levels.mean=TRUE) > > fit <- function(ch, age) > + { > + d <- data.frame(ch, age) > + dd <- datadist(d) > + options(datadist="dd") > + lrm(ch ~ age, data=d, x=TRUE, y=TRUE) > + } > > fit(ch, age) > Error in Design(eval(m, sys.parent())) : > dataset dd not found for options(datadist=) > > It works outside a function: > > d <- data.frame(ch, age) > > dd <- datadist(d) > > options(datadist="dd") > > l <- lrm(ch ~ age, data=d, x=TRUE, y=TRUE) > > > Thanks, > GadMy guess is that you'll need to put dd in the global environment, not in fit's environment. At any rate it is inefficient to call datadist every time. Why not call it once for the whole data frame containing all the predictors, at the top of the program? Also it is inefficient to chop continuous variables. You can use the proportional odds model with continuous ch as a response variable although it will be slow if ch has more than, say, 100 unique values. Frank> > > > sessionInfo() > R version 2.6.2 (2008-02-08) > x86_64-pc-linux-gnu > > ... > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Design_2.1-1 survival_2.34 Hmisc_3.4-3 > > loaded via a namespace (and not attached): > [1] cluster_1.11.9 grid_2.6.2 lattice_0.17-4 rcompgen_0.1-17 >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University