Ross Boylan
2007-Jan-16 22:03 UTC
[Rd] Problems with checking documentation vs data, and a proposal
I have a single data file inputs.RData that contains 3 objects. I generated an Rd page for each object using prompt(). When I run R CMD check I get * checking for code/documentation mismatches ... WARNING Warning in utils::data(list = al, envir = data_env) : data set 'gold' not found (gold is one of the objects). This appears to be coming from the codocData function defined in src/library/tools/R/QC.R (this is in the Debianised source 2.4.1, so the path might be a little different). According to the help on this function, it will only attempt a match when there is a single alias in the documentation file, although I'm not sure that's what the code does (it seems to check only if there is more than one format section). At any rate, the central logic appears to gather up names of data objects and then to load them with ## Try loading the data set into data_env. utils::data(list = al, envir = data_env) if(exists(al, envir = data_env, mode = "list", inherits = FALSE)) { al <- get(al, envir = data_env, mode = "list") } Since there is no gold.RData, this is failing. This leads to 2 issues: what should I do now, and how might this work better in the future. Taking the future first, how about having the code first load all the data files that it finds somewhere near the beginning? If it did so, the code ## Try finding the variable or data set given by the alias. al <- aliases[i] if(exists(al, envir = code_env, mode = "list", inherits = FALSE)) { which precedes the earlier snippet, would find the symbol was defined and be happy. I suppose the data could be loaded into code_env, although using it seems to risk deciding that a data symbol is defined when the symbol refers to a code object. I'm not sure if attempting to load the data objects individually should still be attempted under this scenario, if the symbol is not already present. What can I do in the short run, particularly since I would like to have the code pass R CMD check with versions of R that don't include this possible enhancement, what can I do? I see several options, none of them beautiful: 1) Delete inputs.RData and create 3 separate data objects. However, I have code that relies on inputs being present, and the 3 data items go together naturally. 2) Make a single document describing inputs.RData. First problem: the page would be awkward combining all 3 things. Second, it looks as if codocData might still try loading the individual data objects, since it tries to pull data names out of the documentation, even out of individual item inside \describe. 3) Attempt to disable the checks by adding multiple aliases or something else to be revealed by closer inspection of the code. This is a hack that bypasses the checking altogether (unless it turns out I still get a complaint about missing documentation). 4) Create gold.RData and others as symlinks to inputs.RData. Fragile across operating systems, version control systems, and versions of tar. Might get errors about multiple data definitions. Usual caveats: this is all based on my imperfect understanding of the code. So, any comments on the possible modification to codocData or the work-arounds? -- Ross Boylan wk: (415) 514-8146 185 Berry St #5700 ross at biostat.ucsf.edu Dept of Epidemiology and Biostatistics fax: (415) 514-8150 University of California, San Francisco San Francisco, CA 94107-1739 hm: (415) 550-1062
Ross Boylan
2007-Jan-16 22:38 UTC
[Rd] Problems with checking documentation vs data, and a proposal
On Tue, 2007-01-16 at 14:03 -0800, Ross Boylan wrote:> I have a single data file inputs.RData that contains 3 objects. I > generated an Rd page for each object using prompt(). > When I run R CMD check I get > * checking for code/documentation mismatches ... WARNING > Warning in utils::data(list = al, envir = data_env) : > data set 'gold' not found > (gold is one of the objects)......> What can I do in the short run, particularly since I would like to have > the code pass R CMD check with versions of R that don't include this > possible enhancement, what can I do? I see several options, none of > them beautiful:...> 4) Create gold.RData and others as symlinks to inputs.RData. Fragile > across operating systems, version control systems, and versions of tar. > Might get errors about multiple data definitions.Option 4 worked, though the symlinks were converted to regular files by R CMD check.