thr3ads.net - R devel - [Rd] Another wishlist for R [Jan 2004]

If this information is useful, please help other people find it:
Share via:

Kevin Wright

2004-Jan-16 18:45 UTC

[Rd] Another wishlist for R

First, a big thanks to all of the developers and users that have worked to
make R such useful software.  It is only because I find the software so useful
that I have the following opinions.

A recent post to R-devel listed the 'Top 10 Features' for one person.  I
found
it to be quite an interesting read.  Over the past couple of years I have
assembled my own lists.

Retrospective.  Some of my favorite things I like about R (vs. S-Plus)

1. Integration with emacs
2. Nice color handling
3. Wealth of packages, easy package updates
4. HTML help
5. More answers on R-news than S-help
6. Active developer community
7. Package creation tools
8. Functions: setwd, with, apropos

Prospective.  Periodically David Smith at Insightful asks users, "If you
had
$100, how would you allocate that money to development?"  Without listing
dollar amounts, these are my personal choices for R.

1. Add "head" and "tail" to R base.  
   Patrick Burns has these: http://www.burns-stat.com/pages/public.html#genutil
   Very handy functions for checking data manipulation.

2. Strive for self-contained examples in all .Rd files (as far as possible).
   Generally quite good, but there's always room for improvement.
   For R base, If I create examples, to whom should I send them (R-devel?) and
   how (request for change?).
   Here's one example (by P. Dalgaard) for function 'replace'
     # Replace in a data frame NA?s with -1? 
     dd <- data.frame(a=c(1,2,NA,4),b=c(NA,2,3,4)) 
     dd[] <- lapply(dd,function(x) replace(x, is.na(x), -1)) 

3. Encourage (more) standards for function names.  
   A prominent link on CRAN to the coding conventions would be good.
   Here is a draft of coding conventions:
     http://www.maths.lth.se/help/R/RCC/
   Partly as a result of the community development of R, the names of
   functions lack consistency.  Consider the following examples: 
     row.names, rownames
     browseURL, contrib.url, fixup.package.URLs
     package.contents, packageStatus
     mahalanobis, TukeyHSD
     getMethod, getS3method
   The sooner that conventions are encouraged, the more consistent future
   function names will be.

4. Increased integration of text and graphics output (for PDF, in particular).
   Sweave is fantastic for quality reporting, but can be a lot of work
   when a quick analysis is all that is needed.
   Often I would like to do something like print a box plot and include an
   anova table, for example: 
     pdf("file")
     boxplot(y~x)
     frame()
     sink.to.pdf()
     frame()
     anova(lm(y~x))
     sink()
     dev.off()
   I know of no such (simple) tools.  Ben Bolker has a an idea here: 
     http://maths.newcastle.edu.au/~rking/R/help/02b/4179.html

5. Drop unused factor levels by default.  (At least as a settable option.) 
   This issue has been debated before--I'm just adding my vote and
justification.
   The proportion of time I want data to include unused factor levels is close
   to zero.  The amount of time I spend cleaning data to get rid of unused
   factor levels is quite substantial.

6. Expanded font control for graphics devices.
   This is already being considered, so again I'm just adding my vote.  See:
   http://www.stat.auckland.ac.nz/~paul/R/fonts.html
   
7. Clean up namespace implementation
   The introduction of namespaces has (for me) been a nuisance without
   any benefits that I am aware of.  I speak as a user, not a package
maintainer.
   I would like to see (1) more education about namespace benefits, (2) more
   discussions about what is the appropriate role for namespaces and (3) 
   improvements to the documentation, which is now often less correct (if not
   broken) due to namespaces.  For example, help(is.function) doesn't say
how
   functions hidden behind namespaces will be treated.  Most help files
   completely ignore issues with namespaces.  
   Some people will say, "of course namespaces are working exactly as
   expected"!  But that is only true if you expect functions to be 
   hidden...quite a few versions of R trained users otherwise.
   The quiet introduction of namespaces has broken my modus operandi for:
     args(predict.lme)
     is.function(predict.lme)
     predict.lme(object)
     exists("predict.lme")
   Namespaces may be neat/right from a language-design perspective, but have
   made it more frustrating for me to actually use the software. 

8. More consistency in the use of na.action and na.rm.
   Compare: mean(..., na.rm= ...) lme(..., na.action=... )
   Maybe na.action could be added to 'mean' and other functions.
   There are issues of compatability with S-Plus here...

9. Add 'substitute' to getAnywhere 
   Acutally, the code for getAnywhere already contains 'substitute', so
it
   looks like the author intended for the function to work without a quoted
   argument. That would be wonderful.  Then why does
   getAnywhere("predict.lme") work but getAnywhere(predict.lme) does
not work?
   (Yet another namespace issue)

   I'm not the first person to ask this question.  Obviously I'm
   a member of the "blind" population that can't read help files:
     http://maths.newcastle.edu.au/~rking/R/help/03b/0760.html
   Another possibility is that the help file could be clearer for us blind
   folk that interpret "x: a character string or name" to mean that
   x might not be a character string. (See the help page) 

10. More uniformity in quoting arguments.
    Uniformity outweighs cleverness/exceptions ("The Art of Unix
Programming").
    Functions accepting non-quoted arguments
      is.function(obj)
      args(predict)
      rm(a)
      help(help)
      find(replace) or find("replace")
    Functions requiring quoted arguments
      get("help")
      exists("predict.lme")       

    Some people have claimed "the designers of S knew what they were
    doing"  because you can do clever things like this:
        i="help"
        exists(i)   
    But we could just as easily be doing other clever things 
    and have more uniform quoting rules.  S is probably too mature for this to
    really be considered.

11. Have 'aggregate' add logical/default names to its value.
    I'm basically echoing this thread:
        http://maths.newcastle.edu.au/~rking/R/help/03b/7517.html
    Using aggregate(x,by,FUN), I would find it very useful if the factor names
in
    the "by" list carried through to the final aggregate data.frame. 
Also,
    when 'x' is a vector (and maybe for other data structures), it would
be
    nice to have the original names included in the result.

12. Wanted: General-purpose mixed-models function/package
    The nlme library is very nice for mixed-effects models with nested
    effects, but it is not very general-purpose.  Even Bates/Pinheiro have said
    several times in posts to R-help/S-news that nlme was designed for nested
    models and using other models can be hard.
      Bates: "highly unintuitive" (crossed effects model)
      Bates: "algorithms for lme are tuned for nested random effects"
    For example, in nlme,
      The syntax for crossed random effects is quite intimidating
      Try removing the variance component for Rep in: random=~1|Rep/WholePlot.
      Try changing an nested effect from random to fixed (or vice-versa).
      Try to extract lsmeans for fixed-effects in a model.
      Try to do a multiple-comparison of fixed-effects estimates.
      Try using AR1xAR1 error structure.  The nlme library appears to have 
        tools for this, but again is syntactically difficult.  I can find no
        examples.
    Most of these tasks would ideally be straightforward in a general-purpose
    mixed-models function (as they are in SAS, Genstat, etc.)

    The ASREML software is available in S-Plus (and soon R, I'm told) via
    the proprietary 'samm' library.  Whereas lme seems excellent for
basic
    nested-effects models and difficult for other models, samm excels at
    crossed-effects models, but doesn't have the plethora of useful 
    print, plot, extractor, and summary methods that are found in nlme.

13. The fantasy list.  Go ahead and tell me, "In your dreams!"
   
Deprecate 'update'.  Cute, but makes session transcripts hard to read.
Remove implicit intercepts in models.  Require y~1+x.  Force thinking about
intercepts.
Lattice colors could be more saturated for printing and projecting
Rename 'prompt' to something closer to its purpose like makeSkeletonHelp


The humble opinion of one devoted user,

Kevin Wright

Thomas Lumley

2004-Jan-16 19:11 UTC

head link

[Rd] Another wishlist for R

On Fri, 16 Jan 2004, Kevin Wright wrote:
>
> 10. More uniformity in quoting arguments.
>     Uniformity outweighs cleverness/exceptions ("The Art of Unix
Programming").
>     Functions accepting non-quoted arguments
>       is.function(obj)
>       args(predict)
>       rm(a)
>       help(help)
>       find(replace) or find("replace")
>     Functions requiring quoted arguments
>       get("help")
>       exists("predict.lme")
>
>     Some people have claimed "the designers of S knew what they were
>     doing"  because you can do clever things like this:
>         i="help"
>         exists(i)
>     But we could just as easily be doing other clever things
>     and have more uniform quoting rules.  S is probably too mature for this
to
>     really be considered.
>
This is actually happening -- gradually these functions are losing the
ability to take both quoted and unquoted arguments. However, it's not a
matter of making them all the same.

is.function() and args() really shouldn't take quoted arguments -- the
argument is an object, not on the name of the object.  This is in contrast
to rm(), help(), and find(), where the argument really is the name of the
object.  One motivation for persuading people to type ?lm instead of
help(lm) is that it might then be possible to force help() to take quoted
arguments.

There's a similar issue with expressions as arguments. A few functions
such as with(), quote(), substitute(), capture.output() correctly take
unquoted expressions as arguments.  Most functions will use only the value
of an unquoted expression.  A number of modelling functions, however, use
unquoted expressions without evaluating them, and shouldn't.  It's
probably too late to stop this for functions like glm(), but at least new
modelling functions could avoid it.  There are notes about this issue at
http://developer.r-project.org/nonstandard-eval.pdf

	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle

Douglas Bates

2004-Jan-16 20:18 UTC

head link

[Rd] Another wishlist for R

Kevin Wright <kwright@eskimo.com> writes:
> 12. Wanted: General-purpose mixed-models function/package
>     The nlme library is very nice for mixed-effects models with nested
>     effects, but it is not very general-purpose.  Even Bates/Pinheiro have
said
>     several times in posts to R-help/S-news that nlme was designed for
nested
>     models and using other models can be hard.
>       Bates: "highly unintuitive" (crossed effects model)
>       Bates: "algorithms for lme are tuned for nested random
effects"
>     For example, in nlme,
>       The syntax for crossed random effects is quite intimidating
>       Try removing the variance component for Rep in:
random=~1|Rep/WholePlot.
>       Try changing an nested effect from random to fixed (or vice-versa).
>       Try to extract lsmeans for fixed-effects in a model.
>       Try to do a multiple-comparison of fixed-effects estimates.
>       Try using AR1xAR1 error structure.  The nlme library appears to have 
>         tools for this, but again is syntactically difficult.  I can find
no
>         examples.
>     Most of these tasks would ideally be straightforward in a
general-purpose
>     mixed-models function (as they are in SAS, Genstat, etc.)
The crossed random effects problem is also in the process of being fixed.
Our recent work on computational methods for mixed-effects models
        http://www.stat.wisc.edu/~bates/reports/MixedComp.pdf
shows how to structure the calculations but it takes a long time to
get the code designed, implemented, debugged, debugged again, debugged
again, ..., documented, documented some more, documented some more,
debugged again, ...

I am hopeful that I will be able to come up with a single, unified
data structure, based on sparse matrices, that can be used for nested,
crossed, and partically crossed random effects.

At present I am doing a major redesign of the Matrix package to change
to S4 classes and methods and to incorporate sparse matrices.  Once
that is more-or-less stable (I expect a preliminary release by the end
of January) I will work on the implementation of the lme structures.
>     The ASREML software is available in S-Plus (and soon R, I'm told)
via
>     the proprietary 'samm' library.  Whereas lme seems excellent
for basic
>     nested-effects models and difficult for other models, samm excels at
>     crossed-effects models, but doesn't have the plethora of useful 
>     print, plot, extractor, and summary methods that are found in nlme.
It is interesting that ASREML will be available for R.

Warnes, Gregory R

2004-Jan-21 00:04 UTC

head link

[Rd] Another wishlist for R

> 1. Add "head" and "tail" to R base.  
>    Patrick Burns has these: 
> http://www.burns-stat.com/pages/public.html#genutil
>    Very handy functions for checking data manipulation.

How about we just add these to a package available on CRAN?  How about we
just make a package of the Burns Statistics functions?  There are already
.Rd files...
> 4. Increased integration of text and graphics output (for 
> PDF, in particular).
>    Sweave is fantastic for quality reporting, but can be a lot of work
>    when a quick analysis is all that is needed.
>    Often I would like to do something like print a box plot 
> and include an
>    anova table, for example: 
>      pdf("file")
>      boxplot(y~x)
>      frame()
>      sink.to.pdf()
>      frame()
>      anova(lm(y~x))
>      sink()
>      dev.off()
>    I know of no such (simple) tools.  Ben Bolker has a an idea here: 
>      http://maths.newcastle.edu.au/~rking/R/help/02b/4179.html
In gregmisc I have a function textplot which does much of what you want, and
I've just added (in CRAN incoming now) a new function sinkplot() which
accomplishes exactly what you've asked for.  Here is the example:

   set.seed(12456)
   x <- factor(sample( LETTERS[1:5], 50, replace=T))
   y <- rnorm(50, mean=as.numeric(x), sd=1)

   par(mfrow=c(1,2))
   boxplot(y~x, col="darkgreen")

   sinkplot()
   anova(lm(y~x))
   sinkplot("plot",col="darkgreen")

The only thing that isn't quite right is the default R font is
proportionally spaced so the matrix columns don't line up right.  I'll
need
change the default font.   [Are any of the Hershey fonts fixed width?]


-Greg


LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}

Seemingly Similar Threads

Search for more apparently analagous threads

R devel - Jan 2004 - Another wishlist for R

[Rd] Another wishlist for R

[Rd] Another wishlist for R

[Rd] Another wishlist for R

[Rd] Another wishlist for R

Seemingly Similar Threads