First, a big thanks to all of the developers and users that have worked to
make R such useful software. It is only because I find the software so useful
that I have the following opinions.
A recent post to R-devel listed the 'Top 10 Features' for one person. I
found
it to be quite an interesting read. Over the past couple of years I have
assembled my own lists.
Retrospective. Some of my favorite things I like about R (vs. S-Plus)
1. Integration with emacs
2. Nice color handling
3. Wealth of packages, easy package updates
4. HTML help
5. More answers on R-news than S-help
6. Active developer community
7. Package creation tools
8. Functions: setwd, with, apropos
Prospective. Periodically David Smith at Insightful asks users, "If you
had
$100, how would you allocate that money to development?" Without listing
dollar amounts, these are my personal choices for R.
1. Add "head" and "tail" to R base.
Patrick Burns has these: http://www.burns-stat.com/pages/public.html#genutil
Very handy functions for checking data manipulation.
2. Strive for self-contained examples in all .Rd files (as far as possible).
Generally quite good, but there's always room for improvement.
For R base, If I create examples, to whom should I send them (R-devel?) and
how (request for change?).
Here's one example (by P. Dalgaard) for function 'replace'
# Replace in a data frame NA?s with -1?
dd <- data.frame(a=c(1,2,NA,4),b=c(NA,2,3,4))
dd[] <- lapply(dd,function(x) replace(x, is.na(x), -1))
3. Encourage (more) standards for function names.
A prominent link on CRAN to the coding conventions would be good.
Here is a draft of coding conventions:
http://www.maths.lth.se/help/R/RCC/
Partly as a result of the community development of R, the names of
functions lack consistency. Consider the following examples:
row.names, rownames
browseURL, contrib.url, fixup.package.URLs
package.contents, packageStatus
mahalanobis, TukeyHSD
getMethod, getS3method
The sooner that conventions are encouraged, the more consistent future
function names will be.
4. Increased integration of text and graphics output (for PDF, in particular).
Sweave is fantastic for quality reporting, but can be a lot of work
when a quick analysis is all that is needed.
Often I would like to do something like print a box plot and include an
anova table, for example:
pdf("file")
boxplot(y~x)
frame()
sink.to.pdf()
frame()
anova(lm(y~x))
sink()
dev.off()
I know of no such (simple) tools. Ben Bolker has a an idea here:
http://maths.newcastle.edu.au/~rking/R/help/02b/4179.html
5. Drop unused factor levels by default. (At least as a settable option.)
This issue has been debated before--I'm just adding my vote and
justification.
The proportion of time I want data to include unused factor levels is close
to zero. The amount of time I spend cleaning data to get rid of unused
factor levels is quite substantial.
6. Expanded font control for graphics devices.
This is already being considered, so again I'm just adding my vote. See:
http://www.stat.auckland.ac.nz/~paul/R/fonts.html
7. Clean up namespace implementation
The introduction of namespaces has (for me) been a nuisance without
any benefits that I am aware of. I speak as a user, not a package
maintainer.
I would like to see (1) more education about namespace benefits, (2) more
discussions about what is the appropriate role for namespaces and (3)
improvements to the documentation, which is now often less correct (if not
broken) due to namespaces. For example, help(is.function) doesn't say
how
functions hidden behind namespaces will be treated. Most help files
completely ignore issues with namespaces.
Some people will say, "of course namespaces are working exactly as
expected"! But that is only true if you expect functions to be
hidden...quite a few versions of R trained users otherwise.
The quiet introduction of namespaces has broken my modus operandi for:
args(predict.lme)
is.function(predict.lme)
predict.lme(object)
exists("predict.lme")
Namespaces may be neat/right from a language-design perspective, but have
made it more frustrating for me to actually use the software.
8. More consistency in the use of na.action and na.rm.
Compare: mean(..., na.rm= ...) lme(..., na.action=... )
Maybe na.action could be added to 'mean' and other functions.
There are issues of compatability with S-Plus here...
9. Add 'substitute' to getAnywhere
Acutally, the code for getAnywhere already contains 'substitute', so
it
looks like the author intended for the function to work without a quoted
argument. That would be wonderful. Then why does
getAnywhere("predict.lme") work but getAnywhere(predict.lme) does
not work?
(Yet another namespace issue)
I'm not the first person to ask this question. Obviously I'm
a member of the "blind" population that can't read help files:
http://maths.newcastle.edu.au/~rking/R/help/03b/0760.html
Another possibility is that the help file could be clearer for us blind
folk that interpret "x: a character string or name" to mean that
x might not be a character string. (See the help page)
10. More uniformity in quoting arguments.
Uniformity outweighs cleverness/exceptions ("The Art of Unix
Programming").
Functions accepting non-quoted arguments
is.function(obj)
args(predict)
rm(a)
help(help)
find(replace) or find("replace")
Functions requiring quoted arguments
get("help")
exists("predict.lme")
Some people have claimed "the designers of S knew what they were
doing" because you can do clever things like this:
i="help"
exists(i)
But we could just as easily be doing other clever things
and have more uniform quoting rules. S is probably too mature for this to
really be considered.
11. Have 'aggregate' add logical/default names to its value.
I'm basically echoing this thread:
http://maths.newcastle.edu.au/~rking/R/help/03b/7517.html
Using aggregate(x,by,FUN), I would find it very useful if the factor names
in
the "by" list carried through to the final aggregate data.frame.
Also,
when 'x' is a vector (and maybe for other data structures), it would
be
nice to have the original names included in the result.
12. Wanted: General-purpose mixed-models function/package
The nlme library is very nice for mixed-effects models with nested
effects, but it is not very general-purpose. Even Bates/Pinheiro have said
several times in posts to R-help/S-news that nlme was designed for nested
models and using other models can be hard.
Bates: "highly unintuitive" (crossed effects model)
Bates: "algorithms for lme are tuned for nested random effects"
For example, in nlme,
The syntax for crossed random effects is quite intimidating
Try removing the variance component for Rep in: random=~1|Rep/WholePlot.
Try changing an nested effect from random to fixed (or vice-versa).
Try to extract lsmeans for fixed-effects in a model.
Try to do a multiple-comparison of fixed-effects estimates.
Try using AR1xAR1 error structure. The nlme library appears to have
tools for this, but again is syntactically difficult. I can find no
examples.
Most of these tasks would ideally be straightforward in a general-purpose
mixed-models function (as they are in SAS, Genstat, etc.)
The ASREML software is available in S-Plus (and soon R, I'm told) via
the proprietary 'samm' library. Whereas lme seems excellent for
basic
nested-effects models and difficult for other models, samm excels at
crossed-effects models, but doesn't have the plethora of useful
print, plot, extractor, and summary methods that are found in nlme.
13. The fantasy list. Go ahead and tell me, "In your dreams!"
Deprecate 'update'. Cute, but makes session transcripts hard to read.
Remove implicit intercepts in models. Require y~1+x. Force thinking about
intercepts.
Lattice colors could be more saturated for printing and projecting
Rename 'prompt' to something closer to its purpose like makeSkeletonHelp
The humble opinion of one devoted user,
Kevin Wright