Jim Lindsey writes:
> My problem with .RData almost a year ago is very similar to
> that of attach in functions.
Somehow my programmer's instinct tells me never to use attach in
functions (other than in extremis, of course) because it is such
a blatant side-effect operation. In my view the search path is a
globally visible hierarchy and operators that affect it are best
used only at the interactive level.
In similar vein I am strongly in favour of actually banning the
superassignment operator, "<<-". Such operations should be so
rare and unusual that people should be forced to use assign();
having a special syntactic operator for the job only encourages
terrible programming practice. I once saw a suite of functions
written by someone who should have known better where all the
assignment operators had been systematically replaced by "<<-".
The explanation was "that was the only way I could make sure that
the variables I created inside the function were not discarded"!
There are some cases when you do want to create something that is
globally visible, but for the duration of the current expression
only. This is what the S structure known as "frame 1" is for. I
have yet to discover how to access this in R (or if it is
possible at all) but it can be a useful device, even if the
practice of using it to story temporary global variables is not
free of the criticism that it, too, can be a dangerous
side-effect. Most powerful things are at the same time
dangerous.
[Looking at the way the "S virus" is mutating, it is possible to
see how before long attach and detach may not be necessary. The
inclusion of a data= argument in the fitting and Trellis graphics
functions, for example, usually obviate the need. In time it may
be possible for most functions to have an argument that specifies
a temporary addition to the top of the search path as the
preferential source of variables.
(I make no apologies for the S virus, by the way. I regard it as
an evolving system and a vehicle for new technology. "If I want
SAS I know where to find it" with apologies to Dennis Ritchie.)]
> If I remember correctly (I have not seen a .RData for a long
> time) suppose that a .RData is loaded and contains a variable
> called y that you have forgotten about (more probably you
> don't even notice that the .RData was loaded).
Good working principle/practice #2: Keep your working directories
clean and free of temporaries no longer needed.
Good working principle/practice #3: Use the file system
intelligently. Use different working directories for different
jobs and do not do everything in one big working "R" directory.
> You create a dataframe (say read.table) containing a column
> labelled y, planning to analyze the data. It is invisible
> because attached behind the y from .RData and you unknowingly
> produce a completely erroneous statistical analysis for your
> client or for publication.
Exactly. This is why, in my view, not .RData, but attach()
should be on the way out. Using functions with a data= argument
to specify a preferential source for variables before all others
completely overcomes this trap.
> (With luck, the two y's have incompatible lengths and a
> warning about vectors not being multiples of each other's
> lengths will be produced - when I had the problem, this
> warning was not yet available.)
I agree that safe data analysis should not be a matter of luck!
> .RData is banned on our site.
This is simply saying you are against permanent storage of
objects. I can see some reason for that if disc space is scarce,
but not much otherwise. In time you will be forced to re-invent
it, like most lessons of history. (Just a speculative
prediction, Jim, I could be wrong....)
> I do not think that functions should produce side effects
> (except the few well-known ones). Attach should be local to a
> function and the column names should not be hidden by objects
> outside its scope.
Who would disagree, but I interpret this as an argument against
attach in most cases, and attach inside functions entirely. Some
things are for interactive use only, others primarily for functions.
> For me, this question is now rather academic because I copy
> the columns of the dataframe in the function instead of
> attaching (and .RData never appears).
For me it is rather academic, too, but for quite a different
reason. I never use attach in functions. Where security and
integrity are primary I ensure that the evaluation frame for the
expression is fixed and well specified. This means either using
functions that allow a data= argument where possible, making sure
that all variables needed are in that data frame (or list), or
using eval() inside functions to achieve a similar effect.
(I am puzzled why so many people seem to regard eval as in some
way arcane or obscure, by the way. They use it subliminally
literally all the time.)
It would be useful if R had some of the S-PLUS utility functions
such as find() to locate where a visible object is currently held
on the search path. exists() is there, so find() is not much of
an extension. Also, why do functions like objects() and attach()
have a different argument sequence from their cousins in S? I am
also puzzled why frames 0 and 1 are apparently not there, and why
an object (apparently) cannot be attached at the top of the
search path (as it can in S-PLUS) but must go at what would be
called `where=2' at best. Of course I could be wrong about these
matters as I am more of an S person than an R person, I do admit!
--
Bill Venables, Head, Dept of Statistics, Tel.: +61 8 8303 5418
University of Adelaide, Fax.: +61 8 8303 3696
South AUSTRALIA. 5005. Email: Bill.Venables at adelaide.edu.au
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._