Hi, I just encountered a problem in R that may easily be fixed: If one uses attach for a data.frame e.g. 10000 times and forgets detach, then R gets incredibly slow (less then 10% of the original speed). My system: platform powerpc-apple-darwin6.0 arch powerpc os darwin6.0 system powerpc, darwin6.0 status major 1 minor 6.1 year 2002 month 11 day 01 language R Kind regards, Andreas Eckner
Andreas Eckner <andreas.eckner at soundinvest.net> writes:> Hi, > > I just encountered a problem in R that may easily be fixed: If one uses > attach for a data.frame e.g. 10000 times and forgets detach, then R gets > incredibly slow (less then 10% of the original speed).R also gets incredibly slow if you create 10000 copies of your data set, which is effectively the same thing! The fix is: Don't do that... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Peter is right, but there is a point here: it would be nice if attach(my.dframe) would do nothing - or at least warn - if my.dframe was already in the search list. Like library(). Attaching twice is almost bound to be an error? Of course in many circumstances it might be better to use with() than attach() - if you haven't come across with(), it works like attach() with an inbuilt detach() when the parentheses close. Aside 1: it would also be nice if ?attach pointed to ?with. Is this kind of suggestion best sent to r-help or r-devel? Aside 2: is with() efficient, or does it create a copy of the dataset? SF -----Original Message----- From: Peter Dalgaard BSA [mailto:p.dalgaard at biostat.ku.dk] Sent: 04 August 2003 00:52 To: Andreas Eckner Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Problem with data.frames Security Warning: If you are not sure an attachment is safe to open please contact Andy on x234. There are 0 attachments with this message. ________________________________________________________________ Andreas Eckner <andreas.eckner at soundinvest.net> writes:> Hi, > > I just encountered a problem in R that may easily be fixed: If oneuses> attach for a data.frame e.g. 10000 times and forgets detach, then Rgets> incredibly slow (less then 10% of the original speed).R also gets incredibly slow if you create 10000 copies of your data set, which is effectively the same thing! The fix is: Don't do that... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}
What I meant by "like library()" is that this function ensures that a library (or should I say package?) does not get loaded twice / duplicated in the search path. Of course attach() is not the same as library(). Nor is it the same as with() - I just thought it might be useful if the help pages pointed to each other. Because I have programmed for years using attach() and detach() and only very recently discovered with() (which is nearly always what I want since I work mostly with scripts).> So how can you tell if the data frame is on the search list"my.dframe" %in% search() ? Does anyone have an example of a case where attaching the same data frame twice would be useful? Thanks Prof R. for (another) reminder to read the code - I'm always forgetting to look there. RTFC, Simon! SF -----Original Message----- From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Sent: 04 August 2003 12:06 To: Simon Fear Cc: Peter Dalgaard BSA; Andreas Eckner; r-help at stat.math.ethz.ch Subject: RE: [R] Problem with data.frames On Mon, 4 Aug 2003, Simon Fear wrote:> Peter is right, but there is a point here: it would be nice if > attach(my.dframe) would do nothing - or at least warn - if my.dframewas> already in the search list. Like library(). Attaching twice is almostIt is not like library(). attach attaches a copy of the data frame, and it can be altered subsequently. So how can you tell if the data frame is on the search list. I suspect it is quite common to create an object called `tmp' and put it on the search list for a while. library() assumes that you have not reinstalled a loaded package (and if you have you would get inconsistent results).> bound > to be an error?Not at all.> Of course in many circumstances it might be better to use with() than > attach() - if you haven't come across with(), it works like attach() > with an > inbuilt detach() when the parentheses close. > > Aside 1: it would also be nice if ?attach pointed to ?with. Is thiskind> of > suggestion best sent to r-help or r-devel?I don't see they do the same job, especially not in an interactive session (which is when attach() is most usefully used in this way).> Aside 2: is with() efficient, or does it create a copy of the dataset?It does not create a copy, as you can see from the code (it uses a three-argument eval). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}
Dear Simon,
A solution that skirts the issue is to introduce a new function -- something
like
Attach <- function (what, pos=2, name=deparse(substitute(what))) {
detach(pos = match(name, search()))
attach(get(name, envir=.GlobalEnv), pos=pos, name=name)
}
Then, when the previously attached version of the object is different from the
current one, the previous version is detached. I think that this is the
behaviour that would usually be desired, though there might be occasions where
you want to preserve the previous version.
Regards,
John
----------------------------------------------------
John Fox
Department of Sociolgy
McMaster University
http://socserv.mcmaster.ca/jfox
----------------------------------------------------
> ------------Original Message-------------
> From: "Simon Fear" <Simon.Fear at synequanon.com>
> To: "Prof Brian Ripley" <ripley at stats.ox.ac.uk>
> Cc: Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>, r-help at
stat.math.ethz.ch, Andreas Eckner <andreas.eckner at soundinvest.net>
> Date: Mon, Aug-4-2003 8:04 AM
> Subject: RE: [R] Problem with data.frames
>
> What I meant by "like library()" is that this function ensures
that a
> library
> (or should I say package?) does not get loaded twice / duplicated in the
> search path. Of course attach() is not the same as library(). Nor is it
> the
> same as with() - I just thought it might be useful if the help pages
> pointed
> to each other. Because I have programmed for years using attach() and
> detach() and only very recently discovered with() (which is nearly
> always
> what I want since I work mostly with scripts).
>
> > So how can you tell if the data frame is on the search list
>
> "my.dframe" %in% search() ?
>
> Does anyone have an example of a case where attaching the same data
> frame
> twice would be useful?
>
> Thanks Prof R. for (another) reminder to read the code - I'm always
> forgetting to look there. RTFC, Simon!
>
> SF
>
> -----Original Message-----
> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> Sent: 04 August 2003 12:06
> To: Simon Fear
> Cc: Peter Dalgaard BSA; Andreas Eckner; r-help at stat.math.ethz.ch
> Subject: RE: [R] Problem with data.frames
>
>
> On Mon, 4 Aug 2003, Simon Fear wrote:
>
> > Peter is right, but there is a point here: it would be nice if
> > attach(my.dframe) would do nothing - or at least warn - if my.dframe
> was
> > already in the search list. Like library(). Attaching twice is almost
>
> It is not like library(). attach attaches a copy of the data frame, and
> it can be altered subsequently. So how can you tell if the data frame is
> on the search list. I suspect it is quite common to create an object
> called `tmp' and put it on the search list for a while. library()
> assumes
> that you have not reinstalled a loaded package (and if you have you
> would
> get inconsistent results).
>
> > bound
> > to be an error?
>
> Not at all.
>
> > Of course in many circumstances it might be better to use with() than
> > attach() - if you haven't come across with(), it works like
attach()
> > with an
> > inbuilt detach() when the parentheses close.
> >
> > Aside 1: it would also be nice if ?attach pointed to ?with. Is this
> kind
> > of
> > suggestion best sent to r-help or r-devel?
>
> I don't see they do the same job, especially not in an interactive
> session (which is when attach() is most usefully used in this way).
>
> > Aside 2: is with() efficient, or does it create a copy of the dataset?
> It does not create a copy, as you can see from the code (it uses a
> three-argument eval).
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
>
> Simon Fear
> Senior Statistician
> Syne qua non Ltd
> Tel: +44 (0) 1379 644449
> Fax: +44 (0) 1379 644445
> email: Simon.Fear at synequanon.com
> web: http://www.synequanon.com
>
> Number of attachments included with this message: 0
>
> This message (and any associated files) is confidential and\...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>
Peter, Brian, and list,
sorry if I was not clear (and obviously I wasn't), I do fully accept
that
different data frames *can* be given the same names. Then you'd have to
access the first one created with an explicit get() to some specific
point in
the search list - or explicitly placed there using pos= - if indeed you
could
remember which one you wanted.
I think my point is that duplicating names strikes me as horrendous
programming style - why on earth would I ever need to have two
deliberately
different objects coexisting with the same name? Or the same object,
with
the same name, more than once?
Maybe I am missing the point in that I don't really use R interactively,
I
always source() or copy and paste to the prompt, so I probably think of
debugging structured programs more than some users.
If I can't convince everyone that this type of name duplication is best
treated as an error, surely a warning would be good? At least an
optional
argument ...
In Peter D's example, the second invocation of f() should presumably
create a
"new" d? That would suggest TWO new arguments, replace= and warn=.
Currently
both false. And Brian R. likes to have several tmp's. I just can't see
why.
(Well, actually, I can see that it saves the bother of detaching(), and
in an
interactive session with nobody looking over your shoulder, who cares?)
But
if I need more than one temporary dataframe I call them tmp1, tmp2, etc.
I
just don't like the idea of a load of old tmp's filling up my precious
memory
space and then not remembering which one was which. Surely it's a bug
waiting
to pounce (not sure about that metaphor, sorry).
SF
PS Before anyone tells me: name duplication is intentional and very
handy in
R - when writing and nesting functions. But then it is different.
Lexical
scope ensures that the most recently defined instance is used when the
name
is used (unless you do something psychopathic, such as use an explicit
assign
outside the current frame). But these local instances disappear when the
function closes. This is not the case for variables deliberately placed
in
the search path, which are meant to be "global", and in frame 1 (I
think. I'm
getting out of my depth here - spent too long using Splus).
In Peter D's example you have to be very careful that you don't assign
"x" in
.GlobalEnv else it will mask the one you probably wanted, which is d$x
or
get("x",pos=2) depending how many times you've called f().
I'm beginning to go off the idea of attach()ing dataframes altogether,
the
more I think about it.
Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 644449
Fax: +44 (0) 1379 644445
email: Simon.Fear at synequanon.com
web: http://www.synequanon.com
Number of attachments included with this message: 0
This message (and any associated files) is confidential and\...{{dropped}}
You believed right - I do indeed lack the vision of others to foresee
any
practical need to attach and detach within a recursive function, except
perhaps for my own amusement. Should I do the recursive call in between
the
attach and detach, I wonder? Wouldn't it all be fun, watching that
search
path grow and grow, before the function eventually hit the first detach!
And
just imagine the hilarity you could have, attaching and detaching at
different positions! I could have a thousand identical data sets, all
called
Eric!!!
It's one of the many situations in which I would very much like to get a
warning or error message, pointing out to me that I had absolutely no
idea
what I was doing.
Surely that's what warnings are for? For those of us who wonder why our
code
doesn't do what we think it should, until a long time after the
deadline?
SF
-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
Sent: 04 August 2003 14:34
To: Simon Fear
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] Problem with data.frames
Security Warning:
If you are not sure an attachment is safe to open please contact
Andy on x234. There are 0 attachments with this message.
________________________________________________________________
Think about what happens if you call a function recursively, e.g. by
Recall(), and that function includes an attach/detach pair.
I believe it is your vision that is too limited, not other people's.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 644449
Fax: +44 (0) 1379 644445
email: Simon.Fear at synequanon.com
web: http://www.synequanon.com
Number of attachments included with this message: 0
This message (and any associated files) is confidential and\...{{dropped}}