Hi, I just encountered a problem in R that may easily be fixed: If one uses attach for a data.frame e.g. 10000 times and forgets detach, then R gets incredibly slow (less then 10% of the original speed). My system: platform powerpc-apple-darwin6.0 arch powerpc os darwin6.0 system powerpc, darwin6.0 status major 1 minor 6.1 year 2002 month 11 day 01 language R Kind regards, Andreas Eckner
Andreas Eckner <andreas.eckner at soundinvest.net> writes:> Hi, > > I just encountered a problem in R that may easily be fixed: If one uses > attach for a data.frame e.g. 10000 times and forgets detach, then R gets > incredibly slow (less then 10% of the original speed).R also gets incredibly slow if you create 10000 copies of your data set, which is effectively the same thing! The fix is: Don't do that... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Peter is right, but there is a point here: it would be nice if attach(my.dframe) would do nothing - or at least warn - if my.dframe was already in the search list. Like library(). Attaching twice is almost bound to be an error? Of course in many circumstances it might be better to use with() than attach() - if you haven't come across with(), it works like attach() with an inbuilt detach() when the parentheses close. Aside 1: it would also be nice if ?attach pointed to ?with. Is this kind of suggestion best sent to r-help or r-devel? Aside 2: is with() efficient, or does it create a copy of the dataset? SF -----Original Message----- From: Peter Dalgaard BSA [mailto:p.dalgaard at biostat.ku.dk] Sent: 04 August 2003 00:52 To: Andreas Eckner Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Problem with data.frames Security Warning: If you are not sure an attachment is safe to open please contact Andy on x234. There are 0 attachments with this message. ________________________________________________________________ Andreas Eckner <andreas.eckner at soundinvest.net> writes:> Hi, > > I just encountered a problem in R that may easily be fixed: If oneuses> attach for a data.frame e.g. 10000 times and forgets detach, then Rgets> incredibly slow (less then 10% of the original speed).R also gets incredibly slow if you create 10000 copies of your data set, which is effectively the same thing! The fix is: Don't do that... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}
What I meant by "like library()" is that this function ensures that a library (or should I say package?) does not get loaded twice / duplicated in the search path. Of course attach() is not the same as library(). Nor is it the same as with() - I just thought it might be useful if the help pages pointed to each other. Because I have programmed for years using attach() and detach() and only very recently discovered with() (which is nearly always what I want since I work mostly with scripts).> So how can you tell if the data frame is on the search list"my.dframe" %in% search() ? Does anyone have an example of a case where attaching the same data frame twice would be useful? Thanks Prof R. for (another) reminder to read the code - I'm always forgetting to look there. RTFC, Simon! SF -----Original Message----- From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Sent: 04 August 2003 12:06 To: Simon Fear Cc: Peter Dalgaard BSA; Andreas Eckner; r-help at stat.math.ethz.ch Subject: RE: [R] Problem with data.frames On Mon, 4 Aug 2003, Simon Fear wrote:> Peter is right, but there is a point here: it would be nice if > attach(my.dframe) would do nothing - or at least warn - if my.dframewas> already in the search list. Like library(). Attaching twice is almostIt is not like library(). attach attaches a copy of the data frame, and it can be altered subsequently. So how can you tell if the data frame is on the search list. I suspect it is quite common to create an object called `tmp' and put it on the search list for a while. library() assumes that you have not reinstalled a loaded package (and if you have you would get inconsistent results).> bound > to be an error?Not at all.> Of course in many circumstances it might be better to use with() than > attach() - if you haven't come across with(), it works like attach() > with an > inbuilt detach() when the parentheses close. > > Aside 1: it would also be nice if ?attach pointed to ?with. Is thiskind> of > suggestion best sent to r-help or r-devel?I don't see they do the same job, especially not in an interactive session (which is when attach() is most usefully used in this way).> Aside 2: is with() efficient, or does it create a copy of the dataset?It does not create a copy, as you can see from the code (it uses a three-argument eval). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}
Dear Simon, A solution that skirts the issue is to introduce a new function -- something like Attach <- function (what, pos=2, name=deparse(substitute(what))) { detach(pos = match(name, search())) attach(get(name, envir=.GlobalEnv), pos=pos, name=name) } Then, when the previously attached version of the object is different from the current one, the previous version is detached. I think that this is the behaviour that would usually be desired, though there might be occasions where you want to preserve the previous version. Regards, John ---------------------------------------------------- John Fox Department of Sociolgy McMaster University http://socserv.mcmaster.ca/jfox ----------------------------------------------------> ------------Original Message------------- > From: "Simon Fear" <Simon.Fear at synequanon.com> > To: "Prof Brian Ripley" <ripley at stats.ox.ac.uk> > Cc: Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>, r-help at stat.math.ethz.ch, Andreas Eckner <andreas.eckner at soundinvest.net> > Date: Mon, Aug-4-2003 8:04 AM > Subject: RE: [R] Problem with data.frames > > What I meant by "like library()" is that this function ensures that a > library > (or should I say package?) does not get loaded twice / duplicated in the > search path. Of course attach() is not the same as library(). Nor is it > the > same as with() - I just thought it might be useful if the help pages > pointed > to each other. Because I have programmed for years using attach() and > detach() and only very recently discovered with() (which is nearly > always > what I want since I work mostly with scripts). > > > So how can you tell if the data frame is on the search list > > "my.dframe" %in% search() ? > > Does anyone have an example of a case where attaching the same data > frame > twice would be useful? > > Thanks Prof R. for (another) reminder to read the code - I'm always > forgetting to look there. RTFC, Simon! > > SF > > -----Original Message----- > From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] > Sent: 04 August 2003 12:06 > To: Simon Fear > Cc: Peter Dalgaard BSA; Andreas Eckner; r-help at stat.math.ethz.ch > Subject: RE: [R] Problem with data.frames > > > On Mon, 4 Aug 2003, Simon Fear wrote: > > > Peter is right, but there is a point here: it would be nice if > > attach(my.dframe) would do nothing - or at least warn - if my.dframe > was > > already in the search list. Like library(). Attaching twice is almost > > It is not like library(). attach attaches a copy of the data frame, and > it can be altered subsequently. So how can you tell if the data frame is > on the search list. I suspect it is quite common to create an object > called `tmp' and put it on the search list for a while. library() > assumes > that you have not reinstalled a loaded package (and if you have you > would > get inconsistent results). > > > bound > > to be an error? > > Not at all. > > > Of course in many circumstances it might be better to use with() than > > attach() - if you haven't come across with(), it works like attach() > > with an > > inbuilt detach() when the parentheses close. > > > > Aside 1: it would also be nice if ?attach pointed to ?with. Is this > kind > > of > > suggestion best sent to r-help or r-devel? > > I don't see they do the same job, especially not in an interactive > session (which is when attach() is most usefully used in this way). > > > Aside 2: is with() efficient, or does it create a copy of the dataset? > It does not create a copy, as you can see from the code (it uses a > three-argument eval). > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > Simon Fear > Senior Statistician > Syne qua non Ltd > Tel: +44 (0) 1379 644449 > Fax: +44 (0) 1379 644445 > email: Simon.Fear at synequanon.com > web: http://www.synequanon.com > > Number of attachments included with this message: 0 > > This message (and any associated files) is confidential and\...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >
Peter, Brian, and list, sorry if I was not clear (and obviously I wasn't), I do fully accept that different data frames *can* be given the same names. Then you'd have to access the first one created with an explicit get() to some specific point in the search list - or explicitly placed there using pos= - if indeed you could remember which one you wanted. I think my point is that duplicating names strikes me as horrendous programming style - why on earth would I ever need to have two deliberately different objects coexisting with the same name? Or the same object, with the same name, more than once? Maybe I am missing the point in that I don't really use R interactively, I always source() or copy and paste to the prompt, so I probably think of debugging structured programs more than some users. If I can't convince everyone that this type of name duplication is best treated as an error, surely a warning would be good? At least an optional argument ... In Peter D's example, the second invocation of f() should presumably create a "new" d? That would suggest TWO new arguments, replace= and warn=. Currently both false. And Brian R. likes to have several tmp's. I just can't see why. (Well, actually, I can see that it saves the bother of detaching(), and in an interactive session with nobody looking over your shoulder, who cares?) But if I need more than one temporary dataframe I call them tmp1, tmp2, etc. I just don't like the idea of a load of old tmp's filling up my precious memory space and then not remembering which one was which. Surely it's a bug waiting to pounce (not sure about that metaphor, sorry). SF PS Before anyone tells me: name duplication is intentional and very handy in R - when writing and nesting functions. But then it is different. Lexical scope ensures that the most recently defined instance is used when the name is used (unless you do something psychopathic, such as use an explicit assign outside the current frame). But these local instances disappear when the function closes. This is not the case for variables deliberately placed in the search path, which are meant to be "global", and in frame 1 (I think. I'm getting out of my depth here - spent too long using Splus). In Peter D's example you have to be very careful that you don't assign "x" in .GlobalEnv else it will mask the one you probably wanted, which is d$x or get("x",pos=2) depending how many times you've called f(). I'm beginning to go off the idea of attach()ing dataframes altogether, the more I think about it. Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}
You believed right - I do indeed lack the vision of others to foresee any practical need to attach and detach within a recursive function, except perhaps for my own amusement. Should I do the recursive call in between the attach and detach, I wonder? Wouldn't it all be fun, watching that search path grow and grow, before the function eventually hit the first detach! And just imagine the hilarity you could have, attaching and detaching at different positions! I could have a thousand identical data sets, all called Eric!!! It's one of the many situations in which I would very much like to get a warning or error message, pointing out to me that I had absolutely no idea what I was doing. Surely that's what warnings are for? For those of us who wonder why our code doesn't do what we think it should, until a long time after the deadline? SF -----Original Message----- From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Sent: 04 August 2003 14:34 To: Simon Fear Cc: r-help at stat.math.ethz.ch Subject: RE: [R] Problem with data.frames Security Warning: If you are not sure an attachment is safe to open please contact Andy on x234. There are 0 attachments with this message. ________________________________________________________________ Think about what happens if you call a function recursively, e.g. by Recall(), and that function includes an attach/detach pair. I believe it is your vision that is too limited, not other people's. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}