gschultz at scriptpro.com
2010-May-24 14:29 UTC
[R] Data frames, passing by value, and performance
I understand that everything passed to an R function is passed "by value". This would seem to include data frames, which my current application uses heavily, both for storing program inputs, and holding intermediate and final results. In trying to get greater performance out of my R code, I am wondering if there is any clean way to access data frames without having them copied all the time. Or is my only option to make them global, and write to them using <<- ? I have considered using matrices, but I like the self-documenting aspect of data frame column names. Input/output to disk is not the issue here, as that does not take long in my case. It's just the internal parameter passing that I'm concerned about. (I've checked R-FAQ, R-lang and searched the R-help archives, but didn't see any specific mentions of this.) Thanks. Grant Schultz
Gabor Grothendieck
2010-May-24 14:34 UTC
[R] Data frames, passing by value, and performance
If you don't modify the data frame in your function it won't physically make a new copy. On Mon, May 24, 2010 at 10:29 AM, <gschultz at scriptpro.com> wrote:> I understand that everything passed to an R function is passed "by > value". ?This would seem to include data frames, which my current > application uses heavily, both for storing program inputs, and holding > intermediate and final results. ?In trying to get greater performance > out of my R code, I am wondering if there is any clean way to access > data frames without having them copied all the time. ?Or is my only > option to make them global, and write to them using <<- ?? > > I have considered using matrices, but I like the self-documenting aspect > of data frame column names. ?Input/output to disk is not the issue here, > as that does not take long in my case. ?It's just the internal parameter > passing that I'm concerned about. > > (I've checked R-FAQ, R-lang and searched the R-help archives, but didn't > see any specific mentions of this.) > > Thanks. > > Grant Schultz > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
biostatmatt
2010-May-24 14:54 UTC
[R] Data frames, passing by value, and performance (Matt Shotwell)
R is pretty smart about duplicating only when necessary. That is, arguments passed to a function are copy-on-write. Also, I think (someone more knowledgeable please correct if I'm wrong) it may be better to use the data frame, which is just a list internally, because if you only modify one column, only that column is duplicated, not the entire data frame. If you were to use a matrix, the entire matrix would require duplication. -Matt On Mon, 2010-05-24 at 09:29 -0500, gschultz at scriptpro.com wrote:> I understand that everything passed to an R function is passed "by > value". This would seem to include data frames, which my current > application uses heavily, both for storing program inputs, and holding > intermediate and final results. In trying to get greater performance > out of my R code, I am wondering if there is any clean way to access > data frames without having them copied all the time. Or is my only > option to make them global, and write to them using <<- ? > > I have considered using matrices, but I like the self-documenting aspect > of data frame column names. Input/output to disk is not the issue here, > as that does not take long in my case. It's just the internal parameter > passing that I'm concerned about. > > (I've checked R-FAQ, R-lang and searched the R-help archives, but didn't > see any specific mentions of this.) > > Thanks. > > Grant Schultz > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.