G=F6ran, At 11:04 07/12/01 +0100, G=F6ran Brostr=F6m wrote:>On Wed, 5 Dec 2001, G=F6ran Brostr=F6m wrote: > >[...] >=20 >> My real problem is how to create a data frame in a sequentially growing >> manner, when I know the final size (no of cases). I want to avoid to >> call 'rbind' many times, and instead create an 'empty' data frame in >> one call, and then fill it. Are there better ways of doing this? > >Got no answer to this one, so I provide one myself: > >The answer is: Yes, definitely. I did this, with pure R code, and=20 >created a new data frame with around 58000 records. It took 7 hours to=20 >run. I then did it with compiled code (Fortran), and that made a slight >difference: It took 4.8 seconds(!). > >G=F6ranI seem to remember that R is not very efficient at creating/manipulating large data frames. Did you consider doing it with a matrix with 58000 rows? In that case, of course, all your columns *must* be of the same mode. Emmanuel Paradis -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Are you sure that the time difference is *only* in creating the data frame, rather than other computations in the loop? Andy> -----Original Message----- > From: G?ran Brostr?m [mailto:gb at stat.umu.se] > Sent: Friday, December 07, 2001 7:25 AM > To: Prof Brian Ripley > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] rbind and data.frame > > > On Fri, 7 Dec 2001, Prof Brian Ripley wrote: > > > On Fri, 7 Dec 2001, [iso-8859-1] G?ran Brostr?m wrote: > > > > > On Wed, 5 Dec 2001, G?ran Brostr?m wrote: > > > > > > [...] > > > > > > > My real problem is how to create a data frame in a > sequentially growing > > > > manner, when I know the final size (no of cases). I > want to avoid to > > > > call 'rbind' many times, and instead create an 'empty' > data frame in > > > > one call, and then fill it. Are there better ways of doing this? > > > > > > Got no answer to this one, so I provide one myself: > > > > The usual answer is to create a data frame of the desired size and > > populate it via indexing. That's in some books I know! > > I know that book too (thanks!). I did what you suggest, and > that took 7 > hours to run. Definitely. > > G?ran > > > > > > > The answer is: Yes, definitely. I did this, with pure R > code, and > > > created a new data frame with around 58000 records. It > took 7 hours to > > > run. I then did it with compiled code (Fortran), and that > made a slight > > > difference: It took 4.8 seconds(!). > > > > > > G?ran > > > > > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > > > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > > > Send "info", "help", or "[un]subscribe" > > > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > > > > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ > > > > > > > > > -- > G?ran Brostr?m tel: +46 90 786 5223 > professor fax: +46 90 786 6614 > Department of Statistics http://www.stat.umu.se/egna/gb/ > Ume? University > SE-90187 Ume?, Sweden e-mail: gb at stat.umu.se > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 7 xxx -1, Emmanuel Paradis wrote: [...]> I seem to remember that R is not very efficient at creating/manipulating > large data frames. Did you consider doing it with a matrix with 58000 rows? > In that case, of course, all your columns *must* be of the same mode.Yes, I tried reading from a data.frame, doing some calculations, and writing to rows of a matrix. It is definitely faster than writing to a data frame, but _much_ slower than compiled code. We also have to convert the matrix to a data frame of a given type. It is not quite trivial, because variable types and names have to be 'read' from the input data frame, but I think I know how to do that. I think the _real_ problem is that I have to do this in a loop, row by row, because the input rows produce a variable number of output rows. G?ran -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Heres some timings from a 700MHZ laptop running WIN/2000:> x.1 <- data.frame(a=integer(85000), b=double(85000), c=character(85000)) > str(x.1)`data.frame': 85000 obs. of 3 variables: $ a: int 0 0 0 0 0 0 0 0 0 0 ... $ b: num 0 0 0 0 0 0 0 0 0 0 ... $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ... # # loading up a variable with a vector takes very little time #> system.time(x.1$a <- 1:85000)[1] 0.03 0.00 0.03 NA NA> str(x.1)`data.frame': 85000 obs. of 3 variables: $ a: int 1 2 3 4 5 6 7 8 9 10 ... $ b: num 0 0 0 0 0 0 0 0 0 0 ... $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ... # # a 'for' loop by itself is only 0.3 seconds #> system.time(for (i in 1:85000)invisible(1))[1] 0.30 0.00 0.31 NA NA # # it takes me 5 seconds to initialize 85,000 of a variable, so I would assume # it would depend on how many and what type. If 'factors', I would assume you would # declare those as 'character' and then convert to 'factor' at the end. # so it seems fast; is there something I am missing? #> system.time(for (i in 1:85000) x.1$a[i] <- i)[1] 5.12 0.04 5.22 NA NA>"Liaw, Andy" <andy_liaw at merck.com>@stat.math.ethz.ch on 12/07/2001 10:32:31 Sent by: owner-r-help at stat.math.ethz.ch To: r-help at stat.math.ethz.ch cc: Subject: RE: [R] rbind and data.frame Are you sure that the time difference is *only* in creating the data frame, rather than other computations in the loop? Andy> -----Original Message----- > From: G?ran Brostr?m [mailto:gb at stat.umu.se] > Sent: Friday, December 07, 2001 7:25 AM > To: Prof Brian Ripley > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] rbind and data.frame > > > On Fri, 7 Dec 2001, Prof Brian Ripley wrote: > > > On Fri, 7 Dec 2001, [iso-8859-1] G?ran Brostr?m wrote: > > > > > On Wed, 5 Dec 2001, G?ran Brostr?m wrote: > > > > > > [...] > > > > > > > My real problem is how to create a data frame in a > sequentially growing > > > > manner, when I know the final size (no of cases). I > want to avoid to > > > > call 'rbind' many times, and instead create an 'empty' > data frame in > > > > one call, and then fill it. Are there better ways of doing this? > > > > > > Got no answer to this one, so I provide one myself: > > > > The usual answer is to create a data frame of the desired size and > > populate it via indexing. That's in some books I know! > > I know that book too (thanks!). I did what you suggest, and > that took 7 > hours to run. Definitely. > > G?ran > > > > > > > The answer is: Yes, definitely. I did this, with pure R > code, and > > > created a new data frame with around 58000 records. It > took 7 hours to > > > run. I then did it with compiled code (Fortran), and that > made a slight > > > difference: It took 4.8 seconds(!). > > > > > > G?ran > > > > > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > > > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > > > Send "info", "help", or "[un]subscribe" > > > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > > > > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ > > > > > > > > > -- > G?ran Brostr?m tel: +46 90 786 5223 > professor fax: +46 90 786 6614 > Department of Statistics http://www.stat.umu.se/egna/gb/ > Ume? University > SE-90187 Ume?, Sweden e-mail: gb at stat.umu.se > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._._ -- NOTICE: The information contained in this electronic mail transmission is intended by Convergys Corporation for the use of the named individual or entity to which it is directed and may contain information that is privileged or otherwise confidential. If you have received this electronic mail transmission in error, please delete it from your system without copying or forwarding it, and notify the sender of the error by reply email or by telephone (collect), so that the sender's address records can be corrected. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._