thr3ads.net - R help - [R] rbind and data.frame [Dec 2001]

If this information is useful, please help other people find it:
Share via:

Emmanuel Paradis

2001-Dec-07 14:08 UTC

[R] rbind and data.frame

G=F6ran,

At 11:04 07/12/01 +0100, G=F6ran Brostr=F6m wrote:>On Wed, 5 Dec 2001, G=F6ran Brostr=F6m wrote:
>
>[...]
>=20
>> My real problem is how to create a data frame in a sequentially growing
>> manner, when I know the final size (no of cases). I want to avoid to
>> call 'rbind' many times, and instead create an 'empty'
data frame in
>> one call, and then fill it. Are there better ways of doing this?
>
>Got no answer to this one, so I provide one myself:
>
>The answer is: Yes, definitely. I did this, with pure  R  code, and=20
>created a new data frame with around 58000 records. It took 7 hours to=20
>run. I then did it with compiled code (Fortran), and that made a slight
>difference:  It took 4.8 seconds(!).
>
>G=F6ran
I seem to remember that R is not very efficient at creating/manipulating
large data frames. Did you consider doing it with a matrix with 58000 rows?
In that case, of course, all your columns *must* be of the same mode.

Emmanuel Paradis
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Liaw, Andy

2001-Dec-07 15:32 UTC

head link

[R] rbind and data.frame

Are you sure that the time difference is *only* in creating the data frame,
rather than other computations in the loop?

Andy
> -----Original Message-----
> From: G?ran Brostr?m [mailto:gb at stat.umu.se]
> Sent: Friday, December 07, 2001 7:25 AM
> To: Prof Brian Ripley
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] rbind and data.frame
> 
> 
> On Fri, 7 Dec 2001, Prof Brian Ripley wrote:
> 
> > On Fri, 7 Dec 2001, [iso-8859-1] G?ran Brostr?m wrote:
> > 
> > > On Wed, 5 Dec 2001, G?ran Brostr?m wrote:
> > >
> > > [...]
> > >
> > > > My real problem is how to create a data frame in a 
> sequentially growing
> > > > manner, when I know the final size (no of cases). I 
> want to avoid to
> > > > call 'rbind' many times, and instead create an
'empty'
> data frame in
> > > > one call, and then fill it. Are there better ways of doing
this?
> > >
> > > Got no answer to this one, so I provide one myself:
> > 
> > The usual answer is to create a data frame of the desired size and
> > populate it via indexing.  That's in some books I know!
> 
> I know that book too (thanks!). I did what you suggest, and 
> that took 7 
> hours to run. Definitely.
> 
> G?ran
> 
> > >
> > > The answer is: Yes, definitely. I did this, with pure  R  
> code, and
> > > created a new data frame with around 58000 records. It 
> took 7 hours to
> > > run. I then did it with compiled code (Fortran), and that 
> made a slight
> > > difference:  It took 4.8 seconds(!).
> > >
> > > G?ran
> > >
> > > 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> > > r-help mailing list -- Read 
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > Send "info", "help", or
"[un]subscribe"
> > > (in the "body", not the subject !)  To: 
> r-help-request at stat.math.ethz.ch
> > > 
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._._._
> > >
> > 
> > 
> 
> -- 
>  G?ran Brostr?m                      tel: +46 90 786 5223
>  professor                           fax: +46 90 786 6614
>  Department of Statistics            http://www.stat.umu.se/egna/gb/
>  Ume? University
>  SE-90187 Ume?, Sweden             e-mail: gb at stat.umu.se
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-help mailing list -- Read 
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: 
> r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._._._
> 
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Göran Broström

2001-Dec-07 15:47 UTC

head link

[R] rbind and data.frame

On 7 xxx -1, Emmanuel Paradis wrote:

[...]
 > I seem to remember that R is not very efficient at creating/manipulating
> large data frames. Did you consider doing it with a matrix with 58000 rows?
> In that case, of course, all your columns *must* be of the same mode.
Yes, I tried reading from a data.frame, doing some calculations,
and writing to rows of a matrix. It is definitely faster than writing to
a data frame, but _much_ slower than compiled code. We also have to 
convert the matrix to a data frame of a given type. It is not quite 
trivial, because variable types and names have  to be 'read' from the 
input data frame, but I think I know how to do that.

I think the _real_ problem is that I have to do this in a loop, row by 
row, because the input rows produce a variable number of output rows.

G?ran

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

james.holtman@convergys.com

2001-Dec-07 19:14 UTC

head link

[R] rbind and data.frame

Heres some timings from a 700MHZ laptop running WIN/2000:
> x.1 <- data.frame(a=integer(85000), b=double(85000), c=character(85000))
> str(x.1)`data.frame':   85000 obs. of  3 variables:
 $ a: int  0 0 0 0 0 0 0 0 0 0 ...
 $ b: num  0 0 0 0 0 0 0 0 0 0 ...
 $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#
# loading up a variable with a vector takes very little time
#> system.time(x.1$a <- 1:85000)
[1] 0.03 0.00 0.03   NA   NA> str(x.1)`data.frame':   85000 obs. of  3 variables:
 $ a: int  1 2 3 4 5 6 7 8 9 10 ...
 $ b: num  0 0 0 0 0 0 0 0 0 0 ...
 $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
#
# a 'for' loop by itself is only 0.3 seconds
#> system.time(for (i in 1:85000)invisible(1))[1] 0.30 0.00 0.31   NA   NA
#
# it takes me 5 seconds to initialize 85,000 of a variable, so I would
assume
# it would depend on how many and what type.  If 'factors', I would
assume
you would
# declare those as 'character' and then convert to 'factor' at
the end.
# so it seems fast; is there something I am missing?
#> system.time(for (i in 1:85000) x.1$a[i] <- i)
[1] 5.12 0.04 5.22   NA   NA>



"Liaw, Andy" <andy_liaw at merck.com>@stat.math.ethz.ch on
12/07/2001 10:32:31

Sent by:  owner-r-help at stat.math.ethz.ch


To:   r-help at stat.math.ethz.ch
cc:
Subject:  RE: [R] rbind and data.frame


Are you sure that the time difference is *only* in creating the data frame,
rather than other computations in the loop?

Andy
> -----Original Message-----
> From: G?ran Brostr?m [mailto:gb at stat.umu.se]
> Sent: Friday, December 07, 2001 7:25 AM
> To: Prof Brian Ripley
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] rbind and data.frame
>
>
> On Fri, 7 Dec 2001, Prof Brian Ripley wrote:
>
> > On Fri, 7 Dec 2001, [iso-8859-1] G?ran Brostr?m wrote:
> >
> > > On Wed, 5 Dec 2001, G?ran Brostr?m wrote:
> > >
> > > [...]
> > >
> > > > My real problem is how to create a data frame in a
> sequentially growing
> > > > manner, when I know the final size (no of cases). I
> want to avoid to
> > > > call 'rbind' many times, and instead create an
'empty'
> data frame in
> > > > one call, and then fill it. Are there better ways of doing
this?
> > >
> > > Got no answer to this one, so I provide one myself:
> >
> > The usual answer is to create a data frame of the desired size and
> > populate it via indexing.  That's in some books I know!
>
> I know that book too (thanks!). I did what you suggest, and
> that took 7
> hours to run. Definitely.
>
> G?ran
>
> > >
> > > The answer is: Yes, definitely. I did this, with pure  R
> code, and
> > > created a new data frame with around 58000 records. It
> took 7 hours to
> > > run. I then did it with compiled code (Fortran), and that
> made a slight
> > > difference:  It took 4.8 seconds(!).
> > >
> > > G?ran
> > >
> > >
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> > > r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > Send "info", "help", or
"[un]subscribe"
> > > (in the "body", not the subject !)  To:
> r-help-request at stat.math.ethz.ch
> > >
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._._._
> > >
> >
> >
>
> --
>  G?ran Brostr?m                      tel: +46 90 786 5223
>  professor                           fax: +46 90 786 6614
>  Department of Statistics            http://www.stat.umu.se/egna/gb/
>  Ume? University
>  SE-90187 Ume?, Sweden             e-mail: gb at stat.umu.se
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To:
> r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._._._
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._._



--

NOTICE:  The information contained in this electronic mail transmission is
intended by Convergys Corporation for the use of the named individual or
entity to which it is directed and may contain information that is
privileged or otherwise confidential.  If you have received this electronic
mail transmission in error, please delete it from your system without
copying or forwarding it, and notify the sender of the error by reply email
or by telephone (collect), so that the sender's address records can be
corrected.


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Maybe Matching Threads

Search for more reasonably related threads

R help - Dec 2001 - rbind and data.frame

[R] rbind and data.frame

[R] rbind and data.frame

[R] rbind and data.frame

[R] rbind and data.frame

Maybe Matching Threads