thr3ads.net - R help - [R] List of lists? Data frames? (Or other data structures?) [May 2003]

If this information is useful, please help other people find it:
Share via:

R A F

2003-May-01 00:51 UTC

[R] List of lists? Data frames? (Or other data structures?)

Hi, I'm faced with the following problem and would appreciate some
advice.

I could have a data frame x that looks like this:
         aa          bb
a        1           "A"
b        2           "B"

The advantage of this is that I could access all the individual
components easily.  Also I could access all the rows and columns
easily.

Alternatively, I could have a list of lists that looks like this:

xprime <- list()
xprime$a <- list()
xprime$b <- list()

xprime$a$aa <- 1
xprime$a$bb <- "A"

xprime$b$aa <- 2
xprime$b$bb <- "B"

etc.

If speed is important, would a list of lists be faster than a data
frame? (I know, for example, that scan is supposed to be faster than
read.table, but I don't know if that is related to issues with data
frames.)

My problem with a list of lists, though, is that if I want to access
all the bb subcomponents, a naive method like this one failed:

y <- c( "a", "b" )
xprime[[ y ]]$bb (Does not work)

So to get all the bb subcomponents I seem to need to loop, which may
slow things down (presumably).  But maybe people here know of a way.

Finally what would be the "best" way given the constraint of quick
access to all rows, columns and individual components?

I'd appreciate your thoughts and comments.  Thanks very much.

Roger Peng

2003-May-01 01:25 UTC

head link

[R] List of lists? Data frames? (Or other data structures?)

If you're talking about rows and columns, it seems like the appropriate
data structure for you is the data frame.  I think your list of lists
representation might get unwieldy after a while.  I can't really think of
why a data frame would be any slower than a list of lists -- I've never
experienced such behavior.

read.table() may be a little slower than scan() because read.table() reads
in an entire file and then converts each of the columns into an
appropriate data class.  So there is some post-processing going on.  It
doesn't have anything to do with data frames vs. lists.

-roger
_______________________________
UCLA Department of Statistics
http://www.stat.ucla.edu/~rpeng

On Thu, 1 May 2003, R A F wrote:
> Hi, I'm faced with the following problem and would appreciate some
> advice.
> 
> I could have a data frame x that looks like this:
>          aa          bb
> a        1           "A"
> b        2           "B"
> 
> The advantage of this is that I could access all the individual
> components easily.  Also I could access all the rows and columns
> easily.
> 
> Alternatively, I could have a list of lists that looks like this:
> 
> xprime <- list()
> xprime$a <- list()
> xprime$b <- list()
> 
> xprime$a$aa <- 1
> xprime$a$bb <- "A"
> 
> xprime$b$aa <- 2
> xprime$b$bb <- "B"
> 
> etc.
> 
> If speed is important, would a list of lists be faster than a data
> frame? (I know, for example, that scan is supposed to be faster than
> read.table, but I don't know if that is related to issues with data
> frames.)
> 
> My problem with a list of lists, though, is that if I want to access
> all the bb subcomponents, a naive method like this one failed:
> 
> y <- c( "a", "b" )
> xprime[[ y ]]$bb (Does not work)
> 
> So to get all the bb subcomponents I seem to need to loop, which may
> slow things down (presumably).  But maybe people here know of a way.
> 
> Finally what would be the "best" way given the constraint of
quick
> access to all rows, columns and individual components?
> 
> I'd appreciate your thoughts and comments.  Thanks very much.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

R A F

2003-May-01 11:49 UTC

head link

[R] List of lists? Data frames? (Or other data structures?)

Thanks for your comments.  I'm not too familiar with these differences,
but here's a simple experiment.  In a data file with 139,000 rows and
5 columns (double string double double double),
>system.time( aaa <- read.table( "file" ) )20.67 0.41 21.10 0.00 0.00
>system.time( aaa <- scan( "file", list( 0, "", 0, 0,
0 ) ) )6.07 0.01 6.09 0.00 0.00

It seems like scan is much faster -- and as the data file grows,
read.table seems to choke.  (I actually tried this with a data file
with over 2 million rows.)

I'm using a Sun-Sparc, Solaris 2.8 and R 1.5.1.  Sorry I can't be
more specific about the hardware/software configurations, not being
too knowledgeable about this sort of thing.

By the way, it's not possible to create a matrix of mixed types, is
it?  (I don't know how anyway.)

Any ideas as to the speed differences?  Thanks again.
>From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
>To: Roger Peng <rpeng at stat.ucla.edu>
>CC: r-help at stat.math.ethz.ch, R A F <raf1729 at hotmail.com>
>Subject: Re: [R] List of lists?  Data frames? (Or other data structures?)
>Date: Thu, 1 May 2003 08:42:55 +0100 (BST)
>
>On Wed, 30 Apr 2003, Roger Peng wrote:
>
> > If you're talking about rows and columns, it seems like the
appropriate
> > data structure for you is the data frame.  I think your list of lists
> > representation might get unwieldy after a while.  I can't really
think
>of
> > why a data frame would be any slower than a list of lists -- I've
never
> > experienced such behavior.
> >
> > read.table() may be a little slower than scan() because read.table() 
>reads
> > in an entire file and then converts each of the columns into an
> > appropriate data class.  So there is some post-processing going on. 
It
> > doesn't have anything to do with data frames vs. lists.
>
>Only if you don't specify colClasses: if you do (and you would need the
>information to use scan()) there should be no performance penalty. (Note
>that matrices can be scan()-ed into a vector and the dimensions added, and
>that will be faster.)

R A F

2003-May-01 12:20 UTC

head link

[R] List of lists? Data frames? (Or other data structures?)

Ah, thanks!

(It's not that I didn't reading it -- I didn't understand it and so
I thought that it'd be easier to ask again.  Thanks very much!)
>From: Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>
>To: "R A F" <raf1729 at hotmail.com>
>CC: ripley at stats.ox.ac.uk, rpeng at stat.ucla.edu, r-help at
stat.math.ethz.ch
>Subject: Re: [R] List of lists? Data frames? (Or other data structures?)
>Date: 01 May 2003 14:19:32 +0200
>
>You're not taking Brian's hint!:

R A F

2003-May-01 12:57 UTC

head link

[R] List of lists? Data frames? (Or other data structures?)

For what it's worth, I followed the suggestion of using colClasses:

cls <- c( "numeric", "character", "numeric",
"numeric", "numeric" )
system.time( bbb <- read.table( "file", colClasses = cls ) )

Here're the results from three tries:
8.21 0.06 8.28 0.00 0.00
8.94 0.10 9.10 0.00 0.00
8.55 0.06 8.69 0.00 0.00

I also did
system.time( aaa <- scan( "file", list( 0, "", 0, 0, 0 )
) three
times:

6.46 0.04 6.59 0.00 0.00
5.27 0.04 5.33 0.00 0.00
5.14 0.05 5.19 0.00 0.00

By the way, I did the experiment in the order bbb, aaa, bbb, aaa,
bbb, aaa.

So it appears that read.table is still a little slower -- but it could
be just me doing something wrong.

Thanks.
>From: "R A F" <raf1729 at hotmail.com>
>To: p.dalgaard at biostat.ku.dk
>CC: r-help at stat.math.ethz.ch, rpeng at stat.ucla.edu, ripley at
stats.ox.ac.uk
>Subject: Re: [R] List of lists? Data frames? (Or other data structures?)
>Date: Thu, 01 May 2003 12:20:57 +0000
>
>Ah, thanks!
>
>(It's not that I didn't reading it -- I didn't understand it and
so
>I thought that it'd be easier to ask again.  Thanks very much!)
>
>>From: Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>
>>To: "R A F" <raf1729 at hotmail.com>
>>CC: ripley at stats.ox.ac.uk, rpeng at stat.ucla.edu, r-help at
stat.math.ethz.ch
>>Subject: Re: [R] List of lists? Data frames? (Or other data structures?)
>>Date: 01 May 2003 14:19:32 +0200
>>
>>You're not taking Brian's hint!:

Apparently Analagous Threads

Search for more apparently analagous threads

R help - May 2003 - List of lists? Data frames? (Or other data structures?)

[R] List of lists? Data frames? (Or other data structures?)

[R] List of lists? Data frames? (Or other data structures?)

[R] List of lists? Data frames? (Or other data structures?)

[R] List of lists? Data frames? (Or other data structures?)

[R] List of lists? Data frames? (Or other data structures?)

Apparently Analagous Threads