thr3ads.net - R help - [R] Lookups in R [Jul 2007]

If this information is useful, please help other people find it:
Share via:

mfrumin

2007-Jul-04 19:01 UTC

[R] Lookups in R

Hey all; I'm a beginner++ user of R, trying to use it to do some processing
of data sets of over 1M rows, and running into a snafu.  imagine that my
input is a huge table of transactions, each linked to a specif user id.  as
I run through the transactions, I need to update a separate table for the
users, but I am finding that the traditional ways of doing a table lookup
are way too slow to support this kind of operation.

i.e:

for(i in 1:1000000) {
   userid = transactions$userid[i];
   amt = transactions$amounts[i];
   users[users$id == userid,'amt'] += amt;
}

I assume this is a linear lookup through the users table (in which there are
10's of thousands of rows), when really what I need is O(constant time), or
at worst O(log(# users)).

is there any way to manage a list of ID's (be they numeric, string, etc) and
have them efficiently mapped to some other table index?

I see the CRAN package for SQLite hashes, but that seems to be going a bit
too far.

thanks,
Mike

Intern, Oyster Card Group, Transport for London
(feel free to email back to this address, I'm posting through NAbble so I
hope it works).
-- 
View this message in context:
http://www.nabble.com/Lookups-in-R-tf4026062.html#a11435994
Sent from the R help mailing list archive at Nabble.com.

Peter Dalgaard

2007-Jul-04 19:47 UTC

head link

[R] Lookups in R

mfrumin wrote:> Hey all; I'm a beginner++ user of R, trying to use it to do some
processing
> of data sets of over 1M rows, and running into a snafu.  imagine that my
> input is a huge table of transactions, each linked to a specif user id.  as
> I run through the transactions, I need to update a separate table for the
> users, but I am finding that the traditional ways of doing a table lookup
> are way too slow to support this kind of operation.
>
> i.e:
>
> for(i in 1:1000000) {
>    userid = transactions$userid[i];
>    amt = transactions$amounts[i];
>    users[users$id == userid,'amt'] += amt;
> }
>
> I assume this is a linear lookup through the users table (in which there
are
> 10's of thousands of rows), when really what I need is O(constant
time), or
> at worst O(log(# users)).
>
> is there any way to manage a list of ID's (be they numeric, string,
etc) and
> have them efficiently mapped to some other table index?
>
> I see the CRAN package for SQLite hashes, but that seems to be going a bit
> too far.
>   Sometimes you need a bit of lateral thinking. I suspect that you could 
do it like this:

tbl <- with(transactions, tapply(amount, userid, sum))
users$amt <- users$amt + tbl[users$id]

one catch is that there could be users with no transactions, in which 
case you may need to replace userid by factor(userid, levels=users$id). 
None of this is tested, of course.

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Jul 2007 - Lookups in R

[R] Lookups in R

[R] Lookups in R

Apparently Analagous Threads