thr3ads.net - R devel - [Rd] O(?) access time for rownames/colnames/named dimensions [Aug 2007]

If this information is useful, please help other people find it:
Share via:

mfrumin

2007-Aug-03 11:06 UTC

[Rd] O(?) access time for rownames/colnames/named dimensions

Dear princely R developers to whom I owe so much of my daily productivity
level,

I am wondering how indexing by row and column names are implemented. That
is, when I have a matrix with named rows (or columns) and I then index into
that matrix using the string names (rather than integers), how does R scan
through the list of names in order to figure out which row/column to use?
My current suspicion is that it's a linear search, but I've been wrong
many
times before.

If it is linear, may I ask if the possibility of implementing the
name-to-integer-index lookup in a more efficient way as ever been discussed?
For example, why not hash the string indexes to quickly map them to integer
indexes?

Of course adding this sort of functionality would make it somewhat more
expensive to add rows/columns anywhere but the end of the matrix, but in
many cases I suspect it would be worth it. Or perhaps there would be a way
to enable this selectively on each named dimension?

In my use case, I am building an array with named dimensions once, and then
doing lots of reading/updates into that matrix without changing its shape.
This is proving to be rather costly at the moment and I've taken to using a
hashed environment to do lookups where it really counts, but this is rather
inconvenient.

I'm sure this has been thought about before, but I didn't see anything
in
this mailing list.

Thanks,
Michael
--
View this message in context:
http://www.nabble.com/O%28-%29-access-time-for-rownames-colnames-named-dimensions-tf4211918.html#a11981293
Sent from the R devel mailing list archive at Nabble.com.

Prof Brian Ripley

2007-Aug-03 20:14 UTC

head link

[Rd] O(?) access time for rownames/colnames/named dimensions

On Fri, 3 Aug 2007, mfrumin wrote:
>
> Dear princely R developers to whom I owe so much of my daily productivity
> level,
>
> I am wondering how indexing by row and column names are implemented.  That
> is, when I have a matrix with named rows (or columns) and I then index into
> that matrix using the string names (rather than integers), how does R scan
> through the list of names in order to figure out which row/column to use?
> My current suspicion is that it's a linear search, but I've been
wrong many
> times before.
>
> If it is linear, may I ask if the possibility of implementing the
> name-to-integer-index lookup in a more efficient way as ever been
discussed?
> For example, why not hash the string indexes to quickly map them to integer
> indexes?
We do, for large enough indices.  From ONEWS

     o	Indexing a vector by a character vector was slow if both the
 	vector and index were long (say 10,000).  Now hashing is used
 	and the time should be linear in the longer of the lengths
 	(but more memory is used).

See src/main/subscript.c (the code is also used for array dimensions).

The hash table is not kept, since repeated lookups are not common (R is a 
vectorized language).
> Of course adding this sort of functionality would make it somewhat more
> expensive to add rows/columns anywhere but the end of the matrix, but in
> many cases I suspect it would be worth it.  Or perhaps there would be a way
> to enable this selectively on each named dimension?
>
> In my use case, I am building an array with named dimensions once, and then
> doing lots of reading/updates into that matrix without changing its shape.
> This is proving to be rather costly at the moment and I've taken to
using a
> hashed environment to do lookups where it really counts, but this is rather
> inconvenient.
>
> I'm sure this has been thought about before, but I didn't see
anything in
> this mailing list.
>
> Thanks,
> Michael
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Seemingly Similar Threads

Search for more seemingly similar threads

R devel - Aug 2007 - O(?) access time for rownames/colnames/named dimensions

[Rd] O(?) access time for rownames/colnames/named dimensions

[Rd] O(?) access time for rownames/colnames/named dimensions

Seemingly Similar Threads