mfrumin
2007-Aug-03 11:06 UTC
[Rd] O(?) access time for rownames/colnames/named dimensions
Dear princely R developers to whom I owe so much of my daily productivity level, I am wondering how indexing by row and column names are implemented. That is, when I have a matrix with named rows (or columns) and I then index into that matrix using the string names (rather than integers), how does R scan through the list of names in order to figure out which row/column to use? My current suspicion is that it's a linear search, but I've been wrong many times before. If it is linear, may I ask if the possibility of implementing the name-to-integer-index lookup in a more efficient way as ever been discussed? For example, why not hash the string indexes to quickly map them to integer indexes? Of course adding this sort of functionality would make it somewhat more expensive to add rows/columns anywhere but the end of the matrix, but in many cases I suspect it would be worth it. Or perhaps there would be a way to enable this selectively on each named dimension? In my use case, I am building an array with named dimensions once, and then doing lots of reading/updates into that matrix without changing its shape. This is proving to be rather costly at the moment and I've taken to using a hashed environment to do lookups where it really counts, but this is rather inconvenient. I'm sure this has been thought about before, but I didn't see anything in this mailing list. Thanks, Michael -- View this message in context: http://www.nabble.com/O%28-%29-access-time-for-rownames-colnames-named-dimensions-tf4211918.html#a11981293 Sent from the R devel mailing list archive at Nabble.com.
Prof Brian Ripley
2007-Aug-03 20:14 UTC
[Rd] O(?) access time for rownames/colnames/named dimensions
On Fri, 3 Aug 2007, mfrumin wrote:> > Dear princely R developers to whom I owe so much of my daily productivity > level, > > I am wondering how indexing by row and column names are implemented. That > is, when I have a matrix with named rows (or columns) and I then index into > that matrix using the string names (rather than integers), how does R scan > through the list of names in order to figure out which row/column to use? > My current suspicion is that it's a linear search, but I've been wrong many > times before. > > If it is linear, may I ask if the possibility of implementing the > name-to-integer-index lookup in a more efficient way as ever been discussed? > For example, why not hash the string indexes to quickly map them to integer > indexes?We do, for large enough indices. From ONEWS o Indexing a vector by a character vector was slow if both the vector and index were long (say 10,000). Now hashing is used and the time should be linear in the longer of the lengths (but more memory is used). See src/main/subscript.c (the code is also used for array dimensions). The hash table is not kept, since repeated lookups are not common (R is a vectorized language).> Of course adding this sort of functionality would make it somewhat more > expensive to add rows/columns anywhere but the end of the matrix, but in > many cases I suspect it would be worth it. Or perhaps there would be a way > to enable this selectively on each named dimension? > > In my use case, I am building an array with named dimensions once, and then > doing lots of reading/updates into that matrix without changing its shape. > This is proving to be rather costly at the moment and I've taken to using a > hashed environment to do lookups where it really counts, but this is rather > inconvenient. > > I'm sure this has been thought about before, but I didn't see anything in > this mailing list. > > Thanks, > Michael >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595