On Wed, 13 Jan 2010, Benjamin Tyner wrote:
> The MKsetup() in unique.c throws an error if the vector to be hashed is
> longer than (2^32)/8:
>
> if(n < 0 || n > 536870912) /* protect against overflow to -ve */
> error(_("length %d is too large for hashing"), n);
>
> I occasionally work with vectors longer than this on 64-bit builds. Would
it
> be too much to ask that R can take advantage of all 64 bits for hashing
when
> compiled as such?
'All 64 bits' of what? All systems we use have 64 bit integer types,
but there are good reasons not to use them where not needed, and 'int'
is not 64-bit on any R platform. I don't see the connection to 64-bit
pointers, which is what is most often meant by a '64-bit build'.
Efficiency would be a major consideration with such long vectors.
What type(s) are you contemplating, and are they full of duplicates?
If the latter, we could simply allow K=29. Otherwise likely a new
approach would be needed.
I think the way forward is for you to do some experiments and submit
proposed code changes with supporting evidence. (It seems only you is
interested.)
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595