On Wed, 17 Feb 2010, S. Few wrote:
> Currently using R 2.92
> Win XP
>
>
> Goal: scale up computing capacity for large datasets (1-5 million records)
>
> I realize under 32 bit versions of R the memory.limit is maxed at 4GB.
>
> Q:
> 1. What are the limits under 64 bit versions of R? Are those limits OS
> dependent?
I'm not sure exactly, but it is known, and large enough not to currently be
a constraint.
> 2. Are there limits to the size of individual objects??
Individual vectors are still limited to 2^31 entries (?2^32), so a matrix can
have only 2 billion elements, a data frame can have only two billion rows and
two billion columns, etc. This is likely to be the binding constraint in the
near future, but 2^31 integers is an 8Gb vector and 2^31 doubles is 16Gb.
There will also be some limit on the number of objects. I don't know if we
even know what it is, but it will be large.
> 3. Are there limits or problems in using functions such as lm(),
> glm(), rpart(), doBy package, MASS, etc?
I don't think so. The differences should not be visible at the interpreted
level, and packages that are not 64-bit clean in their compiled code will have
broken already. Obviously, algorithms that scale slower than linearly in the
sample size will get very painful.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle