Furthermore, I am not even able to take a sample of my large vector
(which does exist somehow and is in memory):
> sampleOfBigVector <- c(range(myBigVector),sample(myBigVector, 1000))
Error: cannot allocate vector of size 718.0 Mb
I guess I don't know what else I can do now, except find some cluster
with a lot of memory to run this code on (presumably I'd be able to
allocate those vectors then)?
Jonathan
On Tue, Mar 9, 2010 at 4:11 PM, Jonathan <jonsleepy at gmail.com>
wrote:> Hi R-help,
> ? ?I am interested in comparing two vectors of data
> observations to see if they come from the same distrubution (and have
> settled on the Kolmogorov-Smirnov test to do this)..
>
> I'd prefer to use all my data points, but computationally speaking,
> this is proving to be troublesome due to the size of my vectors (the
> larger of the two is about 90 million observations). ?I suppose I
> could take a smaller sample of points from that large vector to use as
> input in my ks-test, but I want to see if I can avoid doing that, in
> favor of including all of the data..
>
> Code:
>> result <- ks.test(rep(1:940,100000),rep(1:940,800))
> Error: cannot allocate vector of size 358.6 Mb
>
> Any ideas?
>
> OS: Windows 7 64-bit, R ver. 2.10.1, Memory: 4 gb
>
> Best,
> Jonathan
>
>
>
> Best,
> Jonathan
>