Let me fix a couple of typos in that email:
Hi All:
Let's say I have two dataframes (Condition1 and Condition2); each
being on the order of 12,000 and 16,000 rows; 1 column. The entries
contain dates.
I'd like to calculate, for each possible pair of dates (that is:
Condition1[1:12,000] and Condition2[1:16,000], the number of days
difference between the dates in the pair. The result should be a
matrix 12,000 by 16,000, which I'll call M. The purpose of building
such a matrix M is to create a histogram of all the values contained
within it.
Ex):
Condition1 <- data.frame('dates' =
rep(c('2001-02-10','1998-03-14'),6000))
Condition2 <- data.frame('dates' =
rep(c('2003-07-06','2007-03-11'),8000))
First, my instinct is to try and vectorize the operation. I tried
this by expanding each vector into a matrix of repeated vectors (I'd
then just subtract the two resultant matrices to get matrix M). I got
the following error:
> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)),
byrow=TRUE, ncol=nrow(Condition1))
Error: cannot allocate vector of size 732.4 Mb> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)),
byrow=FALSE, nrow=nrow(Condition2))
Error: cannot allocate vector of size 732.4 Mb
Since it seems these matrices are too large, I'm wondering whether
there's a better way to call a hist command without actually building
the said matrix..
I'd greatly appreciate any ideas!
Best,
Jonathan
On Mon, Feb 15, 2010 at 8:19 PM, Jonathan <jonsleepy at gmail.com>
wrote:> Hi All:
>
> Let's say I have two dataframes (Condition1 and Condition2); each
> being on the order of 12,000 and 16,000 rows; 1 column. ?The entries
> contain dates.
>
> I'd like to calculate, for each possible pair of dates (that is:
> Condition1[1:10,000] and Condition2[1:10,000], the number of days
> difference between the dates in the pair. ?The result should be a
> matrix 12,000 by 16,000. ?Really, what I need is a histogram of all
> the values in this matrix.
>
> Ex):
> Condition1 <- data.frame('dates' =
rep(c('2001-02-10','1998-03-14'),6000))
> Condition2 <- data.frame('dates' =
rep(c('2003-07-06','2007-03-11'),8000))
>
> First, my instinct is to try and vectorize the operation. ?I tried
> this by expanding each vector into a matrix of repeated vectors (I'd
> then just subtract the two). ?I got the following error:
>
>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)),
byrow=TRUE, ncol=nrow(Condition1))
> Error: cannot allocate vector of size 732.4 Mb
>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)),
byrow=FALSE, nrow=nrow(Condition2))
> Error: cannot allocate vector of size 732.4 Mb
>
> Since it seems these matrices are too large, I'm wondering whether
> there's a better way to call a hist command without actually building
> the said matrix..
>
> I'd greatly appreciate any ideas!
>
> Best,
> Jonathan
>