leif olson
2009-Jun-26 19:40 UTC
[R] 50993 point distance matrix, too big to as.matrix, looking for another way to calculate point-level summary
Hello, Im working on a 50933 point count bird abundance dataset. I've succeeded in calculating a distance matrix for this entire set, but I don't have sufficient memory to convert this to a matrix, as below... abun.dist <- dist(abun.mat[1:50993,1:235) test <- rowMeans(as.matrix(abun.dist)) Error in matrix(0, size, size) : too many elements specified ive been able to run a hclust() clustering procedure, due to the fact that hclust() makes a call to fortran code, but id like to be able to generate a calinski index for each of the clusters to assess the validity. Unfortunately, all the validation routines I have found are all native R code, and usually call as.matrix, resulting in the same error i receive above. What I'd like to figure out is how to just go through, one point at a time, and calculate the values i need. But I've been unable to come up with code to call the correct positions in the dist vector, can anyone suggest some code that might do this? Thanks... ...leif -- -- First they ignore you, then they laugh at you, then they fight you, then you win - Mohandas Gandhi [[alternative HTML version deleted]]
Romain Francois
2009-Jun-27 07:32 UTC
[R] 50993 point distance matrix, too big to as.matrix, looking for another way to calculate point-level summary
Hi, If you are only interested in row means, you can work the distance matrix at the c level. You might like to adapt this post: http://tolstoy.newcastle.edu.au/R/e6/devel/09/04/1378.html Romain On 06/26/2009 09:40 PM, leif olson wrote:> Hello, Im working on a 50933 point count bird abundance dataset. I've > succeeded in calculating a distance matrix for this entire set, but I don't > have sufficient memory to convert this to a matrix, as below... > abun.dist<- dist(abun.mat[1:50993,1:235) > test<- rowMeans(as.matrix(abun.dist)) > Error in matrix(0, size, size) : too many elements specified > > ive been able to run a hclust() clustering procedure, due to the fact that > hclust() makes a call to fortran code, but id like to be able to generate a > calinski index for each of the clusters to assess the validity. > Unfortunately, all the validation routines I have found are all native R > code, and usually call as.matrix, resulting in the same error i receive > above. > What I'd like to figure out is how to just go through, one point at a time, > and calculate the values i need. But I've been unable to come up with code > to call the correct positions in the dist vector, can anyone suggest some > code that might do this? Thanks... > > ...leif >-- Romain Francois Independent R Consultant +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr