Hello, I am using R to look at whole-genome gene expression data. This means about 27,000 genes, each with a vector of numbers reflecting expression at different tissues and times. I need to do an all against all co-expression calculation (basically, just calculate Pearson's r for every gene-gene pair). I try to store the result of such a thing in a 27000x27000 matrix, but r seems not to like allocating such a large beast. Any recommendations? I also rather want to then manipulate the all-against-all interaction data, treating it as a graph with edge weighted by the r^2, eliminating some edges, maybe even finding network motifs like cliques, etc., so having it as a manipulable object in r presents some advantages. Alternatively, I could print it out maybe (don't even know how to do that) and then write script code in another language to manipulate it. All advice welcome. A. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Antonio Garcia-Martinez UC-Berkeley Physics/Joint Genome Institute http://cryptologia.com .=$=. .=$=. .=$=. .=$=. @ @ | | | @ | | | @ @ | | | @ | | | | @ @ | | | @ @ | | | @ @ | | | @ @ | | | | @ | | | @ @ | | | @ | | | @ @ | ~' `~$~' `~$~' `~$~' ` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On Sun, 11 Apr 2004 19:15:08 -0700 (PDT), you wrote:>Hello, > >I am using R to look at whole-genome gene expression data. This means >about 27,000 genes, each with a vector of numbers reflecting expression at >different tissues and times.How long is that vector? Presumably shorter than 27000.>I need to do an all against all co-expression >calculation (basically, just calculate Pearson's r for every gene-gene >pair). I try to store the result of such a thing in a 27000x27000 matrix, >but r seems not to like allocating such a large beast. Any >recommendations?If you have fewer than 27000 cases, then the correlation matrix is not full rank, and could be summarized in much less space. For example, if you have 100 cases, then a 100x100 matrix will give the correlation structure, and a 26900x100 matrix would give the weights for the rest of the genes. (It's late, so I might wrong about this, but I don't think so.) To calculate those matrices, just pick the first 100 genes to use for the correlation matrix (assuming you get a full rank matrix that way), then regress each of the others onto those. Duncan Murdoch