thr3ads.net - R help - [R] Very large matrices for very large genome [Apr 2004]

If this information is useful, please help other people find it:
Share via:

Antonio Garcia

2004-Apr-12 02:15 UTC

[R] Very large matrices for very large genome

Hello,

I am using R to look at whole-genome gene expression data. This means
about 27,000 genes, each with a vector of numbers reflecting expression at
different tissues and times. I need to do an all against all co-expression
calculation (basically, just calculate Pearson's r for every gene-gene
pair). I try to store the result of such a thing in a 27000x27000 matrix,
but r seems not to like allocating such a large beast. Any
recommendations?

I also rather want to then manipulate the all-against-all interaction
data, treating it as a graph with edge weighted by the r^2, eliminating
some edges, maybe even finding network motifs like cliques, etc., so
having it as a manipulable object in r presents some advantages.
Alternatively, I could print it out maybe (don't even know how to do that)
and then write script code in another language to manipulate it.

All advice welcome.

A.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Antonio Garcia-Martinez
UC-Berkeley Physics/Joint Genome Institute
http://cryptologia.com

        .=$=.   .=$=.           .=$=.   .=$=.
@       @ | | | @ | | | @       @ | | | @ | | |
| @   @ | | | @   @ | | | @   @ | | | @   @ | |
| | @ | | | @       @ | | | @ | | | @       @ |
~'   `~$~'           `~$~'   `~$~'           `

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Duncan Murdoch

2004-Apr-12 02:47 UTC

head link

[R] Very large matrices for very large genome

On Sun, 11 Apr 2004 19:15:08 -0700 (PDT), you wrote:
>Hello,
>
>I am using R to look at whole-genome gene expression data. This means
>about 27,000 genes, each with a vector of numbers reflecting expression at
>different tissues and times.
How long is that vector?  Presumably shorter than 27000.
>I need to do an all against all co-expression
>calculation (basically, just calculate Pearson's r for every gene-gene
>pair). I try to store the result of such a thing in a 27000x27000 matrix,
>but r seems not to like allocating such a large beast. Any
>recommendations?
If you have fewer than 27000 cases, then the correlation matrix is not
full rank, and could be summarized in much less space.  For example,
if you have 100 cases, then a 100x100 matrix will give the correlation
structure, and a 26900x100 matrix would give the weights for the rest
of the genes.

(It's late, so I might wrong about this, but I don't think so.)

To calculate those matrices, just pick the first 100 genes to use for
the correlation matrix (assuming you get a full rank matrix that way),
then regress each of the others onto those.

Duncan Murdoch

Maybe Matching Threads

Search for more reasonably related threads

R help - Apr 2004 - Very large matrices for very large genome

[R] Very large matrices for very large genome

[R] Very large matrices for very large genome

Maybe Matching Threads