bin999 at lycos.com writes:
> Can someone help me calculate correlations for grouped values?
>
> Heres what my first few line of data look like:
>
> > head(cmexpr)
> LLID GMID CEXPR MEXPR
> 1 1005 10831 2.057462 -0.08486
> 2 1005 10831 2.057515 -0.08486
> 3 1005 10831 2.057462 0.01209
> 4 1005 10831 2.057515 0.01209
> 5 1005 10836 2.050980 0.17237
> 6 1005 10836 2.018576 0.17237
>
> LLID is gene id, GMID is cell line id, the EXPR columns are gene expression
is two different microarray experiments.
>
> I'd like to get correlations for each gene id (1005, 1006, etc)
>
> Heres what I've tried so far:
>
> sapply(by(cmexpr[3:4],cmexpr$LLID, function(x)
cor(cmexpr$CEXPR,cmexpr$MEXPR, use = "pairwise.complete.obs")),
function(x) x,simplify=T)
>
> I think this is close to what I need, but its giving me the same corr for
each gene, which is not right:
>
> 1005 10083 10146 10158 10174 10206
> -0.04751543 -0.04751543 -0.04751543 -0.04751543 -0.04751543 -0.04751543
> 10211 10212 10219 10363 10484 10492
> -0.04751543 -0.04751543 -0.04751543 -0.04751543 -0.04751543 -0.04751543
>
> Sorry if this sounds like a newbie question; it seems like it ought to be
pretty straightforward, but I've wrestled with both the R documenation and
mailing list archive and can't seem to get it right.
You're passing a function to by() whose result does not depend
on the argument x ...
This might be clearer:
c(by(cmexpr, cmexpr$LLID, with, cor(CEXPR, MEXPR, use="pair")))
or
sapply(split(cmexpr, cmexpr$LLID), with, cor(CEXPR, MEXPR,
use="pair"))
--
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907