thr3ads.net - R help - [R] calculating p-values by row for data frames [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Christoph Heuck

2009-Oct-15 15:50 UTC

[R] calculating p-values by row for data frames

Hello R-users,
I am looking for an elegant way to calculate p-values for each row of
a data frame.
My situation is as follows:
I have a gene expression results from a microarray with 64 samples
looking at 25626 genes. The results are in a data frame with the
dimensions 64 by 25626
I want to create a volcano plot of difference of means vs. ?log(10) of
the p-values,
comparing normal samples to abnormal samples. The results of both type
of samples are all in my data frame.
Now, I have found a way to calculate the p-value using a ?for (i in
1:25626)? loop (see below):

df.normal  #dataframe, which only contains the normal samples
df.samples  #dataframe, which only contains abnormal samples

DM=rowMeans(df.normal)-rowMeans(df.samples) #gives me a dataframe with
the difference of means

PV=array(1,c(25626,1))
for (i in 1:25626){
VL=t.test(matrix.b[i,],matrix.a[i,])
V=as.numeric(VL[3])
V=-log10(V)
PV[i,1]=V}

plot(DM, PV, main=title,xlab=x.lab, ylab="-log(10) P-Values",pch=20)}

It takes around 3-5 minutes to generate the volcano plot this way. I
will be running arrays which will look at 2.2 million sites >> this
approach will then take way too long.
I was wondering if there is a more elegant way to calculate the
p-values for an array/fataframe/matrix in a row-by row fashion, which
is similar to ?rowMeans?.

I thought writing a function to get the p-value and then using
apply(x,1,function) would be the best.

I have the function which will give me the p-value

p.value = function (x,y){
PV=as.numeric(t.test(x,y)[3])
}

and I can get a result if I test it only on one row (below is 6 by 10
data frame example of my original data)

RRR
                     X259863    X267862     X267906    X300875
X300877     X300878
MSPI0406S00000183 -3.2257205 -3.2248899  2.85590082 -2.6293602
-3.5054348 -2.62817269
MSPI0406S00000238 -2.6661903 -3.1135020  2.17073881 -3.2357307
-2.3309775 -1.76078452
MSPI0406S00000239 -1.7636439 -0.6702877  0.19471126 -0.7397132
-1.4332662 -0.24822470
MSPI0406S00000300  0.6471381 -0.2638928 -0.61876054 -0.9180127
0.2539848 -0.63122203
MSPI0406S00000301  0.9207208  0.2164267 -0.33238846 -1.1450717
-0.2935584 -1.01659802
MSPI0406S00000321 -0.4073272 -0.2852402 -0.08085746 -0.4109428
-0.2185432 -0.39736137
MSPI0406S00000352 -0.7074175 -0.6987548 -1.22004647 -0.8570551
-0.5083861 -0.09267928
MSPI0406S00000353 -0.2745682  0.3012990 -0.64787221 -0.5654195
0.4265007 -0.65963404
MSPI0406S00000354 -1.1858394 -1.4388609 -0.07329722 -2.0010785
-1.3245696 -1.43216984
MSPI0406S00000360 -1.4599809 -1.4929059  0.63453235 -1.1476760
-1.5849922 -1.03187399
> zz=p.value(RRR[1,1:3],RRR[1,4:6])
> zz$p.value
[1] 0.485727

but I cannot do this row by row using apply
> xxx=apply(RRR,1,p.value(RRR[,1:3],RRR[,4:6]))
Error in match.fun(FUN) :
  'p.value(RRR[, 1:3], RRR[, 4:6])' is not a function, character or
symbol

Does anyone have any suggestions?
Thanks in advance

Christoph Heuck
Albert Einstein College of Medicine

Meyners, Michael, LAUSANNE, AppliedMathematics

2009-Oct-16 06:04 UTC

head link

[R] calculating p-values by row for data frames

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Christoph Heuck
> Sent: Donnerstag, 15. Oktober 2009 17:51
> To: r-help at r-project.org
> Subject: [R] calculating p-values by row for data frames
> 
> Hello R-users,
> I am looking for an elegant way to calculate p-values for 
> each row of a data frame.
> My situation is as follows:
> I have a gene expression results from a microarray with 64 
> samples looking at 25626 genes. The results are in a data 
> frame with the dimensions 64 by 25626 I want to create a 
> volcano plot of difference of means vs. -log(10) of the 
> p-values, comparing normal samples to abnormal samples. The 
> results of both type of samples are all in my data frame.
> Now, I have found a way to calculate the p-value using a "for 
> (i in 1:25626)" loop (see below):
> 
> df.normal  #dataframe, which only contains the normal samples 
> df.samples  #dataframe, which only contains abnormal samples
> 
> DM=rowMeans(df.normal)-rowMeans(df.samples) #gives me a 
> dataframe with the difference of means
> 
> PV=array(1,c(25626,1))
> for (i in 1:25626){
> VL=t.test(matrix.b[i,],matrix.a[i,])
> V=as.numeric(VL[3])
> V=-log10(V)
> PV[i,1]=V}
> 
> plot(DM, PV, main=title,xlab=x.lab, ylab="-log(10)
P-Values",pch=20)}
> 
> It takes around 3-5 minutes to generate the volcano plot this 
> way. I will be running arrays which will look at 2.2 million 
> sites >> this approach will then take way too long.
> I was wondering if there is a more elegant way to calculate 
> the p-values for an array/fataframe/matrix in a row-by row 
> fashion, which is similar to "rowMeans".
> 
> I thought writing a function to get the p-value and then using
> apply(x,1,function) would be the best.
> 
> I have the function which will give me the p-value
> 
> p.value = function (x,y){
> PV=as.numeric(t.test(x,y)[3])
> }
> 
> and I can get a result if I test it only on one row (below is 
> 6 by 10 data frame example of my original data)
> 
> RRR
>                      X259863    X267862     X267906    X300875
> X300877     X300878
> MSPI0406S00000183 -3.2257205 -3.2248899  2.85590082 -2.6293602
> -3.5054348 -2.62817269
> MSPI0406S00000238 -2.6661903 -3.1135020  2.17073881 -3.2357307
> -2.3309775 -1.76078452
> MSPI0406S00000239 -1.7636439 -0.6702877  0.19471126 -0.7397132
> -1.4332662 -0.24822470
> MSPI0406S00000300  0.6471381 -0.2638928 -0.61876054 -0.9180127
> 0.2539848 -0.63122203
> MSPI0406S00000301  0.9207208  0.2164267 -0.33238846 -1.1450717
> -0.2935584 -1.01659802
> MSPI0406S00000321 -0.4073272 -0.2852402 -0.08085746 -0.4109428
> -0.2185432 -0.39736137
> MSPI0406S00000352 -0.7074175 -0.6987548 -1.22004647 -0.8570551
> -0.5083861 -0.09267928
> MSPI0406S00000353 -0.2745682  0.3012990 -0.64787221 -0.5654195
> 0.4265007 -0.65963404
> MSPI0406S00000354 -1.1858394 -1.4388609 -0.07329722 -2.0010785
> -1.3245696 -1.43216984
> MSPI0406S00000360 -1.4599809 -1.4929059  0.63453235 -1.1476760
> -1.5849922 -1.03187399
> 
> > zz=p.value(RRR[1,1:3],RRR[1,4:6])
> > zz
> $p.value
> [1] 0.485727
> 
> but I cannot do this row by row using apply
> 
> > xxx=apply(RRR,1,p.value(RRR[,1:3],RRR[,4:6]))
xxx <- apply(RRR, 1, function(x) p.value(x[1:3],x[4:6]))
works for me. Check the examples in ?apply.
HTH, Michael
> 
> Error in match.fun(FUN) :
>   'p.value(RRR[, 1:3], RRR[, 4:6])' is not a function, 
> character or symbol
> 
> Does anyone have any suggestions?
> Thanks in advance
> 
> Christoph Heuck
> Albert Einstein College of Medicine
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more maybe matching threads

R help - Oct 2009 - calculating p-values by row for data frames

[R] calculating p-values by row for data frames

[R] calculating p-values by row for data frames

Seemingly Similar Threads