thr3ads.net - R help - [R] Correlation matrix for pearson correlation (r,p,BH(FDR)) [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Sarah Bazzocco

2015-Jun-18 08:19 UTC

[R] Correlation matrix for pearson correlation (r,p,BH(FDR))

This post was called "help" before, I changed the Subject.
Thanks for the comments.
Here the example: (I have the two lists saved as .csv and I can open them in R)

Sheet one- Genes (10 genes expression, not binary, meaured in 10 cell
lines)> genes     Genes  Cell.line1 Cell.line2  Cell.line3  Cell.line4  Cell.line5
1   KCNAB3 12.02005181 11.1400910 15.60381163 13.44151596 25.37161030
2    KCNB1  0.02457449  1.3028535  0.81538294  0.59318327  0.15332321
3    KCNB2  0.44791862  0.1060137  0.09864136  0.00000000  0.00000000
4     KERA  0.06090217  0.0000000  0.03352993  0.03634781  0.04190912
5   KGFLP1  0.02450101  0.0000000  0.00000000  0.00000000  0.00000000
6   KGFLP2  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
7    KHDC1  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
8   KHDC1L  2.31894450  2.8252262  5.29099724  7.44183228  1.94629741
9   KHDC3L  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
10 KHDRBS1  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
   Cell.line6 Cell.line7  Cell.line8  Cell.line9 Cell.line10
1  8.12373424 7.67506261 24.43776341 18.33244818    9.224225
2  4.18181234 1.65268403  5.98346320  1.51423807    0.000000
3  0.05857207 0.05945414  0.20733924  0.05830982    0.000000
4  0.00000000 0.00000000  0.07752608  0.01585643   16.664245
5  0.02563099 0.03902548  0.00000000  0.00000000    0.000000
6  0.00000000 0.00000000  0.00000000  0.00000000    0.000000
7  0.00000000 0.00000000  0.00000000  0.00000000    0.000000
8  8.56022436 7.50838343  7.17964645  3.28602729    0.000000
9  0.00000000 0.00000000  0.00000000  0.00000000    3.598534
10 0.00000000 0.03081180  0.00000000  0.00000000    2.600173

Sheet two - features (2 features(Growth rate,drug sensitivity for 10 cell
lines)> features         Cell.line Cell.line1 Cell.line2 Cell.line3 Cell.line4 Cell.line5
1      Growth rate         NA         NA         NA      51.41         NA
2 Drug sensitivity       5.03       6.57          8       1.26          3
  Cell.line6 Cell.line7 Cell.line8 Cell.line9 Cell.line10
1      41.33      26.76      24.19         NA          NA
2       1.40       1.88       1.33       5.05        9.12

What I found:
corr.test {psych}
corr.test(x, y = NULL, use =
"pairwise",method="pearson",adjust="BH",alpha=.01)
--> I adjusted the original command to what I need (BH insted og holm) and
alpha=.01 insted of 0.05.

I would be very happy, if someone could show me how to use this command, in
particular how to refer as x and y to the two sheets I have (Genes and
Features). I would take it from there.

Thanks a lot in advance.

Sarah






----- Original Message -----
From: "Rainer Schuermann" <Rainer.Schuermann at gmx.net>
To: "Sarah Bazzocco" <sarah.bazzocco at vhir.org>
Sent: Thursday, 18 June, 2015 8:14:56 AM
Subject: Re: [R] help



Hi Sarah, 

? 

Not an answer to our question but a piece of well intended advice: 

? 

1. Don't post HTML but plain text. Not only that people will tell you this
in a sometimes not very friendly manner - using HTML actually does make posts
illegible in this mailing list. Code, and R _is_ code, is always plain text.

? 

2. Don't pose an abstract problem - this looks too much like "Can you
please do my work for me". Show us what you have tried already, and people
will happily jump in and provide their thoughts and advice.

? 

3. Always make sure that you ave a reproducible example in your mail, and a set
of data of the same type and structure you are using - ideally using dput().

? 

See further advice here 

? 

PLEASE do read the posting guide ? http://www.R-project.org/posting-guide.html 

and provide commented, minimal, self-contained, reproducible code. 

? 

and here: 

? 

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

? 

For your problem, R has an immense wealth of ideas and solutions. 

? 

Rgds, 

Rainer 

? 

? 

? 

On Wed June 17 2015 16:57:24 Sarah Bazzocco wrote: 
> 
> Hello, 
> 
> ? 
> 
> I am a R-beginner and I need some help.?The question is very simple: I need
to do a pearson correlations (r,p-value and FDR with BH) from an Expression
array (with several thousand genes for lets say 20 cell lines)?with some
features of those cell lines.
> 
> 
> 
> My problem I have is the organization of the excel sheets and how to
introduce the data into R and run the script. I though the easiest and more
organized for me would be two expcel sheets:
> 
> 1- Only Expression data (in rows the?genes and in colums cell lines) 
> 
> 2- Only the features (In row the features (e.g. a) growth rate, b)
sensitivity to some drugs) and in columns the cell lines).
> 
> 
> 
> -->That would creat both sheets with 20 colums. 
> 
> 
> 
> Now I would like to get a correlation of the gene 1: the expression of all
lines with the growth rate.
> 
> the same for gene2... and soforth. I sould obtain as many r,p and BH(FDR)
as genes there are.
> 
> the same I would need to do for the sensitivity... and so on. 
> 
> 
> 
> Do you think this is doable? I am not at all a bioinformatic expert, so all
help is very welcome.
> 
> 
> 
> Thank you very much! 
> 
> 
> 
> Kind regards, 
> 
> 
> 
> Sarah 
> 
> 
> 
> 
? 

-- 


Sarah Bazzocco, PhD student 
Group of Molecular Oncology, 
CIBBIM-Nanomedicine, 
Vall d'Hebron Hospital Research Institute, 
Passeig Vall d'Hebron 119-129, 
Barcelona 08035, Spain. 
Tel: +34-93-489-4056 

Fax: +34-93-489-3893 
Email: sarah.bazzocco at vhir.org 



-- 


Sarah Bazzocco, PhD student 
Group of Molecular Oncology, 
CIBBIM-Nanomedicine, 
Vall d'Hebron Hospital Research Institute, 
Passeig Vall d'Hebron 119-129, 
Barcelona 08035, Spain. 
Tel: +34-93-489-4056 

Fax: +34-93-489-3893 
Email: sarah.bazzocco at vhir.org

Rainer Schuermann

2015-Jun-18 17:09 UTC

head link

[R] Correlation matrix for pearson correlation (r,p,BH(FDR))

The way the sample data is provided is not useful. I have re-built your data,
please find the dput() version below (and pls check whether I got it right...).

This is not my area of competence at all, but from what I see from the help page
is that the expected parameters are, among others:

x	A matrix or dataframe
y	A second matrix or dataframe __with the same number of rows as x__

I hope that somebody with a better understanding of your intention is able to
pick up from here, with the sample data in useful format.

Rgds,
Rainer


dput( genes )
structure(list(Genes = structure(1:10, .Label = c("KCNAB3",
"KCNB1",
"KCNB2", "KERA", "KGFLP1", "KGFLP2",
"KHDC1", "KHDC1L", "KHDC3L",
"KHDRBS1"), class = "factor"), Cell.line1 = c(12.02005181,
0.02457449,
0.44791862, 0.06090217, 0.02450101, 0, 0, 2.3189445, 0, 0), Cell.line2 =
c(11.140091,
1.3028535, 0.1060137, 0, 0, 0, 0, 2.8252262, 0, 0), Cell.line3 = c(15.60381163, 
0.81538294, 0.09864136, 0.03352993, 0, 0, 0, 5.29099724, 0, 0
), Cell.line4 = c(13.44151596, 0.59318327, 0, 0.03634781, 0, 
0, 0, 7.44183228, 0, 0), Cell.line5 = c(25.3716103, 0.15332321, 
0, 0.04190912, 0, 0, 0, 1.94629741, 0, 0), Cell.line6 = c(8.12373424, 
4.18181234, 0.05857207, 0, 0.02563099, 0, 0, 8.56022436, 0, 0
), Cell.line7 = c(7.67506261, 1.65268403, 0.05945414, 0, 0.03902548, 
0, 0, 7.50838343, 0, 0.0308118), Cell.line8 = c(24.43776341, 
5.9834632, 0.20733924, 0.07752608, 0, 0, 0, 7.17964645, 0, 0), 
    Cell.line9 = c(18.33244818, 1.51423807, 0.05830982, 0.01585643, 
    0, 0, 0, 3.28602729, 0, 0), Cell.line10 = c(9.224225, 0, 
    0, 16.664245, 0, 0, 0, 0, 3.598534, 2.600173)), .Names =
c("Genes",
"Cell.line1", "Cell.line2", "Cell.line3",
"Cell.line4", "Cell.line5",
"Cell.line6", "Cell.line7", "Cell.line8",
"Cell.line9", "Cell.line10"
), row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9",
"10"), class = "data.frame")

dput( features )
structure(list(Cell.line = c("Growth rate", "Drug
sensitivity"
), Cell.line1 = c(NA, "41.33"), Cell.line2 = c(NA, "26.76"),
    Cell.line3 = c(NA, "24.19"), Cell.line4 = c("51.41",
NA),
    Cell.line5 = c(NA_character_, NA_character_), Cell.line6 =
c("5.03",
    "1.40"), Cell.line7 = c("6.57", "1.88"),
Cell.line8 = c("8",
    "1.33"), Cell.line9 = c("1.26", "5.05"),
Cell.line10 = c("3",
    "9.12")), .Names = c("Cell.line",
"Cell.line1", "Cell.line2",
"Cell.line3", "Cell.line4", "Cell.line5",
"Cell.line6", "Cell.line7",
"Cell.line8", "Cell.line9", "Cell.line10"),
row.names = c(NA,
-2L), class = "data.frame")


On Thu June 18 2015 10:19:55 Sarah Bazzocco wrote:> This post was called "help" before, I changed the Subject.
> Thanks for the comments.
> Here the example: (I have the two lists saved as .csv and I can open them
in R)
> 
> Sheet one- Genes (10 genes expression, not binary, meaured in 10 cell
lines)
> > genes
>      Genes  Cell.line1 Cell.line2  Cell.line3  Cell.line4  Cell.line5
> 1   KCNAB3 12.02005181 11.1400910 15.60381163 13.44151596 25.37161030
> 2    KCNB1  0.02457449  1.3028535  0.81538294  0.59318327  0.15332321
> 3    KCNB2  0.44791862  0.1060137  0.09864136  0.00000000  0.00000000
> 4     KERA  0.06090217  0.0000000  0.03352993  0.03634781  0.04190912
> 5   KGFLP1  0.02450101  0.0000000  0.00000000  0.00000000  0.00000000
> 6   KGFLP2  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
> 7    KHDC1  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
> 8   KHDC1L  2.31894450  2.8252262  5.29099724  7.44183228  1.94629741
> 9   KHDC3L  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
> 10 KHDRBS1  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
>    Cell.line6 Cell.line7  Cell.line8  Cell.line9 Cell.line10
> 1  8.12373424 7.67506261 24.43776341 18.33244818    9.224225
> 2  4.18181234 1.65268403  5.98346320  1.51423807    0.000000
> 3  0.05857207 0.05945414  0.20733924  0.05830982    0.000000
> 4  0.00000000 0.00000000  0.07752608  0.01585643   16.664245
> 5  0.02563099 0.03902548  0.00000000  0.00000000    0.000000
> 6  0.00000000 0.00000000  0.00000000  0.00000000    0.000000
> 7  0.00000000 0.00000000  0.00000000  0.00000000    0.000000
> 8  8.56022436 7.50838343  7.17964645  3.28602729    0.000000
> 9  0.00000000 0.00000000  0.00000000  0.00000000    3.598534
> 10 0.00000000 0.03081180  0.00000000  0.00000000    2.600173
> 
> Sheet two - features (2 features(Growth rate,drug sensitivity for 10 cell
lines)
> > features
>          Cell.line Cell.line1 Cell.line2 Cell.line3 Cell.line4 Cell.line5
> 1      Growth rate         NA         NA         NA      51.41         NA
> 2 Drug sensitivity       5.03       6.57          8       1.26          3
>   Cell.line6 Cell.line7 Cell.line8 Cell.line9 Cell.line10
> 1      41.33      26.76      24.19         NA          NA
> 2       1.40       1.88       1.33       5.05        9.12
> 
> What I found:
> corr.test {psych}
> corr.test(x, y = NULL, use =
"pairwise",method="pearson",adjust="BH",alpha=.01)
> --> I adjusted the original command to what I need (BH insted og holm)
and alpha=.01 insted of 0.05.
> 
> I would be very happy, if someone could show me how to use this command, in
particular how to refer as x and y to the two sheets I have (Genes and
Features). I would take it from there.
> 
> Thanks a lot in advance.
> 
> Sarah

Peter Langfelder

2015-Jun-18 18:52 UTC

head link

[R] Correlation matrix for pearson correlation (r,p,BH(FDR))

You have multiple options. I will advertise my own solution - install
the package WGCNA, installation instructions at

http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/#cranInstall

then you can use the function
cp = corAndPvalue(t(genes), t(features)).

You need to transpose both because the function expects variables in
columns and samples in rows.

This will give you a list whose components include 'cor' (matrix of
the correlation values) and 'p' (matrix of the Student p-values). To
get a matrix of the corresponding FDR, use

fdr = apply(cp$p, 2, p.adjust, method = "fdr")

Hope this helps,

Peter


On Thu, Jun 18, 2015 at 1:19 AM, Sarah Bazzocco <sarah.bazzocco at
vhir.org> wrote:> This post was called "help" before, I changed the Subject.
> Thanks for the comments.
> Here the example: (I have the two lists saved as .csv and I can open them
in R)
>
> Sheet one- Genes (10 genes expression, not binary, meaured in 10 cell
lines)
>> genes
>      Genes  Cell.line1 Cell.line2  Cell.line3  Cell.line4  Cell.line5
> 1   KCNAB3 12.02005181 11.1400910 15.60381163 13.44151596 25.37161030
> 2    KCNB1  0.02457449  1.3028535  0.81538294  0.59318327  0.15332321
> 3    KCNB2  0.44791862  0.1060137  0.09864136  0.00000000  0.00000000
> 4     KERA  0.06090217  0.0000000  0.03352993  0.03634781  0.04190912
> 5   KGFLP1  0.02450101  0.0000000  0.00000000  0.00000000  0.00000000
> 6   KGFLP2  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
> 7    KHDC1  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
> 8   KHDC1L  2.31894450  2.8252262  5.29099724  7.44183228  1.94629741
> 9   KHDC3L  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
> 10 KHDRBS1  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
>    Cell.line6 Cell.line7  Cell.line8  Cell.line9 Cell.line10
> 1  8.12373424 7.67506261 24.43776341 18.33244818    9.224225
> 2  4.18181234 1.65268403  5.98346320  1.51423807    0.000000
> 3  0.05857207 0.05945414  0.20733924  0.05830982    0.000000
> 4  0.00000000 0.00000000  0.07752608  0.01585643   16.664245
> 5  0.02563099 0.03902548  0.00000000  0.00000000    0.000000
> 6  0.00000000 0.00000000  0.00000000  0.00000000    0.000000
> 7  0.00000000 0.00000000  0.00000000  0.00000000    0.000000
> 8  8.56022436 7.50838343  7.17964645  3.28602729    0.000000
> 9  0.00000000 0.00000000  0.00000000  0.00000000    3.598534
> 10 0.00000000 0.03081180  0.00000000  0.00000000    2.600173
>
> Sheet two - features (2 features(Growth rate,drug sensitivity for 10 cell
lines)
>> features
>          Cell.line Cell.line1 Cell.line2 Cell.line3 Cell.line4 Cell.line5
> 1      Growth rate         NA         NA         NA      51.41         NA
> 2 Drug sensitivity       5.03       6.57          8       1.26          3
>   Cell.line6 Cell.line7 Cell.line8 Cell.line9 Cell.line10
> 1      41.33      26.76      24.19         NA          NA
> 2       1.40       1.88       1.33       5.05        9.12
>
> What I found:
> corr.test {psych}
> corr.test(x, y = NULL, use =
"pairwise",method="pearson",adjust="BH",alpha=.01)
> --> I adjusted the original command to what I need (BH insted og holm)
and alpha=.01 insted of 0.05.
>
> I would be very happy, if someone could show me how to use this command, in
particular how to refer as x and y to the two sheets I have (Genes and
Features). I would take it from there.
>
> Thanks a lot in advance.
>
> Sarah
>
>
>
>
>
>
> ----- Original Message -----
> From: "Rainer Schuermann" <Rainer.Schuermann at gmx.net>
> To: "Sarah Bazzocco" <sarah.bazzocco at vhir.org>
> Sent: Thursday, 18 June, 2015 8:14:56 AM
> Subject: Re: [R] help
>
>
>
> Hi Sarah,
>
>
>
> Not an answer to our question but a piece of well intended advice:
>
>
>
> 1. Don't post HTML but plain text. Not only that people will tell you
this in a sometimes not very friendly manner - using HTML actually does make
posts illegible in this mailing list. Code, and R _is_ code, is always plain
text.
>
>
>
> 2. Don't pose an abstract problem - this looks too much like "Can
you please do my work for me". Show us what you have tried already, and
people will happily jump in and provide their thoughts and advice.
>
>
>
> 3. Always make sure that you ave a reproducible example in your mail, and a
set of data of the same type and structure you are using - ideally using dput().
>
>
>
> See further advice here
>
>
>
> PLEASE do read the posting guide  
http://www.R-project.org/posting-guide.html
>
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> and here:
>
>
>
>
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>
>
>
> For your problem, R has an immense wealth of ideas and solutions.
>
>
>
> Rgds,
>
> Rainer
>
>
>
>
>
>
>
> On Wed June 17 2015 16:57:24 Sarah Bazzocco wrote:
>
>>
>
>> Hello,
>
>>
>
>> ?
>
>>
>
>> I am a R-beginner and I need some help.?The question is very simple: I
need to do a pearson correlations (r,p-value and FDR with BH) from an Expression
array (with several thousand genes for lets say 20 cell lines)?with some
features of those cell lines.
>
>>
>
>>
>
>>
>
>> My problem I have is the organization of the excel sheets and how to
introduce the data into R and run the script. I though the easiest and more
organized for me would be two expcel sheets:
>
>>
>
>> 1- Only Expression data (in rows the?genes and in colums cell lines)
>
>>
>
>> 2- Only the features (In row the features (e.g. a) growth rate, b)
sensitivity to some drugs) and in columns the cell lines).
>
>>
>
>>
>
>>
>
>> -->That would creat both sheets with 20 colums.
>
>>
>
>>
>
>>
>
>> Now I would like to get a correlation of the gene 1: the expression of
all lines with the growth rate.
>
>>
>
>> the same for gene2... and soforth. I sould obtain as many r,p and
BH(FDR) as genes there are.
>
>>
>
>> the same I would need to do for the sensitivity... and so on.
>
>>
>
>>
>
>>
>
>> Do you think this is doable? I am not at all a bioinformatic expert, so
all help is very welcome.
>
>>
>
>>
>
>>
>
>> Thank you very much!
>
>>
>
>>
>
>>
>
>> Kind regards,
>
>>
>
>>
>
>>
>
>> Sarah
>
>>
>
>>
>
>>
>
>>
>
>
>
> --
>
>
> Sarah Bazzocco, PhD student
> Group of Molecular Oncology,
> CIBBIM-Nanomedicine,
> Vall d'Hebron Hospital Research Institute,
> Passeig Vall d'Hebron 119-129,
> Barcelona 08035, Spain.
> Tel: +34-93-489-4056
>
> Fax: +34-93-489-3893
> Email: sarah.bazzocco at vhir.org
>
>
>
> --
>
>
> Sarah Bazzocco, PhD student
> Group of Molecular Oncology,
> CIBBIM-Nanomedicine,
> Vall d'Hebron Hospital Research Institute,
> Passeig Vall d'Hebron 119-129,
> Barcelona 08035, Spain.
> Tel: +34-93-489-4056
>
> Fax: +34-93-489-3893
> Email: sarah.bazzocco at vhir.org
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Jun 2015 - Correlation matrix for pearson correlation (r,p,BH(FDR))

[R] Correlation matrix for pearson correlation (r,p,BH(FDR))

[R] Correlation matrix for pearson correlation (r,p,BH(FDR))

[R] Correlation matrix for pearson correlation (r,p,BH(FDR))