Sarah Bazzocco
2015-Jun-18 08:19 UTC
[R] Correlation matrix for pearson correlation (r,p,BH(FDR))
This post was called "help" before, I changed the Subject. Thanks for the comments. Here the example: (I have the two lists saved as .csv and I can open them in R) Sheet one- Genes (10 genes expression, not binary, meaured in 10 cell lines)> genesGenes Cell.line1 Cell.line2 Cell.line3 Cell.line4 Cell.line5 1 KCNAB3 12.02005181 11.1400910 15.60381163 13.44151596 25.37161030 2 KCNB1 0.02457449 1.3028535 0.81538294 0.59318327 0.15332321 3 KCNB2 0.44791862 0.1060137 0.09864136 0.00000000 0.00000000 4 KERA 0.06090217 0.0000000 0.03352993 0.03634781 0.04190912 5 KGFLP1 0.02450101 0.0000000 0.00000000 0.00000000 0.00000000 6 KGFLP2 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 7 KHDC1 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 8 KHDC1L 2.31894450 2.8252262 5.29099724 7.44183228 1.94629741 9 KHDC3L 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 10 KHDRBS1 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 Cell.line6 Cell.line7 Cell.line8 Cell.line9 Cell.line10 1 8.12373424 7.67506261 24.43776341 18.33244818 9.224225 2 4.18181234 1.65268403 5.98346320 1.51423807 0.000000 3 0.05857207 0.05945414 0.20733924 0.05830982 0.000000 4 0.00000000 0.00000000 0.07752608 0.01585643 16.664245 5 0.02563099 0.03902548 0.00000000 0.00000000 0.000000 6 0.00000000 0.00000000 0.00000000 0.00000000 0.000000 7 0.00000000 0.00000000 0.00000000 0.00000000 0.000000 8 8.56022436 7.50838343 7.17964645 3.28602729 0.000000 9 0.00000000 0.00000000 0.00000000 0.00000000 3.598534 10 0.00000000 0.03081180 0.00000000 0.00000000 2.600173 Sheet two - features (2 features(Growth rate,drug sensitivity for 10 cell lines)> featuresCell.line Cell.line1 Cell.line2 Cell.line3 Cell.line4 Cell.line5 1 Growth rate NA NA NA 51.41 NA 2 Drug sensitivity 5.03 6.57 8 1.26 3 Cell.line6 Cell.line7 Cell.line8 Cell.line9 Cell.line10 1 41.33 26.76 24.19 NA NA 2 1.40 1.88 1.33 5.05 9.12 What I found: corr.test {psych} corr.test(x, y = NULL, use = "pairwise",method="pearson",adjust="BH",alpha=.01) --> I adjusted the original command to what I need (BH insted og holm) and alpha=.01 insted of 0.05. I would be very happy, if someone could show me how to use this command, in particular how to refer as x and y to the two sheets I have (Genes and Features). I would take it from there. Thanks a lot in advance. Sarah ----- Original Message ----- From: "Rainer Schuermann" <Rainer.Schuermann at gmx.net> To: "Sarah Bazzocco" <sarah.bazzocco at vhir.org> Sent: Thursday, 18 June, 2015 8:14:56 AM Subject: Re: [R] help Hi Sarah, ? Not an answer to our question but a piece of well intended advice: ? 1. Don't post HTML but plain text. Not only that people will tell you this in a sometimes not very friendly manner - using HTML actually does make posts illegible in this mailing list. Code, and R _is_ code, is always plain text. ? 2. Don't pose an abstract problem - this looks too much like "Can you please do my work for me". Show us what you have tried already, and people will happily jump in and provide their thoughts and advice. ? 3. Always make sure that you ave a reproducible example in your mail, and a set of data of the same type and structure you are using - ideally using dput(). ? See further advice here ? PLEASE do read the posting guide ? http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ? and here: ? http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ? For your problem, R has an immense wealth of ideas and solutions. ? Rgds, Rainer ? ? ? On Wed June 17 2015 16:57:24 Sarah Bazzocco wrote:>> Hello,>> ?>> I am a R-beginner and I need some help.?The question is very simple: I need to do a pearson correlations (r,p-value and FDR with BH) from an Expression array (with several thousand genes for lets say 20 cell lines)?with some features of those cell lines.>>>> My problem I have is the organization of the excel sheets and how to introduce the data into R and run the script. I though the easiest and more organized for me would be two expcel sheets:>> 1- Only Expression data (in rows the?genes and in colums cell lines)>> 2- Only the features (In row the features (e.g. a) growth rate, b) sensitivity to some drugs) and in columns the cell lines).>>>> -->That would creat both sheets with 20 colums.>>>> Now I would like to get a correlation of the gene 1: the expression of all lines with the growth rate.>> the same for gene2... and soforth. I sould obtain as many r,p and BH(FDR) as genes there are.>> the same I would need to do for the sensitivity... and so on.>>>> Do you think this is doable? I am not at all a bioinformatic expert, so all help is very welcome.>>>> Thank you very much!>>>> Kind regards,>>>> Sarah>>>>? -- Sarah Bazzocco, PhD student Group of Molecular Oncology, CIBBIM-Nanomedicine, Vall d'Hebron Hospital Research Institute, Passeig Vall d'Hebron 119-129, Barcelona 08035, Spain. Tel: +34-93-489-4056 Fax: +34-93-489-3893 Email: sarah.bazzocco at vhir.org -- Sarah Bazzocco, PhD student Group of Molecular Oncology, CIBBIM-Nanomedicine, Vall d'Hebron Hospital Research Institute, Passeig Vall d'Hebron 119-129, Barcelona 08035, Spain. Tel: +34-93-489-4056 Fax: +34-93-489-3893 Email: sarah.bazzocco at vhir.org
Rainer Schuermann
2015-Jun-18 17:09 UTC
[R] Correlation matrix for pearson correlation (r,p,BH(FDR))
The way the sample data is provided is not useful. I have re-built your data, please find the dput() version below (and pls check whether I got it right...). This is not my area of competence at all, but from what I see from the help page is that the expected parameters are, among others: x A matrix or dataframe y A second matrix or dataframe __with the same number of rows as x__ I hope that somebody with a better understanding of your intention is able to pick up from here, with the sample data in useful format. Rgds, Rainer dput( genes ) structure(list(Genes = structure(1:10, .Label = c("KCNAB3", "KCNB1", "KCNB2", "KERA", "KGFLP1", "KGFLP2", "KHDC1", "KHDC1L", "KHDC3L", "KHDRBS1"), class = "factor"), Cell.line1 = c(12.02005181, 0.02457449, 0.44791862, 0.06090217, 0.02450101, 0, 0, 2.3189445, 0, 0), Cell.line2 = c(11.140091, 1.3028535, 0.1060137, 0, 0, 0, 0, 2.8252262, 0, 0), Cell.line3 = c(15.60381163, 0.81538294, 0.09864136, 0.03352993, 0, 0, 0, 5.29099724, 0, 0 ), Cell.line4 = c(13.44151596, 0.59318327, 0, 0.03634781, 0, 0, 0, 7.44183228, 0, 0), Cell.line5 = c(25.3716103, 0.15332321, 0, 0.04190912, 0, 0, 0, 1.94629741, 0, 0), Cell.line6 = c(8.12373424, 4.18181234, 0.05857207, 0, 0.02563099, 0, 0, 8.56022436, 0, 0 ), Cell.line7 = c(7.67506261, 1.65268403, 0.05945414, 0, 0.03902548, 0, 0, 7.50838343, 0, 0.0308118), Cell.line8 = c(24.43776341, 5.9834632, 0.20733924, 0.07752608, 0, 0, 0, 7.17964645, 0, 0), Cell.line9 = c(18.33244818, 1.51423807, 0.05830982, 0.01585643, 0, 0, 0, 3.28602729, 0, 0), Cell.line10 = c(9.224225, 0, 0, 16.664245, 0, 0, 0, 0, 3.598534, 2.600173)), .Names = c("Genes", "Cell.line1", "Cell.line2", "Cell.line3", "Cell.line4", "Cell.line5", "Cell.line6", "Cell.line7", "Cell.line8", "Cell.line9", "Cell.line10" ), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame") dput( features ) structure(list(Cell.line = c("Growth rate", "Drug sensitivity" ), Cell.line1 = c(NA, "41.33"), Cell.line2 = c(NA, "26.76"), Cell.line3 = c(NA, "24.19"), Cell.line4 = c("51.41", NA), Cell.line5 = c(NA_character_, NA_character_), Cell.line6 = c("5.03", "1.40"), Cell.line7 = c("6.57", "1.88"), Cell.line8 = c("8", "1.33"), Cell.line9 = c("1.26", "5.05"), Cell.line10 = c("3", "9.12")), .Names = c("Cell.line", "Cell.line1", "Cell.line2", "Cell.line3", "Cell.line4", "Cell.line5", "Cell.line6", "Cell.line7", "Cell.line8", "Cell.line9", "Cell.line10"), row.names = c(NA, -2L), class = "data.frame") On Thu June 18 2015 10:19:55 Sarah Bazzocco wrote:> This post was called "help" before, I changed the Subject. > Thanks for the comments. > Here the example: (I have the two lists saved as .csv and I can open them in R) > > Sheet one- Genes (10 genes expression, not binary, meaured in 10 cell lines) > > genes > Genes Cell.line1 Cell.line2 Cell.line3 Cell.line4 Cell.line5 > 1 KCNAB3 12.02005181 11.1400910 15.60381163 13.44151596 25.37161030 > 2 KCNB1 0.02457449 1.3028535 0.81538294 0.59318327 0.15332321 > 3 KCNB2 0.44791862 0.1060137 0.09864136 0.00000000 0.00000000 > 4 KERA 0.06090217 0.0000000 0.03352993 0.03634781 0.04190912 > 5 KGFLP1 0.02450101 0.0000000 0.00000000 0.00000000 0.00000000 > 6 KGFLP2 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 > 7 KHDC1 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 > 8 KHDC1L 2.31894450 2.8252262 5.29099724 7.44183228 1.94629741 > 9 KHDC3L 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 > 10 KHDRBS1 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 > Cell.line6 Cell.line7 Cell.line8 Cell.line9 Cell.line10 > 1 8.12373424 7.67506261 24.43776341 18.33244818 9.224225 > 2 4.18181234 1.65268403 5.98346320 1.51423807 0.000000 > 3 0.05857207 0.05945414 0.20733924 0.05830982 0.000000 > 4 0.00000000 0.00000000 0.07752608 0.01585643 16.664245 > 5 0.02563099 0.03902548 0.00000000 0.00000000 0.000000 > 6 0.00000000 0.00000000 0.00000000 0.00000000 0.000000 > 7 0.00000000 0.00000000 0.00000000 0.00000000 0.000000 > 8 8.56022436 7.50838343 7.17964645 3.28602729 0.000000 > 9 0.00000000 0.00000000 0.00000000 0.00000000 3.598534 > 10 0.00000000 0.03081180 0.00000000 0.00000000 2.600173 > > Sheet two - features (2 features(Growth rate,drug sensitivity for 10 cell lines) > > features > Cell.line Cell.line1 Cell.line2 Cell.line3 Cell.line4 Cell.line5 > 1 Growth rate NA NA NA 51.41 NA > 2 Drug sensitivity 5.03 6.57 8 1.26 3 > Cell.line6 Cell.line7 Cell.line8 Cell.line9 Cell.line10 > 1 41.33 26.76 24.19 NA NA > 2 1.40 1.88 1.33 5.05 9.12 > > What I found: > corr.test {psych} > corr.test(x, y = NULL, use = "pairwise",method="pearson",adjust="BH",alpha=.01) > --> I adjusted the original command to what I need (BH insted og holm) and alpha=.01 insted of 0.05. > > I would be very happy, if someone could show me how to use this command, in particular how to refer as x and y to the two sheets I have (Genes and Features). I would take it from there. > > Thanks a lot in advance. > > Sarah
Peter Langfelder
2015-Jun-18 18:52 UTC
[R] Correlation matrix for pearson correlation (r,p,BH(FDR))
You have multiple options. I will advertise my own solution - install the package WGCNA, installation instructions at http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/#cranInstall then you can use the function cp = corAndPvalue(t(genes), t(features)). You need to transpose both because the function expects variables in columns and samples in rows. This will give you a list whose components include 'cor' (matrix of the correlation values) and 'p' (matrix of the Student p-values). To get a matrix of the corresponding FDR, use fdr = apply(cp$p, 2, p.adjust, method = "fdr") Hope this helps, Peter On Thu, Jun 18, 2015 at 1:19 AM, Sarah Bazzocco <sarah.bazzocco at vhir.org> wrote:> This post was called "help" before, I changed the Subject. > Thanks for the comments. > Here the example: (I have the two lists saved as .csv and I can open them in R) > > Sheet one- Genes (10 genes expression, not binary, meaured in 10 cell lines) >> genes > Genes Cell.line1 Cell.line2 Cell.line3 Cell.line4 Cell.line5 > 1 KCNAB3 12.02005181 11.1400910 15.60381163 13.44151596 25.37161030 > 2 KCNB1 0.02457449 1.3028535 0.81538294 0.59318327 0.15332321 > 3 KCNB2 0.44791862 0.1060137 0.09864136 0.00000000 0.00000000 > 4 KERA 0.06090217 0.0000000 0.03352993 0.03634781 0.04190912 > 5 KGFLP1 0.02450101 0.0000000 0.00000000 0.00000000 0.00000000 > 6 KGFLP2 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 > 7 KHDC1 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 > 8 KHDC1L 2.31894450 2.8252262 5.29099724 7.44183228 1.94629741 > 9 KHDC3L 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 > 10 KHDRBS1 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 > Cell.line6 Cell.line7 Cell.line8 Cell.line9 Cell.line10 > 1 8.12373424 7.67506261 24.43776341 18.33244818 9.224225 > 2 4.18181234 1.65268403 5.98346320 1.51423807 0.000000 > 3 0.05857207 0.05945414 0.20733924 0.05830982 0.000000 > 4 0.00000000 0.00000000 0.07752608 0.01585643 16.664245 > 5 0.02563099 0.03902548 0.00000000 0.00000000 0.000000 > 6 0.00000000 0.00000000 0.00000000 0.00000000 0.000000 > 7 0.00000000 0.00000000 0.00000000 0.00000000 0.000000 > 8 8.56022436 7.50838343 7.17964645 3.28602729 0.000000 > 9 0.00000000 0.00000000 0.00000000 0.00000000 3.598534 > 10 0.00000000 0.03081180 0.00000000 0.00000000 2.600173 > > Sheet two - features (2 features(Growth rate,drug sensitivity for 10 cell lines) >> features > Cell.line Cell.line1 Cell.line2 Cell.line3 Cell.line4 Cell.line5 > 1 Growth rate NA NA NA 51.41 NA > 2 Drug sensitivity 5.03 6.57 8 1.26 3 > Cell.line6 Cell.line7 Cell.line8 Cell.line9 Cell.line10 > 1 41.33 26.76 24.19 NA NA > 2 1.40 1.88 1.33 5.05 9.12 > > What I found: > corr.test {psych} > corr.test(x, y = NULL, use = "pairwise",method="pearson",adjust="BH",alpha=.01) > --> I adjusted the original command to what I need (BH insted og holm) and alpha=.01 insted of 0.05. > > I would be very happy, if someone could show me how to use this command, in particular how to refer as x and y to the two sheets I have (Genes and Features). I would take it from there. > > Thanks a lot in advance. > > Sarah > > > > > > > ----- Original Message ----- > From: "Rainer Schuermann" <Rainer.Schuermann at gmx.net> > To: "Sarah Bazzocco" <sarah.bazzocco at vhir.org> > Sent: Thursday, 18 June, 2015 8:14:56 AM > Subject: Re: [R] help > > > > Hi Sarah, > > > > Not an answer to our question but a piece of well intended advice: > > > > 1. Don't post HTML but plain text. Not only that people will tell you this in a sometimes not very friendly manner - using HTML actually does make posts illegible in this mailing list. Code, and R _is_ code, is always plain text. > > > > 2. Don't pose an abstract problem - this looks too much like "Can you please do my work for me". Show us what you have tried already, and people will happily jump in and provide their thoughts and advice. > > > > 3. Always make sure that you ave a reproducible example in your mail, and a set of data of the same type and structure you are using - ideally using dput(). > > > > See further advice here > > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > and here: > > > > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > > > > For your problem, R has an immense wealth of ideas and solutions. > > > > Rgds, > > Rainer > > > > > > > > On Wed June 17 2015 16:57:24 Sarah Bazzocco wrote: > >> > >> Hello, > >> > >> ? > >> > >> I am a R-beginner and I need some help.?The question is very simple: I need to do a pearson correlations (r,p-value and FDR with BH) from an Expression array (with several thousand genes for lets say 20 cell lines)?with some features of those cell lines. > >> > >> > >> > >> My problem I have is the organization of the excel sheets and how to introduce the data into R and run the script. I though the easiest and more organized for me would be two expcel sheets: > >> > >> 1- Only Expression data (in rows the?genes and in colums cell lines) > >> > >> 2- Only the features (In row the features (e.g. a) growth rate, b) sensitivity to some drugs) and in columns the cell lines). > >> > >> > >> > >> -->That would creat both sheets with 20 colums. > >> > >> > >> > >> Now I would like to get a correlation of the gene 1: the expression of all lines with the growth rate. > >> > >> the same for gene2... and soforth. I sould obtain as many r,p and BH(FDR) as genes there are. > >> > >> the same I would need to do for the sensitivity... and so on. > >> > >> > >> > >> Do you think this is doable? I am not at all a bioinformatic expert, so all help is very welcome. > >> > >> > >> > >> Thank you very much! > >> > >> > >> > >> Kind regards, > >> > >> > >> > >> Sarah > >> > >> > >> > >> > > > > -- > > > Sarah Bazzocco, PhD student > Group of Molecular Oncology, > CIBBIM-Nanomedicine, > Vall d'Hebron Hospital Research Institute, > Passeig Vall d'Hebron 119-129, > Barcelona 08035, Spain. > Tel: +34-93-489-4056 > > Fax: +34-93-489-3893 > Email: sarah.bazzocco at vhir.org > > > > -- > > > Sarah Bazzocco, PhD student > Group of Molecular Oncology, > CIBBIM-Nanomedicine, > Vall d'Hebron Hospital Research Institute, > Passeig Vall d'Hebron 119-129, > Barcelona 08035, Spain. > Tel: +34-93-489-4056 > > Fax: +34-93-489-3893 > Email: sarah.bazzocco at vhir.org > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.