Hi, I am interested in using the cast function in R to perform some aggregation. I did once manage to get it working, but have now forgotten how I did this. So here is my dilemma. I have several thousands of probes (about 180,000) corresponding to each gene; what I'd like to do is obtain is a frequency count of the various occurrences of each probes for each gene. The data would look something like this: Gene ProbeID Expression_Level A 1 0.34 A 2 0.21 E 3 0.11 A 4 0.21 F 5 0.56 F 6 0.87 . . . (180000 data points) In each case, the probeID is unique. The output I am looking for is something like this: Gene No.ofprobes Mean_expression A 3 0.25 Is there an easy way to do this using "cast" or "melt"? Ideally, I would also like to see the unique probes corresponding to each gene in the wide format. Thanks in advance Max Maxy Mariasegaram| Reserach Fellow | Australian Prostate Cancer Research Centre| Level 1, Building 33 | Princess Alexandra Hospital | 199 Ipswich Road, Brisbane QLD 4102 Australia | t: 07 3176 3073| f: 07 3176 7440 | e: mariaseg@qut.edu.au [[alternative HTML version deleted]]
Hi Max Using plyr instead of rehsape: library(plyr) df <- data.frame(gene=c('A', 'A', 'E', 'A', 'F', 'F'), probe = c(1,2,3,4,5,6)) ddply(df, .(gene), function(df)length(df$gene)) gene V1 1 A 3 2 E 1 3 F 2 best iain --- On Thu, 30/6/11, Max Mariasegaram <max.mariasegaram at qut.edu.au> wrote:> From: Max Mariasegaram <max.mariasegaram at qut.edu.au> > Subject: [R] aggregating data > To: "r-help at r-project.org" <r-help at r-project.org> > Date: Thursday, 30 June, 2011, 8:28 > Hi, > > I am interested in using the cast function in R to perform > some aggregation. I did once manage to get it working, but > have now forgotten how I did this. So here is my dilemma. I > have several thousands of probes (about 180,000) > corresponding to each gene; what I'd like to do is obtain is > a frequency count of the various occurrences of each probes > for each gene. > > The data would look something like this: > > Gene? ???ProbeID? ? ? > ? ? ? ???Expression_Level > A? ? ? ? ? > ???1? ? ? ? ? ? > ? 0.34 > A? ? ? ? ? > ???2? ? ? ? ? ? > ? 0.21 > E? ? ? ? ? ? ? 3? > ? ? ? ? ? ? 0.11 > A? ? ? ? ? > ???4? ? ? ? ? ? > ? 0.21 > F? ? ? ? ? ? ? 5? > ? ? ? ? ? ? 0.56 > F? ? ? ? ? ? ? 6? > ? ? ? ? ? ? 0.87 > . > . > . > (180000 data points) > > In each case, the probeID is unique. The output I am > looking for is something like this: > > Gene? ???No.ofprobes? ? > ? Mean_expression > A? ? ? ? ? > ???3? ? ? ? ? ? > ? 0.25 > > Is there an easy way to do this using "cast" or "melt"? > Ideally, I would also like to see the unique probes > corresponding to each gene in the wide format. > > Thanks in advance > Max > > Maxy Mariasegaram| Reserach Fellow | Australian Prostate > Cancer Research Centre| Level 1, Building 33 | Princess > Alexandra Hospital | 199 Ipswich Road, Brisbane QLD 4102 > Australia | t: 07 3176 3073| f: 07 3176 7440 | e: mariaseg at qut.edu.au > > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >
oops last reply was only half the solution: library(plyr) df <- data.frame(gene=c('A', 'A', 'E', 'A', 'F', 'F'), probe = c(1,2,3,4,5,6), exp = c(0.34, 0.21, 0.11, 0.21, 0.56, 0.81)) ddply(df, .(gene), function(df)c(length(df$gene), median(df$exp)) gene V1 V2 1 A 3 0.210 2 E 1 0.110 3 F 2 0.685 best iain --- On Thu, 30/6/11, Max Mariasegaram <max.mariasegaram at qut.edu.au> wrote:> From: Max Mariasegaram <max.mariasegaram at qut.edu.au> > Subject: [R] aggregating data > To: "r-help at r-project.org" <r-help at r-project.org> > Date: Thursday, 30 June, 2011, 8:28 > Hi, > > I am interested in using the cast function in R to perform > some aggregation. I did once manage to get it working, but > have now forgotten how I did this. So here is my dilemma. I > have several thousands of probes (about 180,000) > corresponding to each gene; what I'd like to do is obtain is > a frequency count of the various occurrences of each probes > for each gene. > > The data would look something like this: > > Gene? ???ProbeID? ? ? > ? ? ? ???Expression_Level > A? ? ? ? ? > ???1? ? ? ? ? ? > ? 0.34 > A? ? ? ? ? > ???2? ? ? ? ? ? > ? 0.21 > E? ? ? ? ? ? ? 3? > ? ? ? ? ? ? 0.11 > A? ? ? ? ? > ???4? ? ? ? ? ? > ? 0.21 > F? ? ? ? ? ? ? 5? > ? ? ? ? ? ? 0.56 > F? ? ? ? ? ? ? 6? > ? ? ? ? ? ? 0.87 > . > . > . > (180000 data points) > > In each case, the probeID is unique. The output I am > looking for is something like this: > > Gene? ???No.ofprobes? ? > ? Mean_expression > A? ? ? ? ? > ???3? ? ? ? ? ? > ? 0.25 > > Is there an easy way to do this using "cast" or "melt"? > Ideally, I would also like to see the unique probes > corresponding to each gene in the wide format. > > Thanks in advance > Max > > Maxy Mariasegaram| Reserach Fellow | Australian Prostate > Cancer Research Centre| Level 1, Building 33 | Princess > Alexandra Hospital | 199 Ipswich Road, Brisbane QLD 4102 > Australia | t: 07 3176 3073| f: 07 3176 7440 | e: mariaseg at qut.edu.au > > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >
If you have a large datatable, you might consider using 'data.table' which is better performing than 'plyr'> x <- read.table(textConnection("Gene ProbeID Expression_Level+ A 1 0.34 + A 2 0.21 + E 3 0.11 + A 4 0.21 + F 5 0.56 + F 6 0.87"), header = TRUE)> closeAllConnections() > require(data.table) > x <- data.table(x) > x[,+ list(nProbes = length(ProbeID) + , Mean_Level = mean(Expression_Level) + ) + , by = Gene + ] Gene nProbes Mean_Level [1,] A 3 0.2533333 [2,] E 1 0.1100000 [3,] F 2 0.7150000> >On Thu, Jun 30, 2011 at 3:28 AM, Max Mariasegaram <max.mariasegaram at qut.edu.au> wrote:> Hi, > > I am interested in using the cast function in R to perform some aggregation. I did once manage to get it working, but have now forgotten how I did this. So here is my dilemma. I have several thousands of probes (about 180,000) corresponding to each gene; what I'd like to do is obtain is a frequency count of the various occurrences of each probes for each gene. > > The data would look something like this: > > Gene ? ? ProbeID ? ? ? ? ? ? ? Expression_Level > A ? ? ? ? ? ? 1 ? ? ? ? ? ? ?0.34 > A ? ? ? ? ? ? 2 ? ? ? ? ? ? ?0.21 > E ? ? ? ? ? ? ?3 ? ? ? ? ? ? ?0.11 > A ? ? ? ? ? ? 4 ? ? ? ? ? ? ?0.21 > F ? ? ? ? ? ? ?5 ? ? ? ? ? ? ?0.56 > F ? ? ? ? ? ? ?6 ? ? ? ? ? ? ?0.87 > . > . > . > (180000 data points) > > In each case, the probeID is unique. The output I am looking for is something like this: > > Gene ? ? No.ofprobes ? ? ?Mean_expression > A ? ? ? ? ? ? 3 ? ? ? ? ? ? ?0.25 > > Is there an easy way to do this using "cast" or "melt"? Ideally, I would also like to see the unique probes corresponding to each gene in the wide format. > > Thanks in advance > Max > > Maxy Mariasegaram| Reserach Fellow | Australian Prostate Cancer Research Centre| Level 1, Building 33 | Princess Alexandra Hospital | 199 Ipswich Road, Brisbane QLD 4102 Australia | t: 07 3176 3073| f: 07 3176 7440 | e: mariaseg at qut.edu.au > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Hi, You can get it with "by": foo <- function(x)c(length(x$probe), mean(x$exp)) res <- by(df[c('exp', 'probe')], df['gene'], FUN=foo) do.call(rbind, res) Bye, Oscar. -- Oscar Perpi??n Lamigueiro Dpto. Ingenier?a El?ctrica EUITI-UPM http://procomun.wordpress.com El Thu, 30 Jun 2011 17:28:02 +1000 Max Mariasegaram <max.mariasegaram at qut.edu.au> escribi?:> Hi, > > I am interested in using the cast function in R to perform some > aggregation. I did once manage to get it working, but have now > forgotten how I did this. So here is my dilemma. I have several > thousands of probes (about 180,000) corresponding to each gene; what > I'd like to do is obtain is a frequency count of the various > occurrences of each probes for each gene. > > The data would look something like this: > > Gene ProbeID Expression_Level > A 1 0.34 > A 2 0.21 > E 3 0.11 > A 4 0.21 > F 5 0.56 > F 6 0.87 > . > . > . > (180000 data points) > > In each case, the probeID is unique. The output I am looking for is > something like this: > > Gene No.ofprobes Mean_expression > A 3 0.25 > > Is there an easy way to do this using "cast" or "melt"? Ideally, I > would also like to see the unique probes corresponding to each gene > in the wide format. > > Thanks in advance > Max > > Maxy Mariasegaram| Reserach Fellow | Australian Prostate Cancer > Research Centre| Level 1, Building 33 | Princess Alexandra Hospital | > 199 Ipswich Road, Brisbane QLD 4102 Australia | t: 07 3176 3073| f: > 07 3176 7440 | e: mariaseg at qut.edu.au > > > [[alternative HTML version deleted]] >