Hi, I am interested in using the cast function in R to perform some aggregation. I did once manage to get it working, but have now forgotten how I did this. So here is my dilemma. I have several thousands of probes (about 180,000) corresponding to each gene; what I'd like to do is obtain is a frequency count of the various occurrences of each probes for each gene. The data would look something like this: Gene ProbeID Expression_Level A 1 0.34 A 2 0.21 E 3 0.11 A 4 0.21 F 5 0.56 F 6 0.87 . . . (180000 data points) In each case, the probeID is unique. The output I am looking for is something like this: Gene No.ofprobes Mean_expression A 3 0.25 Is there an easy way to do this using "cast" or "melt"? Ideally, I would also like to see the unique probes corresponding to each gene in the wide format. Thanks in advance Max Maxy Mariasegaram| Reserach Fellow | Australian Prostate Cancer Research Centre| Level 1, Building 33 | Princess Alexandra Hospital | 199 Ipswich Road, Brisbane QLD 4102 Australia | t: 07 3176 3073| f: 07 3176 7440 | e: mariaseg@qut.edu.au [[alternative HTML version deleted]]
Hi Max
Using plyr instead of rehsape:
library(plyr)
df <- data.frame(gene=c('A', 'A', 'E', 'A',
'F', 'F'), probe = c(1,2,3,4,5,6))
ddply(df, .(gene), function(df)length(df$gene))
gene V1
1 A 3
2 E 1
3 F 2
best
iain
--- On Thu, 30/6/11, Max Mariasegaram <max.mariasegaram at qut.edu.au>
wrote:
> From: Max Mariasegaram <max.mariasegaram at qut.edu.au>
> Subject: [R] aggregating data
> To: "r-help at r-project.org" <r-help at r-project.org>
> Date: Thursday, 30 June, 2011, 8:28
> Hi,
>
> I am interested in using the cast function in R to perform
> some aggregation. I did once manage to get it working, but
> have now forgotten how I did this. So here is my dilemma. I
> have several thousands of probes (about 180,000)
> corresponding to each gene; what I'd like to do is obtain is
> a frequency count of the various occurrences of each probes
> for each gene.
>
> The data would look something like this:
>
> Gene? ???ProbeID? ? ?
> ? ? ? ???Expression_Level
> A? ? ? ? ?
> ???1? ? ? ? ? ?
> ? 0.34
> A? ? ? ? ?
> ???2? ? ? ? ? ?
> ? 0.21
> E? ? ? ? ? ? ? 3?
> ? ? ? ? ? ? 0.11
> A? ? ? ? ?
> ???4? ? ? ? ? ?
> ? 0.21
> F? ? ? ? ? ? ? 5?
> ? ? ? ? ? ? 0.56
> F? ? ? ? ? ? ? 6?
> ? ? ? ? ? ? 0.87
> .
> .
> .
> (180000 data points)
>
> In each case, the probeID is unique. The output I am
> looking for is something like this:
>
> Gene? ???No.ofprobes? ?
> ? Mean_expression
> A? ? ? ? ?
> ???3? ? ? ? ? ?
> ? 0.25
>
> Is there an easy way to do this using "cast" or "melt"?
> Ideally, I would also like to see the unique probes
> corresponding to each gene in the wide format.
>
> Thanks in advance
> Max
>
> Maxy Mariasegaram| Reserach Fellow | Australian Prostate
> Cancer Research Centre| Level 1, Building 33 | Princess
> Alexandra Hospital | 199 Ipswich Road, Brisbane QLD 4102
> Australia | t: 07 3176 3073| f: 07 3176 7440 | e: mariaseg at qut.edu.au
>
>
> ??? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>
oops last reply was only half the solution:
library(plyr)
df <- data.frame(gene=c('A', 'A', 'E', 'A',
'F', 'F'), probe = c(1,2,3,4,5,6), exp = c(0.34, 0.21, 0.11,
0.21, 0.56, 0.81))
ddply(df, .(gene), function(df)c(length(df$gene), median(df$exp))
gene V1 V2
1 A 3 0.210
2 E 1 0.110
3 F 2 0.685
best
iain
--- On Thu, 30/6/11, Max Mariasegaram <max.mariasegaram at qut.edu.au>
wrote:
> From: Max Mariasegaram <max.mariasegaram at qut.edu.au>
> Subject: [R] aggregating data
> To: "r-help at r-project.org" <r-help at r-project.org>
> Date: Thursday, 30 June, 2011, 8:28
> Hi,
>
> I am interested in using the cast function in R to perform
> some aggregation. I did once manage to get it working, but
> have now forgotten how I did this. So here is my dilemma. I
> have several thousands of probes (about 180,000)
> corresponding to each gene; what I'd like to do is obtain is
> a frequency count of the various occurrences of each probes
> for each gene.
>
> The data would look something like this:
>
> Gene? ???ProbeID? ? ?
> ? ? ? ???Expression_Level
> A? ? ? ? ?
> ???1? ? ? ? ? ?
> ? 0.34
> A? ? ? ? ?
> ???2? ? ? ? ? ?
> ? 0.21
> E? ? ? ? ? ? ? 3?
> ? ? ? ? ? ? 0.11
> A? ? ? ? ?
> ???4? ? ? ? ? ?
> ? 0.21
> F? ? ? ? ? ? ? 5?
> ? ? ? ? ? ? 0.56
> F? ? ? ? ? ? ? 6?
> ? ? ? ? ? ? 0.87
> .
> .
> .
> (180000 data points)
>
> In each case, the probeID is unique. The output I am
> looking for is something like this:
>
> Gene? ???No.ofprobes? ?
> ? Mean_expression
> A? ? ? ? ?
> ???3? ? ? ? ? ?
> ? 0.25
>
> Is there an easy way to do this using "cast" or "melt"?
> Ideally, I would also like to see the unique probes
> corresponding to each gene in the wide format.
>
> Thanks in advance
> Max
>
> Maxy Mariasegaram| Reserach Fellow | Australian Prostate
> Cancer Research Centre| Level 1, Building 33 | Princess
> Alexandra Hospital | 199 Ipswich Road, Brisbane QLD 4102
> Australia | t: 07 3176 3073| f: 07 3176 7440 | e: mariaseg at qut.edu.au
>
>
> ??? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>
If you have a large datatable, you might consider using 'data.table' which is better performing than 'plyr'> x <- read.table(textConnection("Gene ProbeID Expression_Level+ A 1 0.34 + A 2 0.21 + E 3 0.11 + A 4 0.21 + F 5 0.56 + F 6 0.87"), header = TRUE)> closeAllConnections() > require(data.table) > x <- data.table(x) > x[,+ list(nProbes = length(ProbeID) + , Mean_Level = mean(Expression_Level) + ) + , by = Gene + ] Gene nProbes Mean_Level [1,] A 3 0.2533333 [2,] E 1 0.1100000 [3,] F 2 0.7150000> >On Thu, Jun 30, 2011 at 3:28 AM, Max Mariasegaram <max.mariasegaram at qut.edu.au> wrote:> Hi, > > I am interested in using the cast function in R to perform some aggregation. I did once manage to get it working, but have now forgotten how I did this. So here is my dilemma. I have several thousands of probes (about 180,000) corresponding to each gene; what I'd like to do is obtain is a frequency count of the various occurrences of each probes for each gene. > > The data would look something like this: > > Gene ? ? ProbeID ? ? ? ? ? ? ? Expression_Level > A ? ? ? ? ? ? 1 ? ? ? ? ? ? ?0.34 > A ? ? ? ? ? ? 2 ? ? ? ? ? ? ?0.21 > E ? ? ? ? ? ? ?3 ? ? ? ? ? ? ?0.11 > A ? ? ? ? ? ? 4 ? ? ? ? ? ? ?0.21 > F ? ? ? ? ? ? ?5 ? ? ? ? ? ? ?0.56 > F ? ? ? ? ? ? ?6 ? ? ? ? ? ? ?0.87 > . > . > . > (180000 data points) > > In each case, the probeID is unique. The output I am looking for is something like this: > > Gene ? ? No.ofprobes ? ? ?Mean_expression > A ? ? ? ? ? ? 3 ? ? ? ? ? ? ?0.25 > > Is there an easy way to do this using "cast" or "melt"? Ideally, I would also like to see the unique probes corresponding to each gene in the wide format. > > Thanks in advance > Max > > Maxy Mariasegaram| Reserach Fellow | Australian Prostate Cancer Research Centre| Level 1, Building 33 | Princess Alexandra Hospital | 199 Ipswich Road, Brisbane QLD 4102 Australia | t: 07 3176 3073| f: 07 3176 7440 | e: mariaseg at qut.edu.au > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Hi,
You can get it with "by":
foo <- function(x)c(length(x$probe), mean(x$exp))
res <- by(df[c('exp', 'probe')], df['gene'], FUN=foo)
do.call(rbind, res)
Bye,
Oscar.
--
Oscar Perpi??n Lamigueiro
Dpto. Ingenier?a El?ctrica
EUITI-UPM
http://procomun.wordpress.com
El Thu, 30 Jun 2011 17:28:02 +1000
Max Mariasegaram <max.mariasegaram at qut.edu.au>
escribi?:> Hi,
>
> I am interested in using the cast function in R to perform some
> aggregation. I did once manage to get it working, but have now
> forgotten how I did this. So here is my dilemma. I have several
> thousands of probes (about 180,000) corresponding to each gene; what
> I'd like to do is obtain is a frequency count of the various
> occurrences of each probes for each gene.
>
> The data would look something like this:
>
> Gene ProbeID Expression_Level
> A 1 0.34
> A 2 0.21
> E 3 0.11
> A 4 0.21
> F 5 0.56
> F 6 0.87
> .
> .
> .
> (180000 data points)
>
> In each case, the probeID is unique. The output I am looking for is
> something like this:
>
> Gene No.ofprobes Mean_expression
> A 3 0.25
>
> Is there an easy way to do this using "cast" or "melt"?
Ideally, I
> would also like to see the unique probes corresponding to each gene
> in the wide format.
>
> Thanks in advance
> Max
>
> Maxy Mariasegaram| Reserach Fellow | Australian Prostate Cancer
> Research Centre| Level 1, Building 33 | Princess Alexandra Hospital |
> 199 Ipswich Road, Brisbane QLD 4102 Australia | t: 07 3176 3073| f:
> 07 3176 7440 | e: mariaseg at qut.edu.au
>
>
> [[alternative HTML version deleted]]
>