Dear R users, I have some trouble with the aggregate function. Here are my data> dafS_id AF_Class count... R_gc_percent S_length 5 8264497 1 30 0.48 35678 6 8264497 3 7 0.48 35678 8 8264554 1 31 0.51 38894 9 8264554 2 11 0.51 38894 10 8264554 3 1 0.51 38894 for a given S_id, I would like to select the line corresponding to the max count. To perform this, I used: > aggregate(daf,list(daf$S_id),max) Group.1 S_id AF_Class count... R_gc_percent S_length 1 8264497 8264497 3 30 0.48 35678 2 8264554 8264554 3 31 0.51 38894 which is ok for the count. But I realized that max function is also applied to AF_class (should be 1 and 1 instead of 3 and 3), so it seems that aggregate is not the appropriate function for that I want to do. Is there any other function I could use instead? Best whishes, St?phane. -- =========================================================Stephane CRUVEILLER Ph. D. Genoscope - Centre National de Sequencage Atelier de Genomique Comparative 2, Rue Gaston Cremieux CP 5706 91057 Evry Cedex - France Phone: +33 (0)1 60 87 84 58 Fax: +33 (0)1 60 87 25 14 EMails: scruveil at genoscope.cns.fr ,scruvell at infobiogen.fr
try 'by':> xS_id AF_Class count... R_gc_percent S_length 5 8264497 1 30 0.48 35678 6 8264497 3 7 0.48 35678 8 8264554 1 31 0.51 38894 9 8264554 2 11 0.51 38894 10 8264554 3 1 0.51 38894> do.call('rbind', by(x, x$S_id, function(y) y[which.max(y$AF_Class),]))S_id AF_Class count... R_gc_percent S_length 8264497 8264497 3 7 0.48 35678 8264554 8264554 3 1 0.51 38894>On 3/29/06, Stephane CRUVEILLER <scruveil@genoscope.cns.fr> wrote:> > Dear R users, > > I have some trouble with the aggregate function. Here are my data > > > daf > S_id AF_Class count... R_gc_percent S_length > 5 8264497 1 30 0.48 35678 > 6 8264497 3 7 0.48 35678 > 8 8264554 1 31 0.51 38894 > 9 8264554 2 11 0.51 38894 > 10 8264554 3 1 0.51 38894 > > for a given S_id, I would like to select the line corresponding to the > max count. To perform this, I used: > > aggregate(daf,list(daf$S_id),max) > Group.1 S_id AF_Class count... R_gc_percent S_length > 1 8264497 8264497 3 30 0.48 35678 > 2 8264554 8264554 3 31 0.51 38894 > > which is ok for the count. But I realized that max function is also > applied > to AF_class (should be 1 and 1 instead of 3 and 3), so it seems that > aggregate is not the appropriate function for that I want to do. Is > there any other function I could use instead? > > Best whishes, > > > Stéphane. > -- > =========================================================> Stephane CRUVEILLER Ph. D. > Genoscope - Centre National de Sequencage > Atelier de Genomique Comparative > 2, Rue Gaston Cremieux CP 5706 > 91057 Evry Cedex - France > Phone: +33 (0)1 60 87 84 58 > Fax: +33 (0)1 60 87 25 14 > EMails: scruveil@genoscope.cns.fr ,scruvell@infobiogen.fr > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What the problem you are trying to solve? [[alternative HTML version deleted]]
Nice trick, thx... St?phane. On Wed, 2006-03-29 at 11:17 -0500, jim holtman wrote:> try 'by': > > > x > S_id AF_Class count... R_gc_percent S_length > 5 8264497 1 30 0.48 35678 > 6 8264497 3 7 0.48 35678 > 8 8264554 1 31 0.51 38894 > 9 8264554 2 11 0.51 38894 > 10 8264554 3 1 0.51 38894 > > do.call('rbind', by(x, x$S_id, function(y) y[which.max(y > $AF_Class),])) > S_id AF_Class count... R_gc_percent S_length > 8264497 8264497 3 7 0.48 35678 > 8264554 8264554 3 1 0.51 38894 > > > > > > On 3/29/06, Stephane CRUVEILLER <scruveil at genoscope.cns.fr> wrote: > Dear R users, > > I have some trouble with the aggregate function. Here are my > data > > > daf > S_id AF_Class count... R_gc_percent S_length > 5 8264497 1 30 0.48 35678 > 6 8264497 3 7 0.48 35678 > 8 8264554 1 31 0.51 38894 > 9 8264554 2 11 0.51 38894 > 10 8264554 3 1 0.51 38894 > > for a given S_id, I would like to select the line > corresponding to the > max count. To perform this, I used: > > aggregate(daf,list(daf$S_id),max) > Group.1 S_id AF_Class count... R_gc_percent S_length > 1 8264497 8264497 3 30 0.48 35678 > 2 8264554 8264554 3 31 0.51 38894 > > which is ok for the count. But I realized that max function is > also > applied > to AF_class (should be 1 and 1 instead of 3 and 3), so it > seems that > aggregate is not the appropriate function for that I want to > do. Is > there any other function I could use instead? > > Best whishes, > > > St?phane. > -- > =========================================================> Stephane CRUVEILLER Ph. D. > Genoscope - Centre National de Sequencage > Atelier de Genomique Comparative > 2, Rue Gaston Cremieux CP 5706 > 91057 Evry Cedex - France > Phone: +33 (0)1 60 87 84 58 > Fax: +33 (0)1 60 87 25 14 > EMails: scruveil at genoscope.cns.fr ,scruvell at infobiogen.fr > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 (Cell) > +1 513 247 0281 (Home) > > What the problem you are trying to solve?-- =========================================================Stephane CRUVEILLER Ph. D. Genoscope - Centre National de Sequencage Atelier de Genomique Comparative 2, Rue Gaston Cremieux CP 5706 91057 Evry Cedex - France Phone: +33 (0)1 60 87 84 58 Fax: +33 (0)1 60 87 25 14 EMails: scruveil at genoscope.cns.fr ,scruvell at infobiogen.fr