Matthew Pettis
2008-Sep-25 19:00 UTC
[R] Equivalent of 'first.var' or 'last.var' from SAS in R?
Hi, I want to sort a data frame by multiple columns and then take the first record in each unique level of the "by" group I used to sort the data frame. Does someone have an example of how to do this? Thanks, Matt -- It is from the wellspring of our despair and the places that we are broken that we come to repair the world. -- Murray Waas
Peter Dalgaard
2008-Sep-25 19:26 UTC
[R] Equivalent of 'first.var' or 'last.var' from SAS in R?
Matthew Pettis wrote:> Hi, > > I want to sort a data frame by multiple columns and then take the > first record in each unique level of the "by" group I used to sort the > data frame. Does someone have an example of how to do this? > > Thanks, > Matt > >Something like this > aggregate(airquality,airquality["Month"],head,1) Month Ozone Solar.R Wind Temp Month Day 1 5 41 190 7.4 67 5 1 2 6 NA 286 8.6 78 6 1 3 7 135 269 4.1 84 7 1 4 8 39 83 6.9 81 8 1 5 9 96 167 6.9 91 9 1 where you probably want to lose the first column. or > unsplit(lapply(split(aq,aq$Month), head,1),5:9) Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 32 NA 286 8.6 78 6 1 62 135 269 4.1 84 7 1 93 39 83 6.9 81 8 1 124 96 167 6.9 91 9 1 This also works, but the "tail" variant is harder: > unsplit(lapply(split(aq,aq$Month), "[",1,),5:9) -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
hadley wickham
2008-Sep-25 20:05 UTC
[R] Equivalent of 'first.var' or 'last.var' from SAS in R?
On Thu, Sep 25, 2008 at 2:00 PM, Matthew Pettis <matthew.pettis at gmail.com> wrote:> Hi, > > I want to sort a data frame by multiple columns and then take the > first record in each unique level of the "by" group I used to sort the > data frame. Does someone have an example of how to do this? > > Thanks, > MattIn the (very soon to be released) plyr package, you can do: library(plyr) ddply(airquality, .(Month), head, 1) ddply(airquality, .(Month), tail, 1) Hadley -- http://had.co.nz/
Reasonably Related Threads
- Odd behaviour in within.list() when deleting 2+ variables
- Odd behaviour in within.list() when deleting 2+ variables
- print.data.frame : row.name = FALSE not having intended effect
- Newbie: Ranking a data frame, grouped by 2 or more columns
- Newbie: Formatting numbers with commas