Julien Barnier
2007-Apr-13 11:07 UTC
[R] Two basic data manipulation questions (counting and aggregating)
Dear R users, I hav two basic data manipulations questions that I can't resolve. My data is a data frame which look like the following : id type 10002 "7" 10061 "1" 10061 "1" 10061 "4" 10065 "7" 10114 "1" 10114 "1" 10114 "4" 10136 "7" 10136 "2" 10136 "2" First, I would like to create a "counter" variable which will count the rank of each row inside each "id" level, ie something like : id type counter 10002 "7" 1 10061 "1" 1 10061 "1" 2 10061 "4" 3 10065 "7" 1 10114 "1" 1 10114 "1" 2 10114 "4" 3 10136 "7" 1 10136 "2" 2 10136 "2" 3 Is there a straightforward way to do that, without using several "for" loops ? The second thing I would like to do is to aggregate the first data.frame by concatenating the 'type' values for each 'id', ie I'd like to obtain something like : id value 10002 7 10061 114 10065 7 10114 114 10136 722 I have tried the "aggregate" function, but it doesn't work because the "paste" function doesn't return a scalar value. Using tapply seems to work, but is not straightforward, and I wanted to know if there is a simple way to do this. Thanks in advance for any help. -- Julien
Julien Barnier
2007-Apr-13 14:17 UTC
[R] Two basic data manipulation questions (counting and aggregating)
Hi, Sorry for replying to myself, but I think I have found a solution for my first question.> First, I would like to create a "counter" variable which will count > the rank of each row inside each "id" level, ie something like : > > id type counter > 10002 "7" 1 > 10061 "1" 1 > 10061 "1" 2 > 10061 "4" 3 > 10065 "7" 1 > 10114 "1" 1 > 10114 "1" 2 > 10114 "4" 3 > 10136 "7" 1 > 10136 "2" 2 > 10136 "2" 3If I use : df$counter <- unlist(tapply(df$id, df$id, order)) But there may be a better solution... -- Julien
Greg Snow
2007-Apr-13 14:38 UTC
[R] Two basic data manipulation questions (counting and aggregating)
For the 1st question, if every record with the same id is together in a block, then> mydf$counter <- unlist( tapply( mydf$type, mydf$id, rank, tie='first') ) Should give you your counter column If the id's are not in blocks then try:> mydf$counter <- ave(tmp$type, factor(tmp$id), FUN=function(x) rank(x,tie='first')) For the second question, look at the collapse argument to paste. Using paste with collapse='' should give you the scalar that you can use with aggregate. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Julien Barnier > Sent: Friday, April 13, 2007 5:08 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Two basic data manipulation questions (counting > and aggregating) > > Dear R users, > > I hav two basic data manipulations questions that I can't resolve. > > My data is a data frame which look like the following : > > id type > 10002 "7" > 10061 "1" > 10061 "1" > 10061 "4" > 10065 "7" > 10114 "1" > 10114 "1" > 10114 "4" > 10136 "7" > 10136 "2" > 10136 "2" > > > First, I would like to create a "counter" variable which will > count the rank of each row inside each "id" level, ie something like : > > id type counter > 10002 "7" 1 > 10061 "1" 1 > 10061 "1" 2 > 10061 "4" 3 > 10065 "7" 1 > 10114 "1" 1 > 10114 "1" 2 > 10114 "4" 3 > 10136 "7" 1 > 10136 "2" 2 > 10136 "2" 3 > > Is there a straightforward way to do that, without using > several "for" loops ? > > The second thing I would like to do is to aggregate the first > data.frame by concatenating the 'type' values for each 'id', > ie I'd like to obtain something like : > > id value > 10002 7 > 10061 114 > 10065 7 > 10114 114 > 10136 722 > > I have tried the "aggregate" function, but it doesn't work > because the "paste" function doesn't return a scalar value. > Using tapply seems to work, but is not straightforward, and I > wanted to know if there is a simple way to do this. > > Thanks in advance for any help. > > -- > Julien > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >