Hi everybody, I have a large dataframe similar to this one: knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315', '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'), format="%Y%m%d") kdata <- data.frame (knames, kdate) I would like to add a new variable to the dataframe counting the occurrences of different values in knames in their order of appearance (according to the date as in indicated in kdate). The solution should be a variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop, but there must be a more elegant way to this. Thanks! Best, Kai [[alternative HTML version deleted]]
Hello Kai This looks like a fun question. Here is my solution, I'd be curious to see solutions by other people here. It can also be tweaked in various ways, and easily put into a function (actually, if you do it - please put it back online :) ) The only thing that might require some work is the rearranging of the columns. Cheers, Tal ###################### # Loading the functions ###################### # Making sure we can source code from github source(" http://www.r-statistics.com/wp-content/uploads/2012/01/source_https.r.txt") # This is based on code first discussed here: ## http://www.r-statistics.com/2012/01/printing-nested-tables-in-r-bridging-between-the-reshape-and-tables-packages/ # Reading in the function for using merge that reserves order source_https(" https://raw.github.com/talgalili/R-code-snippets/master/merge.data.frame.r") ################## # Make Data knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315', '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'), format="%Y%m%d") kdata <- data.frame (knames, kdate) kdata$kdate <- as.character(kdata$kdate) ################## # Calculate counts tmp <- data.frame(table(kdata$kdate)) colnames(tmp)[1] <- "kdate" tmp[,1] <- as.character(tmp[,1]) # Based on this: # http://www.r-statistics.com/2012/01/merging-two-data-frame-objects-while-preserving-the-rows-order/ merge.data.frame(kdata ,tmp ,keep_order = "x") ### Solution: kdate knames Freq 9 2011-10-01 ab 1 10 2011-11-02 aa 1 2 2010-10-01 ac 2 1 2010-03-15 ad 1 4 2010-12-01 ab 1 5 2011-01-05 ac 1 3 2010-10-01 aa 2 7 2011-05-04 ad 1 8 2011-06-03 ae 1 6 2011-02-01 af 1 ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Sat, Feb 11, 2012 at 8:17 PM, Kai Mx <govokai@gmail.com> wrote:> Hi everybody, > I have a large dataframe similar to this one: > knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') > kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315', > '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'), > format="%Y%m%d") > kdata <- data.frame (knames, kdate) > I would like to add a new variable to the dataframe counting the > occurrences of different values in knames in their order of appearance > (according to the date as in indicated in kdate). The solution should be a > variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop, > but there must be a more elegant way to this. > > Thanks! > > Best, > > Kai > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Sat, Feb 11, 2012 at 07:17:54PM +0100, Kai Mx wrote:> Hi everybody, > I have a large dataframe similar to this one: > knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') > kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315', > '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'), > format="%Y%m%d") > kdata <- data.frame (knames, kdate) > I would like to add a new variable to the dataframe counting the > occurrences of different values in knames in their order of appearance > (according to the date as in indicated in kdate). The solution should be a > variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop, > but there must be a more elegant way to this.Hi. Is the first 2 in the new variable due to the fact that the name is "ab" and "ab" at row 5 has older date? If so, then try the following ind <- order(kdata$kdate) f <- function(x) seq.int(along.with=x) kdata$x <- ave(1:nrow(kdata), kdata$knames[ind], FUN=f)[order(ind)] knames kdate x 1 ab 2011-10-01 2 2 aa 2011-11-02 2 3 ac 2010-10-01 1 4 ad 2010-03-15 1 5 ab 2010-12-01 1 6 ac 2011-01-05 2 7 aa 2010-10-01 1 8 ad 2011-05-04 2 9 ae 2011-06-03 1 10 af 2011-02-01 1 kdata$knames[ind] orders the names by increasing date. ave(...)[order(ind)] reorders the output of ave() to the original order. Hope this helps. Petr Savicky.
On Feb 11, 2012, at 1:17 PM, Kai Mx wrote:> Hi everybody, > I have a large dataframe similar to this one: > knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') > kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315', > '20101201', '20110105', '20101001', '20110504', '20110603', > '20110201'), > format="%Y%m%d") > kdata <- data.frame (knames, kdate)> ave(unclass(kdate), knames, FUN=order ) [1] 2 2 1 1 1 2 1 2 1 1 That was actually not using the dataframe values but you could also do this: > kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order )) > kdata knames kdate ord 1 ab 2011-10-01 2 2 aa 2011-11-02 2 3 ac 2010-10-01 1 4 ad 2010-03-15 1 5 ab 2010-12-01 1 6 ac 2011-01-05 2 7 aa 2010-10-01 1 8 ad 2011-05-04 2 9 ae 2011-06-03 1 10 af 2011-02-01 1> I would like to add a new variable to the dataframe counting the > occurrences of different values in knames in their order of appearance > (according to the date as in indicated in kdate). The solution > should be a > variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a > loop, > but there must be a more elegant way to this. > > Thanks! > > Best, > > Kai > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Reasonably Related Threads
- Query about creating time sequences
- [PATCH v3 15/18] vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()
- [PATCH v3 15/18] vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()
- [PATCH v3 18/18] vhost: vhost_scsi_handle_vq() should just use copy_from_user()
- [PATCH v3 18/18] vhost: vhost_scsi_handle_vq() should just use copy_from_user()