thr3ads.net - R help - [R] Counting occurences of variables in a dataframe [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Kai Mx

2012-Feb-11 18:17 UTC

[R] Counting occurences of variables in a dataframe

Hi everybody,
I have a large dataframe similar to this one:
knames <-c('ab', 'aa', 'ac', 'ad',
'ab', 'ac', 'aa', 'ad','ae',
'af')
kdate <- as.Date( c('20111001', '20111102',
'20101001', '20100315',
'20101201', '20110105', '20101001', '20110504',
'20110603', '20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
I would like to add a new variable to the dataframe counting the
occurrences of different values in knames in their order of appearance
(according to the date as in indicated in kdate). The solution should be a
variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop,
but there must be a more elegant way to this.

Thanks!

Best,

Kai

	[[alternative HTML version deleted]]

Tal Galili

2012-Feb-11 18:45 UTC

head link

[R] Counting occurences of variables in a dataframe

Hello Kai

This looks like a fun question.

Here is my solution, I'd be curious to see solutions by other people here.
It can also be tweaked in various ways, and easily put into a function
(actually, if you do it - please put it back online :) )
The only thing that might require some work is the rearranging of the
columns.

Cheers,
Tal



######################
# Loading the functions
######################
# Making sure we can source code from github
source("
http://www.r-statistics.com/wp-content/uploads/2012/01/source_https.r.txt")
# This is based on code first discussed here:
##
http://www.r-statistics.com/2012/01/printing-nested-tables-in-r-bridging-between-the-reshape-and-tables-packages/

# Reading in the function for using merge that reserves order
source_https("
https://raw.github.com/talgalili/R-code-snippets/master/merge.data.frame.r")




##################
# Make Data
knames <-c('ab', 'aa', 'ac', 'ad',
'ab', 'ac', 'aa', 'ad','ae',
'af')
kdate <- as.Date( c('20111001', '20111102',
'20101001', '20100315',
'20101201', '20110105', '20101001', '20110504',
'20110603', '20110201'),
format="%Y%m%d")
kdata <- data.frame (knames, kdate)
kdata$kdate <- as.character(kdata$kdate)

##################
# Calculate counts
tmp <- data.frame(table(kdata$kdate))
colnames(tmp)[1] <- "kdate"
tmp[,1] <- as.character(tmp[,1])

# Based on this:
#
http://www.r-statistics.com/2012/01/merging-two-data-frame-objects-while-preserving-the-rows-order/
merge.data.frame(kdata ,tmp ,keep_order = "x")

### Solution:
         kdate knames Freq
9  2011-10-01     ab    1
10 2011-11-02     aa    1
2  2010-10-01     ac    2
1  2010-03-15     ad    1
4  2010-12-01     ab    1
5  2011-01-05     ac    1
3  2010-10-01     aa    2
7  2011-05-04     ad    1
8  2011-06-03     ae    1
6  2011-02-01     af    1






----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Sat, Feb 11, 2012 at 8:17 PM, Kai Mx <govokai@gmail.com> wrote:
> Hi everybody,
> I have a large dataframe similar to this one:
> knames <-c('ab', 'aa', 'ac', 'ad',
'ab', 'ac', 'aa', 'ad','ae',
'af')
> kdate <- as.Date( c('20111001', '20111102',
'20101001', '20100315',
> '20101201', '20110105', '20101001',
'20110504', '20110603', '20110201'),
> format="%Y%m%d")
> kdata <- data.frame (knames, kdate)
> I would like to add a new variable to the dataframe counting the
> occurrences of different values in knames in their order of appearance
> (according to the date as in indicated in kdate). The solution should be a
> variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop,
> but there must be a more elegant way to this.
>
> Thanks!
>
> Best,
>
> Kai
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Petr Savicky

2012-Feb-11 18:59 UTC

head link

[R] Counting occurences of variables in a dataframe

On Sat, Feb 11, 2012 at 07:17:54PM +0100, Kai Mx wrote:> Hi everybody,
> I have a large dataframe similar to this one:
> knames <-c('ab', 'aa', 'ac', 'ad',
'ab', 'ac', 'aa', 'ad','ae',
'af')
> kdate <- as.Date( c('20111001', '20111102',
'20101001', '20100315',
> '20101201', '20110105', '20101001',
'20110504', '20110603', '20110201'),
> format="%Y%m%d")
> kdata <- data.frame (knames, kdate)
> I would like to add a new variable to the dataframe counting the
> occurrences of different values in knames in their order of appearance
> (according to the date as in indicated in kdate). The solution should be a
> variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a loop,
> but there must be a more elegant way to this.
Hi.

Is the first 2 in the new variable due to the fact that
the name is "ab" and "ab" at row 5 has older date? If so,
then try the following

  ind <- order(kdata$kdate)
  f <- function(x) seq.int(along.with=x)
  kdata$x <- ave(1:nrow(kdata), kdata$knames[ind], FUN=f)[order(ind)]

     knames      kdate x
  1      ab 2011-10-01 2
  2      aa 2011-11-02 2
  3      ac 2010-10-01 1
  4      ad 2010-03-15 1
  5      ab 2010-12-01 1
  6      ac 2011-01-05 2
  7      aa 2010-10-01 1
  8      ad 2011-05-04 2
  9      ae 2011-06-03 1
  10     af 2011-02-01 1

kdata$knames[ind] orders the names by increasing date.
ave(...)[order(ind)] reorders the output of ave() to the original order.

Hope this helps.

Petr Savicky.

David Winsemius

2012-Feb-11 21:05 UTC

head link

[R] Counting occurences of variables in a dataframe

On Feb 11, 2012, at 1:17 PM, Kai Mx wrote:
> Hi everybody,
> I have a large dataframe similar to this one:
> knames <-c('ab', 'aa', 'ac', 'ad',
'ab', 'ac', 'aa', 'ad','ae',
'af')
> kdate <- as.Date( c('20111001', '20111102',
'20101001', '20100315',
> '20101201', '20110105', '20101001',
'20110504', '20110603',
> '20110201'),
> format="%Y%m%d")
> kdata <- data.frame (knames, kdate)
 >  ave(unclass(kdate), knames, FUN=order )
  [1] 2 2 1 1 1 2 1 2 1 1


That was actually not using the dataframe values but you could also do  
this:

 > kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order ))
 > kdata
    knames      kdate ord
1      ab 2011-10-01   2
2      aa 2011-11-02   2
3      ac 2010-10-01   1
4      ad 2010-03-15   1
5      ab 2010-12-01   1
6      ac 2011-01-05   2
7      aa 2010-10-01   1
8      ad 2011-05-04   2
9      ae 2011-06-03   1
10     af 2011-02-01   1
> I would like to add a new variable to the dataframe counting the
> occurrences of different values in knames in their order of appearance
> (according to the date as in indicated in kdate). The solution  
> should be a
> variable with the values 2,2,1,1,1,2,1,2,1,1. I could do it with a  
> loop,
> but there must be a more elegant way to this.
>
> Thanks!
>
> Best,
>
> Kai
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Feb 2012 - Counting occurences of variables in a dataframe

[R] Counting occurences of variables in a dataframe

[R] Counting occurences of variables in a dataframe

[R] Counting occurences of variables in a dataframe

[R] Counting occurences of variables in a dataframe

Possibly Parallel Threads