thr3ads.net - R help - [R] How to group by then count? [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Monnand

2015-Jan-04 09:02 UTC

[R] How to group by then count?

Hi all,

I thought this was a very naive problem but I have not found any solution
which is idiomatic to R.

The problem is like this:

Assuming we have vector of strings:
 x = c("1", "1", "2", "1",
"5", "2")

We want to count number of appearance of each string. i.e. in vector x,
string "1" appears 3 times; "2" appears twice and
"5" appears once. Then I
want to know which string is the majority. In this case, it is "1".

For imperative languages like C, C++ Java and python, I would use a hash
table to count each strings where keys are the strings and values are the
number of appearance. For functional languages like clojure, there're
higher order functions like group-by.

However, for R, I can hardly find a good solution to this simple problem. I
found a hash package, which implements hash table. However, installing a
package simple for a hash table is really annoying for me. I did find
aggregate and other functions which operates on data frames. But in my
case, it is a simple vector. Converting it to a data frame may be not
desirable. (Or is it?)

Could anyone suggest me an idiomatic way of doing such job in R? I would be
appreciate for your help!

-Monnand

	[[alternative HTML version deleted]]

Christian Brandstätter

2015-Jan-04 10:17 UTC

head link

[R] How to group by then count?

Dear Monnad,

one possible way would be to use as.factor() and in the summary you would get
counts for every level.

Like this:

  x = c("1", "1", "2", "1",
"5", "2")

summary(as.factor(x))

Cheers, Christian

> Hi all,
>
> I thought this was a very naive problem but I have not found any solution
> which is idiomatic to R.
>
> The problem is like this:
>
> Assuming we have vector of strings:
>   x = c("1", "1", "2", "1",
"5", "2")
>
> We want to count number of appearance of each string. i.e. in vector x,
> string "1" appears 3 times; "2" appears twice and
"5" appears once. Then I
> want to know which string is the majority. In this case, it is
"1".
>
> For imperative languages like C, C++ Java and python, I would use a hash
> table to count each strings where keys are the strings and values are the
> number of appearance. For functional languages like clojure, there're
> higher order functions like group-by.
>
> However, for R, I can hardly find a good solution to this simple problem. I
> found a hash package, which implements hash table. However, installing a
> package simple for a hash table is really annoying for me. I did find
> aggregate and other functions which operates on data frames. But in my
> case, it is a simple vector. Converting it to a data frame may be not
> desirable. (Or is it?)
>
> Could anyone suggest me an idiomatic way of doing such job in R? I would be
> appreciate for your help!
>
> -Monnand
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]

Berend Hasselman

2015-Jan-04 10:22 UTC

head link

[R] How to group by then count?

> On 04-01-2015, at 10:02, Monnand <monnand at gmail.com> wrote:
> 
> Hi all,
> 
> I thought this was a very naive problem but I have not found any solution
> which is idiomatic to R.
> 
> The problem is like this:
> 
> Assuming we have vector of strings:
> x = c("1", "1", "2", "1",
"5", "2")
> 
> We want to count number of appearance of each string. i.e. in vector x,
> string "1" appears 3 times; "2" appears twice and
"5" appears once. Then I
> want to know which string is the majority. In this case, it is
"1".
> 
> For imperative languages like C, C++ Java and python, I would use a hash
> table to count each strings where keys are the strings and values are the
> number of appearance. For functional languages like clojure, there're
> higher order functions like group-by.
> 
> However, for R, I can hardly find a good solution to this simple problem. I
> found a hash package, which implements hash table. However, installing a
> package simple for a hash table is really annoying for me. I did find
> aggregate and other functions which operates on data frames. But in my
> case, it is a simple vector. Converting it to a data frame may be not
> desirable. (Or is it?)
> 
> Could anyone suggest me an idiomatic way of doing such job in R? I would be
> appreciate for your help!
> 
Have a look at table:

?table

Berend

MacQueen, Don

2015-Jan-04 22:03 UTC

head link

[R] How to group by then count?

This seems to me to be a case where thinking in terms of computer
programming concepts is getting in the way a bit. Approach it as a data
analysis task; the S language (upon which R is based) is designed in part
for data analysis so there is a function that does most of the job for you.

(I changed your vector of strings to make the result more easily
interpreted)
> x = c("1", "1", "2", "1",
"5",
"2",'3','5','5','2','2')
> tmp <- table(x)      ## counts the number of appearances of each element
> tmp[tmp==max(tmp)]   ## finds which one occurs most often2 
4 

Meaning that the element '2' appears 4 times.  The table() function
should
be fast even with long vectors. Here's an example with a vector of length
1 million:

foo <- table( sample(letters, 1e6, replace=TRUE) )


One of the seminal books on the S language is John M Chambers' Programming
with Data -- and I would emphasize the "with Data" part of that title.

-- 

Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 1/4/15, 1:02 AM, "Monnand" <monnand at gmail.com> wrote:
>Hi all,
>
>I thought this was a very naive problem but I have not found any solution
>which is idiomatic to R.
>
>The problem is like this:
>
>Assuming we have vector of strings:
> x = c("1", "1", "2", "1",
"5", "2")
>
>We want to count number of appearance of each string. i.e. in vector x,
>string "1" appears 3 times; "2" appears twice and
"5" appears once. Then I
>want to know which string is the majority. In this case, it is
"1".
>
>For imperative languages like C, C++ Java and python, I would use a hash
>table to count each strings where keys are the strings and values are the
>number of appearance. For functional languages like clojure, there're
>higher order functions like group-by.
>
>However, for R, I can hardly find a good solution to this simple problem.
>I
>found a hash package, which implements hash table. However, installing a
>package simple for a hash table is really annoying for me. I did find
>aggregate and other functions which operates on data frames. But in my
>case, it is a simple vector. Converting it to a data frame may be not
>desirable. (Or is it?)
>
>Could anyone suggest me an idiomatic way of doing such job in R? I would
>be
>appreciate for your help!
>
>-Monnand
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

Monnand

2015-Jan-06 21:29 UTC

head link

[R] How to group by then count?

Thank you, all! Your replies are very useful, especially Don's explanation!

One complaint I have is: the function name (talbe) is really not very
informative.

On Sun Jan 04 2015 at 5:03:47 PM MacQueen, Don <macqueen1 at llnl.gov>
wrote:
> This seems to me to be a case where thinking in terms of computer
> programming concepts is getting in the way a bit. Approach it as a data
> analysis task; the S language (upon which R is based) is designed in part
> for data analysis so there is a function that does most of the job for you.
>
> (I changed your vector of strings to make the result more easily
> interpreted)
>
> > x = c("1", "1", "2", "1",
"5",
"2",'3','5','5','2','2')
> > tmp <- table(x)      ## counts the number of appearances of each
element
> > tmp[tmp==max(tmp)]   ## finds which one occurs most often
> 2
> 4
>
> Meaning that the element '2' appears 4 times.  The table() function
should
> be fast even with long vectors. Here's an example with a vector of
length
> 1 million:
>
> foo <- table( sample(letters, 1e6, replace=TRUE) )
>
>
> One of the seminal books on the S language is John M Chambers'
Programming
> with Data -- and I would emphasize the "with Data" part of that
title.
>
> --
>
> Don MacQueen
>
> Lawrence Livermore National Laboratory
> 7000 East Ave., L-627
> Livermore, CA 94550
> 925-423-1062
>
>
>
>
>
> On 1/4/15, 1:02 AM, "Monnand" <monnand at gmail.com> wrote:
>
> >Hi all,
> >
> >I thought this was a very naive problem but I have not found any
solution
> >which is idiomatic to R.
> >
> >The problem is like this:
> >
> >Assuming we have vector of strings:
> > x = c("1", "1", "2", "1",
"5", "2")
> >
> >We want to count number of appearance of each string. i.e. in vector x,
> >string "1" appears 3 times; "2" appears twice and
"5" appears once. Then I
> >want to know which string is the majority. In this case, it is
"1".
> >
> >For imperative languages like C, C++ Java and python, I would use a
hash
> >table to count each strings where keys are the strings and values are
the
> >number of appearance. For functional languages like clojure,
there're
> >higher order functions like group-by.
> >
> >However, for R, I can hardly find a good solution to this simple
problem.
> >I
> >found a hash package, which implements hash table. However, installing
a
> >package simple for a hash table is really annoying for me. I did find
> >aggregate and other functions which operates on data frames. But in my
> >case, it is a simple vector. Converting it to a data frame may be not
> >desirable. (Or is it?)
> >
> >Could anyone suggest me an idiomatic way of doing such job in R? I
would
> >be
> >appreciate for your help!
> >
> >-Monnand
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>
	[[alternative HTML version deleted]]

R help - Jan 2015 - How to group by then count?

[R] How to group by then count?

[R] How to group by then count?

[R] How to group by then count?

[R] How to group by then count?

[R] How to group by then count?