Hi all, I thought this was a very naive problem but I have not found any solution which is idiomatic to R. The problem is like this: Assuming we have vector of strings: x = c("1", "1", "2", "1", "5", "2") We want to count number of appearance of each string. i.e. in vector x, string "1" appears 3 times; "2" appears twice and "5" appears once. Then I want to know which string is the majority. In this case, it is "1". For imperative languages like C, C++ Java and python, I would use a hash table to count each strings where keys are the strings and values are the number of appearance. For functional languages like clojure, there're higher order functions like group-by. However, for R, I can hardly find a good solution to this simple problem. I found a hash package, which implements hash table. However, installing a package simple for a hash table is really annoying for me. I did find aggregate and other functions which operates on data frames. But in my case, it is a simple vector. Converting it to a data frame may be not desirable. (Or is it?) Could anyone suggest me an idiomatic way of doing such job in R? I would be appreciate for your help! -Monnand [[alternative HTML version deleted]]
Dear Monnad, one possible way would be to use as.factor() and in the summary you would get counts for every level. Like this: x = c("1", "1", "2", "1", "5", "2") summary(as.factor(x)) Cheers, Christian> Hi all, > > I thought this was a very naive problem but I have not found any solution > which is idiomatic to R. > > The problem is like this: > > Assuming we have vector of strings: > x = c("1", "1", "2", "1", "5", "2") > > We want to count number of appearance of each string. i.e. in vector x, > string "1" appears 3 times; "2" appears twice and "5" appears once. Then I > want to know which string is the majority. In this case, it is "1". > > For imperative languages like C, C++ Java and python, I would use a hash > table to count each strings where keys are the strings and values are the > number of appearance. For functional languages like clojure, there're > higher order functions like group-by. > > However, for R, I can hardly find a good solution to this simple problem. I > found a hash package, which implements hash table. However, installing a > package simple for a hash table is really annoying for me. I did find > aggregate and other functions which operates on data frames. But in my > case, it is a simple vector. Converting it to a data frame may be not > desirable. (Or is it?) > > Could anyone suggest me an idiomatic way of doing such job in R? I would be > appreciate for your help! > > -Monnand > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
> On 04-01-2015, at 10:02, Monnand <monnand at gmail.com> wrote: > > Hi all, > > I thought this was a very naive problem but I have not found any solution > which is idiomatic to R. > > The problem is like this: > > Assuming we have vector of strings: > x = c("1", "1", "2", "1", "5", "2") > > We want to count number of appearance of each string. i.e. in vector x, > string "1" appears 3 times; "2" appears twice and "5" appears once. Then I > want to know which string is the majority. In this case, it is "1". > > For imperative languages like C, C++ Java and python, I would use a hash > table to count each strings where keys are the strings and values are the > number of appearance. For functional languages like clojure, there're > higher order functions like group-by. > > However, for R, I can hardly find a good solution to this simple problem. I > found a hash package, which implements hash table. However, installing a > package simple for a hash table is really annoying for me. I did find > aggregate and other functions which operates on data frames. But in my > case, it is a simple vector. Converting it to a data frame may be not > desirable. (Or is it?) > > Could anyone suggest me an idiomatic way of doing such job in R? I would be > appreciate for your help! >Have a look at table: ?table Berend
This seems to me to be a case where thinking in terms of computer programming concepts is getting in the way a bit. Approach it as a data analysis task; the S language (upon which R is based) is designed in part for data analysis so there is a function that does most of the job for you. (I changed your vector of strings to make the result more easily interpreted)> x = c("1", "1", "2", "1", "5", "2",'3','5','5','2','2') > tmp <- table(x) ## counts the number of appearances of each element > tmp[tmp==max(tmp)] ## finds which one occurs most often2 4 Meaning that the element '2' appears 4 times. The table() function should be fast even with long vectors. Here's an example with a vector of length 1 million: foo <- table( sample(letters, 1e6, replace=TRUE) ) One of the seminal books on the S language is John M Chambers' Programming with Data -- and I would emphasize the "with Data" part of that title. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/4/15, 1:02 AM, "Monnand" <monnand at gmail.com> wrote:>Hi all, > >I thought this was a very naive problem but I have not found any solution >which is idiomatic to R. > >The problem is like this: > >Assuming we have vector of strings: > x = c("1", "1", "2", "1", "5", "2") > >We want to count number of appearance of each string. i.e. in vector x, >string "1" appears 3 times; "2" appears twice and "5" appears once. Then I >want to know which string is the majority. In this case, it is "1". > >For imperative languages like C, C++ Java and python, I would use a hash >table to count each strings where keys are the strings and values are the >number of appearance. For functional languages like clojure, there're >higher order functions like group-by. > >However, for R, I can hardly find a good solution to this simple problem. >I >found a hash package, which implements hash table. However, installing a >package simple for a hash table is really annoying for me. I did find >aggregate and other functions which operates on data frames. But in my >case, it is a simple vector. Converting it to a data frame may be not >desirable. (Or is it?) > >Could anyone suggest me an idiomatic way of doing such job in R? I would >be >appreciate for your help! > >-Monnand > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Thank you, all! Your replies are very useful, especially Don's explanation! One complaint I have is: the function name (talbe) is really not very informative. On Sun Jan 04 2015 at 5:03:47 PM MacQueen, Don <macqueen1 at llnl.gov> wrote:> This seems to me to be a case where thinking in terms of computer > programming concepts is getting in the way a bit. Approach it as a data > analysis task; the S language (upon which R is based) is designed in part > for data analysis so there is a function that does most of the job for you. > > (I changed your vector of strings to make the result more easily > interpreted) > > > x = c("1", "1", "2", "1", "5", "2",'3','5','5','2','2') > > tmp <- table(x) ## counts the number of appearances of each element > > tmp[tmp==max(tmp)] ## finds which one occurs most often > 2 > 4 > > Meaning that the element '2' appears 4 times. The table() function should > be fast even with long vectors. Here's an example with a vector of length > 1 million: > > foo <- table( sample(letters, 1e6, replace=TRUE) ) > > > One of the seminal books on the S language is John M Chambers' Programming > with Data -- and I would emphasize the "with Data" part of that title. > > -- > > Don MacQueen > > Lawrence Livermore National Laboratory > 7000 East Ave., L-627 > Livermore, CA 94550 > 925-423-1062 > > > > > > On 1/4/15, 1:02 AM, "Monnand" <monnand at gmail.com> wrote: > > >Hi all, > > > >I thought this was a very naive problem but I have not found any solution > >which is idiomatic to R. > > > >The problem is like this: > > > >Assuming we have vector of strings: > > x = c("1", "1", "2", "1", "5", "2") > > > >We want to count number of appearance of each string. i.e. in vector x, > >string "1" appears 3 times; "2" appears twice and "5" appears once. Then I > >want to know which string is the majority. In this case, it is "1". > > > >For imperative languages like C, C++ Java and python, I would use a hash > >table to count each strings where keys are the strings and values are the > >number of appearance. For functional languages like clojure, there're > >higher order functions like group-by. > > > >However, for R, I can hardly find a good solution to this simple problem. > >I > >found a hash package, which implements hash table. However, installing a > >package simple for a hash table is really annoying for me. I did find > >aggregate and other functions which operates on data frames. But in my > >case, it is a simple vector. Converting it to a data frame may be not > >desirable. (Or is it?) > > > >Could anyone suggest me an idiomatic way of doing such job in R? I would > >be > >appreciate for your help! > > > >-Monnand > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]