Bastien Ferland-Raymond
2010-Sep-17 17:02 UTC
[R] grouping dataframe entries using a categorical variable
DearR Users, I have a problem which I think you might be able to help. I have a dataframe which I'm trying to "filter" following different groups I specified. It's a little hard to explain, so here is an example: My dataframe: ESS DHP 1 EPB 22 2 SAB 10 3 SAB 20 4 BOJ 14 5 ERS 28 11 SAB 10 12 SAB 22 13 BOJ 26 20 SAB 10 21 SAB 22 22 BOJ 32 29 SAB 14 30 SAB 22 38 SAB 14 47 SAB 18 I'm trying to filter it by selecting a subgroup of ESS, for example: softwood<- c("EPB","SAB") So I can obtain: NEW dataframe: ESS DHP 1 EPB 22 2 SAB 10 3 SAB 20 11 SAB 10 12 SAB 22 20 SAB 10 21 SAB 22 29 SAB 14 30 SAB 22 38 SAB 14 47 SAB 18 (my real groups are actually bigger and so are my dataframe but you get the idea). I have looked at subset and aggregate but it doesn't work and the loop would be totally inefficient. I'm sure there is a function in R that does something like that but I couldn't find the proper "keyword" to search for it. Thanks for your help, Bastien
Ista Zahn
2010-Sep-17 17:31 UTC
[R] grouping dataframe entries using a categorical variable
Hi Bastien, You can use match(), or the convenience function %in%, like this (assuming your data.frame is named "dat"): subset(dat, ESS %in% c("EPB","SAB")) dat[dat$ESS %in% c("EPB","SAB"), ] best, Ista On Fri, Sep 17, 2010 at 1:02 PM, Bastien Ferland-Raymond <bastien.ferland-raymond.1 at ulaval.ca> wrote:> ?DearR Users, > > I have a problem which I think you might be able to help. ?I have a dataframe which I'm trying to "filter" following different groups I specified. ?It's a little hard to explain, so here is an example: > > My dataframe: > > ? ESS DHP > 1 ?EPB ?22 > 2 ?SAB ?10 > 3 ?SAB ?20 > 4 ?BOJ ?14 > 5 ?ERS ?28 > 11 SAB ?10 > 12 SAB ?22 > 13 BOJ ?26 > 20 SAB ?10 > 21 SAB ?22 > 22 BOJ ?32 > 29 SAB ?14 > 30 SAB ?22 > 38 SAB ?14 > 47 SAB ?18 > > I'm trying to filter it by selecting a subgroup of ESS, for example: > ?softwood<- c("EPB","SAB") > > So I can obtain: > NEW dataframe: > ? ESS DHP > 1 ?EPB ?22 > 2 ?SAB ?10 > 3 ?SAB ?20 > 11 SAB ?10 > 12 SAB ?22 > 20 SAB ?10 > 21 SAB ?22 > 29 SAB ?14 > 30 SAB ?22 > 38 SAB ?14 > 47 SAB ?18 > > (my real groups are actually bigger and so are my dataframe but you get the idea). > > I have looked at subset and aggregate but it doesn't work and the loop would be totally inefficient. I'm sure there is a function in R that does something like that but I couldn't find the proper "keyword" to search for it. > > Thanks for your help, > > Bastien > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
Phil Spector
2010-Sep-17 17:33 UTC
[R] grouping dataframe entries using a categorical variable
Bastien - In what way did subset(yourdataframe,ESS %in% softwood) not work? - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Fri, 17 Sep 2010, Bastien Ferland-Raymond wrote:> DearR Users, > > I have a problem which I think you might be able to help. I have a dataframe which I'm trying to "filter" following different groups I specified. It's a little hard to explain, so here is an example: > > My dataframe: > > ESS DHP > 1 EPB 22 > 2 SAB 10 > 3 SAB 20 > 4 BOJ 14 > 5 ERS 28 > 11 SAB 10 > 12 SAB 22 > 13 BOJ 26 > 20 SAB 10 > 21 SAB 22 > 22 BOJ 32 > 29 SAB 14 > 30 SAB 22 > 38 SAB 14 > 47 SAB 18 > > I'm trying to filter it by selecting a subgroup of ESS, for example: > softwood<- c("EPB","SAB") > > So I can obtain: > NEW dataframe: > ESS DHP > 1 EPB 22 > 2 SAB 10 > 3 SAB 20 > 11 SAB 10 > 12 SAB 22 > 20 SAB 10 > 21 SAB 22 > 29 SAB 14 > 30 SAB 22 > 38 SAB 14 > 47 SAB 18 > > (my real groups are actually bigger and so are my dataframe but you get the idea). > > I have looked at subset and aggregate but it doesn't work and the loop would be totally inefficient. I'm sure there is a function in R that does something like that but I couldn't find the proper "keyword" to search for it. > > Thanks for your help, > > Bastien > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >