Bastien Ferland-Raymond
2010-Sep-17 17:02 UTC
[R] grouping dataframe entries using a categorical variable
DearR Users,
I have a problem which I think you might be able to help. I have a dataframe
which I'm trying to "filter" following different groups I
specified. It's a little hard to explain, so here is an example:
My dataframe:
ESS DHP
1 EPB 22
2 SAB 10
3 SAB 20
4 BOJ 14
5 ERS 28
11 SAB 10
12 SAB 22
13 BOJ 26
20 SAB 10
21 SAB 22
22 BOJ 32
29 SAB 14
30 SAB 22
38 SAB 14
47 SAB 18
I'm trying to filter it by selecting a subgroup of ESS, for example:
softwood<- c("EPB","SAB")
So I can obtain:
NEW dataframe:
ESS DHP
1 EPB 22
2 SAB 10
3 SAB 20
11 SAB 10
12 SAB 22
20 SAB 10
21 SAB 22
29 SAB 14
30 SAB 22
38 SAB 14
47 SAB 18
(my real groups are actually bigger and so are my dataframe but you get the
idea).
I have looked at subset and aggregate but it doesn't work and the loop would
be totally inefficient. I'm sure there is a function in R that does
something like that but I couldn't find the proper "keyword" to
search for it.
Thanks for your help,
Bastien
Ista Zahn
2010-Sep-17 17:31 UTC
[R] grouping dataframe entries using a categorical variable
Hi Bastien,
You can use match(), or the convenience function %in%, like this
(assuming your data.frame is named "dat"):
subset(dat, ESS %in% c("EPB","SAB"))
dat[dat$ESS %in% c("EPB","SAB"), ]
best,
Ista
On Fri, Sep 17, 2010 at 1:02 PM, Bastien Ferland-Raymond
<bastien.ferland-raymond.1 at ulaval.ca> wrote:> ?DearR Users,
>
> I have a problem which I think you might be able to help. ?I have a
dataframe which I'm trying to "filter" following different groups
I specified. ?It's a little hard to explain, so here is an example:
>
> My dataframe:
>
> ? ESS DHP
> 1 ?EPB ?22
> 2 ?SAB ?10
> 3 ?SAB ?20
> 4 ?BOJ ?14
> 5 ?ERS ?28
> 11 SAB ?10
> 12 SAB ?22
> 13 BOJ ?26
> 20 SAB ?10
> 21 SAB ?22
> 22 BOJ ?32
> 29 SAB ?14
> 30 SAB ?22
> 38 SAB ?14
> 47 SAB ?18
>
> I'm trying to filter it by selecting a subgroup of ESS, for example:
> ?softwood<- c("EPB","SAB")
>
> So I can obtain:
> NEW dataframe:
> ? ESS DHP
> 1 ?EPB ?22
> 2 ?SAB ?10
> 3 ?SAB ?20
> 11 SAB ?10
> 12 SAB ?22
> 20 SAB ?10
> 21 SAB ?22
> 29 SAB ?14
> 30 SAB ?22
> 38 SAB ?14
> 47 SAB ?18
>
> (my real groups are actually bigger and so are my dataframe but you get the
idea).
>
> I have looked at subset and aggregate but it doesn't work and the loop
would be totally inefficient. I'm sure there is a function in R that does
something like that but I couldn't find the proper "keyword" to
search for it.
>
> Thanks for your help,
>
> Bastien
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org
Phil Spector
2010-Sep-17 17:33 UTC
[R] grouping dataframe entries using a categorical variable
Bastien -
In what way did
subset(yourdataframe,ESS %in% softwood)
not work?
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Fri, 17 Sep 2010, Bastien Ferland-Raymond wrote:
> DearR Users,
>
> I have a problem which I think you might be able to help. I have a
dataframe which I'm trying to "filter" following different groups
I specified. It's a little hard to explain, so here is an example:
>
> My dataframe:
>
> ESS DHP
> 1 EPB 22
> 2 SAB 10
> 3 SAB 20
> 4 BOJ 14
> 5 ERS 28
> 11 SAB 10
> 12 SAB 22
> 13 BOJ 26
> 20 SAB 10
> 21 SAB 22
> 22 BOJ 32
> 29 SAB 14
> 30 SAB 22
> 38 SAB 14
> 47 SAB 18
>
> I'm trying to filter it by selecting a subgroup of ESS, for example:
> softwood<- c("EPB","SAB")
>
> So I can obtain:
> NEW dataframe:
> ESS DHP
> 1 EPB 22
> 2 SAB 10
> 3 SAB 20
> 11 SAB 10
> 12 SAB 22
> 20 SAB 10
> 21 SAB 22
> 29 SAB 14
> 30 SAB 22
> 38 SAB 14
> 47 SAB 18
>
> (my real groups are actually bigger and so are my dataframe but you get the
idea).
>
> I have looked at subset and aggregate but it doesn't work and the loop
would be totally inefficient. I'm sure there is a function in R that does
something like that but I couldn't find the proper "keyword" to
search for it.
>
> Thanks for your help,
>
> Bastien
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>