Satish Vadlamani
2016-Jul-14 19:43 UTC
[R] How to group by and get distinct rows of of grouped rows based on certain criteria
Hello All: I would like to get your help on the following problem. I have the following data and the first row is the header. Spaces are not important. I want to find out distinct combinations of ATP Group and Business Event (these are the field names that you can see in the data below) that have the Category EQ (Category is the third field) and those that do not have the category EQ. In the example below, the combinations 02/A and 02/B have EQ and the combination ZM/A does not. If I have a larger file, how to get to this answer? What did I try (with dplyr)? # I know that the below is not correct and not giving desired results file1_1 <- file1 %>% group_by(ATP.Group,Business.Event) %>% filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event) # for some reason, I have to convert to data.frame to print the data correctly file1_1 <- as.data.frame(file1_1) file1_1 *Data shown below* |ATP Group|Business Event|Category| |02 |A |AC | |02 |A |AD | |02 |A |EQ | |ZM |A |AU | |ZM |A |AV | |ZM |A |AW | |02 |B |AC | |02 |B |AY | |02 |B |EQ | -- Satish Vadlamani [[alternative HTML version deleted]]
Sarah Goslee
2016-Jul-14 20:50 UTC
[R] How to group by and get distinct rows of of grouped rows based on certain criteria
I took a wild guess as to what your data looked like (please use dput() to provide data, and please do not post in HTML), and took your request literally. Here's one way to approach the problem: mydat <- structure(list(ATP.Group = c("02", "02", "02", "ZM", "ZM", "ZM", "02", "02", "02"), Business.Event = c("A", "A", "A", "A", "A", "A", "B", "B", "B"), Category = c("AC", "AD", "EQ", "AU", "AV", "AW", "AC", "AY", "EQ")), .Names = c("ATP.Group", "Business.Event", "Category"), class = "data.frame", row.names = c(NA, -9L)) hasEQ <- subset(mydat, Category == "EQ") hasEQ <- unique(do.call("paste", c(hasEQ[, c("ATP.Group", "Business.Event")], sep="/"))) notEQ <- subset(mydat, Category != "EQ") notEQ <- unique(do.call("paste", c(notEQ[, c("ATP.Group", "Business.Event")], sep="/"))) notEQ <- notEQ[!(notEQ %in% hasEQ)]> hasEQ[1] "02/A" "02/B"> notEQ[1] "ZM/A" Sarah On Thu, Jul 14, 2016 at 3:43 PM, Satish Vadlamani <satish.vadlamani at gmail.com> wrote:> Hello All: > I would like to get your help on the following problem. > > I have the following data and the first row is the header. Spaces are not > important. > I want to find out distinct combinations of ATP Group and Business Event > (these are the field names that you can see in the data below) that have > the Category EQ (Category is the third field) and those that do not have > the category EQ. In the example below, the combinations 02/A and 02/B have > EQ and the combination ZM/A does not. > > If I have a larger file, how to get to this answer? > > What did I try (with dplyr)? > > # I know that the below is not correct and not giving desired results > file1_1 <- file1 %>% group_by(ATP.Group,Business.Event) %>% > filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event) > # for some reason, I have to convert to data.frame to print the data > correctly > file1_1 <- as.data.frame(file1_1) > file1_1 > > > *Data shown below* > |ATP Group|Business Event|Category| > |02 |A |AC | > |02 |A |AD | > |02 |A |EQ | > |ZM |A |AU | > |ZM |A |AV | > |ZM |A |AW | > |02 |B |AC | > |02 |B |AY | > |02 |B |EQ | > > --
William Dunlap
2016-Jul-14 20:53 UTC
[R] How to group by and get distinct rows of of grouped rows based on certain criteria
> txt <- "|ATP Group|Business Event|Category||02 |A |AC | |02 |A |AD | |02 |A |EQ | |ZM |A |AU | |ZM |A |AV | |ZM |A |AW | |02 |B |AC | |02 |B |AY | |02 |B |EQ | "> d <- read.table(sep="|", text=txt, header=TRUE, strip.white=TRUE,check.names=FALSE)[,2:4]> str(d)'data.frame': 9 obs. of 3 variables: $ ATP Group : Factor w/ 2 levels "02","ZM": 1 1 1 2 2 2 1 1 1 $ Business Event: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2 $ Category : Factor w/ 7 levels "AC","AD","AU",..: 1 2 7 3 4 5 1 6 7> unique(d[d[,"Category"]!="EQ", c("ATP Group", "Business Event")])ATP Group Business Event 1 02 A 4 ZM A 7 02 B> unique(d[d[,"Category"]=="EQ", c("ATP Group", "Business Event")])ATP Group Business Event 3 02 A 9 02 B Some folks prefer to use subset() instead of "[". The previous expression is equivalent to:> unique( subset(d, Category=="EQ", c("ATP Group", "Business Event")))ATP Group Business Event 3 02 A 9 02 B Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jul 14, 2016 at 12:43 PM, Satish Vadlamani < satish.vadlamani at gmail.com> wrote:> Hello All: > I would like to get your help on the following problem. > > I have the following data and the first row is the header. Spaces are not > important. > I want to find out distinct combinations of ATP Group and Business Event > (these are the field names that you can see in the data below) that have > the Category EQ (Category is the third field) and those that do not have > the category EQ. In the example below, the combinations 02/A and 02/B have > EQ and the combination ZM/A does not. > > If I have a larger file, how to get to this answer? > > What did I try (with dplyr)? > > # I know that the below is not correct and not giving desired results > file1_1 <- file1 %>% group_by(ATP.Group,Business.Event) %>% > filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event) > # for some reason, I have to convert to data.frame to print the data > correctly > file1_1 <- as.data.frame(file1_1) > file1_1 > > > *Data shown below* > |ATP Group|Business Event|Category| > |02 |A |AC | > |02 |A |AD | > |02 |A |EQ | > |ZM |A |AU | > |ZM |A |AV | > |ZM |A |AW | > |02 |B |AC | > |02 |B |AY | > |02 |B |EQ | > > -- > > Satish Vadlamani > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Satish Vadlamani
2016-Jul-15 20:43 UTC
[R] How to group by and get distinct rows of of grouped rows based on certain criteria
Thank you Bill and Sarah for your help. I was able to do the same with dplyr with the following code. But I could not post this since at that time my message was not posted yet.>>file1 <- select(file1, ATP.Group,Business.Event,Category) file1_1 <- file1 %>% group_by(ATP.Group,Business.Event) %>% filter(Category == "EQ") %>% distinct(ATP.Group,Business.Event) file1_1 <- as.data.frame(file1_1) file1_1 file1_2 <- file1 %>% group_by(ATP.Group,Business.Event) %>% distinct(ATP.Group,Business.Event) file1_2 <- as.data.frame(file1_2) file1_2 setdiff(select(file1_2,ATP.Group,Business.Event), select(file1_1,ATP.Group,Business.Event))>>On Thu, Jul 14, 2016 at 1:53 PM, William Dunlap <wdunlap at tibco.com> wrote:> > txt <- "|ATP Group|Business Event|Category| > |02 |A |AC | > |02 |A |AD | > |02 |A |EQ | > |ZM |A |AU | > |ZM |A |AV | > |ZM |A |AW | > |02 |B |AC | > |02 |B |AY | > |02 |B |EQ | > " > > d <- read.table(sep="|", text=txt, header=TRUE, strip.white=TRUE, > check.names=FALSE)[,2:4] > > str(d) > 'data.frame': 9 obs. of 3 variables: > $ ATP Group : Factor w/ 2 levels "02","ZM": 1 1 1 2 2 2 1 1 1 > $ Business Event: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2 > $ Category : Factor w/ 7 levels "AC","AD","AU",..: 1 2 7 3 4 5 1 6 7 > > unique(d[d[,"Category"]!="EQ", c("ATP Group", "Business Event")]) > ATP Group Business Event > 1 02 A > 4 ZM A > 7 02 B > > unique(d[d[,"Category"]=="EQ", c("ATP Group", "Business Event")]) > ATP Group Business Event > 3 02 A > 9 02 B > > Some folks prefer to use subset() instead of "[". The previous expression > is equivalent to: > > > unique( subset(d, Category=="EQ", c("ATP Group", "Business Event"))) > ATP Group Business Event > 3 02 A > 9 02 B > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Thu, Jul 14, 2016 at 12:43 PM, Satish Vadlamani < > satish.vadlamani at gmail.com> wrote: > >> Hello All: >> I would like to get your help on the following problem. >> >> I have the following data and the first row is the header. Spaces are not >> important. >> I want to find out distinct combinations of ATP Group and Business Event >> (these are the field names that you can see in the data below) that have >> the Category EQ (Category is the third field) and those that do not have >> the category EQ. In the example below, the combinations 02/A and 02/B have >> EQ and the combination ZM/A does not. >> >> If I have a larger file, how to get to this answer? >> >> What did I try (with dplyr)? >> >> # I know that the below is not correct and not giving desired results >> file1_1 <- file1 %>% group_by(ATP.Group,Business.Event) %>% >> filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event) >> # for some reason, I have to convert to data.frame to print the data >> correctly >> file1_1 <- as.data.frame(file1_1) >> file1_1 >> >> >> *Data shown below* >> |ATP Group|Business Event|Category| >> |02 |A |AC | >> |02 |A |AD | >> |02 |A |EQ | >> |ZM |A |AU | >> |ZM |A |AV | >> |ZM |A |AW | >> |02 |B |AC | >> |02 |B |AY | >> |02 |B |EQ | >> >> -- >> >> Satish Vadlamani >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >-- Satish Vadlamani [[alternative HTML version deleted]]