thr3ads.net - R help - [R] How to group by and get distinct rows of of grouped rows based on certain criteria [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Satish Vadlamani

2016-Jul-14 19:43 UTC

[R] How to group by and get distinct rows of of grouped rows based on certain criteria

Hello All:
I would like to get your help on the following problem.

I have the following data and the first row is the header. Spaces are not
important.
I want to find out distinct combinations of ATP Group and Business Event
(these are the field names that you can see in the data below) that have
the Category EQ (Category is the third field) and those that do not have
the category EQ. In the example below, the combinations 02/A and 02/B have
EQ and the combination ZM/A does not.

If I have a larger file, how to get to this answer?

What did I try (with dplyr)?

# I know that the below is not correct and not giving desired results
file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event)
# for some reason, I have to convert to data.frame to print the data
correctly
file1_1 <- as.data.frame(file1_1)
file1_1


*Data shown below*
|ATP Group|Business Event|Category|
|02       |A             |AC      |
|02       |A             |AD      |
|02       |A             |EQ      |
|ZM       |A             |AU      |
|ZM       |A             |AV      |
|ZM       |A             |AW      |
|02       |B             |AC      |
|02       |B             |AY      |
|02       |B             |EQ      |

-- 

Satish Vadlamani

	[[alternative HTML version deleted]]

Sarah Goslee

2016-Jul-14 20:50 UTC

head link

[R] How to group by and get distinct rows of of grouped rows based on certain criteria

I took a wild guess as to what your data looked like (please use
dput() to provide data, and please do not post in HTML), and took your
request literally.

Here's one way to approach the problem:


mydat <- structure(list(ATP.Group = c("02", "02",
"02", "ZM", "ZM", "ZM",
"02", "02", "02"), Business.Event =
c("A", "A", "A", "A", "A",
"A", "B", "B", "B"), Category =
c("AC", "AD", "EQ", "AU",
"AV",
"AW", "AC", "AY", "EQ")), .Names =
c("ATP.Group", "Business.Event",
"Category"), class = "data.frame", row.names = c(NA, -9L))


hasEQ <- subset(mydat, Category == "EQ")
hasEQ <- unique(do.call("paste", c(hasEQ[, c("ATP.Group",
"Business.Event")], sep="/")))

notEQ <- subset(mydat, Category != "EQ")
notEQ <- unique(do.call("paste", c(notEQ[, c("ATP.Group",
"Business.Event")], sep="/")))
notEQ <- notEQ[!(notEQ %in% hasEQ)]
> hasEQ
[1] "02/A" "02/B"> notEQ[1] "ZM/A"

Sarah

On Thu, Jul 14, 2016 at 3:43 PM, Satish Vadlamani
<satish.vadlamani at gmail.com> wrote:> Hello All:
> I would like to get your help on the following problem.
>
> I have the following data and the first row is the header. Spaces are not
> important.
> I want to find out distinct combinations of ATP Group and Business Event
> (these are the field names that you can see in the data below) that have
> the Category EQ (Category is the third field) and those that do not have
> the category EQ. In the example below, the combinations 02/A and 02/B have
> EQ and the combination ZM/A does not.
>
> If I have a larger file, how to get to this answer?
>
> What did I try (with dplyr)?
>
> # I know that the below is not correct and not giving desired results
> file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
> filter(Category != "EQ") %>%
distinct(ATP.Group,Business.Event)
> # for some reason, I have to convert to data.frame to print the data
> correctly
> file1_1 <- as.data.frame(file1_1)
> file1_1
>
>
> *Data shown below*
> |ATP Group|Business Event|Category|
> |02       |A             |AC      |
> |02       |A             |AD      |
> |02       |A             |EQ      |
> |ZM       |A             |AU      |
> |ZM       |A             |AV      |
> |ZM       |A             |AW      |
> |02       |B             |AC      |
> |02       |B             |AY      |
> |02       |B             |EQ      |
>
> --

William Dunlap

2016-Jul-14 20:53 UTC

head link

[R] How to group by and get distinct rows of of grouped rows based on certain criteria

> txt <- "|ATP Group|Business Event|Category||02       |A             |AC      |
|02       |A             |AD      |
|02       |A             |EQ      |
|ZM       |A             |AU      |
|ZM       |A             |AV      |
|ZM       |A             |AW      |
|02       |B             |AC      |
|02       |B             |AY      |
|02       |B             |EQ      |
"> d <- read.table(sep="|", text=txt, header=TRUE,
strip.white=TRUE,
check.names=FALSE)[,2:4]> str(d)'data.frame':   9 obs. of  3 variables:
 $ ATP Group     : Factor w/ 2 levels "02","ZM": 1 1 1 2 2 2
1 1 1
 $ Business Event: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2
2 2
 $ Category      : Factor w/ 7 levels
"AC","AD","AU",..: 1 2 7 3 4 5 1 6
7> unique(d[d[,"Category"]!="EQ", c("ATP Group",
"Business Event")])  ATP Group Business Event
1        02              A
4        ZM              A
7        02              B> unique(d[d[,"Category"]=="EQ", c("ATP Group",
"Business Event")])  ATP Group Business Event
3        02              A
9        02              B

Some folks prefer to use subset() instead of "[".  The previous
expression
is equivalent to:
> unique( subset(d, Category=="EQ", c("ATP Group",
"Business Event")))  ATP Group Business Event
3        02              A
9        02              B


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jul 14, 2016 at 12:43 PM, Satish Vadlamani <
satish.vadlamani at gmail.com> wrote:
> Hello All:
> I would like to get your help on the following problem.
>
> I have the following data and the first row is the header. Spaces are not
> important.
> I want to find out distinct combinations of ATP Group and Business Event
> (these are the field names that you can see in the data below) that have
> the Category EQ (Category is the third field) and those that do not have
> the category EQ. In the example below, the combinations 02/A and 02/B have
> EQ and the combination ZM/A does not.
>
> If I have a larger file, how to get to this answer?
>
> What did I try (with dplyr)?
>
> # I know that the below is not correct and not giving desired results
> file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
> filter(Category != "EQ") %>%
distinct(ATP.Group,Business.Event)
> # for some reason, I have to convert to data.frame to print the data
> correctly
> file1_1 <- as.data.frame(file1_1)
> file1_1
>
>
> *Data shown below*
> |ATP Group|Business Event|Category|
> |02       |A             |AC      |
> |02       |A             |AD      |
> |02       |A             |EQ      |
> |ZM       |A             |AU      |
> |ZM       |A             |AV      |
> |ZM       |A             |AW      |
> |02       |B             |AC      |
> |02       |B             |AY      |
> |02       |B             |EQ      |
>
> --
>
> Satish Vadlamani
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Satish Vadlamani

2016-Jul-15 20:43 UTC

head link

[R] How to group by and get distinct rows of of grouped rows based on certain criteria

Thank you Bill and Sarah for your help. I was able to do the same with
dplyr with the following code. But I could not post this since at that time
my message was not posted yet.
>>file1 <- select(file1, ATP.Group,Business.Event,Category)

file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
filter(Category == "EQ") %>% distinct(ATP.Group,Business.Event)
file1_1 <- as.data.frame(file1_1)
file1_1

file1_2 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
distinct(ATP.Group,Business.Event)
file1_2 <- as.data.frame(file1_2)
file1_2

setdiff(select(file1_2,ATP.Group,Business.Event),
select(file1_1,ATP.Group,Business.Event))>>
On Thu, Jul 14, 2016 at 1:53 PM, William Dunlap <wdunlap at tibco.com>
wrote:
> > txt <- "|ATP Group|Business Event|Category|
> |02       |A             |AC      |
> |02       |A             |AD      |
> |02       |A             |EQ      |
> |ZM       |A             |AU      |
> |ZM       |A             |AV      |
> |ZM       |A             |AW      |
> |02       |B             |AC      |
> |02       |B             |AY      |
> |02       |B             |EQ      |
> "
> > d <- read.table(sep="|", text=txt, header=TRUE,
strip.white=TRUE,
> check.names=FALSE)[,2:4]
> > str(d)
> 'data.frame':   9 obs. of  3 variables:
>  $ ATP Group     : Factor w/ 2 levels "02","ZM": 1 1 1
2 2 2 1 1 1
>  $ Business Event: Factor w/ 2 levels "A","B": 1 1 1 1
1 1 2 2 2
>  $ Category      : Factor w/ 7 levels
"AC","AD","AU",..: 1 2 7 3 4 5 1 6 7
> > unique(d[d[,"Category"]!="EQ", c("ATP
Group", "Business Event")])
>   ATP Group Business Event
> 1        02              A
> 4        ZM              A
> 7        02              B
> > unique(d[d[,"Category"]=="EQ", c("ATP
Group", "Business Event")])
>   ATP Group Business Event
> 3        02              A
> 9        02              B
>
> Some folks prefer to use subset() instead of "[".  The previous
expression
> is equivalent to:
>
> > unique( subset(d, Category=="EQ", c("ATP Group",
"Business Event")))
>   ATP Group Business Event
> 3        02              A
> 9        02              B
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jul 14, 2016 at 12:43 PM, Satish Vadlamani <
> satish.vadlamani at gmail.com> wrote:
>
>> Hello All:
>> I would like to get your help on the following problem.
>>
>> I have the following data and the first row is the header. Spaces are
not
>> important.
>> I want to find out distinct combinations of ATP Group and Business
Event
>> (these are the field names that you can see in the data below) that
have
>> the Category EQ (Category is the third field) and those that do not
have
>> the category EQ. In the example below, the combinations 02/A and 02/B
have
>> EQ and the combination ZM/A does not.
>>
>> If I have a larger file, how to get to this answer?
>>
>> What did I try (with dplyr)?
>>
>> # I know that the below is not correct and not giving desired results
>> file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
>> filter(Category != "EQ") %>%
distinct(ATP.Group,Business.Event)
>> # for some reason, I have to convert to data.frame to print the data
>> correctly
>> file1_1 <- as.data.frame(file1_1)
>> file1_1
>>
>>
>> *Data shown below*
>> |ATP Group|Business Event|Category|
>> |02       |A             |AC      |
>> |02       |A             |AD      |
>> |02       |A             |EQ      |
>> |ZM       |A             |AU      |
>> |ZM       |A             |AV      |
>> |ZM       |A             |AW      |
>> |02       |B             |AC      |
>> |02       |B             |AY      |
>> |02       |B             |EQ      |
>>
>> --
>>
>> Satish Vadlamani
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

-- 

Satish Vadlamani

	[[alternative HTML version deleted]]

R help - Jul 2016 - How to group by and get distinct rows of of grouped rows based on certain criteria

[R] How to group by and get distinct rows of of grouped rows based on certain criteria

[R] How to group by and get distinct rows of of grouped rows based on certain criteria

[R] How to group by and get distinct rows of of grouped rows based on certain criteria

[R] How to group by and get distinct rows of of grouped rows based on certain criteria