thr3ads.net - R help - [R] "not all duplicated" question [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Vesco Miloushev

2013-Jul-13 20:12 UTC

[R] "not all duplicated" question

Hi,

I want to select elements which have duplicates by are not all duplicated.

Here is what I mean. Suppose I have a two column matrix with columns
"Country" and "Pet"


Country, Pet
------------------
France, Dog
France, Cat
France, Dog
Canada, Cat
Canada, Cat
Japan, Dog
Japan, Cat
Italy, Cat

I want to extract all the entries that are duplicated in column
"Country" but not ALL duplicated in column "Pet".

In this case I want

Country, Pet
------------------
France, Dog
France, Cat
France, Dog
Japan, Dog
Japan, Cat

Notice that I keep France, because not all are duplicated. If there
was no entry "France, Cat" then it all of the entries with
"France"
would be eliminated.

Thanks for your help.

jim holtman

2013-Jul-13 22:32 UTC

head link

[R] "not all duplicated" question

try this:
> x <- read.csv(text = "Country, Pet+  France, Dog
+  France, Cat
+  France, Dog
+  Canada, Cat
+  Canada, Cat
+  Japan, Dog
+  Japan, Cat
+  Italy, Cat", as.is = TRUE)> # split by Country and then see if dups in "Pet"
> xs <- split(x, x$Country)
> Dups <- do.call(rbind+ , lapply(xs, function(.country){
+ if (all(.country$Pet[1L] == .country$Pet)) return(NULL)
+ .country  # return match
+ })
+ )> row.names(Dups) <- NULL  # remove rownames before printing
> Dups  Country  Pet
1  France  Dog
2  France  Cat
3  France  Dog
4   Japan  Dog
5   Japan  Cat>

On Sat, Jul 13, 2013 at 4:12 PM, Vesco Miloushev
<vesco.miloushev@gmail.com>wrote:
> Hi,
>
> I want to select elements which have duplicates by are not all duplicated.
>
> Here is what I mean. Suppose I have a two column matrix with columns
> "Country" and "Pet"
>
>
> Country, Pet
> ------------------
> France, Dog
> France, Cat
> France, Dog
> Canada, Cat
> Canada, Cat
> Japan, Dog
> Japan, Cat
> Italy, Cat
>
> I want to extract all the entries that are duplicated in column
> "Country" but not ALL duplicated in column "Pet".
>
> In this case I want
>
> Country, Pet
> ------------------
> France, Dog
> France, Cat
> France, Dog
> Japan, Dog
> Japan, Cat
>
> Notice that I keep France, because not all are duplicated. If there
> was no entry "France, Cat" then it all of the entries with
"France"
> would be eliminated.
>
> Thanks for your help.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

	[[alternative HTML version deleted]]

arun

2013-Jul-14 01:08 UTC

head link

[R] "not all duplicated" question

Hi,
May be this helps:
dat1<- read.table(text="
Country, Pet
France, Dog
France, Cat
France, Dog
Canada, Cat
Canada, Cat
Japan, Dog
Japan, Cat
Italy, Cat
",sep=",",header=TRUE,stringsAsFactors=FALSE)


?dat1[with(dat1,as.numeric(ave(Pet,Country,FUN=function(x)
length(unique(x)))))>1,]
#? Country? Pet
#1? France? Dog
#2? France? Cat
#3? France? Dog
#6?? Japan? Dog
#7?? Japan? Cat
A.K.



----- Original Message -----
From: Vesco Miloushev <vesco.miloushev at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, July 13, 2013 4:12 PM
Subject: [R] "not all duplicated" question

Hi,

I want to select elements which have duplicates by are not all duplicated.

Here is what I mean. Suppose I have a two column matrix with columns
"Country" and "Pet"


Country, Pet
------------------
France, Dog
France, Cat
France, Dog
Canada, Cat
Canada, Cat
Japan, Dog
Japan, Cat
Italy, Cat

I want to extract all the entries that are duplicated in column
"Country" but not ALL duplicated in column "Pet".

In this case I want

Country, Pet
------------------
France, Dog
France, Cat
France, Dog
Japan, Dog
Japan, Cat

Notice that I keep France, because not all are duplicated. If there
was no entry "France, Cat" then it all of the entries with
"France"
would be eliminated.

Thanks for your help.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

R help - Jul 2013 - "not all duplicated" question

[R] "not all duplicated" question

[R] "not all duplicated" question

[R] "not all duplicated" question