thr3ads.net - R help - [R] Find backward duplicates in a data frame [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Hermann Norpois

2013-Nov-15 21:26 UTC

[R] Find backward duplicates in a data frame

Hello,

I am looking for a method to eliminate rows dupblicates in a backwards
manner, for instance:
I want to keep A B but not B A (see my data.frame test).
Thanks
Hermann
> test  a u
1 A B
2 A C
3 B A
4 B F
5 C A
6 D W> dput (test)structure(list(a = structure(c(1L, 1L, 2L, 2L, 3L, 4L), .Label =
c("A",
"B", "C", "D"), class = "factor"), u =
structure(c(2L, 3L, 1L,
4L, 1L, 5L), .Label = c("A", "B", "C",
"F", "W"), class = "factor")),
.Names = c("a",
"u"), row.names = c(NA, -6L), class = "data.frame")

	[[alternative HTML version deleted]]

arun

2013-Nov-15 23:08 UTC

head link

[R] Find backward duplicates in a data frame

Hi,May be:

fun1 <- function(dat){
indx <- apply(dat,1,function(x) {
??? ??? any(x==sort(x))| !any(as.character(interaction(dat,sep=""))
%in% paste(sort(x),collapse=""))
??? ??? })
dat[indx,]
}

test1 <- rbind(test,data.frame(a="F",u="E"))
fun1(test)
fun1(test1)




A.K.




On Friday, November 15, 2013 4:58 PM, Hermann Norpois <hnorpois at
gmail.com> wrote:
Hello,

I am looking for a method to eliminate rows dupblicates in a backwards
manner, for instance:
I want to keep A B but not B A (see my data.frame test).
Thanks
Hermann
> test? a u
1 A B
2 A C
3 B A
4 B F
5 C A
6 D W> dput (test)structure(list(a = structure(c(1L, 1L, 2L, 2L, 3L, 4L), .Label =
c("A",
"B", "C", "D"), class = "factor"), u =
structure(c(2L, 3L, 1L,
4L, 1L, 5L), .Label = c("A", "B", "C",
"F", "W"), class = "factor")),
.Names = c("a",
"u"), row.names = c(NA, -6L), class = "data.frame")

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

MacQueen, Don

2013-Nov-15 23:43 UTC

head link

[R] Find backward duplicates in a data frame

So rows are considered duplicated if they have the same two characters,
regardless of which column they're in?

If the B A row came first is it ok to keep that row, or would you want to
keep the A B row?

This appears to work, at least for this example.

  foo <- t(apply(test,1, function(x) sort(format(x)) ))
  test[ !duplicated(foo),]

  a u
1 A B
2 A C
4 B F
6 D W

Note that the function sorts the formatted value, in case the factor
levels are such that they don't sort alphabetically.

Notice also that in the result, the second column ('u') is still a
factor,
and its levels still include 'A', even though A no longer is present in
the column. Whether or not that's wanted, I couldn't say.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062

On 11/15/13 1:26 PM, "Hermann Norpois" <hnorpois at gmail.com>
wrote:
>Hello,
>
>I am looking for a method to eliminate rows dupblicates in a backwards
>manner, for instance:
>I want to keep A B but not B A (see my data.frame test).
>Thanks
>Hermann
>
>> test
>  a u
>1 A B
>2 A C
>3 B A
>4 B F
>5 C A
>6 D W
>> dput (test)
>structure(list(a = structure(c(1L, 1L, 2L, 2L, 3L, 4L), .Label =
c("A",
>"B", "C", "D"), class = "factor"), u
= structure(c(2L, 3L, 1L,
>4L, 1L, 5L), .Label = c("A", "B", "C",
"F", "W"), class = "factor")),
>.Names = c("a",
>"u"), row.names = c(NA, -6L), class = "data.frame")
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

R help - Nov 2013 - Find backward duplicates in a data frame

[R] Find backward duplicates in a data frame

[R] Find backward duplicates in a data frame

[R] Find backward duplicates in a data frame