thr3ads.net - R help - [R] Remove duplicated rows [Apr 2010]

If this information is useful, please help other people find it:
Share via:

chrisli1223

2010-Apr-23 02:05 UTC

[R] Remove duplicated rows

Hi all,

I have a dataset similar to the following

Name	Date	Value
A	1/01/2000	4
A	2/01/2000	4
A	3/01/2000	5
A	4/01/2000	4
A	5/01/2000	1
B	6/01/2000	2
B	7/01/2000	1
B	8/01/2000	1

I would like R to remove duplicates based on column 1 and 3 only. In
addition, I would like R to remove duplicates based on the underlying and
overlying row only. For example, for A, I would like to remove row 2 only
and keep row 1, 3 and 4.

I have tried: unique() and replicated(), but I do not have much success. I
have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know how to
apply it to this multi-column situation.

Any help would be greatly appreciated.

Thanks in advance,
Chris
-- 
View this message in context:
http://r.789695.n4.nabble.com/Remove-duplicated-rows-tp2023065p2023065.html
Sent from the R help mailing list archive at Nabble.com.

	[[alternative HTML version deleted]]

Petr PIKAL

2010-Apr-23 10:00 UTC

head link

[R] Odp: Remove duplicated rows

Hi
r-help-bounces at r-project.org napsal dne 23.04.2010 04:05:00:
> 
> Hi all,
> 
> I have a dataset similar to the following
> 
> Name   Date   Value
> A   1/01/2000   4
> A   2/01/2000   4
> A   3/01/2000   5
> A   4/01/2000   4
> A   5/01/2000   1
> B   6/01/2000   2
> B   7/01/2000   1
> B   8/01/2000   1
> 
> I would like R to remove duplicates based on column 1 and 3 only. In
> addition, I would like R to remove duplicates based on the underlying 
and> overlying row only. For example, for A, I would like to remove row 2 
only> and keep row 1, 3 and 4.
Hm. Strange. You want to keep lines 1,3 an 4. for A. What about line 5? 
Why do you want to keep line 1 and 4 which have A an 4 in both columns?

test=read.table("clipboard", header=T)
test[!duplicated(paste(test[,1], test[,3])),]
  Name      Date Value
1    A 1/01/2000     4
3    A 3/01/2000     5
5    A 5/01/2000     1
6    B 6/01/2000     2
7    B 7/01/2000     1

Gives you unique values, however I am not sure if it is what you want.

Regards
Petr

> 
> I have tried: unique() and replicated(), but I do not have much success. 
I> have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know
how to
> apply it to this multi-column situation.
> 
> Any help would be greatly appreciated.
> 
> Thanks in advance,
> Chris
> -- 
> View this message in context: 
http://r.789695.n4.nabble.com/Remove-duplicated-> rows-tp2023065p2023065.html
> Sent from the R help mailing list archive at Nabble.com.
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Gustaf Rydevik

2010-Apr-23 11:20 UTC

head link

[R] Remove duplicated rows

On Fri, Apr 23, 2010 at 4:05 AM, chrisli1223
<chrisli at austwaterenv.com.au> wrote:>
> Hi all,
>
> I have a dataset similar to the following
>
> Name ? ?Date ? ?Value
> A ? ? ? 1/01/2000 ? ? ? 4
> A ? ? ? 2/01/2000 ? ? ? 4
> A ? ? ? 3/01/2000 ? ? ? 5
> A ? ? ? 4/01/2000 ? ? ? 4
> A ? ? ? 5/01/2000 ? ? ? 1
> B ? ? ? 6/01/2000 ? ? ? 2
> B ? ? ? 7/01/2000 ? ? ? 1
> B ? ? ? 8/01/2000 ? ? ? 1
>
> I would like R to remove duplicates based on column 1 and 3 only. In
> addition, I would like R to remove duplicates based on the underlying and
> overlying row only. For example, for A, I would like to remove row 2 only
> and keep row 1, 3 and 4.
>
> I have tried: unique() and replicated(), but I do not have much success. I
> have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know
how to
> apply it to this multi-column situation.
>
> Any help would be greatly appreciated.
>
> Thanks in advance,
> Chris
> --


Hi,

This code is a bit ugly, but it works. Hope it helps.
/Gustaf

library(zoo)
test<-read.table("clipboard",header=T)
test$code<-paste(test$Name,test$Value,sep="")

drop.ndx<-rollapply(zoo(test$code),3,function(x)(x[2]%in%c(x[1],x[3])))

drop.ndx<-c(FALSE,drop.ndx,FALSE)
test[!drop.ndx,]



-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

Gabor Grothendieck

2010-Apr-23 12:28 UTC

head link

[R] Remove duplicated rows

Try this:

DF[!duplicated(DF[-2]),]


On Thu, Apr 22, 2010 at 10:05 PM, chrisli1223
<chrisli at austwaterenv.com.au> wrote:>
> Hi all,
>
> I have a dataset similar to the following
>
> Name ? ?Date ? ?Value
> A ? ? ? 1/01/2000 ? ? ? 4
> A ? ? ? 2/01/2000 ? ? ? 4
> A ? ? ? 3/01/2000 ? ? ? 5
> A ? ? ? 4/01/2000 ? ? ? 4
> A ? ? ? 5/01/2000 ? ? ? 1
> B ? ? ? 6/01/2000 ? ? ? 2
> B ? ? ? 7/01/2000 ? ? ? 1
> B ? ? ? 8/01/2000 ? ? ? 1
>
> I would like R to remove duplicates based on column 1 and 3 only. In
> addition, I would like R to remove duplicates based on the underlying and
> overlying row only. For example, for A, I would like to remove row 2 only
> and keep row 1, 3 and 4.
>
> I have tried: unique() and replicated(), but I do not have much success. I
> have also tried: dataset<-c(1,diff(dataset)!=0), but I don't know
how to
> apply it to this multi-column situation.
>
> Any help would be greatly appreciated.
>
> Thanks in advance,
> Chris
> --
> View this message in context:
http://r.789695.n4.nabble.com/Remove-duplicated-rows-tp2023065p2023065.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

chrisli1223

2010-Apr-27 00:11 UTC

head link

[R] Remove duplicated rows

Thank you Petr, Gustaf and Gabor. Your help is much appreciated.

I have tried:

dataset[!duplicated(dataset[,-2]),]

and it solves my problem.

Thanks,
Chris
-- 
View this message in context:
http://r.789695.n4.nabble.com/Remove-duplicated-rows-tp2023065p2065997.html
Sent from the R help mailing list archive at Nabble.com.

Seemingly Similar Threads

Search for more maybe matching threads

R help - Apr 2010 - Remove duplicated rows

[R] Remove duplicated rows

[R] Odp: Remove duplicated rows

[R] Remove duplicated rows

[R] Remove duplicated rows

[R] Remove duplicated rows

Seemingly Similar Threads