thr3ads.net - R help - [R] identify and delete in table [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Jonas Fransson

2012-Feb-27 11:17 UTC

[R] identify and delete in table

Dear all,

I want to delete the exact matches in a large dataset based on a smaller
dataset. In other words I want to subtract the smaller dataset from the larger
one. The smaller dataset is a part of the larger one. The datasets contains
hundred of thousands of lines (1 column) and the content on each line differ in
length. The data is extracted paths from web logs.

On an abstract level I want to subtract dataset2 from dataset1 to get dataset3:

dataset1: 
1 A
2 B
3 X
4 AA
5 A
6 D
7 XA
8 C

dataset2:
1 A
2 X
3 A

dataset3:
1 B
2 AA
3 D
4 XA
5 C

The final order in dataset3 is not important.

Thanks,

Jonas Fransson
Ph.D.stud.

IVA / Det Informationsvidenskabelige Akademi
Royal School of Library and Information Science
Birketinget 6
DK-2300 Copenhagen S
T +45 32 58 60 66
D +45 32 34 15 10
www.iva.dk/jf

jim holtman

2012-Feb-27 14:44 UTC

head link

[R] identify and delete in table

Is this what you want:
> ds1 <- read.table(text = "1 A+ 2 B
+ 3 X
+ 4 AA
+ 5 A
+ 6 D
+ 7 XA
+ 8 C", as.is = TRUE)>
> ds2 <- read.table(text = "1 A+ 2 X
+ 3 A", as.is = TRUE)>
> # find matches
> ds3 <- ds1[!(ds1$V2 %in% ds2$V2), ]
> ds3  V1 V2
2  2  B
4  4 AA
6  6  D
7  7 XA
8  8  C


On Mon, Feb 27, 2012 at 6:17 AM, Jonas Fransson <jf at iva.dk>
wrote:> Dear all,
>
> I want to delete the exact matches in a large dataset based on a smaller
dataset. In other words I want to subtract the smaller dataset from the larger
one. The smaller dataset is a part of the larger one. The datasets contains
hundred of thousands of lines (1 column) and the content on each line differ in
length. The data is extracted paths from web logs.
>
> On an abstract level I want to subtract dataset2 from dataset1 to get
dataset3:
>
> dataset1:
> 1 A
> 2 B
> 3 X
> 4 AA
> 5 A
> 6 D
> 7 XA
> 8 C
>
> dataset2:
> 1 A
> 2 X
> 3 A
>
> dataset3:
> 1 B
> 2 AA
> 3 D
> 4 XA
> 5 C
>
> The final order in dataset3 is not important.
>
> Thanks,
>
> Jonas Fransson
> Ph.D.stud.
>
> IVA / Det Informationsvidenskabelige Akademi
> Royal School of Library and Information Science
> Birketinget 6
> DK-2300 Copenhagen S
> T +45 32 58 60 66
> D +45 32 34 15 10
> www.iva.dk/jf
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

R help - Feb 2012 - identify and delete in table

[R] identify and delete in table

[R] identify and delete in table