Hello
I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
DF1= data.frame(V1=1:6, V2= letters[1:6])
DF2= data.frame(V1=1:3, V2= letters[1:3])
How do I create a new data frame of the difference between DF1 and DF2
newDF=data.frame(V1=4:6, V2= letters[4:6])
In my real data, the rows are not in order as in the example I provided.
Thanks much
Joseph
[[alternative HTML version deleted]]
Hi Joseph, Try this: DF1[!DF1$V1%in%DF2$V1,] subset(DF1,!V1%in%DF2$V1) HTH, Jorge On Sun, Sep 14, 2008 at 12:49 PM, joseph <jdsandjd@yahoo.com> wrote:> Hello > I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: > DF1= data.frame(V1=1:6, V2= letters[1:6]) > DF2= data.frame(V1=1:3, V2= letters[1:3]) > How do I create a new data frame of the difference between DF1 and DF2 > newDF=data.frame(V1=4:6, V2= letters[4:6]) > In my real data, the rows are not in order as in the example I provided. > Thanks much > Joseph > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Mark as you guessed it, I meant a dataframe of the rows in DF1 that are not in DF2 . Here is what I got:> complement<-setdiff(DF1$V2,DF2$V2) > DF1[,complement]Error in `[.data.frame`(DF1, , complement) : undefined columns selected>----- Original Message ---- From: Mark Leeds <markleeds@verizon.net> To: joseph <jdsandjd@yahoo.com> Cc: markleeds@hitecapital.com Sent: Sunday, September 14, 2008 10:07:48 AM Subject: RE: [R] difference of two data frames Hi: If you mean a dataframe of the rows in DF1 that are not in DF2 , then I think below will work for the letters, which , according to what I'm understanding, will also make it work for the rows so no need to consider the numbers ? complement<-setdiff(DF1$V2,DF2$V2) DFnew<=DF1[,complement) But, 3 things to consider: 1) I'm not sure if I understand the problem. 2) I'm also at home and I don't use R here so I can't test it. 3) I'm also not sure about the order of the setdiff operation so you may have to switch the order of the two columns I used. Atleast, it will get you started though and I'm confident someone else will answer. Good luck. -----Original Message----- From: r-help-bounces@r-project.org [mailto:r-help-bounces@r-project.org] On Behalf Of joseph Sent: Sunday, September 14, 2008 12:50 PM To: r-help@r-project.org Cc: r-help@r-project.org Subject: [R] difference of two data frames Hello I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: DF1= data.frame(V1=1:6, V2= letters[1:6]) DF2= data.frame(V1=1:3, V2= letters[1:3]) How do I create a new data frame of the difference between DF1 and DF2 newDF=data.frame(V1=4:6, V2= letters[4:6]) In my real data, the rows are not in order as in the example I provided. Thanks much Joseph [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Hi Jorge
both commands work;
can you extend it to several coulmns? the reason I am asking is that in my real
data the uniqueness of the rows is made of all the columns; in other words V1
might have duplicates.
Thanks
----- Original Message ----
From: Jorge Ivan Velez <jorgeivanvelez@gmail.com>
To: joseph <jdsandjd@yahoo.com>
Cc: r-help@r-project.org
Sent: Sunday, September 14, 2008 10:23:33 AM
Subject: Re: [R] difference of two data frames
Hi Joseph,
Try this:
DF1[!DF1$V1%in%DF2$V1,]
subset(DF1,!V1%in%DF2$V1)
HTH,
Jorge
On Sun, Sep 14, 2008 at 12:49 PM, joseph <jdsandjd@yahoo.com> wrote:
Hello
I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
DF1= data.frame(V1=1:6, V2= letters[1:6])
DF2= data.frame(V1=1:3, V2= letters[1:3])
How do I create a new data frame of the difference between DF1 and DF2
newDF=data.frame(V1=4:6, V2= letters[4:6])
In my real data, the rows are not in order as in the example I provided.
Thanks much
Joseph
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
Actually you got it, the data sets you created are a perfect example (row#1 and
row#2 in DF1 have the same V1 and differ only in V2) , but here is the problem:
row#2 in DF1 exists in DF1 and not in DF2, however it does not show in the
Difference. It seems to me that both V1 and V2 should be considered when
calculating the difference.
----- Original Message ----
From: Jorge Ivan Velez <jorgeivanvelez@gmail.com>
To: joseph <jdsandjd@yahoo.com>
Sent: Sunday, September 14, 2008 11:14:11 AM
Subject: Re: [R] difference of two data frames
Hi Joseph,
I'm not sure if I understood your point, but try this:
# Data sets
DF1= data.frame(V1=c(1,1,2,3,3,4,5,5,6), V2= letters[1:9])
DF2= data.frame(V1=1:3, V2= letters[1:3])
# Difference
DF1[! DF1$V1 %in% DF2$V1,]
HTH,
Jorge
On Sun, Sep 14, 2008 at 1:57 PM, joseph <jdsandjd@yahoo.com> wrote:
Hi Jorge
both commands work;
can you extend it to several coulmns? the reason I am asking is that in my real
data the uniqueness of the rows is made of all the columns; in other words V1
might have duplicates.
Thanks
----- Original Message ----
From: Jorge Ivan Velez <jorgeivanvelez@gmail.com>
To: joseph <jdsandjd@yahoo.com>
Cc: r-help@r-project.org
Sent: Sunday, September 14, 2008 10:23:33 AM
Subject: Re: [R] difference of two data frames
Hi Joseph,
Try this:
DF1[!DF1$V1%in%DF2$V1,]
subset(DF1,!V1%in%DF2$V1)
HTH,
Jorge
On Sun, Sep 14, 2008 at 12:49 PM, joseph <jdsandjd@yahoo.com> wrote:
Hello
I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
DF1= data.frame(V1=1:6, V2= letters[1:6])
DF2= data.frame(V1=1:3, V2= letters[1:3])
How do I create a new data frame of the difference between DF1 and DF2
newDF=data.frame(V1=4:6, V2= letters[4:6])
In my real data, the rows are not in order as in the example I provided.
Thanks much
Joseph
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
It would be useful to have indexed both dataframes with a unique
identifier, such as in rownames etc.
Without that information, you could possibly try to use the same
approach as duplicated() does by "pasting together a character
representation of rows" using "|" (or any other separator).
keys1 <- apply(DF1, 1, paste, collapse="|")
keys1
[1] "1|a" "2|b" "3|c" "4|d"
"5|e" "6|f"
duplicated(keys1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE
keys2 <- apply(DF2, 1, paste, collapse="|")
keys2
[1] "1|a" "2|b" "3|c"
duplicated(keys2)
[1] FALSE FALSE FALSE
The duplicated part is neccessary to ensure the key generated is truly
unique. You might want to experiment and see if you can create a unique
key using just a few columns.
keys1 %in% keys2
[1] TRUE TRUE TRUE FALSE FALSE FALSE
w <- setdiff( keys1, keys2 )
DF1[ w, ]
V1 V2
4 4 d
5 5 e
6 6 f
Regards, Adai
joseph wrote:> Hi Jorge
> both commands work;
> can you extend it to several coulmns? the reason I am asking is that in my
real data the uniqueness of the rows is made of all the columns; in other words
V1 might have duplicates.
> Thanks
>
>
>
>
> ----- Original Message ----
> From: Jorge Ivan Velez <jorgeivanvelez at gmail.com>
> To: joseph <jdsandjd at yahoo.com>
> Cc: r-help at r-project.org
> Sent: Sunday, September 14, 2008 10:23:33 AM
> Subject: Re: [R] difference of two data frames
>
>
>
> Hi Joseph,
>
> Try this:
>
>
> DF1[!DF1$V1%in%DF2$V1,]
>
> subset(DF1,!V1%in%DF2$V1)
>
>
> HTH,
>
> Jorge
>
>
> On Sun, Sep 14, 2008 at 12:49 PM, joseph <jdsandjd at yahoo.com>
wrote:
>
> Hello
> I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
> DF1= data.frame(V1=1:6, V2= letters[1:6])
> DF2= data.frame(V1=1:3, V2= letters[1:3])
> How do I create a new data frame of the difference between DF1 and DF2
> newDF=data.frame(V1=4:6, V2= letters[4:6])
> In my real data, the rows are not in order as in the example I provided.
> Thanks much
> Joseph
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.