Hello I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: DF1= data.frame(V1=1:6, V2= letters[1:6]) DF2= data.frame(V1=1:3, V2= letters[1:3]) How do I create a new data frame of the difference between DF1 and DF2 newDF=data.frame(V1=4:6, V2= letters[4:6]) In my real data, the rows are not in order as in the example I provided. Thanks much Joseph [[alternative HTML version deleted]]
Hi Joseph, Try this: DF1[!DF1$V1%in%DF2$V1,] subset(DF1,!V1%in%DF2$V1) HTH, Jorge On Sun, Sep 14, 2008 at 12:49 PM, joseph <jdsandjd@yahoo.com> wrote:> Hello > I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: > DF1= data.frame(V1=1:6, V2= letters[1:6]) > DF2= data.frame(V1=1:3, V2= letters[1:3]) > How do I create a new data frame of the difference between DF1 and DF2 > newDF=data.frame(V1=4:6, V2= letters[4:6]) > In my real data, the rows are not in order as in the example I provided. > Thanks much > Joseph > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Mark as you guessed it, I meant a dataframe of the rows in DF1 that are not in DF2 . Here is what I got:> complement<-setdiff(DF1$V2,DF2$V2) > DF1[,complement]Error in `[.data.frame`(DF1, , complement) : undefined columns selected>----- Original Message ---- From: Mark Leeds <markleeds@verizon.net> To: joseph <jdsandjd@yahoo.com> Cc: markleeds@hitecapital.com Sent: Sunday, September 14, 2008 10:07:48 AM Subject: RE: [R] difference of two data frames Hi: If you mean a dataframe of the rows in DF1 that are not in DF2 , then I think below will work for the letters, which , according to what I'm understanding, will also make it work for the rows so no need to consider the numbers ? complement<-setdiff(DF1$V2,DF2$V2) DFnew<=DF1[,complement) But, 3 things to consider: 1) I'm not sure if I understand the problem. 2) I'm also at home and I don't use R here so I can't test it. 3) I'm also not sure about the order of the setdiff operation so you may have to switch the order of the two columns I used. Atleast, it will get you started though and I'm confident someone else will answer. Good luck. -----Original Message----- From: r-help-bounces@r-project.org [mailto:r-help-bounces@r-project.org] On Behalf Of joseph Sent: Sunday, September 14, 2008 12:50 PM To: r-help@r-project.org Cc: r-help@r-project.org Subject: [R] difference of two data frames Hello I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: DF1= data.frame(V1=1:6, V2= letters[1:6]) DF2= data.frame(V1=1:3, V2= letters[1:3]) How do I create a new data frame of the difference between DF1 and DF2 newDF=data.frame(V1=4:6, V2= letters[4:6]) In my real data, the rows are not in order as in the example I provided. Thanks much Joseph [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Hi Jorge both commands work; can you extend it to several coulmns? the reason I am asking is that in my real data the uniqueness of the rows is made of all the columns; in other words V1 might have duplicates. Thanks ----- Original Message ---- From: Jorge Ivan Velez <jorgeivanvelez@gmail.com> To: joseph <jdsandjd@yahoo.com> Cc: r-help@r-project.org Sent: Sunday, September 14, 2008 10:23:33 AM Subject: Re: [R] difference of two data frames Hi Joseph, Try this: DF1[!DF1$V1%in%DF2$V1,] subset(DF1,!V1%in%DF2$V1) HTH, Jorge On Sun, Sep 14, 2008 at 12:49 PM, joseph <jdsandjd@yahoo.com> wrote: Hello I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: DF1= data.frame(V1=1:6, V2= letters[1:6]) DF2= data.frame(V1=1:3, V2= letters[1:3]) How do I create a new data frame of the difference between DF1 and DF2 newDF=data.frame(V1=4:6, V2= letters[4:6]) In my real data, the rows are not in order as in the example I provided. Thanks much Joseph [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Actually you got it, the data sets you created are a perfect example (row#1 and row#2 in DF1 have the same V1 and differ only in V2) , but here is the problem: row#2 in DF1 exists in DF1 and not in DF2, however it does not show in the Difference. It seems to me that both V1 and V2 should be considered when calculating the difference. ----- Original Message ---- From: Jorge Ivan Velez <jorgeivanvelez@gmail.com> To: joseph <jdsandjd@yahoo.com> Sent: Sunday, September 14, 2008 11:14:11 AM Subject: Re: [R] difference of two data frames Hi Joseph, I'm not sure if I understood your point, but try this: # Data sets DF1= data.frame(V1=c(1,1,2,3,3,4,5,5,6), V2= letters[1:9]) DF2= data.frame(V1=1:3, V2= letters[1:3]) # Difference DF1[! DF1$V1 %in% DF2$V1,] HTH, Jorge On Sun, Sep 14, 2008 at 1:57 PM, joseph <jdsandjd@yahoo.com> wrote: Hi Jorge both commands work; can you extend it to several coulmns? the reason I am asking is that in my real data the uniqueness of the rows is made of all the columns; in other words V1 might have duplicates. Thanks ----- Original Message ---- From: Jorge Ivan Velez <jorgeivanvelez@gmail.com> To: joseph <jdsandjd@yahoo.com> Cc: r-help@r-project.org Sent: Sunday, September 14, 2008 10:23:33 AM Subject: Re: [R] difference of two data frames Hi Joseph, Try this: DF1[!DF1$V1%in%DF2$V1,] subset(DF1,!V1%in%DF2$V1) HTH, Jorge On Sun, Sep 14, 2008 at 12:49 PM, joseph <jdsandjd@yahoo.com> wrote: Hello I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: DF1= data.frame(V1=1:6, V2= letters[1:6]) DF2= data.frame(V1=1:3, V2= letters[1:3]) How do I create a new data frame of the difference between DF1 and DF2 newDF=data.frame(V1=4:6, V2= letters[4:6]) In my real data, the rows are not in order as in the example I provided. Thanks much Joseph [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
It would be useful to have indexed both dataframes with a unique identifier, such as in rownames etc. Without that information, you could possibly try to use the same approach as duplicated() does by "pasting together a character representation of rows" using "|" (or any other separator). keys1 <- apply(DF1, 1, paste, collapse="|") keys1 [1] "1|a" "2|b" "3|c" "4|d" "5|e" "6|f" duplicated(keys1) [1] FALSE FALSE FALSE FALSE FALSE FALSE keys2 <- apply(DF2, 1, paste, collapse="|") keys2 [1] "1|a" "2|b" "3|c" duplicated(keys2) [1] FALSE FALSE FALSE The duplicated part is neccessary to ensure the key generated is truly unique. You might want to experiment and see if you can create a unique key using just a few columns. keys1 %in% keys2 [1] TRUE TRUE TRUE FALSE FALSE FALSE w <- setdiff( keys1, keys2 ) DF1[ w, ] V1 V2 4 4 d 5 5 e 6 6 f Regards, Adai joseph wrote:> Hi Jorge > both commands work; > can you extend it to several coulmns? the reason I am asking is that in my real data the uniqueness of the rows is made of all the columns; in other words V1 might have duplicates. > Thanks > > > > > ----- Original Message ---- > From: Jorge Ivan Velez <jorgeivanvelez at gmail.com> > To: joseph <jdsandjd at yahoo.com> > Cc: r-help at r-project.org > Sent: Sunday, September 14, 2008 10:23:33 AM > Subject: Re: [R] difference of two data frames > > > > Hi Joseph, > > Try this: > > > DF1[!DF1$V1%in%DF2$V1,] > > subset(DF1,!V1%in%DF2$V1) > > > HTH, > > Jorge > > > On Sun, Sep 14, 2008 at 12:49 PM, joseph <jdsandjd at yahoo.com> wrote: > > Hello > I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: > DF1= data.frame(V1=1:6, V2= letters[1:6]) > DF2= data.frame(V1=1:3, V2= letters[1:3]) > How do I create a new data frame of the difference between DF1 and DF2 > newDF=data.frame(V1=4:6, V2= letters[4:6]) > In my real data, the rows are not in order as in the example I provided. > Thanks much > Joseph > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.