Hi, I am trying to simultaneously remove duplicate variables from two or more variables in a small R data.frame. I am trying to reproduce the SAS statements from a Proc Sort with Nodupkey for those familiar with SAS. Here's my example data : test <- read.csv("test.csv", sep=",", as.is=TRUE)> testdate var1 var2 num1 num2 1 28/01/11 a 1 213 71 2 28/01/11 b 1 141 47 3 28/01/11 c 2 867 289 4 29/01/11 a 2 234 78 5 29/01/11 b 2 666 222 6 29/01/11 c 2 912 304 7 30/01/11 a 3 417 139 8 30/01/11 b 3 108 36 9 30/01/11 c 2 288 96 I am trying to obtain the following, where duplicates of date AND var2 are removed from the above data.frame. date var1 var2 num1 num2 28/01/2011 a 1 213 71 28/01/2011 c 2 867 289 29/01/2011 a 2 234 78 30/01/2011 c 2 288 96 30/01/2011 a 3 417 139 If I use the !duplicated function with one variable everything works fine. However I wish to remove duplicates of both Date and var2. test[!duplicated(test$date),] date var1 var2 num1 num2 1 0011-01-28 a 1 213 71 4 0011-01-29 a 2 234 78 7 0011-01-30 a 3 417 139 test2 <- test[!duplicated(test$date),!duplicated(test$var2),] Error in `[.data.frame`(test, !duplicated(test$date), !duplicated(test$var2), : undefined columns selected I get an error ? I got different errors when using the unique() function. Can anybody solve this ? Thanks in advance. Jon -- View this message in context: http://r.789695.n4.nabble.com/Problems-using-unique-function-and-duplicated-tp3328150p3328150.html Sent from the R help mailing list archive at Nabble.com.
Hi Jon, I think you made a mistake in your desired output. If it is indeed a mistake, then this should do: test[!duplicated(test[,c("date","var2")]),] HTH, Ivan PS: think about dput() when you want to share objects, in this case dput(test) Le 2/28/2011 16:51, JonC a ?crit :> Hi, I am trying to simultaneously remove duplicate variables from two or more > variables in a small R data.frame. I am trying to reproduce the SAS > statements from a Proc Sort with Nodupkey for those familiar with SAS. > > Here's my example data : > > test<- read.csv("test.csv", sep=",", as.is=TRUE) >> test > date var1 var2 num1 num2 > 1 28/01/11 a 1 213 71 > 2 28/01/11 b 1 141 47 > 3 28/01/11 c 2 867 289 > 4 29/01/11 a 2 234 78 > 5 29/01/11 b 2 666 222 > 6 29/01/11 c 2 912 304 > 7 30/01/11 a 3 417 139 > 8 30/01/11 b 3 108 36 > 9 30/01/11 c 2 288 96 > > I am trying to obtain the following, where duplicates of date AND var2 are > removed from the above data.frame. > > date var1 var2 num1 num2 > 28/01/2011 a 1 213 71 > 28/01/2011 c 2 867 289 > 29/01/2011 a 2 234 78 > 30/01/2011 c 2 288 96 > 30/01/2011 a 3 417 139 > > > > If I use the !duplicated function with one variable everything works fine. > However I wish to remove duplicates of both Date and var2. > > test[!duplicated(test$date),] > date var1 var2 num1 num2 > 1 0011-01-28 a 1 213 71 > 4 0011-01-29 a 2 234 78 > 7 0011-01-30 a 3 417 139 > > test2<- test[!duplicated(test$date),!duplicated(test$var2),] > Error in `[.data.frame`(test, !duplicated(test$date), > !duplicated(test$var2), : undefined columns selected > > I get an error ? > I got different errors when using the unique() function. > > Can anybody solve this ? > > Thanks in advance. > > Jon > >-- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. S?ugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
Jon, you need to combine the conditions into one logical value, e.g. cond1 & cond2, e.g. !duplicated(test$date) & !duplicated(test$var2) However, I doubt that this is what you want: you remove too many rows (rows whose single values appeared already, even if the combination is unique). Have a look at the wiki, though: http://rwiki.sciviews.org/doku.php?id=tips:data-frames:count_and_extract_unique_rows Claudia On 02/28/2011 04:51 PM, JonC wrote:> Hi, I am trying to simultaneously remove duplicate variables from two or more > variables in a small R data.frame. I am trying to reproduce the SAS > statements from a Proc Sort with Nodupkey for those familiar with SAS. > > Here's my example data : > > test<- read.csv("test.csv", sep=",", as.is=TRUE) >> test > date var1 var2 num1 num2 > 1 28/01/11 a 1 213 71 > 2 28/01/11 b 1 141 47 > 3 28/01/11 c 2 867 289 > 4 29/01/11 a 2 234 78 > 5 29/01/11 b 2 666 222 > 6 29/01/11 c 2 912 304 > 7 30/01/11 a 3 417 139 > 8 30/01/11 b 3 108 36 > 9 30/01/11 c 2 288 96 > > I am trying to obtain the following, where duplicates of date AND var2 are > removed from the above data.frame. > > date var1 var2 num1 num2 > 28/01/2011 a 1 213 71 > 28/01/2011 c 2 867 289 > 29/01/2011 a 2 234 78 > 30/01/2011 c 2 288 96 > 30/01/2011 a 3 417 139 > > > > If I use the !duplicated function with one variable everything works fine. > However I wish to remove duplicates of both Date and var2. > > test[!duplicated(test$date),] > date var1 var2 num1 num2 > 1 0011-01-28 a 1 213 71 > 4 0011-01-29 a 2 234 78 > 7 0011-01-30 a 3 417 139 > > test2<- test[!duplicated(test$date),!duplicated(test$var2),] > Error in `[.data.frame`(test, !duplicated(test$date), > !duplicated(test$var2), : undefined columns selected > > I get an error ? > I got different errors when using the unique() function. > > Can anybody solve this ? > > Thanks in advance. > > Jon > >-- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Universit? degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbeleites at units.it
On 28-Feb-11 15:51:17, JonC wrote:> Hi, I am trying to simultaneously remove duplicate variables from two > or more > variables in a small R data.frame. I am trying to reproduce the SAS > statements from a Proc Sort with Nodupkey for those familiar with SAS. > > Here's my example data : > > test <- read.csv("test.csv", sep=",", as.is=TRUE) >> test > date var1 var2 num1 num2 > 1 28/01/11 a 1 213 71 > 2 28/01/11 b 1 141 47 > 3 28/01/11 c 2 867 289 > 4 29/01/11 a 2 234 78 > 5 29/01/11 b 2 666 222 > 6 29/01/11 c 2 912 304 > 7 30/01/11 a 3 417 139 > 8 30/01/11 b 3 108 36 > 9 30/01/11 c 2 288 96 > > I am trying to obtain the following, where duplicates of date AND var2 > are removed from the above data.frame. > > date var1 var2 num1 num2 > 28/01/2011 a 1 213 71 > 28/01/2011 c 2 867 289 > 29/01/2011 a 2 234 78 > 30/01/2011 c 2 288 96 > 30/01/2011 a 3 417 139 > > > > If I use the !duplicated function with one variable everything works > fine. > However I wish to remove duplicates of both Date and var2. > > test[!duplicated(test$date),] > date var1 var2 num1 num2 > 1 0011-01-28 a 1 213 71 > 4 0011-01-29 a 2 234 78 > 7 0011-01-30 a 3 417 139 > > test2 <- test[!duplicated(test$date),!duplicated(test$var2),] > Error in `[.data.frame`(test, !duplicated(test$date), > !duplicated(test$var2), : undefined columns selected > I got different errors when using the unique() function. > > Can anybody solve this ? > > Thanks in advance. > JonThe following gives what you state you wish to obtain (though not quite in the same order of rows. Call the original dataframe 'df': df # date var1 var2 num1 num2 # 1 28/01/11 a 1 213 71 # 2 28/01/11 b 1 141 47 # 3 28/01/11 c 2 867 289 # 4 29/01/11 a 2 234 78 # 5 29/01/11 b 2 666 222 # 6 29/01/11 c 2 912 304 # 7 30/01/11 a 3 417 139 # 8 30/01/11 b 3 108 36 # 9 30/01/11 c 2 288 96 ix <-which(duplicated(data.frame(df$date,df$var2))) ix # [1] 2 5 6 8 df[-ix,] # date var1 var2 num1 num2 # 1 28/01/11 a 1 213 71 # 3 28/01/11 c 2 867 289 # 4 29/01/11 a 2 234 78 # 7 30/01/11 a 3 417 139 # 9 30/01/11 c 2 288 96 Does this help? Ted. PS I'm posting this from a temporarily subscribed alternative address (for testing purposes) instead of my usual ted.harding at wlandres.net -------------------------------------------------------------------- E-Mail: (Ted Harding) <efh at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 28-Feb-11 Time: 16:19:59 ------------------------------ XFMail ------------------------------