thr3ads.net - R help - [R] Problems using unique function and !duplicated [Feb 2011]

If this information is useful, please help other people find it:
Share via:

JonC

2011-Feb-28 15:51 UTC

[R] Problems using unique function and !duplicated

Hi, I am trying to simultaneously remove duplicate variables from two or more
variables in a small R data.frame. I am trying to reproduce the SAS
statements from a Proc Sort with Nodupkey for those familiar with SAS. 

Here's my example data : 

test <- read.csv("test.csv", sep=",",
as.is=TRUE)> test      date var1 var2 num1 num2
1 28/01/11    a    1  213   71
2 28/01/11    b    1  141   47
3 28/01/11    c    2  867  289
4 29/01/11    a    2  234   78
5 29/01/11    b    2  666  222
6 29/01/11    c    2  912  304
7 30/01/11    a    3  417  139
8 30/01/11    b    3  108   36
9 30/01/11    c    2  288   96

I am trying to obtain the following, where duplicates of date AND var2 are
removed from the above data.frame.

date          	var1	var2	num1	num2
28/01/2011	a	1	213	       71
28/01/2011	c	2	867	       289
29/01/2011	a	2	234	       78
30/01/2011	c	2	288	       96
30/01/2011	a	3	417	       139



If I use the !duplicated function with one variable everything works fine.
However I wish to remove duplicates of both Date and var2.

 test[!duplicated(test$date),]
        date var1 var2 num1 num2
1 0011-01-28    a    1  213   71
4 0011-01-29    a    2  234   78
7 0011-01-30    a    3  417  139

test2 <- test[!duplicated(test$date),!duplicated(test$var2),]
Error in `[.data.frame`(test, !duplicated(test$date),
!duplicated(test$var2),  :   undefined columns selected

I get an error ? 
I got different errors when using the unique() function. 

Can anybody solve this ? 

Thanks in advance.

Jon


-- 
View this message in context:
http://r.789695.n4.nabble.com/Problems-using-unique-function-and-duplicated-tp3328150p3328150.html
Sent from the R help mailing list archive at Nabble.com.

Ivan Calandra

2011-Feb-28 16:07 UTC

head link

[R] Problems using unique function and !duplicated

Hi Jon,

I think you made a mistake in your desired output.
If it is indeed a mistake, then this should do:

test[!duplicated(test[,c("date","var2")]),]

HTH,
Ivan

PS: think about dput() when you want to share objects, in this case 
dput(test)


Le 2/28/2011 16:51, JonC a ?crit :> Hi, I am trying to simultaneously remove duplicate variables from two or
more
> variables in a small R data.frame. I am trying to reproduce the SAS
> statements from a Proc Sort with Nodupkey for those familiar with SAS.
>
> Here's my example data :
>
> test<- read.csv("test.csv", sep=",", as.is=TRUE)
>> test
>        date var1 var2 num1 num2
> 1 28/01/11    a    1  213   71
> 2 28/01/11    b    1  141   47
> 3 28/01/11    c    2  867  289
> 4 29/01/11    a    2  234   78
> 5 29/01/11    b    2  666  222
> 6 29/01/11    c    2  912  304
> 7 30/01/11    a    3  417  139
> 8 30/01/11    b    3  108   36
> 9 30/01/11    c    2  288   96
>
> I am trying to obtain the following, where duplicates of date AND var2 are
> removed from the above data.frame.
>
> date          	var1	var2	num1	num2
> 28/01/2011	a	1	213	       71
> 28/01/2011	c	2	867	       289
> 29/01/2011	a	2	234	       78
> 30/01/2011	c	2	288	       96
> 30/01/2011	a	3	417	       139
>
>
>
> If I use the !duplicated function with one variable everything works fine.
> However I wish to remove duplicates of both Date and var2.
>
>   test[!duplicated(test$date),]
>          date var1 var2 num1 num2
> 1 0011-01-28    a    1  213   71
> 4 0011-01-29    a    2  234   78
> 7 0011-01-30    a    3  417  139
>
> test2<- test[!duplicated(test$date),!duplicated(test$var2),]
> Error in `[.data.frame`(test, !duplicated(test$date),
> !duplicated(test$var2),  :   undefined columns selected
>
> I get an error ?
> I got different errors when using the unique() function.
>
> Can anybody solve this ?
>
> Thanks in advance.
>
> Jon
>
>
-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. S?ugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

Claudia Beleites

2011-Feb-28 16:10 UTC

head link

[R] Problems using unique function and !duplicated

Jon,

you need to combine the conditions into one logical value, e.g. cond1 &
cond2,
e.g. !duplicated(test$date) & !duplicated(test$var2)

However, I doubt that this is what you want: you remove too many rows (rows 
whose single values appeared already, even if the combination is unique).

Have a look at the wiki, though: 
http://rwiki.sciviews.org/doku.php?id=tips:data-frames:count_and_extract_unique_rows

Claudia


On 02/28/2011 04:51 PM, JonC wrote:> Hi, I am trying to simultaneously remove duplicate variables from two or
more
> variables in a small R data.frame. I am trying to reproduce the SAS
> statements from a Proc Sort with Nodupkey for those familiar with SAS.
>
> Here's my example data :
>
> test<- read.csv("test.csv", sep=",", as.is=TRUE)
>> test
>        date var1 var2 num1 num2
> 1 28/01/11    a    1  213   71
> 2 28/01/11    b    1  141   47
> 3 28/01/11    c    2  867  289
> 4 29/01/11    a    2  234   78
> 5 29/01/11    b    2  666  222
> 6 29/01/11    c    2  912  304
> 7 30/01/11    a    3  417  139
> 8 30/01/11    b    3  108   36
> 9 30/01/11    c    2  288   96
>
> I am trying to obtain the following, where duplicates of date AND var2 are
> removed from the above data.frame.
>
> date          	var1	var2	num1	num2
> 28/01/2011	a	1	213	       71
> 28/01/2011	c	2	867	       289
> 29/01/2011	a	2	234	       78
> 30/01/2011	c	2	288	       96
> 30/01/2011	a	3	417	       139
>
>
>
> If I use the !duplicated function with one variable everything works fine.
> However I wish to remove duplicates of both Date and var2.
>
>   test[!duplicated(test$date),]
>          date var1 var2 num1 num2
> 1 0011-01-28    a    1  213   71
> 4 0011-01-29    a    2  234   78
> 7 0011-01-30    a    3  417  139
>
> test2<- test[!duplicated(test$date),!duplicated(test$var2),]
> Error in `[.data.frame`(test, !duplicated(test$date),
> !duplicated(test$var2),  :   undefined columns selected
>
> I get an error ?
> I got different errors when using the unique() function.
>
> Can anybody solve this ?
>
> Thanks in advance.
>
> Jon
>
>

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Universit? degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbeleites at units.it

(Ted Harding)

2011-Feb-28 16:20 UTC

head link

[R] Problems using unique function and !duplicated

On 28-Feb-11 15:51:17, JonC wrote:> Hi, I am trying to simultaneously remove duplicate variables from two
> or more
> variables in a small R data.frame. I am trying to reproduce the SAS
> statements from a Proc Sort with Nodupkey for those familiar with SAS. 
> 
> Here's my example data : 
> 
> test <- read.csv("test.csv", sep=",", as.is=TRUE)
>> test
>       date var1 var2 num1 num2
> 1 28/01/11    a    1  213   71
> 2 28/01/11    b    1  141   47
> 3 28/01/11    c    2  867  289
> 4 29/01/11    a    2  234   78
> 5 29/01/11    b    2  666  222
> 6 29/01/11    c    2  912  304
> 7 30/01/11    a    3  417  139
> 8 30/01/11    b    3  108   36
> 9 30/01/11    c    2  288   96
> 
> I am trying to obtain the following, where duplicates of date AND var2
> are removed from the above data.frame.
> 
> date                  var1    var2    num1    num2
> 28/01/2011    a       1       213            71
> 28/01/2011    c       2       867            289
> 29/01/2011    a       2       234            78
> 30/01/2011    c       2       288            96
> 30/01/2011    a       3       417            139
> 
> 
> 
> If I use the !duplicated function with one variable everything works
> fine.
> However I wish to remove duplicates of both Date and var2.
> 
>  test[!duplicated(test$date),]
>         date var1 var2 num1 num2
> 1 0011-01-28    a    1  213   71
> 4 0011-01-29    a    2  234   78
> 7 0011-01-30    a    3  417  139
> 
> test2 <- test[!duplicated(test$date),!duplicated(test$var2),]
> Error in `[.data.frame`(test, !duplicated(test$date),
> !duplicated(test$var2),  :   undefined columns selected
> I got different errors when using the unique() function. 
> 
> Can anybody solve this ? 
> 
> Thanks in advance.
> Jon
The following gives what you state you wish to obtain (though
not quite in the same order of rows. Call the original dataframe 'df':

  df
  #       date var1 var2 num1 num2
  # 1 28/01/11    a    1  213   71
  # 2 28/01/11    b    1  141   47
  # 3 28/01/11    c    2  867  289
  # 4 29/01/11    a    2  234   78
  # 5 29/01/11    b    2  666  222
  # 6 29/01/11    c    2  912  304
  # 7 30/01/11    a    3  417  139
  # 8 30/01/11    b    3  108   36
  # 9 30/01/11    c    2  288   96

  ix <-which(duplicated(data.frame(df$date,df$var2)))
  ix
  # [1] 2 5 6 8

  df[-ix,]
  #       date var1 var2 num1 num2
  # 1 28/01/11    a    1  213   71
  # 3 28/01/11    c    2  867  289
  # 4 29/01/11    a    2  234   78
  # 7 30/01/11    a    3  417  139
  # 9 30/01/11    c    2  288   96

Does this help?
Ted.
PS I'm posting this from a temporarily subscribed alternative
address (for testing purposes) instead of my usual
ted.harding at wlandres.net

--------------------------------------------------------------------
E-Mail: (Ted Harding) <efh at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 28-Feb-11                                       Time: 16:19:59
------------------------------ XFMail ------------------------------

Seemingly Similar Threads

Search for more reasonably related threads

R help - Feb 2011 - Problems using unique function and !duplicated

[R] Problems using unique function and !duplicated

[R] Problems using unique function and !duplicated

[R] Problems using unique function and !duplicated

[R] Problems using unique function and !duplicated

Seemingly Similar Threads