thr3ads.net - R help - [R] difference between unique() and !duplicated() [Sep 2007]

If this information is useful, please help other people find it:
Share via:

T.Lok

2007-Sep-13 09:47 UTC

[R] difference between unique() and !duplicated()

Yesterday I spend the whole day struggling on how to get 
the maximum value of "y" for every unique value of "x" 
from the dataframe "test". In the R Book (Crawley, 2007) 
an example of this can be found on page 121. I tried to do 
it this way, but I failed.

In the end, I figured out how to get it working (first 
order, and afterwards use !duplicated()). My question is: 
why does it not work with the unique() function on p. 121 
(
i.e. test[rev(order(x)),][unique(y),]) ?

As a simple example, I used to following syntax:
> x <-
c("A","A","B","B","C","C","D")
> y <- c(1,2,1,1,2,3,1)
> z <-
c("yes","yes","no","yes","no","no","no")
> test <- data.frame(x,y,z)
> test
   x y   z
1 A 1 yes
2 A 2 yes
3 B 1  no
4 B 1 yes
5 C 2  no
6 C 3  no
7 D 1  no
> test[rev(order(test$y, test$z)),][unique(test$x),]
   x y   z
6 C 3  no
2 A 2 yes
5 C 2  no
4 B 1 yes

# this clearly does not give a unique value for x, since 
there are 2 C's and no D!
> test[rev(order(test$y, test$z)),][!duplicated(test$x),]
   x y   z
6 C 3  no
5 C 2  no
1 A 1 yes
3 B 1  no

# this also doesn't work
# then I thought, maybe first use the order() function, 
then unique()
> test[rev(order(test$y, test$z)),]
   x y   z
6 C 3  no
2 A 2 yes
5 C 2  no
4 B 1 yes
1 A 1 yes
7 D 1  no
3 B 1  no
> test1 <- test[rev(order(test$y, test$z)),]
> test1[unique(test1$x),]
   x y   z
5 C 2  no
6 C 3  no
2 A 2 yes
4 B 1 yes

# still no unique values for x
> test1[!duplicated(test1$x),]
   x y   z
6 C 3  no
2 A 2 yes
4 B 1 yes
7 D 1  no

# finally I get unique values for x, for the maximum value 
of y (and z). But why does this not work when giving the 
order() and !duplicated() command simultaneously?
And why does only !duplicated() work, and not unique()?

jim holtman

2007-Sep-13 10:42 UTC

head link

[R] difference between unique() and !duplicated()

Try this:
> a <- read.table(textConnection("x y   z+ 1 A 1 yes
+ 2 A 2 yes
+ 3 B 1  no
+ 4 B 1 yes
+ 5 C 2  no
+ 6 C 3  no
+ 7 D 1  no"), header=TRUE)> do.call('rbind', by(a, a$x, function(.sub){+     .sub[which.max(.sub$y),]
+ }))
  x y   z
A A 2 yes
B B 1  no
C C 3  no
D D 1  no


On 9/13/07, T.Lok <T.Lok at rug.nl> wrote:> Yesterday I spend the whole day struggling on how to get
> the maximum value of "y" for every unique value of "x"
> from the dataframe "test". In the R Book (Crawley, 2007)
> an example of this can be found on page 121. I tried to do
> it this way, but I failed.
>
> In the end, I figured out how to get it working (first
> order, and afterwards use !duplicated()). My question is:
> why does it not work with the unique() function on p. 121
> (
> i.e. test[rev(order(x)),][unique(y),]) ?
>
> As a simple example, I used to following syntax:
>
> > x <-
c("A","A","B","B","C","C","D")
> > y <- c(1,2,1,1,2,3,1)
> > z <-
c("yes","yes","no","yes","no","no","no")
> > test <- data.frame(x,y,z)
> > test
>
>   x y   z
> 1 A 1 yes
> 2 A 2 yes
> 3 B 1  no
> 4 B 1 yes
> 5 C 2  no
> 6 C 3  no
> 7 D 1  no
>
> > test[rev(order(test$y, test$z)),][unique(test$x),]
>
>   x y   z
> 6 C 3  no
> 2 A 2 yes
> 5 C 2  no
> 4 B 1 yes
>
> # this clearly does not give a unique value for x, since
> there are 2 C's and no D!
>
> > test[rev(order(test$y, test$z)),][!duplicated(test$x),]
>
>   x y   z
> 6 C 3  no
> 5 C 2  no
> 1 A 1 yes
> 3 B 1  no
>
> # this also doesn't work
> # then I thought, maybe first use the order() function,
> then unique()
>
> > test[rev(order(test$y, test$z)),]
>
>   x y   z
> 6 C 3  no
> 2 A 2 yes
> 5 C 2  no
> 4 B 1 yes
> 1 A 1 yes
> 7 D 1  no
> 3 B 1  no
>
> > test1 <- test[rev(order(test$y, test$z)),]
> > test1[unique(test1$x),]
>
>   x y   z
> 5 C 2  no
> 6 C 3  no
> 2 A 2 yes
> 4 B 1 yes
>
> # still no unique values for x
>
> > test1[!duplicated(test1$x),]
>
>   x y   z
> 6 C 3  no
> 2 A 2 yes
> 4 B 1 yes
> 7 D 1  no
>
> # finally I get unique values for x, for the maximum value
> of y (and z). But why does this not work when giving the
> order() and !duplicated() command simultaneously?
> And why does only !duplicated() work, and not unique()?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Michael Dewey

2007-Sep-13 11:00 UTC

head link

[R] difference between unique() and !duplicated()

At 10:47 13/09/2007, T.Lok wrote:>Yesterday I spend the whole day struggling on how to get the maximum 
>value of "y" for every unique value of "x" from the
dataframe
>"test". In the R Book (Crawley, 2007) an example of this can be 
>found on page 121. I tried to do it this way, but I failed.
>
>In the end, I figured out how to get it working (first order, and 
>afterwards use !duplicated()). My question is: why does it not work 
>with the unique() function on p. 121 (
>i.e. test[rev(order(x)),][unique(y),]) ?
This is not a direct answer to your question but is not
tapply(y, x, max)
a simpler way to do what you want?




Michael Dewey
http://www.aghmed.fsnet.co.uk

Duncan Murdoch

2007-Sep-13 11:11 UTC

head link

[R] difference between unique() and !duplicated()

On 13/09/2007 5:47 AM, T.Lok wrote:> Yesterday I spend the whole day struggling on how to get 
> the maximum value of "y" for every unique value of "x" 
> from the dataframe "test". In the R Book (Crawley, 2007) 
> an example of this can be found on page 121. I tried to do 
> it this way, but I failed.
> 
> In the end, I figured out how to get it working (first 
> order, and afterwards use !duplicated()). My question is: 
> why does it not work with the unique() function on p. 121 
> (
> i.e. test[rev(order(x)),][unique(y),]) ?
> 
> As a simple example, I used to following syntax:
> 
>> x <-
c("A","A","B","B","C","C","D")
>> y <- c(1,2,1,1,2,3,1)
>> z <-
c("yes","yes","no","yes","no","no","no")
>> test <- data.frame(x,y,z)
>> test
> 
>    x y   z
> 1 A 1 yes
> 2 A 2 yes
> 3 B 1  no
> 4 B 1 yes
> 5 C 2  no
> 6 C 3  no
> 7 D 1  no
> 
>> test[rev(order(test$y, test$z)),][unique(test$x),]
> 
>    x y   z
> 6 C 3  no
> 2 A 2 yes
> 5 C 2  no
> 4 B 1 yes
> 
> # this clearly does not give a unique value for x, since 
> there are 2 C's and no D!
You are trying to index by the unique values of x.  But x is a factor, 
so this doesn't do anything even close to what you wanted.
> 
>> test[rev(order(test$y, test$z)),][!duplicated(test$x),]
> 
>    x y   z
> 6 C 3  no
> 5 C 2  no
> 1 A 1 yes
> 3 B 1  no
> 
You rearranged the rows of test but not of test$x.  This would work:

test <- test[rev(order(test$y, test$z)),]
test[!duplicated(test$x),]
> # this also doesn't work
> # then I thought, maybe first use the order() function, 
> then unique()
> 
>> test[rev(order(test$y, test$z)),]
> 
>    x y   z
> 6 C 3  no
> 2 A 2 yes
> 5 C 2  no
> 4 B 1 yes
> 1 A 1 yes
> 7 D 1  no
> 3 B 1  no
> 
>> test1 <- test[rev(order(test$y, test$z)),]
>> test1[unique(test1$x),]
> 
>    x y   z
> 5 C 2  no
> 6 C 3  no
> 2 A 2 yes
> 4 B 1 yes
> 
> # still no unique values for x
> 
>> test1[!duplicated(test1$x),]
> 
>    x y   z
> 6 C 3  no
> 2 A 2 yes
> 4 B 1 yes
> 7 D 1  no
> 
> # finally I get unique values for x, for the maximum value 
> of y (and z). But why does this not work when giving the 
> order() and !duplicated() command simultaneously?
> And why does only !duplicated() work, and not unique()?
I think both questions are answered above.

Duncan Murdoch

Maybe Matching Threads

Search for more seemingly similar threads

R help - Sep 2007 - difference between unique() and !duplicated()

[R] difference between unique() and !duplicated()

[R] difference between unique() and !duplicated()

[R] difference between unique() and !duplicated()

[R] difference between unique() and !duplicated()

Maybe Matching Threads