Yesterday I spend the whole day struggling on how to get the maximum value of "y" for every unique value of "x" from the dataframe "test". In the R Book (Crawley, 2007) an example of this can be found on page 121. I tried to do it this way, but I failed. In the end, I figured out how to get it working (first order, and afterwards use !duplicated()). My question is: why does it not work with the unique() function on p. 121 ( i.e. test[rev(order(x)),][unique(y),]) ? As a simple example, I used to following syntax:> x <- c("A","A","B","B","C","C","D") > y <- c(1,2,1,1,2,3,1) > z <- c("yes","yes","no","yes","no","no","no") > test <- data.frame(x,y,z) > testx y z 1 A 1 yes 2 A 2 yes 3 B 1 no 4 B 1 yes 5 C 2 no 6 C 3 no 7 D 1 no> test[rev(order(test$y, test$z)),][unique(test$x),]x y z 6 C 3 no 2 A 2 yes 5 C 2 no 4 B 1 yes # this clearly does not give a unique value for x, since there are 2 C's and no D!> test[rev(order(test$y, test$z)),][!duplicated(test$x),]x y z 6 C 3 no 5 C 2 no 1 A 1 yes 3 B 1 no # this also doesn't work # then I thought, maybe first use the order() function, then unique()> test[rev(order(test$y, test$z)),]x y z 6 C 3 no 2 A 2 yes 5 C 2 no 4 B 1 yes 1 A 1 yes 7 D 1 no 3 B 1 no> test1 <- test[rev(order(test$y, test$z)),] > test1[unique(test1$x),]x y z 5 C 2 no 6 C 3 no 2 A 2 yes 4 B 1 yes # still no unique values for x> test1[!duplicated(test1$x),]x y z 6 C 3 no 2 A 2 yes 4 B 1 yes 7 D 1 no # finally I get unique values for x, for the maximum value of y (and z). But why does this not work when giving the order() and !duplicated() command simultaneously? And why does only !duplicated() work, and not unique()?
Try this:> a <- read.table(textConnection("x y z+ 1 A 1 yes + 2 A 2 yes + 3 B 1 no + 4 B 1 yes + 5 C 2 no + 6 C 3 no + 7 D 1 no"), header=TRUE)> do.call('rbind', by(a, a$x, function(.sub){+ .sub[which.max(.sub$y),] + })) x y z A A 2 yes B B 1 no C C 3 no D D 1 no On 9/13/07, T.Lok <T.Lok at rug.nl> wrote:> Yesterday I spend the whole day struggling on how to get > the maximum value of "y" for every unique value of "x" > from the dataframe "test". In the R Book (Crawley, 2007) > an example of this can be found on page 121. I tried to do > it this way, but I failed. > > In the end, I figured out how to get it working (first > order, and afterwards use !duplicated()). My question is: > why does it not work with the unique() function on p. 121 > ( > i.e. test[rev(order(x)),][unique(y),]) ? > > As a simple example, I used to following syntax: > > > x <- c("A","A","B","B","C","C","D") > > y <- c(1,2,1,1,2,3,1) > > z <- c("yes","yes","no","yes","no","no","no") > > test <- data.frame(x,y,z) > > test > > x y z > 1 A 1 yes > 2 A 2 yes > 3 B 1 no > 4 B 1 yes > 5 C 2 no > 6 C 3 no > 7 D 1 no > > > test[rev(order(test$y, test$z)),][unique(test$x),] > > x y z > 6 C 3 no > 2 A 2 yes > 5 C 2 no > 4 B 1 yes > > # this clearly does not give a unique value for x, since > there are 2 C's and no D! > > > test[rev(order(test$y, test$z)),][!duplicated(test$x),] > > x y z > 6 C 3 no > 5 C 2 no > 1 A 1 yes > 3 B 1 no > > # this also doesn't work > # then I thought, maybe first use the order() function, > then unique() > > > test[rev(order(test$y, test$z)),] > > x y z > 6 C 3 no > 2 A 2 yes > 5 C 2 no > 4 B 1 yes > 1 A 1 yes > 7 D 1 no > 3 B 1 no > > > test1 <- test[rev(order(test$y, test$z)),] > > test1[unique(test1$x),] > > x y z > 5 C 2 no > 6 C 3 no > 2 A 2 yes > 4 B 1 yes > > # still no unique values for x > > > test1[!duplicated(test1$x),] > > x y z > 6 C 3 no > 2 A 2 yes > 4 B 1 yes > 7 D 1 no > > # finally I get unique values for x, for the maximum value > of y (and z). But why does this not work when giving the > order() and !duplicated() command simultaneously? > And why does only !duplicated() work, and not unique()? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
At 10:47 13/09/2007, T.Lok wrote:>Yesterday I spend the whole day struggling on how to get the maximum >value of "y" for every unique value of "x" from the dataframe >"test". In the R Book (Crawley, 2007) an example of this can be >found on page 121. I tried to do it this way, but I failed. > >In the end, I figured out how to get it working (first order, and >afterwards use !duplicated()). My question is: why does it not work >with the unique() function on p. 121 ( >i.e. test[rev(order(x)),][unique(y),]) ?This is not a direct answer to your question but is not tapply(y, x, max) a simpler way to do what you want? Michael Dewey http://www.aghmed.fsnet.co.uk
On 13/09/2007 5:47 AM, T.Lok wrote:> Yesterday I spend the whole day struggling on how to get > the maximum value of "y" for every unique value of "x" > from the dataframe "test". In the R Book (Crawley, 2007) > an example of this can be found on page 121. I tried to do > it this way, but I failed. > > In the end, I figured out how to get it working (first > order, and afterwards use !duplicated()). My question is: > why does it not work with the unique() function on p. 121 > ( > i.e. test[rev(order(x)),][unique(y),]) ? > > As a simple example, I used to following syntax: > >> x <- c("A","A","B","B","C","C","D") >> y <- c(1,2,1,1,2,3,1) >> z <- c("yes","yes","no","yes","no","no","no") >> test <- data.frame(x,y,z) >> test > > x y z > 1 A 1 yes > 2 A 2 yes > 3 B 1 no > 4 B 1 yes > 5 C 2 no > 6 C 3 no > 7 D 1 no > >> test[rev(order(test$y, test$z)),][unique(test$x),] > > x y z > 6 C 3 no > 2 A 2 yes > 5 C 2 no > 4 B 1 yes > > # this clearly does not give a unique value for x, since > there are 2 C's and no D!You are trying to index by the unique values of x. But x is a factor, so this doesn't do anything even close to what you wanted.> >> test[rev(order(test$y, test$z)),][!duplicated(test$x),] > > x y z > 6 C 3 no > 5 C 2 no > 1 A 1 yes > 3 B 1 no >You rearranged the rows of test but not of test$x. This would work: test <- test[rev(order(test$y, test$z)),] test[!duplicated(test$x),]> # this also doesn't work > # then I thought, maybe first use the order() function, > then unique() > >> test[rev(order(test$y, test$z)),] > > x y z > 6 C 3 no > 2 A 2 yes > 5 C 2 no > 4 B 1 yes > 1 A 1 yes > 7 D 1 no > 3 B 1 no > >> test1 <- test[rev(order(test$y, test$z)),] >> test1[unique(test1$x),] > > x y z > 5 C 2 no > 6 C 3 no > 2 A 2 yes > 4 B 1 yes > > # still no unique values for x > >> test1[!duplicated(test1$x),] > > x y z > 6 C 3 no > 2 A 2 yes > 4 B 1 yes > 7 D 1 no > > # finally I get unique values for x, for the maximum value > of y (and z). But why does this not work when giving the > order() and !duplicated() command simultaneously? > And why does only !duplicated() work, and not unique()?I think both questions are answered above. Duncan Murdoch