Hi, I have a data frame that looks like this:>xx1 x2 x3 A 1 1.5 B 2 0.9 B 3 2.7 C 7 1.8 D 7 1.3 I want to "group" by the x1 column and in the case of multiple x$x1 values (e.g., "B")d, return rows that have the smallest values of x2. In the case of rows with only one value of x1 (e.g., "A"), return the row as is. How can I do that? For example, in the above case, the output I want would be: x1 x2 x3 A 1 1.5 B 2 0.9 C 7 1.8 D 7 1.3 Thanks! [[alternative HTML version deleted]]
?subset is probably what you want: subset(x, x1 == 'A') On Oct 2, 1:43?pm, Kavitha Venkatesan <kavitha.venkate... at gmail.com> wrote:> Hi, > > I have a data frame that looks like this: > > >x > > x1 ?x2 ?x3 > A ? 1 ? ?1.5 > B ? 2 ? ?0.9 > B ? 3 ? ?2.7 > C ? 7 ? ?1.8 > D ? 7 ? ?1.3 > > I want to "group" by the x1 column and in the case of multiple x$x1 values > (e.g., "B")d, return rows that have the smallest values of x2. In the case > of rows with only one value of x1 (e.g., "A"), return the row as is. How can > I do that? ?For example, in the above case, the output I want would be: > > x1 ?x2 ?x3 > A ? 1 ? ?1.5 > B ? 2 ? ?0.9 > C ? 7 ? ?1.8 > D ? 7 ? ?1.3 > > Thanks! > > ? ? ? ? [[alternative HTML version deleted]] > > ______________________________________________ > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
try this:> x <- read.table(textConnection("x1 x2 x3+ A 1 1.5 + B 2 0.9 + B 3 2.7 + C 7 1.8 + D 7 1.3"), header=TRUE)> closeAllConnections() > do.call(rbind, lapply(split(seq(nrow(x)), x$x1), function(.row){+ x[.row[which.min(x$x2[.row])],] + })) x1 x2 x3 A A 1 1.5 B B 2 0.9 C C 7 1.8 D D 7 1.3>On Thu, Oct 1, 2009 at 11:43 PM, Kavitha Venkatesan <kavitha.venkatesan at gmail.com> wrote:> Hi, > > I have a data frame that looks like this: > >>x > > x1 ?x2 ?x3 > A ? 1 ? ?1.5 > B ? 2 ? ?0.9 > B ? 3 ? ?2.7 > C ? 7 ? ?1.8 > D ? 7 ? ?1.3 > > I want to "group" by the x1 column and in the case of multiple x$x1 values > (e.g., "B")d, return rows that have the smallest values of x2. In the case > of rows with only one value of x1 (e.g., "A"), return the row as is. How can > I do that? ?For example, in the above case, the output I want would be: > > x1 ?x2 ?x3 > A ? 1 ? ?1.5 > B ? 2 ? ?0.9 > C ? 7 ? ?1.8 > D ? 7 ? ?1.3 > > > Thanks! > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
As is typical with R there are often other ways. Here is another approach that determines the rows of interest with tapply and min, converts those minimums into logical "targets" with %in%, and extracts them from "x" using indexing: x[x$x2 %in% tapply(x$x2, x$x1, min), ] ######## x1 x2 x3 1 A 1 1.5 2 B 2 0.9 4 C 7 1.8 5 D 7 1.3 You might want to determine whether both would return all rows if there were multiple instances of a minimum. I think the above solution would return multiples while the one below would not. You choose based on the nature of the problem. -- David On Oct 2, 2009, at 5:24 AM, jim holtman wrote:> try this: > >> x <- read.table(textConnection("x1 x2 x3 > + A 1 1.5 > + B 2 0.9 > + B 3 2.7 > + C 7 1.8 > + D 7 1.3"), header=TRUE) >> closeAllConnections() >> do.call(rbind, lapply(split(seq(nrow(x)), x$x1), function(.row){ > + x[.row[which.min(x$x2[.row])],] > + })) > x1 x2 x3 > A A 1 1.5 > B B 2 0.9 > C C 7 1.8 > D D 7 1.3 >> > > > On Thu, Oct 1, 2009 at 11:43 PM, Kavitha Venkatesan > <kavitha.venkatesan at gmail.com> wrote: >> Hi, >> >> I have a data frame that looks like this: >> >>> x >> >> x1 x2 x3 >> A 1 1.5 >> B 2 0.9 >> B 3 2.7 >> C 7 1.8 >> D 7 1.3 >> >> I want to "group" by the x1 column and in the case of multiple x$x1 >> values >> (e.g., "B")d, return rows that have the smallest values of x2. In >> the case >> of rows with only one value of x1 (e.g., "A"), return the row as >> is. How can >> I do that? For example, in the above case, the output I want would >> be: >> >> x1 x2 x3 >> A 1 1.5 >> B 2 0.9 >> C 7 1.8 >> D 7 1.3 >> >> >> Thanks! >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You can use aggregate: aggregate(x[,c('x2','x3')], x['x1'], min) On Fri, Oct 2, 2009 at 12:43 AM, Kavitha Venkatesan <kavitha.venkatesan at gmail.com> wrote:> Hi, > > I have a data frame that looks like this: > >>x > > x1 ?x2 ?x3 > A ? 1 ? ?1.5 > B ? 2 ? ?0.9 > B ? 3 ? ?2.7 > C ? 7 ? ?1.8 > D ? 7 ? ?1.3 > > I want to "group" by the x1 column and in the case of multiple x$x1 values > (e.g., "B")d, return rows that have the smallest values of x2. In the case > of rows with only one value of x1 (e.g., "A"), return the row as is. How can > I do that? ?For example, in the above case, the output I want would be: > > x1 ?x2 ?x3 > A ? 1 ? ?1.5 > B ? 2 ? ?0.9 > C ? 7 ? ?1.8 > D ? 7 ? ?1.3 > > > Thanks! > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
On Fri, Oct 2, 2009 at 4:24 AM, jim holtman <jholtman at gmail.com> wrote:> try this: > >> x <- read.table(textConnection("x1 ?x2 ?x3 > + A ? 1 ? ?1.5 > + B ? 2 ? ?0.9 > + B ? 3 ? ?2.7 > + C ? 7 ? ?1.8 > + D ? 7 ? ?1.3"), header=TRUE) >> closeAllConnections() >> do.call(rbind, lapply(split(seq(nrow(x)), x$x1), function(.row){ > + ? ? x[.row[which.min(x$x2[.row])],] > + })) > ?x1 x2 ?x3 > A ?A ?1 1.5 > B ?B ?2 0.9 > C ?C ?7 1.8 > D ?D ?7 1.3 >>Or, using plyr and subset library(plyr) ddply(x, "x1", subset, x2 == min(x2)) Hadley -- http://had.co.nz/
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of hadley wickham > Sent: Friday, October 02, 2009 6:07 AM > To: jim holtman > Cc: r-help at r-project.org; Kavitha Venkatesan > Subject: Re: [R] split-apply question > > On Fri, Oct 2, 2009 at 4:24 AM, jim holtman > <jholtman at gmail.com> wrote: > > try this: > > > >> x <- read.table(textConnection("x1 ?x2 ?x3 > > + A ? 1 ? ?1.5 > > + B ? 2 ? ?0.9 > > + B ? 3 ? ?2.7 > > + C ? 7 ? ?1.8 > > + D ? 7 ? ?1.3"), header=TRUE) > >> closeAllConnections() > >> do.call(rbind, lapply(split(seq(nrow(x)), x$x1), function(.row){ > > + ? ? x[.row[which.min(x$x2[.row])],] > > + })) > > ?x1 x2 ?x3 > > A ?A ?1 1.5 > > B ?B ?2 0.9 > > C ?C ?7 1.8 > > D ?D ?7 1.3 > >> > > Or, using plyr and subset > > library(plyr) > ddply(x, "x1", subset, x2 == min(x2)) > > HadleySince we are using min() we can use sorting tricks f3 <- function(x) { x <- x[with(x, order(x1,x2)),] isFirstInRun <- function(z)c(TRUE, z[-1] != z[-length(z)]) x[isFirstInRun(x$x1),] } This has the advantage that it keeps the original row names intact. It is quick even when there are lots of unique values in x1. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -- > http://had.co.nz/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Possibly Parallel Threads
- Summing Select Columns of a Data Frame?
- Obtaining the value of x at a given value of y in a smooth.spline object
- Converting a character string into a data frame name and performing assignments to that data frame
- Extract data
- Formatting numeric values in a data frame