thr3ads.net - R help - [R] split-apply question [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Kavitha Venkatesan

2009-Oct-02 03:43 UTC

[R] split-apply question

Hi,

I have a data frame that looks like this:
>x
x1  x2  x3
A   1    1.5
B   2    0.9
B   3    2.7
C   7    1.8
D   7    1.3

I want to "group" by the x1 column and in the case of multiple x$x1
values
(e.g., "B")d, return rows that have the smallest values of x2. In the
case
of rows with only one value of x1 (e.g., "A"), return the row as is.
How can
I do that?  For example, in the above case, the output I want would be:

x1  x2  x3
A   1    1.5
B   2    0.9
C   7    1.8
D   7    1.3


Thanks!

	[[alternative HTML version deleted]]

andrew

2009-Oct-02 04:53 UTC

head link

[R] split-apply question

?subset is probably what you want:

subset(x, x1 == 'A')

On Oct 2, 1:43?pm, Kavitha Venkatesan <kavitha.venkate... at gmail.com>
wrote:> Hi,
>
> I have a data frame that looks like this:
>
> >x
>
> x1 ?x2 ?x3
> A ? 1 ? ?1.5
> B ? 2 ? ?0.9
> B ? 3 ? ?2.7
> C ? 7 ? ?1.8
> D ? 7 ? ?1.3
>
> I want to "group" by the x1 column and in the case of multiple
x$x1 values
> (e.g., "B")d, return rows that have the smallest values of x2. In
the case
> of rows with only one value of x1 (e.g., "A"), return the row as
is. How can
> I do that? ?For example, in the above case, the output I want would be:
>
> x1 ?x2 ?x3
> A ? 1 ? ?1.5
> B ? 2 ? ?0.9
> C ? 7 ? ?1.8
> D ? 7 ? ?1.3
>
> Thanks!
>
> ? ? ? ? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-h... at r-project.org mailing
listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

jim holtman

2009-Oct-02 09:24 UTC

head link

[R] split-apply question

try this:
> x <- read.table(textConnection("x1  x2  x3+ A   1    1.5
+ B   2    0.9
+ B   3    2.7
+ C   7    1.8
+ D   7    1.3"), header=TRUE)> closeAllConnections()
> do.call(rbind, lapply(split(seq(nrow(x)), x$x1), function(.row){+     x[.row[which.min(x$x2[.row])],]
+ }))
  x1 x2  x3
A  A  1 1.5
B  B  2 0.9
C  C  7 1.8
D  D  7 1.3>

On Thu, Oct 1, 2009 at 11:43 PM, Kavitha Venkatesan
<kavitha.venkatesan at gmail.com> wrote:> Hi,
>
> I have a data frame that looks like this:
>
>>x
>
> x1 ?x2 ?x3
> A ? 1 ? ?1.5
> B ? 2 ? ?0.9
> B ? 3 ? ?2.7
> C ? 7 ? ?1.8
> D ? 7 ? ?1.3
>
> I want to "group" by the x1 column and in the case of multiple
x$x1 values
> (e.g., "B")d, return rows that have the smallest values of x2. In
the case
> of rows with only one value of x1 (e.g., "A"), return the row as
is. How can
> I do that? ?For example, in the above case, the output I want would be:
>
> x1 ?x2 ?x3
> A ? 1 ? ?1.5
> B ? 2 ? ?0.9
> C ? 7 ? ?1.8
> D ? 7 ? ?1.3
>
>
> Thanks!
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

David Winsemius

2009-Oct-02 12:34 UTC

head link

[R] split-apply question

As is typical with R there are often other ways. Here is another  
approach that determines the rows of interest with tapply and min,  
converts those minimums into logical "targets" with %in%, and extracts
them from "x" using indexing:

x[x$x2 %in% tapply(x$x2, x$x1, min), ]

########
   x1 x2  x3
1  A  1 1.5
2  B  2 0.9
4  C  7 1.8
5  D  7 1.3

You might want to determine whether both would return all rows if  
there were multiple instances of a minimum. I think the above solution  
would return multiples while the one below would not. You choose based  
on the nature of the problem.

-- 
David


On Oct 2, 2009, at 5:24 AM, jim holtman wrote:
> try this:
>
>> x <- read.table(textConnection("x1  x2  x3
> + A   1    1.5
> + B   2    0.9
> + B   3    2.7
> + C   7    1.8
> + D   7    1.3"), header=TRUE)
>> closeAllConnections()
>> do.call(rbind, lapply(split(seq(nrow(x)), x$x1), function(.row){
> +     x[.row[which.min(x$x2[.row])],]
> + }))
>  x1 x2  x3
> A  A  1 1.5
> B  B  2 0.9
> C  C  7 1.8
> D  D  7 1.3
>>
>
>
> On Thu, Oct 1, 2009 at 11:43 PM, Kavitha Venkatesan
> <kavitha.venkatesan at gmail.com> wrote:
>> Hi,
>>
>> I have a data frame that looks like this:
>>
>>> x
>>
>> x1  x2  x3
>> A   1    1.5
>> B   2    0.9
>> B   3    2.7
>> C   7    1.8
>> D   7    1.3
>>
>> I want to "group" by the x1 column and in the case of
multiple x$x1
>> values
>> (e.g., "B")d, return rows that have the smallest values of
x2. In
>> the case
>> of rows with only one value of x1 (e.g., "A"), return the row
as
>> is. How can
>> I do that?  For example, in the above case, the output I want would  
>> be:
>>
>> x1  x2  x3
>> A   1    1.5
>> B   2    0.9
>> C   7    1.8
>> D   7    1.3
>>
>>
>> Thanks!
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Henrique Dallazuanna

2009-Oct-02 12:49 UTC

head link

[R] split-apply question

You can use aggregate:

aggregate(x[,c('x2','x3')], x['x1'], min)

On Fri, Oct 2, 2009 at 12:43 AM, Kavitha Venkatesan
<kavitha.venkatesan at gmail.com> wrote:> Hi,
>
> I have a data frame that looks like this:
>
>>x
>
> x1 ?x2 ?x3
> A ? 1 ? ?1.5
> B ? 2 ? ?0.9
> B ? 3 ? ?2.7
> C ? 7 ? ?1.8
> D ? 7 ? ?1.3
>
> I want to "group" by the x1 column and in the case of multiple
x$x1 values
> (e.g., "B")d, return rows that have the smallest values of x2. In
the case
> of rows with only one value of x1 (e.g., "A"), return the row as
is. How can
> I do that? ?For example, in the above case, the output I want would be:
>
> x1 ?x2 ?x3
> A ? 1 ? ?1.5
> B ? 2 ? ?0.9
> C ? 7 ? ?1.8
> D ? 7 ? ?1.3
>
>
> Thanks!
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

hadley wickham

2009-Oct-02 13:07 UTC

head link

[R] split-apply question

On Fri, Oct 2, 2009 at 4:24 AM, jim holtman <jholtman at gmail.com>
wrote:> try this:
>
>> x <- read.table(textConnection("x1 ?x2 ?x3
> + A ? 1 ? ?1.5
> + B ? 2 ? ?0.9
> + B ? 3 ? ?2.7
> + C ? 7 ? ?1.8
> + D ? 7 ? ?1.3"), header=TRUE)
>> closeAllConnections()
>> do.call(rbind, lapply(split(seq(nrow(x)), x$x1), function(.row){
> + ? ? x[.row[which.min(x$x2[.row])],]
> + }))
> ?x1 x2 ?x3
> A ?A ?1 1.5
> B ?B ?2 0.9
> C ?C ?7 1.8
> D ?D ?7 1.3
>>
Or, using plyr and subset

library(plyr)
ddply(x, "x1", subset, x2 == min(x2))

Hadley

-- 
http://had.co.nz/

William Dunlap

2009-Oct-02 17:51 UTC

head link

[R] split-apply question

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of hadley wickham
> Sent: Friday, October 02, 2009 6:07 AM
> To: jim holtman
> Cc: r-help at r-project.org; Kavitha Venkatesan
> Subject: Re: [R] split-apply question
> 
> On Fri, Oct 2, 2009 at 4:24 AM, jim holtman 
> <jholtman at gmail.com> wrote:
> > try this:
> >
> >> x <- read.table(textConnection("x1 ?x2 ?x3
> > + A ? 1 ? ?1.5
> > + B ? 2 ? ?0.9
> > + B ? 3 ? ?2.7
> > + C ? 7 ? ?1.8
> > + D ? 7 ? ?1.3"), header=TRUE)
> >> closeAllConnections()
> >> do.call(rbind, lapply(split(seq(nrow(x)), x$x1), function(.row){
> > + ? ? x[.row[which.min(x$x2[.row])],]
> > + }))
> > ?x1 x2 ?x3
> > A ?A ?1 1.5
> > B ?B ?2 0.9
> > C ?C ?7 1.8
> > D ?D ?7 1.3
> >>
> 
> Or, using plyr and subset
> 
> library(plyr)
> ddply(x, "x1", subset, x2 == min(x2))
> 
> Hadley
Since we are using min() we can use sorting tricks

f3 <- function(x) {
   x <- x[with(x, order(x1,x2)),]
   isFirstInRun <- function(z)c(TRUE, z[-1] != z[-length(z)])
   x[isFirstInRun(x$x1),]
}

This has the advantage that it keeps the original row names intact.
It is quick even when there are lots of unique values in x1.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  > -- 
> http://had.co.nz/
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Maybe Matching Threads

Search for more reasonably related threads

R help - Oct 2009 - split-apply question

[R] split-apply question

[R] split-apply question

[R] split-apply question

[R] split-apply question

[R] split-apply question

[R] split-apply question

[R] split-apply question

Maybe Matching Threads