thr3ads.net - R help - [R] efficiently picking one row from a data frame per unique key [Apr 2010]

If this information is useful, please help other people find it:
Share via:

James Kebinger

2010-Apr-13 01:33 UTC

[R] efficiently picking one row from a data frame per unique key

Hello all, I'm trying to transform data frames by grouping the rows by the
values in a particular column, ordered by another column, then picking the
first row in each group.

I'd like to convert a data frame like this:

x  y  z
1 10 20
1 11 19
2 12 18
4 13 17

into one with three rows, like this, where i've discarded one row:

 x  y  z
1 1 11 19
2 2 12 18
4 4 13 17

I've got a solution using aggregate, but it gets very slow with any volume
of data - the performance seems mostly IO bound and never finishes with  a
data set ~6MB

Here's how I'm currently trying to do this

 d = data.frame(x=c(1,1,2,4),y=c(10,11,12,13),z=c(20,19,18,17))
d.ordered = d[order(-d$y),]
aggregate(d.ordered,by=list(key=d.ordered$x),FUN=function(x){x[1]})

I've tried to use split and unsplit, but unsplit complained about duplicate
row names when reassembling the sub frames.

thanks for your suggestions

-james

	[[alternative HTML version deleted]]

Phil Spector

2010-Apr-13 01:49 UTC

head link

[R] efficiently picking one row from a data frame per unique key

James -
     If I understand you correctly:

getone = function(df)df[order(df$x,df$y),][1,]

describes what you want from each data frame corresponding
to a unique value of x.   Then, supposing that your data frame
is called df:

sdf = split(df,df$x)

will create a list of data frames for the unique values
of x, and

do.call(rbind,lapply(sdf,getone))

will return a data frame with one row for each unique value
of x.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu


On Mon, 12 Apr 2010, James Kebinger wrote:
> Hello all, I'm trying to transform data frames by grouping the rows by
the
> values in a particular column, ordered by another column, then picking the
> first row in each group.
>
> I'd like to convert a data frame like this:
>
> x  y  z
> 1 10 20
> 1 11 19
> 2 12 18
> 4 13 17
>
> into one with three rows, like this, where i've discarded one row:
>
> x  y  z
> 1 1 11 19
> 2 2 12 18
> 4 4 13 17
>
> I've got a solution using aggregate, but it gets very slow with any
volume
> of data - the performance seems mostly IO bound and never finishes with  a
> data set ~6MB
>
> Here's how I'm currently trying to do this
>
> d = data.frame(x=c(1,1,2,4),y=c(10,11,12,13),z=c(20,19,18,17))
> d.ordered = d[order(-d$y),]
> aggregate(d.ordered,by=list(key=d.ordered$x),FUN=function(x){x[1]})
>
> I've tried to use split and unsplit, but unsplit complained about
duplicate
> row names when reassembling the sub frames.
>
> thanks for your suggestions
>
> -james
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Peter Alspach

2010-Apr-13 01:52 UTC

head link

[R] efficiently picking one row from a data frame per unique key

Tena koe James

You might try duplicated(), or more to the point !duplicated()

orderedData[!duplicated(orderedData$x),]

HTH ....

Peter Alspach
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of James Kebinger
> Sent: Tuesday, 13 April 2010 1:34 p.m.
> To: r-help at r-project.org
> Subject: [R] efficiently picking one row from a data frame per unique
> key
> 
> Hello all, I'm trying to transform data frames by grouping the rows by
> the
> values in a particular column, ordered by another column, then picking
> the
> first row in each group.
> 
> I'd like to convert a data frame like this:
> 
> x  y  z
> 1 10 20
> 1 11 19
> 2 12 18
> 4 13 17
> 
> into one with three rows, like this, where i've discarded one row:
> 
>  x  y  z
> 1 1 11 19
> 2 2 12 18
> 4 4 13 17
> 
> I've got a solution using aggregate, but it gets very slow with any
> volume
> of data - the performance seems mostly IO bound and never finishes
with> a
> data set ~6MB
> 
> Here's how I'm currently trying to do this
> 
>  d = data.frame(x=c(1,1,2,4),y=c(10,11,12,13),z=c(20,19,18,17))
> d.ordered = d[order(-d$y),]
> aggregate(d.ordered,by=list(key=d.ordered$x),FUN=function(x){x[1]})
> 
> I've tried to use split and unsplit, but unsplit complained about
> duplicate
> row names when reassembling the sub frames.
> 
> thanks for your suggestions
> 
> -james
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more seemingly similar threads

R help - Apr 2010 - efficiently picking one row from a data frame per unique key

[R] efficiently picking one row from a data frame per unique key

[R] efficiently picking one row from a data frame per unique key

[R] efficiently picking one row from a data frame per unique key

Maybe Matching Threads