thr3ads.net - R help - [R] Select top three values from data frame [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Noah Silverman

2009-Aug-26 08:36 UTC

[R] Select top three values from data frame

Hi,

I'm trying to find an easy way to do this.

I want to select the top three values of a specific column in a subset 
of rows in a data.frame.  I'll demonstrate.

A    B    C
x    2    1
x    4    1
x    3    2
y    1    5
y    2    6
y    3    8


I want the top 3 values of B from the data.frame where A=X and C <2

I could extract all the rows where C<2, then sort by B, then take the 
first 3.  But that seems like the wrong way around, and it also will get 
messy with real data of over 100 columns.

Any suggestions?

Petr PIKAL

2009-Aug-26 09:38 UTC

head link

[R] Odp: Select top three values from data frame

Hi

r-help-bounces at r-project.org napsal dne 26.08.2009 10:36:22:
> Hi,
> 
> I'm trying to find an easy way to do this.
> 
> I want to select the top three values of a specific column in a subset 
> of rows in a data.frame.  I'll demonstrate.
> 
> A    B    C
> x    2    1
> x    4    1
> x    3    2
> y    1    5
> y    2    6
> y    3    8
> 
> 
> I want the top 3 values of B from the data.frame where A=X and C <2
> 
> I could extract all the rows where C<2, then sort by B, then take the 
> first 3.  But that seems like the wrong way around, and it also will get 
> messy with real data of over 100 columns.
One way is to use subset, order and head

head(subset(your.data[order(your.data$B, decreasing=T),], subset = C<2 & 
A=="x"), 3)

Regards
Petr

> 
> Any suggestions?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Mohamed Lajnef

2009-Aug-26 09:41 UTC

head link

[R] Select top three values from data frame

Noah Silverman a ?crit :> Hi,
>
> I'm trying to find an easy way to do this.
>
> I want to select the top three values of a specific column in a subset 
> of rows in a data.frame.  I'll demonstrate.
>
Hi,
did you try this?

data[data$A=='x'& data$C<2,]$B # data = your data
frame> A    B    C
> x    2    1
> x    4    1
> x    3    2
> y    1    5
> y    2    6
> y    3    8
>
>
> I want the top 3 values of B from the data.frame where A=X and C <2
>
> I could extract all the rows where C<2, then sort by B, then take the 
> first 3.  But that seems like the wrong way around, and it also will 
> get messy with real data of over 100 columns.
>
> Any suggestions?
regards

ML>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Mohamed Lajnef
INSERM Unit? 955. 
40 rue de Mesly. 94000 Cr?teil.
Courriel : Mohamed.lajnef at inserm.fr 
tel. : 01 49 81 31 31 (poste 18470)
Sec : 01 49 81 32 90
fax : 01 49 81 30 99

Ottorino-Luca Pantani

2009-Aug-26 09:46 UTC

head link

[R] Select top three values from data frame

df.mydata[df.mydata$A=="X" AND df.mydata$C < 2, ]
will do the job ?

8rino

Noah Silverman ha scritto:> Hi,
>
> I'm trying to find an easy way to do this.
>
> I want to select the top three values of a specific column in a subset 
> of rows in a data.frame.  I'll demonstrate.
>
> A    B    C
> x    2    1
> x    4    1
> x    3    2
> y    1    5
> y    2    6
> y    3    8
>
>
> I want the top 3 values of B from the data.frame where A=X and C <2
>
> I could extract all the rows where C<2, then sort by B, then take the 
> first 3.  But that seems like the wrong way around, and it also will 
> get messy with real data of over 100 columns.
>
> Any suggestions?
>

Don MacQueen

2009-Aug-26 15:23 UTC

head link

[R] Select top three values from data frame

Do you want just the values (i.e., a vector), or do you also want the 
corresponding rows of the data frame?

What if there is a tie, or do you know in advance that within any 
particular subset the values of B are unique?

What if the subset that meets the constraints has fewer than 3 unique 
values? (which I think is the case in your example)


    tail(  unique( sort( df$B[  df$A=='x' & df$C < 2 ] ) ) ,3 )

Should do it (but I haven't tested).

Why does it get messy with over 100 columns?
I'll pretend for the moment that you have exactly 100 columns:
    1)  you will be doing this many times, each time with a different 
sets of 3 columns?
    2)  you want the three highest values in each of 98 columns based 
on constraints on the other two?
    3)  you want the three highest values of B based on constraints on 
all of the other 99 columns?

Depending on what changes when more columns are involved, you might 
be able to loop over columns with syntax like,

    for (nm in c('B','D','E') )   tail(  unique( sort(
df[[nm]][
df$A=='x' & df$C < 2 ] ) ) ,3 )

-Don

At 1:36 AM -0700 8/26/09, Noah Silverman wrote:>Hi,
>
>I'm trying to find an easy way to do this.
>
>I want to select the top three values of a specific column in a 
>subset of rows in a data.frame.  I'll demonstrate.
>
>A    B    C
>x    2    1
>x    4    1
>x    3    2
>y    1    5
>y    2    6
>y    3    8
>
>
>I want the top 3 values of B from the data.frame where A=X and C <2
>
>I could extract all the rows where C<2, then sort by B, then take 
>the first 3.  But that seems like the wrong way around, and it also 
>will get messy with real data of over 100 columns.
>
>Any suggestions?
>
>______________________________________________
>R-help at r-project.org mailing list
>https://*stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

Maybe Matching Threads

Search for more apparently analagous threads

R help - Aug 2009 - Select top three values from data frame

[R] Select top three values from data frame

[R] Odp: Select top three values from data frame

[R] Select top three values from data frame

[R] Select top three values from data frame

[R] Select top three values from data frame

Maybe Matching Threads