thr3ads.net - R help - AW: [R] Rank and extract data from a series [Sep 2003]

If this information is useful, please help other people find it:
Share via:

"Unternährer Thomas, uth"

2003-Sep-23 12:23 UTC

AW: [R] Rank and extract data from a series

Hi,


>I would like to rank a time-series of data, extract the top ten data items
from this series, determine the
>corresponding row numbers for each value in the sample, and take a mean of
these *row numbers* (not the data).
>I would like to do this in R, rather than pre-process the data on the UNIX
command line if possible, as I need to >calculate other statistics for the
series.
>I understand that I can use 'sort' to order the data, but I am not
aware of a function in R that would allow me
>to extract a given number of these data and then determine their positions
within the original time series.
>e.g.
>Time series:
>1.0 (row 1)
>4.5 (row 2)
>2.3 (row 3)
>1.0 (row 4)
>7.3 (row 5)
>Sort would give me:
>1.0
>1.0
>2.3
>4.5
>7.3
>I would then like to extract the top two data items:
>4.5
>7.3
>and determine their positions within the original (unsorted) time series:
>4.5 = row 2
>7.3 = row 5
>then take a mean:
>2 and 5 = 3.5
>Thanks in advance.
>James Brown
X <- c(1, 4.5, 2.3, 1, 7.3)
X1 <- sort(X, decreasing=TRUE)[1:2]
X2 <- match(X1, X)
mean(X2)



Hope this helps

Thomas


___________________________________________

James Brown

Cambridge Coastal Research Unit (CCRU)
Department of Geography
University of Cambridge
Downing Place
Cambridge
CB2 3EN, UK

Telephone: +44 (0)1223 339776
Mobile: 07929 817546
Fax: +44 (0)1223 355674

E-mail: jdb33 at cam.ac.uk
E-mail: james_510 at hotmail.com

http://www.geog.cam.ac.uk/ccru/CCRU.html
___________________________________________






On Wed, 10 Sep 2003, Jerome Asselin wrote:
> On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote:
> >
> > Your method looks like a naive reimplementation of integration, and 
> > won't work so well for distributions that have the great majority
of
> > the probability mass concentrated in a small fraction of the sample 
> > space.  I was hoping for something that would retain the 
> > adaptability of integrate().
>
> Yesterday, I've suggested to use approxfun(). Did you consider my 
> suggestion? Below is an example.
>
> N <- 500
> x <- rexp(N)
> y <- rank(x)/(N+1)
> empCDF <- approxfun(x,y)
> xvals <- seq(0,4,.01)
> plot(xvals,empCDF(xvals),type="l",
> xlab="Quantile",ylab="Cumulative Distribution
Function")
> lines(xvals,pexp(xvals),lty=2)
> legend(2,.4,c("Empirical CDF","Exact CDF"),lty=1:2)
>
>
> It's possible to tune in some parameters in approxfun() to better 
> match your personal preferences. Have a look at help(approxfun) for 
> details.
>
> HTH,
> Jerome Asselin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Tony Plate

2003-Sep-23 17:44 UTC

head link

AW: [R] Rank and extract data from a series

Using Thomas Untern?hrer's handy example, one could also do:

 > X <- c(1, 4.5, 2.3, 1, 7.3)
 > mean(order(X, decreasing=TRUE)[1:2])
[1] 3.5
 >

I think this will give the same results as Thomas Untern?hrer's suggested 
code in almost all cases, but it is perhaps more concise and direct 
(provided that you don't actually need the values of the top items).

(of course you have to change the 1:2 to 1:10 for your needs).

Note that this question gets tricky if there are ties such that there is no 
unique set of row numbers that identify N "top" items.

For example, consider the following data:

 > X <- c(1,3,2,3,4)

Taking "top two", should the answer be 3.5 (avg of row numbers 2 and
5),
4.5 (avg of row numbers 4 and 5), or 3.666667 (avg of row numbers 2,4 and 5)?

 > mean(order(X, decreasing=TRUE)[1:2])
[1] 3.5
 > order(X, decreasing=TRUE)[1:2]
[1] 5 2
 > # Andy Liaw's suggestion:
 > mean(which(X %in% sort(X, decreasing=TRUE)[1:2]))
[1] 3.666667
 > which(X %in% sort(X, decreasing=TRUE)[1:2])
[1] 2 4 5
 > # Thomas Untern?hrer's suggestion:
 > mean(match(sort(X, decreasing=TRUE)[1:2], X))
[1] 3.5
 > match(sort(X, decreasing=TRUE)[1:2], X)
[1] 5 2
 >

hope this helps,

Tony Plate

At Tuesday 02:23 PM 9/23/2003 +0200, Untern?hrer Thomas, uth wrote:
>Hi,
>
> >I would like to rank a time-series of data, extract the top ten data 
> items from this series, determine the
> >corresponding row numbers for each value in the sample, and take a mean
> of these *row numbers* (not the data).
>
> >I would like to do this in R, rather than pre-process the data on the 
> UNIX command line if possible, as I need to >calculate other statistics 
> for the series.
>
> >I understand that I can use 'sort' to order the data, but I am
not aware
> of a function in R that would allow me
> >to extract a given number of these data and then determine their 
> positions within the original time series.
>
> >e.g.
>
> >Time series:
>
> >1.0 (row 1)
> >4.5 (row 2)
> >2.3 (row 3)
> >1.0 (row 4)
> >7.3 (row 5)
>
> >Sort would give me:
>
> >1.0
> >1.0
> >2.3
> >4.5
> >7.3
>
> >I would then like to extract the top two data items:
>
> >4.5
> >7.3
>
> >and determine their positions within the original (unsorted) time
series:
>
> >4.5 = row 2
> >7.3 = row 5
>
> >then take a mean:
>
> >2 and 5 = 3.5
>
> >Thanks in advance.
>
> >James Brown
>
>X <- c(1, 4.5, 2.3, 1, 7.3)
>X1 <- sort(X, decreasing=TRUE)[1:2]
>X2 <- match(X1, X)
>mean(X2)
>
>
>
>Hope this helps
>
>Thomas
>
>
>___________________________________________
>
>James Brown
>
>Cambridge Coastal Research Unit (CCRU)
>Department of Geography
>University of Cambridge
>Downing Place
>Cambridge
>CB2 3EN, UK
>
>Telephone: +44 (0)1223 339776
>Mobile: 07929 817546
>Fax: +44 (0)1223 355674
>
>E-mail: jdb33 at cam.ac.uk
>E-mail: james_510 at hotmail.com
>
>http://www.geog.cam.ac.uk/ccru/CCRU.html
>___________________________________________
>
>
>
>
>
>
>On Wed, 10 Sep 2003, Jerome Asselin wrote:
>
> > On September 10, 2003 04:03 pm, Kevin S. Van Horn wrote:
> > >
> > > Your method looks like a naive reimplementation of integration,
and
> > > won't work so well for distributions that have the great
majority of
> > > the probability mass concentrated in a small fraction of the
sample
> > > space.  I was hoping for something that would retain the
> > > adaptability of integrate().
> >
> > Yesterday, I've suggested to use approxfun(). Did you consider my
> > suggestion? Below is an example.
> >
> > N <- 500
> > x <- rexp(N)
> > y <- rank(x)/(N+1)
> > empCDF <- approxfun(x,y)
> > xvals <- seq(0,4,.01)
> > plot(xvals,empCDF(xvals),type="l",
> > xlab="Quantile",ylab="Cumulative Distribution
Function")
> > lines(xvals,pexp(xvals),lty=2)
> > legend(2,.4,c("Empirical CDF","Exact
CDF"),lty=1:2)
> >
> >
> > It's possible to tune in some parameters in approxfun() to better
> > match your personal preferences. Have a look at help(approxfun) for
> > details.
> >
> > HTH,
> > Jerome Asselin
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list 
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Tony Plate   tplate at acm.org

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Sep 2003 - AW: Rank and extract data from a series

AW: [R] Rank and extract data from a series

AW: [R] Rank and extract data from a series

Apparently Analagous Threads