thr3ads.net - R help - [R] any other fast method for median calculation [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Zheng, Xin (NIH) [C]

2009-Apr-14 04:29 UTC

[R] any other fast method for median calculation

Hi there,

I got a data frame with more than 200k columns. How could I get median of each
column fast? mapply is the fastest function I know for that, it's not yet
satisfied though.

It seems function "median" in R calculates median by "sort"
and "mean". I am wondering if there is another function with better
algorithm.

Any hint?

Thanks,

Xin Zheng

S Ellison

2009-Apr-14 09:17 UTC

head link

[R] any other fast method for median calculation

Sorting with an appropriate algorithm is nlog(n), so it's very hard to
get the 'exact' median any faster. However, if you can cope with a less
precise median, you could use a binary search between max(x) and min(x)
with low tolerance or comparatively few iterations. In native R, though,
that isn;t going to be fast; interpreter overhead will likely more than
wipe out any reduction in number of comparisons.

In any case, it looks like you are not constrained by the median
algorithm, but by the number of calls. You might do a lot better with
apply, though > apply(df,2,median)
On my system 200k columns were processed in negligible time by apply
and I'm still waiting for mapply.

S


>>> "Zheng, Xin (NIH) [C]" <zhengxin at mail.nih.gov>
14/04/2009 05:29:40
>>>Hi there,

I got a data frame with more than 200k columns. How could I get median
of each column fast? mapply is the fastest function I know for that,
it's not yet satisfied though. 

It seems function "median" in R calculates median by "sort"
and "mean".
I am wondering if there is another function with better algorithm.

Any hint?

Thanks,

Xin Zheng
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

roger koenker

2009-Apr-14 12:28 UTC

head link

[R] any other fast method for median calculation

There is a slightly faster algorithm in my quantreg package, see  
kuantile()
but this is only significant when sample sizes are very large.  In  
your case
you really need a wrapper that keeps the loop over columns within some
lower level language.

url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    rkoenker at uiuc.edu            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Champaign, IL 61820



On Apr 13, 2009, at 11:29 PM, Zheng, Xin (NIH) [C] wrote:
> Hi there,
>
> I got a data frame with more than 200k columns. How could I get  
> median of each column fast? mapply is the fastest function I know  
> for that, it's not yet satisfied though.
>
> It seems function "median" in R calculates median by
"sort" and
> "mean". I am wondering if there is another function with better  
> algorithm.
>
> Any hint?
>
> Thanks,
>
> Xin Zheng
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more apparently analagous threads

R help - Apr 2009 - any other fast method for median calculation

[R] any other fast method for median calculation

[R] any other fast method for median calculation

[R] any other fast method for median calculation

Reasonably Related Threads