thr3ads.net - R help - [R] quantreg speed [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Yunqi Zhang

2014-Nov-15 20:12 UTC

[R] quantreg speed

Hi all,

I'm using quantreg rq() to perform quantile regression on a large data set.
Each record has 4 fields and there are about 18 million records in total. I
wonder if anyone has tried rq() on a large dataset and how long I should
expect it to finish. Or it is simply too large and I should subsample the
data. I would like to have an idea before I start to run and wait forever.

In addition, I will appreciate if anyone could give me an idea how long it
takes for rq() to run approximately for certain dataset size.

Yunqi

	[[alternative HTML version deleted]]

William Dunlap

2014-Nov-16 01:19 UTC

head link

[R] quantreg speed

You can time it yourself on increasingly large subsets of your data.  E.g.,
> dat <- data.frame(x1=rnorm(1e6), x2=rnorm(1e6),
x3=sample(c("A","B","C"),size=1e6,replace=TRUE))> dat$y <- with(dat, x1 + 2*(x3=="B")*x2 + rnorm(1e6))
> t <- vapply(n<-4^(3:10),FUN=function(n){d<-dat[seq_len(n),];print(system.time(rq(data=d, y ~ x1 + x2*x3,
tau=0.9)))},FUN.VALUE=numeric(5))
   user  system elapsed
      0       0       0
   user  system elapsed
      0       0       0
   user  system elapsed
   0.02    0.00    0.01
   user  system elapsed
   0.01    0.00    0.02
   user  system elapsed
   0.10    0.00    0.11
   user  system elapsed
   1.09    0.00    1.10
   user  system elapsed
  13.05    0.02   13.07
   user  system elapsed
 273.30    0.11  273.74> t           [,1] [,2] [,3] [,4] [,5] [,6]  [,7]   [,8]
user.self     0    0 0.02 0.01 0.10 1.09 13.05 273.30
sys.self      0    0 0.00 0.00 0.00 0.00  0.02   0.11
elapsed       0    0 0.01 0.02 0.11 1.10 13.07 273.74
user.child   NA   NA   NA   NA   NA   NA    NA     NA
sys.child    NA   NA   NA   NA   NA   NA    NA     NA

Do some regressions on t["elapsed",] as a function of n and predict up
to
n=10^7.  E.g.,> summary(lm(t["elapsed",] ~ poly(n,4)))
Call:
lm(formula = t["elapsed", ] ~ poly(n, 4))

Residuals:
         1          2          3          4          5          6
 7          8
-2.375e-03 -2.970e-03  4.484e-03  1.674e-03 -8.723e-04  6.096e-05
-9.199e-07  2.715e-09

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)
(Intercept) 3.601e+01  1.261e-03 28564.33 9.46e-14 ***
poly(n, 4)1 2.493e+02  3.565e-03 69917.04 6.45e-15 ***
poly(n, 4)2 5.093e+01  3.565e-03 14284.61 7.57e-13 ***
poly(n, 4)3 1.158e+00  3.565e-03   324.83 6.43e-08 ***
poly(n, 4)4 4.392e-02  3.565e-03    12.32  0.00115 **
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 0.003565 on 3 degrees of freedom
Multiple R-squared:      1,     Adjusted R-squared:      1
F-statistic: 1.273e+09 on 4 and 3 DF,  p-value: 3.575e-14


It does not look good for n=10^7.



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Nov 15, 2014 at 12:12 PM, Yunqi Zhang <yqzhang at ucsd.edu> wrote:
> Hi all,
>
> I'm using quantreg rq() to perform quantile regression on a large data
set.
> Each record has 4 fields and there are about 18 million records in total. I
> wonder if anyone has tried rq() on a large dataset and how long I should
> expect it to finish. Or it is simply too large and I should subsample the
> data. I would like to have an idea before I start to run and wait forever.
>
> In addition, I will appreciate if anyone could give me an idea how long it
> takes for rq() to run approximately for certain dataset size.
>
> Yunqi
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more maybe matching threads

R help - Nov 2014 - quantreg speed

[R] quantreg speed

[R] quantreg speed

Maybe Matching Threads