Hi all, I recently noticed that `qnbinom()` can take a long time to calculate a result if the `size` argument is very small. For example qnbinom(0.5, mu = 3, size = 1e-10) takes ~30 seconds on my computer. I used gdb to step through the qnbinom.c implementation and noticed that in line 106 (https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106) `y` becomes a very large negative number. Later in the function `y` is (as far as I can see) only used as input for `pnbinom()` which is why I would assume that it should be a non-negative integer. I was wondering if this behavior could be considered a bug and should be reported on the bugzilla? I read the instructions at https://www.r-project.org/bugs.html and wasn't quite sure, so I decided to ask here first :) Best, Constantin PS: I tested the code with R 4.0.0 on macOS and the latest unstable version using docker (https://github.com/wch/r-debug). The session info is> sessionInfo()R Under development (unstable) (2020-08-06 r78973) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04 LTS Matrix products: default BLAS: /usr/local/RD/lib/R/lib/libRblas.so LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.1.0
?? I can reproduce this on
R Under development (unstable) (2020-07-24 r78910)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 18.04 LTS
? In my opinion this is worth reporting, but discussing it here first
was a good idea.? Many more people read this list than watch the bug
tracker, so it will get more attention here; once the excitement has
died down here (which might be almost immediately!), if no-one has
already volunteered to post it to the bug tracker, request an account
(as specified at https://www.r-project.org/bugs.html )
? Thanks!
?? Ben Bolker
For what it's worth it doesn't seem to be a threshold effect:
approximately
log10(time[seconds]) ~ -8 - log10(-size)
over the range from 1e-6 to 1e-9
ff <- function(x) {
?? system.time(qnbinom(0.5, mu=3, size=10^x))[["elapsed"]]
}
svec <- seq(-5,-9,by=-0.2)
res <- lapply(svec, function(x) {
??? cat(x,"\n")
??? replicate(10,ff(x))
??? })
dd <- data.frame(size=rep(svec,each=10),
???????????????? time=unlist(res))
boxplot(log10(time)~size, dd)
summary(lm(log10(time)~size, data=dd, subset=time>0))
On 8/7/20 2:01 PM, Constantin Ahlmann-Eltze via R-devel wrote:
> Hi all,
>
> I recently noticed that `qnbinom()` can take a long time to calculate
> a result if the `size` argument is very small.
> For example
> qnbinom(0.5, mu = 3, size = 1e-10)
> takes ~30 seconds on my computer.
>
> I used gdb to step through the qnbinom.c implementation and noticed
> that in line 106
>
(https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106)
> `y` becomes a very large negative number. Later in the function `y` is
> (as far as I can see) only used as input for `pnbinom()` which is why
> I would assume that it should be a non-negative integer.
>
> I was wondering if this behavior could be considered a bug and should
> be reported on the bugzilla? I read the instructions at
> https://www.r-project.org/bugs.html and wasn't quite sure, so I
> decided to ask here first :)
>
> Best,
> Constantin
>
>
>
>
> PS: I tested the code with R 4.0.0 on macOS and the latest unstable
> version using docker (https://github.com/wch/r-debug). The session
> info is
>> sessionInfo()
> R Under development (unstable) (2020-08-06 r78973)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04 LTS
>
> Matrix products: default
> BLAS: /usr/local/RD/lib/R/lib/libRblas.so
> LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.1.0
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
Thanks Ben for verifying the issue. It is always reassuring to hear
when others can reproduce the problem.
I wrote a small patch that fixes the issue
(https://github.com/r-devel/r-svn/pull/11):
diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c
index b313ce56b2..d2e8d98759 100644
--- a/src/nmath/qnbinom.c
+++ b/src/nmath/qnbinom.c
@@ -104,6 +104,7 @@ double qnbinom(double p, double size, double prob,
int lower_tail, int log_p)
/* y := approx.value (Cornish-Fisher expansion) : */
z = qnorm(p, 0., 1., /*lower_tail*/TRUE, /*log_p*/FALSE);
y = R_forceint(mu + sigma * (z + gamma * (z*z - 1) / 6));
+ y = fmax2(0.0, y);
z = pnbinom(y, size, prob, /*lower_tail*/TRUE, /*log_p*/FALSE);
I used the https://github.com/r-devel/r-svn repo and its continuous
integration tools to check that it doesn't break any existing tests:
https://github.com/r-devel/r-svn/actions/runs/201327042
I have also requested a Bugzilla-account, but haven't heard anything back
yet.
Best,
Constantin
Am Fr., 7. Aug. 2020 um 21:41 Uhr schrieb Ben Bolker <bbolker at
gmail.com>:>
> I can reproduce this on
>
> R Under development (unstable) (2020-07-24 r78910)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Pop!_OS 18.04 LTS
>
> In my opinion this is worth reporting, but discussing it here first
> was a good idea. Many more people read this list than watch the bug
> tracker, so it will get more attention here; once the excitement has
> died down here (which might be almost immediately!), if no-one has
> already volunteered to post it to the bug tracker, request an account
> (as specified at https://www.r-project.org/bugs.html )
>
> Thanks!
>
> Ben Bolker
>
>
> For what it's worth it doesn't seem to be a threshold effect:
approximately
>
> log10(time[seconds]) ~ -8 - log10(-size)
>
> over the range from 1e-6 to 1e-9
>
>
> ff <- function(x) {
> system.time(qnbinom(0.5, mu=3, size=10^x))[["elapsed"]]
> }
> svec <- seq(-5,-9,by=-0.2)
> res <- lapply(svec, function(x) {
> cat(x,"\n")
> replicate(10,ff(x))
> })
>
> dd <- data.frame(size=rep(svec,each=10),
> time=unlist(res))
> boxplot(log10(time)~size, dd)
> summary(lm(log10(time)~size, data=dd, subset=time>0))
>
>
>
>
> On 8/7/20 2:01 PM, Constantin Ahlmann-Eltze via R-devel wrote:
>
> > Hi all,
> >
> > I recently noticed that `qnbinom()` can take a long time to calculate
> > a result if the `size` argument is very small.
> > For example
> > qnbinom(0.5, mu = 3, size = 1e-10)
> > takes ~30 seconds on my computer.
> >
> > I used gdb to step through the qnbinom.c implementation and noticed
> > that in line 106
> >
(https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106)
> > `y` becomes a very large negative number. Later in the function `y` is
> > (as far as I can see) only used as input for `pnbinom()` which is why
> > I would assume that it should be a non-negative integer.
> >
> > I was wondering if this behavior could be considered a bug and should
> > be reported on the bugzilla? I read the instructions at
> > https://www.r-project.org/bugs.html and wasn't quite sure, so I
> > decided to ask here first :)
> >
> > Best,
> > Constantin
> >
> >
> >
> >
> > PS: I tested the code with R 4.0.0 on macOS and the latest unstable
> > version using docker (https://github.com/wch/r-debug). The session
> > info is
> >> sessionInfo()
> > R Under development (unstable) (2020-08-06 r78973)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Ubuntu 20.04 LTS
> >
> > Matrix products: default
> > BLAS: /usr/local/RD/lib/R/lib/libRblas.so
> > LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics grDevices utils datasets methods base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_4.1.0
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel