thr3ads.net - R help - [R] Speed up studentized confidence intervals ? [Dec 2021]

If this information is useful, please help other people find it:
Share via:

Fox, John

2021-Dec-30 04:25 UTC

[R] Speed up studentized confidence intervals ?

Dear varin sacha,

You didn't correctly adapt the code to the median. The outer call to mean()
in the last line shouldn't be replaced with median() -- it computes the
proportion of intervals that include the population median.

As well, you can't rely on the asymptotics of the bootstrap for a nonlinear
statistic like the median with an n as small as 5, as your example, properly
implemented (and with the code slightly cleaned up), illustrates:
> library(boot)
> set.seed(123)
> s <- rgamma(n=100000, shape=2, rate=5)
> (m <- median(s))
[1] 0.3364465> N <- 1000
> n <- 5
> set.seed(321)
> out <- replicate(N, {+   dat <- data.frame(sample(s, size=n))
+   med <- function(d, i) {
+     median(d[i, ])
+   }
+   boot.out <- boot(data = dat, statistic = med, R = 10000)
+   boot.ci(boot.out, type = "bca")$bca[, 4:5]
+ })> #coverage probability
> mean(out[1, ] < m & m < out[2, ])[1] 0.758


You do get the expected coverage, however, for a larger sample, here with n =
100:
> N <- 1000
> n <- 100
> set.seed(321)
> out <- replicate(N, {+   dat <- data.frame(sample(s, size=n))
+   med <- function(d, i) {
+     median(d[i, ])
+   }
+   boot.out <- boot(data = dat, statistic = med, R = 10000)
+   boot.ci(boot.out, type = "bca")$bca[, 4:5]
+ })> #coverage probability
> mean(out[1, ] < m & m < out[2, ])[1] 0.952

I hope this helps,
 John

-- 
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: http://socserv.mcmaster.ca/jfox/
 
 


?On 2021-12-29, 2:09 PM, "R-help on behalf of varin sacha via R-help"
<r-help-bounces at r-project.org on behalf of r-help at r-project.org>
wrote:

    Dear David,
    Dear Rui,

    Many thanks for your response. It perfectly works for the mean. Now I have a
problem with my R code for the median. Because I always get 1 (100%) coverage
probability that is more than very strange. Indeed, considering that an interval
whose lower limit is the smallest value in the sample and whose upper limit is
the largest value has 1/32 + 1/32 = 1/16 probability of non-coverage, implying
that the confidence of such an interval is 15/16 rather than 1 (100%), I suspect
that the confidence interval I use for the median is not correctly defined for
n=5 observations, and likely contains all observations in the sample ? What is
wrong with my R code ?

    ########################################
    library(boot)

    s=rgamma(n=100000,shape=2,rate=5)
    median(s)

    N <- 100
    out <- replicate(N, {
    a<- sample(s,size=5)
    median(a) 

    dat<-data.frame(a)
    med<-function(d,i) {
    temp<-d[i,]
    median(temp)
    }

      boot.out <- boot(data = dat, statistic = med, R = 10000)
      boot.ci(boot.out, type = "bca")$bca[, 4:5]
    })

    #coverage probability
    median(out[1, ] < median(s) & median(s) < out[2, ])
    ########################################




    Le jeudi 23 d?cembre 2021, 14:10:36 UTC+1, Rui Barradas <ruipbarradas at
sapo.pt> a ?crit :





    Hello,

    The code is running very slowly because you are recreating the function 
    in the replicate() loop and because you are creating a data.frame also 
    in the loop.

    And because in the bootstrap statistic function med() you are computing 
    the variance of yet another loop. This is probably statistically wrong 
    but like David says, without a problem description it's hard to say.

    Also, why compute variances if they are never used?

    Here is complete code executing in much less than 2:00 hours. Note that 
    it passes the vector a directly to med(), not a df with just one column.


    library(boot)

    set.seed(2021)
    s <- sample(178:798, 100000, replace = TRUE)
    mean(s)

    med <- function(d, i) {
      temp <- d[i]
      f <- mean(temp)
      g <- var(temp)
      c(Mean = f, Var = g)
    }

    N <- 1000
    out <- replicate(N, {
      a <- sample(s, size = 5)
      boot.out <- boot(data = a, statistic = med, R = 10000)
      boot.ci(boot.out, type = "stud")$stud[, 4:5]
    })
    mean(out[1, ] < mean(s) & mean(s) < out[2, ])
    #[1] 0.952



    Hope this helps,

    Rui Barradas

    ?s 11:45 de 19/12/21, varin sacha via R-help escreveu:
    > Dear R-experts,
    > 
    > Here below my R code working but really really slowly ! I need 2 hours
with my computer to finally get an answer ! Is there a way to improve my R code
to speed it up ? At least to win 1 hour ;=)
    > 
    > Many thanks
    > 
    > ########################################################
    > library(boot)
    > 
    > s<- sample(178:798, 100000, replace=TRUE)
    > mean(s)
    > 
    > N <- 1000
    > out <- replicate(N, {
    > a<- sample(s,size=5)
    > mean(a)
    > dat<-data.frame(a)
    > 
    > med<-function(d,i) {
    > temp<-d[i,]
    > f<-mean(temp)
    > g<-var(replicate(50,mean(sample(temp,replace=T))))
    > return(c(f,g))
    > 
    > }
    > 
    >    boot.out <- boot(data = dat, statistic = med, R = 10000)
    >    boot.ci(boot.out, type = "stud")$stud[, 4:5]
    > })
    > mean(out[1,] < mean(s) & mean(s) < out[2,])
    > ########################################################
    > 
    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.
    > 

    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

varin sacha

2022-Jan-03 19:49 UTC

head link

[R] Speed up studentized confidence intervals ?

Dear John, Dear Rui,

I really thank you a lot for your R code.

Best,
SV




Le jeudi 30 d?cembre 2021, 05:25:11 UTC+1, Fox, John <jfox at mcmaster.ca>
a ?crit :





Dear varin sacha,

You didn't correctly adapt the code to the median. The outer call to mean()
in the last line shouldn't be replaced with median() -- it computes the
proportion of intervals that include the population median.

As well, you can't rely on the asymptotics of the bootstrap for a nonlinear
statistic like the median with an n as small as 5, as your example, properly
implemented (and with the code slightly cleaned up), illustrates:
> library(boot)
> set.seed(123)
> s <- rgamma(n=100000, shape=2, rate=5)
> (m <- median(s))
[1] 0.3364465> N <- 1000
> n <- 5
> set.seed(321)
> out <- replicate(N, {+? dat <- data.frame(sample(s, size=n))
+? med <- function(d, i) {
+? ? median(d[i, ])
+? }
+? boot.out <- boot(data = dat, statistic = med, R = 10000)
+? boot.ci(boot.out, type = "bca")$bca[, 4:5]
+ })> #coverage probability
> mean(out[1, ] < m & m < out[2, ])[1] 0.758


You do get the expected coverage, however, for a larger sample, here with n =
100:
> N <- 1000
> n <- 100
> set.seed(321)
> out <- replicate(N, {+? dat <- data.frame(sample(s, size=n))
+? med <- function(d, i) {
+? ? median(d[i, ])
+? }
+? boot.out <- boot(data = dat, statistic = med, R = 10000)
+? boot.ci(boot.out, type = "bca")$bca[, 4:5]
+ })> #coverage probability
> mean(out[1, ] < m & m < out[2, ])[1] 0.952

I hope this helps,
John

-- 
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: http://socserv.mcmaster.ca/jfox/




?On 2021-12-29, 2:09 PM, "R-help on behalf of varin sacha via R-help"
<r-help-bounces at r-project.org on behalf of r-help at r-project.org>
wrote:

? ? Dear David,
? ? Dear Rui,

? ? Many thanks for your response. It perfectly works for the mean. Now I have a
problem with my R code for the median. Because I always get 1 (100%) coverage
probability that is more than very strange. Indeed, considering that an interval
whose lower limit is the smallest value in the sample and whose upper limit is
the largest value has 1/32 + 1/32 = 1/16 probability of non-coverage, implying
that the confidence of such an interval is 15/16 rather than 1 (100%), I suspect
that the confidence interval I use for the median is not correctly defined for
n=5 observations, and likely contains all observations in the sample ? What is
wrong with my R code ?

? ? ########################################
? ? library(boot)

? ? s=rgamma(n=100000,shape=2,rate=5)
? ? median(s)

? ? N <- 100
? ? out <- replicate(N, {
? ? a<- sample(s,size=5)
? ? median(a) 

? ? dat<-data.frame(a)
? ? med<-function(d,i) {
? ? temp<-d[i,]
? ? median(temp)
? ? }

? ? ? boot.out <- boot(data = dat, statistic = med, R = 10000)
? ? ? boot.ci(boot.out, type = "bca")$bca[, 4:5]
? ? })

? ? #coverage probability
? ? median(out[1, ] < median(s) & median(s) < out[2, ])
? ? ########################################




? ? Le jeudi 23 d?cembre 2021, 14:10:36 UTC+1, Rui Barradas <ruipbarradas at
sapo.pt> a ?crit :





? ? Hello,

? ? The code is running very slowly because you are recreating the function 
? ? in the replicate() loop and because you are creating a data.frame also 
? ? in the loop.

? ? And because in the bootstrap statistic function med() you are computing 
? ? the variance of yet another loop. This is probably statistically wrong 
? ? but like David says, without a problem description it's hard to say.

? ? Also, why compute variances if they are never used?

? ? Here is complete code executing in much less than 2:00 hours. Note that 
? ? it passes the vector a directly to med(), not a df with just one column.


? ? library(boot)

? ? set.seed(2021)
? ? s <- sample(178:798, 100000, replace = TRUE)
? ? mean(s)

? ? med <- function(d, i) {
? ? ? temp <- d[i]
? ? ? f <- mean(temp)
? ? ? g <- var(temp)
? ? ? c(Mean = f, Var = g)
? ? }

? ? N <- 1000
? ? out <- replicate(N, {
? ? ? a <- sample(s, size = 5)
? ? ? boot.out <- boot(data = a, statistic = med, R = 10000)
? ? ? boot.ci(boot.out, type = "stud")$stud[, 4:5]
? ? })
? ? mean(out[1, ] < mean(s) & mean(s) < out[2, ])
? ? #[1] 0.952



? ? Hope this helps,

? ? Rui Barradas

? ? ?s 11:45 de 19/12/21, varin sacha via R-help escreveu:
? ? > Dear R-experts,
? ? > 
? ? > Here below my R code working but really really slowly ! I need 2 hours
with my computer to finally get an answer ! Is there a way to improve my R code
to speed it up ? At least to win 1 hour ;=)
? ? > 
? ? > Many thanks
? ? > 
? ? > ########################################################
? ? > library(boot)
? ? > 
? ? > s<- sample(178:798, 100000, replace=TRUE)
? ? > mean(s)
? ? > 
? ? > N <- 1000
? ? > out <- replicate(N, {
? ? > a<- sample(s,size=5)
? ? > mean(a)
? ? > dat<-data.frame(a)
? ? > 
? ? > med<-function(d,i) {
? ? > temp<-d[i,]
? ? > f<-mean(temp)
? ? > g<-var(replicate(50,mean(sample(temp,replace=T))))
? ? > return(c(f,g))
? ? > 
? ? > }
? ? > 
? ? >? ? boot.out <- boot(data = dat, statistic = med, R = 10000)
? ? >? ? boot.ci(boot.out, type = "stud")$stud[, 4:5]
? ? > })
? ? > mean(out[1,] < mean(s) & mean(s) < out[2,])
? ? > ########################################################
? ? > 
? ? > ______________________________________________
? ? > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
? ? > https://stat.ethz.ch/mailman/listinfo/r-help
? ? > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
? ? > and provide commented, minimal, self-contained, reproducible code.
? ? > 

? ? ______________________________________________
? ? R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
? ? https://stat.ethz.ch/mailman/listinfo/r-help
? ? PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
? ? and provide commented, minimal, self-contained, reproducible code.

R help - Dec 2021 - Speed up studentized confidence intervals ?

[R] Speed up studentized confidence intervals ?

[R] Speed up studentized confidence intervals ?