thr3ads.net - R help - [R] How to generate SE for the proportion value using a randomization process in R? [Jan 2021]

If this information is useful, please help other people find it:
Share via:

Rui Barradas

2021-Jan-28 20:21 UTC

[R] How to generate SE for the proportion value using a randomization process in R?

Hello,

I don't know why you would need to see the indices but rewrite the 
function bootprop as

bootprop_ind <- function(data, index){
   d <- data[index, ]
   #sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
   index
}


and call in the same way. It will now return a matrix of indices with R 
= 1000 rows and 19 columns.

Hope this helps,

Rui Barradas


?s 19:29 de 28/01/21, Marna Wagley escreveu:> Hi Rui,
> I am sorry for asking you several questions.
> 
> In the given example, randomizations (reshuffle) were done 1000 times, 
> and its 1000 proportion values (results) are stored and it can be seen 
> using b$t; but I was wondering how the table was randomized (which rows 
> have been missed/or repeated in each randomizing procedure?).
> 
> Is there any way we can see the randomized table and its associated 
> results? Here in this example, I randomized (or bootstrapped) the table 
> into three times (R=3) so I would like to store these three tables and 
> look at them later to know which rows were repeated/missed. Is there any 
> possibility?
> The example data and the code is given below.
> 
> Thank you for your help.
> 
> ####
> library(boot)
> dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label =
c("id1",
> "id10", "id11", "id12", "id13",
"id14", "id15", "id16", "id17",
> "id18", "id19", "Id2", "id3",
"id4", "id5", "id6", "id7",
"id8",
> "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L,
0L, 0L,
> 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> 1L, 1L)), .Names = c("Sample", "Time1",
"Time2"), class = "data.frame",
> row.names = c(NA,
> -19L))
> daT<-data.frame(dat %>%
>  ? mutate(Time1.but.not.in.Time2 = case_when(
>  ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "0" ?~
"1"),
> Time2.but.not.in.Time1 = case_when(
>  ? ? ? ? ? ? Time1 %in% "0" & Time2 %in% "1" ?~
"1"),
>  ?BothTimes = case_when(
>  ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "1" ?~
"1")))
> cols.num <-
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> "BothTimes")
> daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> summary(daT)
> 
> bootprop <- function(data, index){
>  ? ?d <- data[index, ]
>  ? ?sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> }
> 
> R <- 3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0? ? ?# original
> b$t
> sd(b$t)? # bootstrapped estimate of the SE of the sample prop.
> hist(b$t, freq = FALSE)
> 
> str(b)
> b$data
> b$seed
> b$sim
> b$strata
> ################
> 
> 
> On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <marna.wagley at gmail.com
> <mailto:marna.wagley at gmail.com>> wrote:
> 
>     Yes Rui, I can see we don't need to divide by square root of sample
>     size. The example is great to understand it.
>     Thank you.
>     Marna
> 
> 
>     On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at
sapo.pt
>     <mailto:ruipbarradas at sapo.pt>> wrote:
> 
>         Hello,
> 
>         Inline.
> 
>         ?s 07:47 de 23/01/21, Marna Wagley escreveu:
>          > Dear Rui,
>          > I was wondering whether we have to square root of SD to find
>         SE, right?
> 
>         No, we don't. var already divides by n, don't divide again.
>         This is the code, that can be seen by running the function name
>         at a
>         command line.
> 
> 
>         sd
>         #function (x, na.rm = FALSE)
>         #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
>         #? ? na.rm = na.rm))
>         #<bytecode: 0x55f3ce900848>
>         #<environment: namespace:stats>
> 
> 
> 
>          >
>          > bootprop <- function(data, index){
>          >? ? ?d <- data[index, ]
>          >? ? ?sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]],
>         na.rm = TRUE)
>          > }
>          >
>          > R <- 1e3
>          > set.seed(2020)
>          > b <- boot(daT, bootprop, R)
>          > b
>          > b$t0? ? ?# original
>          > sd(b$t)? # bootstrapped estimate of the SE of the sample
prop.
>          > sd(b$t)/sqrt(1000)
>          > pandit*(1-pandit)
>          >
>          > hist(b$t, freq = FALSE)
> 
> 
>         Try plotting the normal densities for both cases, the red line is
>         clearly wrong.
> 
> 
>         f <- function(x, xbar, s){
>          ? ?dnorm(x, mean = xbar, sd = s)
>         }
> 
>         hist(b$t, freq = FALSE)
>         curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col >   
"blue",
>         add = TRUE)
>         curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1,
>         col >         "red", add = TRUE)
> 
> 
>         Hope this helps,
> 
>         Rui Barradas
> 
>          >
>          >
>          >
>          >
>          > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
>         <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>
>          > <mailto:ruipbarradas at sapo.pt <mailto:ruipbarradas at
sapo.pt>>>
>         wrote:
>          >
>          >? ? ?Hello,
>          >
>          >? ? ?Something like this, using base package boot?
>          >
>          >
>          >? ? ?library(boot)
>          >
>          >? ? ?bootprop <- function(data, index){
>          >? ? ? ? ?d <- data[index, ]
>          >? ? ? ? ?sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]],
>         na.rm = TRUE)
>          >? ? ?}
>          >
>          >? ? ?R <- 1e3
>          >? ? ?set.seed(2020)
>          >? ? ?b <- boot(daT, bootprop, R)
>          >? ? ?b
>          >? ? ?b$t0? ? ?# original
>          >? ? ?sd(b$t)? # bootstrapped estimate of the SE of the sample
>         prop.
>          >? ? ?hist(b$t, freq = FALSE)
>          >
>          >
>          >? ? ?Hope this helps,
>          >
>          >? ? ?Rui Barradas
>          >
>          >? ? ??s 21:57 de 22/01/21, Marna Wagley escreveu:
>          >? ? ? > Hi All,
>          >? ? ? > I was trying to estimate standard error (SE) for
the
>         proportion
>          >? ? ?value using
>          >? ? ? > some kind of randomization process (bootstrapping
or
>         jackknifing)
>          >? ? ?in R, but
>          >? ? ? > I could not figure it out.
>          >? ? ? >
>          >? ? ? > Is there any way to generate SE for the proportion?
>          >? ? ? >
>          >? ? ? > The example of the data and the code I am using is
>         attached for your
>          >? ? ? > reference. I would like to generate the value of
>         proportion with
>          >? ? ?a SE using
>          >? ? ? > a 1000 times randomization.
>          >? ? ? >
>          >? ? ? > dat<-structure(list(Sample = structure(c(1L,
12L, 13L,
>         14L, 15L, 16L,
>          >? ? ? > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
>         11L), .Label
>          >? ? ?= c("id1",
>          >? ? ? > "id10", "id11",
"id12", "id13", "id14", "id15",
>         "id16", "id17",
>          >? ? ? > "id18", "id19",
"Id2", "id3", "id4", "id5",
"id6",
>         "id7", "id8",
>          >? ? ? > "id9"), class = "factor"),
Time1 = c(0L, 1L, 1L, 1L,
>         0L, 0L,
>          >? ? ? > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
0L),
>         Time2 = c(1L,
>          >? ? ? > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
>         0L, 1L, 0L,
>          >? ? ? > 1L, 1L)), .Names = c("Sample",
"Time1", "Time2"), class >          >? ?
?"data.frame",
>          >? ? ? > row.names = c(NA,
>          >? ? ? > -19L))
>          >? ? ? > daT<-data.frame(dat %>%
>          >? ? ? >? ? mutate(Time1.but.not.in.Time2 = case_when(
>          >? ? ? >? ? ? ? ? ? ? Time1 %in% "1" & Time2
%in% "0"? ~ "1"),
>          >? ? ? > Time2.but.not.in.Time1 = case_when(
>          >? ? ? >? ? ? ? ? ? ? Time1 %in% "0" & Time2
%in% "1"? ~ "1"),
>          >? ? ? >? ?BothTimes = case_when(
>          >? ? ? >? ? ? ? ? ? ? Time1 %in% "1" & Time2
%in% "1"? ~ "1")))
>          >? ? ? >? ?daT
>          >? ? ? >? ?summary(daT)
>          >? ? ? >
>          >? ? ? > cols.num <-
>        
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
>          >? ? ? > "BothTimes")
>          >? ? ? > daT[cols.num] <-
sapply(daT[cols.num],as.numeric)
>          >? ? ? > summary(daT)
>          >? ? ? > ProportionValue<-sum(daT$BothTimes,
>         na.rm=T)/sum(daT$Time1, na.rm=T)
>          >? ? ? > ProportionValue
>          >? ? ? > standard error??
>          >? ? ? >
>          >? ? ? >? ? ? ?[[alternative HTML version deleted]]
>          >? ? ? >
>          >? ? ? > ______________________________________________
>          >? ? ? > R-help at r-project.org <mailto:R-help at
r-project.org>
>         <mailto:R-help at r-project.org <mailto:R-help at
r-project.org>>
>         mailing list
>          >? ? ?-- To UNSUBSCRIBE and more, see
>          >? ? ? > https://stat.ethz.ch/mailman/listinfo/r-help
>          >? ? ? > PLEASE do read the posting guide
>          > http://www.R-project.org/posting-guide.html
>          >? ? ? > and provide commented, minimal, self-contained,
>         reproducible code.
>          >? ? ? >
>          >
>

Marna Wagley

2021-Jan-28 20:29 UTC

head link

[R] How to generate SE for the proportion value using a randomization process in R?

Thank you Rui,
This is great. How about the following?

SimilatedData<-boot.array(b, indices=T)

seems it is giving the rows ID which are used in the calculation, isn't it?




On Thu, Jan 28, 2021 at 12:21 PM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> Hello,
>
> I don't know why you would need to see the indices but rewrite the
> function bootprop as
>
> bootprop_ind <- function(data, index){
>    d <- data[index, ]
>    #sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]], na.rm = TRUE)
>    index
> }
>
>
> and call in the same way. It will now return a matrix of indices with R
> = 1000 rows and 19 columns.
>
> Hope this helps,
>
> Rui Barradas
>
>
> ?s 19:29 de 28/01/21, Marna Wagley escreveu:
> > Hi Rui,
> > I am sorry for asking you several questions.
> >
> > In the given example, randomizations (reshuffle) were done 1000 times,
> > and its 1000 proportion values (results) are stored and it can be seen
> > using b$t; but I was wondering how the table was randomized (which
rows
> > have been missed/or repeated in each randomizing procedure?).
> >
> > Is there any way we can see the randomized table and its associated
> > results? Here in this example, I randomized (or bootstrapped) the
table
> > into three times (R=3) so I would like to store these three tables and
> > look at them later to know which rows were repeated/missed. Is there
any
> > possibility?
> > The example data and the code is given below.
> >
> > Thank you for your help.
> >
> > ####
> > library(boot)
> > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L,
16L,
> > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label >
c("id1",
> > "id10", "id11", "id12",
"id13", "id14", "id15", "id16",
"id17",
> > "id18", "id19", "Id2", "id3",
"id4", "id5", "id6", "id7",
"id8",
> > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L,
1L, 0L, 0L,
> > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> > 1L, 1L)), .Names = c("Sample", "Time1",
"Time2"), class = "data.frame",
> > row.names = c(NA,
> > -19L))
> > daT<-data.frame(dat %>%
> >    mutate(Time1.but.not.in.Time2 = case_when(
> >              Time1 %in% "1" & Time2 %in% "0" 
~ "1"),
> > Time2.but.not.in.Time1 = case_when(
> >              Time1 %in% "0" & Time2 %in% "1" 
~ "1"),
> >   BothTimes = case_when(
> >              Time1 %in% "1" & Time2 %in% "1" 
~ "1")))
> > cols.num <-
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> > "BothTimes")
> > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> > summary(daT)
> >
> > bootprop <- function(data, index){
> >     d <- data[index, ]
> >     sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> > }
> >
> > R <- 3
> > set.seed(2020)
> > b <- boot(daT, bootprop, R)
> > b
> > b$t0     # original
> > b$t
> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> > hist(b$t, freq = FALSE)
> >
> > str(b)
> > b$data
> > b$seed
> > b$sim
> > b$strata
> > ################
> >
> >
> > On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <marna.wagley at
gmail.com
> > <mailto:marna.wagley at gmail.com>> wrote:
> >
> >     Yes Rui, I can see we don't need to divide by square root of
sample
> >     size. The example is great to understand it.
> >     Thank you.
> >     Marna
> >
> >
> >     On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at
sapo.pt
> >     <mailto:ruipbarradas at sapo.pt>> wrote:
> >
> >         Hello,
> >
> >         Inline.
> >
> >         ?s 07:47 de 23/01/21, Marna Wagley escreveu:
> >          > Dear Rui,
> >          > I was wondering whether we have to square root of SD to
find
> >         SE, right?
> >
> >         No, we don't. var already divides by n, don't divide
again.
> >         This is the code, that can be seen by running the function
name
> >         at a
> >         command line.
> >
> >
> >         sd
> >         #function (x, na.rm = FALSE)
> >         #sqrt(var(if (is.vector(x) || is.factor(x)) x else
as.double(x),
> >         #    na.rm = na.rm))
> >         #<bytecode: 0x55f3ce900848>
> >         #<environment: namespace:stats>
> >
> >
> >
> >          >
> >          > bootprop <- function(data, index){
> >          >     d <- data[index, ]
> >          >     sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]],
> >         na.rm = TRUE)
> >          > }
> >          >
> >          > R <- 1e3
> >          > set.seed(2020)
> >          > b <- boot(daT, bootprop, R)
> >          > b
> >          > b$t0     # original
> >          > sd(b$t)  # bootstrapped estimate of the SE of the sample
prop.
> >          > sd(b$t)/sqrt(1000)
> >          > pandit*(1-pandit)
> >          >
> >          > hist(b$t, freq = FALSE)
> >
> >
> >         Try plotting the normal densities for both cases, the red line
is
> >         clearly wrong.
> >
> >
> >         f <- function(x, xbar, s){
> >             dnorm(x, mean = xbar, sd = s)
> >         }
> >
> >         hist(b$t, freq = FALSE)
> >         curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col
> >         "blue",
> >         add = TRUE)
> >         curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to =
1,
> >         col > >         "red", add = TRUE)
> >
> >
> >         Hope this helps,
> >
> >         Rui Barradas
> >
> >          >
> >          >
> >          >
> >          >
> >          > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
> >         <ruipbarradas at sapo.pt <mailto:ruipbarradas at
sapo.pt>
> >          > <mailto:ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt>>>
> >         wrote:
> >          >
> >          >     Hello,
> >          >
> >          >     Something like this, using base package boot?
> >          >
> >          >
> >          >     library(boot)
> >          >
> >          >     bootprop <- function(data, index){
> >          >         d <- data[index, ]
> >          >         sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]],
> >         na.rm = TRUE)
> >          >     }
> >          >
> >          >     R <- 1e3
> >          >     set.seed(2020)
> >          >     b <- boot(daT, bootprop, R)
> >          >     b
> >          >     b$t0     # original
> >          >     sd(b$t)  # bootstrapped estimate of the SE of the
sample
> >         prop.
> >          >     hist(b$t, freq = FALSE)
> >          >
> >          >
> >          >     Hope this helps,
> >          >
> >          >     Rui Barradas
> >          >
> >          >     ?s 21:57 de 22/01/21, Marna Wagley escreveu:
> >          >      > Hi All,
> >          >      > I was trying to estimate standard error (SE)
for the
> >         proportion
> >          >     value using
> >          >      > some kind of randomization process
(bootstrapping or
> >         jackknifing)
> >          >     in R, but
> >          >      > I could not figure it out.
> >          >      >
> >          >      > Is there any way to generate SE for the
proportion?
> >          >      >
> >          >      > The example of the data and the code I am
using is
> >         attached for your
> >          >      > reference. I would like to generate the value
of
> >         proportion with
> >          >     a SE using
> >          >      > a 1000 times randomization.
> >          >      >
> >          >      > dat<-structure(list(Sample =
structure(c(1L, 12L, 13L,
> >         14L, 15L, 16L,
> >          >      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L,
> >         11L), .Label
> >          >     = c("id1",
> >          >      > "id10", "id11",
"id12", "id13", "id14", "id15",
> >         "id16", "id17",
> >          >      > "id18", "id19",
"Id2", "id3", "id4", "id5",
"id6",
> >         "id7", "id8",
> >          >      > "id9"), class = "factor"),
Time1 = c(0L, 1L, 1L, 1L,
> >         0L, 0L,
> >          >      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L,
1L, 0L),
> >         Time2 = c(1L,
> >          >      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
0L, 1L,
> >         0L, 1L, 0L,
> >          >      > 1L, 1L)), .Names = c("Sample",
"Time1", "Time2"),
> class > >          >     "data.frame",
> >          >      > row.names = c(NA,
> >          >      > -19L))
> >          >      > daT<-data.frame(dat %>%
> >          >      >    mutate(Time1.but.not.in.Time2 = case_when(
> >          >      >              Time1 %in% "1" &
Time2 %in% "0"  ~ "1"),
> >          >      > Time2.but.not.in.Time1 = case_when(
> >          >      >              Time1 %in% "0" &
Time2 %in% "1"  ~ "1"),
> >          >      >   BothTimes = case_when(
> >          >      >              Time1 %in% "1" &
Time2 %in% "1"  ~ "1")))
> >          >      >   daT
> >          >      >   summary(daT)
> >          >      >
> >          >      > cols.num <-
> >        
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> >          >      > "BothTimes")
> >          >      > daT[cols.num] <-
sapply(daT[cols.num],as.numeric)
> >          >      > summary(daT)
> >          >      > ProportionValue<-sum(daT$BothTimes,
> >         na.rm=T)/sum(daT$Time1, na.rm=T)
> >          >      > ProportionValue
> >          >      > standard error??
> >          >      >
> >          >      >       [[alternative HTML version deleted]]
> >          >      >
> >          >      > ______________________________________________
> >          >      > R-help at r-project.org <mailto:R-help at
r-project.org>
> >         <mailto:R-help at r-project.org <mailto:R-help at
r-project.org>>
> >         mailing list
> >          >     -- To UNSUBSCRIBE and more, see
> >          >      > https://stat.ethz.ch/mailman/listinfo/r-help
> >          >      > PLEASE do read the posting guide
> >          > http://www.R-project.org/posting-guide.html
> >          >      > and provide commented, minimal,
self-contained,
> >         reproducible code.
> >          >      >
> >          >
> >
>
	[[alternative HTML version deleted]]

R help - Jan 2021 - How to generate SE for the proportion value using a randomization process in R?

[R] How to generate SE for the proportion value using a randomization process in R?

[R] How to generate SE for the proportion value using a randomization process in R?