Marna Wagley
2021-Jan-28 19:29 UTC
[R] How to generate SE for the proportion value using a randomization process in R?
Hi Rui,
I am sorry for asking you several questions.
In the given example, randomizations (reshuffle) were done 1000 times, and
its 1000 proportion values (results) are stored and it can be seen using
b$t; but I was wondering how the table was randomized (which rows have been
missed/or repeated in each randomizing procedure?).
Is there any way we can see the randomized table and its associated
results? Here in this example, I randomized (or bootstrapped) the table
into three times (R=3) so I would like to store these three tables and look
at them later to know which rows were repeated/missed. Is there any
possibility?
The example data and the code is given below.
Thank you for your help.
####
library(boot)
dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label =
c("id1",
"id10", "id11", "id12", "id13",
"id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3",
"id4", "id5", "id6", "id7",
"id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"),
class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
mutate(Time1.but.not.in.Time2 = case_when(
Time1 %in% "1" & Time2 %in% "0" ~
"1"),
Time2.but.not.in.Time1 = case_when(
Time1 %in% "0" & Time2 %in% "1" ~
"1"),
BothTimes = case_when(
Time1 %in% "1" & Time2 %in% "1" ~
"1")))
cols.num <-
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
}
R <- 3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
b$t
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)
str(b)
b$data
b$seed
b$sim
b$strata
################
On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <marna.wagley at gmail.com>
wrote:
> Yes Rui, I can see we don't need to divide by square root of sample
size.
> The example is great to understand it.
> Thank you.
> Marna
>
>
> On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at
sapo.pt>
> wrote:
>
>> Hello,
>>
>> Inline.
>>
>> ?s 07:47 de 23/01/21, Marna Wagley escreveu:
>> > Dear Rui,
>> > I was wondering whether we have to square root of SD to find SE,
right?
>>
>> No, we don't. var already divides by n, don't divide again.
>> This is the code, that can be seen by running the function name at a
>> command line.
>>
>>
>> sd
>> #function (x, na.rm = FALSE)
>> #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
>> # na.rm = na.rm))
>> #<bytecode: 0x55f3ce900848>
>> #<environment: namespace:stats>
>>
>>
>>
>> >
>> > bootprop <- function(data, index){
>> > d <- data[index, ]
>> > sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]], na.rm = TRUE)
>> > }
>> >
>> > R <- 1e3
>> > set.seed(2020)
>> > b <- boot(daT, bootprop, R)
>> > b
>> > b$t0 # original
>> > sd(b$t) # bootstrapped estimate of the SE of the sample prop.
>> > sd(b$t)/sqrt(1000)
>> > pandit*(1-pandit)
>> >
>> > hist(b$t, freq = FALSE)
>>
>>
>> Try plotting the normal densities for both cases, the red line is
>> clearly wrong.
>>
>>
>> f <- function(x, xbar, s){
>> dnorm(x, mean = xbar, sd = s)
>> }
>>
>> hist(b$t, freq = FALSE)
>> curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
"blue",
>> add = TRUE)
>> curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col
>> "red", add = TRUE)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> >
>> >
>> >
>> >
>> > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <ruipbarradas at
sapo.pt
>> > <mailto:ruipbarradas at sapo.pt>> wrote:
>> >
>> > Hello,
>> >
>> > Something like this, using base package boot?
>> >
>> >
>> > library(boot)
>> >
>> > bootprop <- function(data, index){
>> > d <- data[index, ]
>> > sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]], na.rm >> TRUE)
>> > }
>> >
>> > R <- 1e3
>> > set.seed(2020)
>> > b <- boot(daT, bootprop, R)
>> > b
>> > b$t0 # original
>> > sd(b$t) # bootstrapped estimate of the SE of the sample prop.
>> > hist(b$t, freq = FALSE)
>> >
>> >
>> > Hope this helps,
>> >
>> > Rui Barradas
>> >
>> > ?s 21:57 de 22/01/21, Marna Wagley escreveu:
>> > > Hi All,
>> > > I was trying to estimate standard error (SE) for the
proportion
>> > value using
>> > > some kind of randomization process (bootstrapping or
jackknifing)
>> > in R, but
>> > > I could not figure it out.
>> > >
>> > > Is there any way to generate SE for the proportion?
>> > >
>> > > The example of the data and the code I am using is
attached for
>> your
>> > > reference. I would like to generate the value of
proportion with
>> > a SE using
>> > > a 1000 times randomization.
>> > >
>> > > dat<-structure(list(Sample = structure(c(1L, 12L,
13L, 14L, 15L,
>> 16L,
>> > > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L), .Label
>> > = c("id1",
>> > > "id10", "id11", "id12",
"id13", "id14", "id15", "id16",
"id17",
>> > > "id18", "id19", "Id2",
"id3", "id4", "id5", "id6",
"id7", "id8",
>> > > "id9"), class = "factor"), Time1 =
c(0L, 1L, 1L, 1L, 0L, 0L,
>> > > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L),
Time2 >> c(1L,
>> > > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L,
1L, 0L,
>> > > 1L, 1L)), .Names = c("Sample",
"Time1", "Time2"), class >> >
"data.frame",
>> > > row.names = c(NA,
>> > > -19L))
>> > > daT<-data.frame(dat %>%
>> > > mutate(Time1.but.not.in.Time2 = case_when(
>> > > Time1 %in% "1" & Time2 %in%
"0" ~ "1"),
>> > > Time2.but.not.in.Time1 = case_when(
>> > > Time1 %in% "0" & Time2 %in%
"1" ~ "1"),
>> > > BothTimes = case_when(
>> > > Time1 %in% "1" & Time2 %in%
"1" ~ "1")))
>> > > daT
>> > > summary(daT)
>> > >
>> > > cols.num <-
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
>> > > "BothTimes")
>> > > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
>> > > summary(daT)
>> > > ProportionValue<-sum(daT$BothTimes,
na.rm=T)/sum(daT$Time1,
>> na.rm=T)
>> > > ProportionValue
>> > > standard error??
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > R-help at r-project.org <mailto:R-help at
r-project.org> mailing list
>> > -- To UNSUBSCRIBE and more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained,
reproducible
>> code.
>> > >
>> >
>>
>
[[alternative HTML version deleted]]
Rui Barradas
2021-Jan-28 20:21 UTC
[R] How to generate SE for the proportion value using a randomization process in R?
Hello,
I don't know why you would need to see the indices but rewrite the
function bootprop as
bootprop_ind <- function(data, index){
d <- data[index, ]
#sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
index
}
and call in the same way. It will now return a matrix of indices with R
= 1000 rows and 19 columns.
Hope this helps,
Rui Barradas
?s 19:29 de 28/01/21, Marna Wagley escreveu:> Hi Rui,
> I am sorry for asking you several questions.
>
> In the given example, randomizations (reshuffle) were done 1000 times,
> and its 1000 proportion values (results) are stored and it can be seen
> using b$t; but I was wondering how the table was randomized (which rows
> have been missed/or repeated in each randomizing procedure?).
>
> Is there any way we can see the randomized table and its associated
> results? Here in this example, I randomized (or bootstrapped) the table
> into three times (R=3) so I would like to store these three tables and
> look at them later to know which rows were repeated/missed. Is there any
> possibility?
> The example data and the code is given below.
>
> Thank you for your help.
>
> ####
> library(boot)
> dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label =
c("id1",
> "id10", "id11", "id12", "id13",
"id14", "id15", "id16", "id17",
> "id18", "id19", "Id2", "id3",
"id4", "id5", "id6", "id7",
"id8",
> "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L,
0L, 0L,
> 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> 1L, 1L)), .Names = c("Sample", "Time1",
"Time2"), class = "data.frame",
> row.names = c(NA,
> -19L))
> daT<-data.frame(dat %>%
> ? mutate(Time1.but.not.in.Time2 = case_when(
> ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "0" ?~
"1"),
> Time2.but.not.in.Time1 = case_when(
> ? ? ? ? ? ? Time1 %in% "0" & Time2 %in% "1" ?~
"1"),
> ?BothTimes = case_when(
> ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "1" ?~
"1")))
> cols.num <-
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> "BothTimes")
> daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> summary(daT)
>
> bootprop <- function(data, index){
> ? ?d <- data[index, ]
> ? ?sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> }
>
> R <- 3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0? ? ?# original
> b$t
> sd(b$t)? # bootstrapped estimate of the SE of the sample prop.
> hist(b$t, freq = FALSE)
>
> str(b)
> b$data
> b$seed
> b$sim
> b$strata
> ################
>
>
> On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <marna.wagley at gmail.com
> <mailto:marna.wagley at gmail.com>> wrote:
>
> Yes Rui, I can see we don't need to divide by square root of sample
> size. The example is great to understand it.
> Thank you.
> Marna
>
>
> On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at
sapo.pt
> <mailto:ruipbarradas at sapo.pt>> wrote:
>
> Hello,
>
> Inline.
>
> ?s 07:47 de 23/01/21, Marna Wagley escreveu:
> > Dear Rui,
> > I was wondering whether we have to square root of SD to find
> SE, right?
>
> No, we don't. var already divides by n, don't divide again.
> This is the code, that can be seen by running the function name
> at a
> command line.
>
>
> sd
> #function (x, na.rm = FALSE)
> #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
> #? ? na.rm = na.rm))
> #<bytecode: 0x55f3ce900848>
> #<environment: namespace:stats>
>
>
>
> >
> > bootprop <- function(data, index){
> >? ? ?d <- data[index, ]
> >? ? ?sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]],
> na.rm = TRUE)
> > }
> >
> > R <- 1e3
> > set.seed(2020)
> > b <- boot(daT, bootprop, R)
> > b
> > b$t0? ? ?# original
> > sd(b$t)? # bootstrapped estimate of the SE of the sample
prop.
> > sd(b$t)/sqrt(1000)
> > pandit*(1-pandit)
> >
> > hist(b$t, freq = FALSE)
>
>
> Try plotting the normal densities for both cases, the red line is
> clearly wrong.
>
>
> f <- function(x, xbar, s){
> ? ?dnorm(x, mean = xbar, sd = s)
> }
>
> hist(b$t, freq = FALSE)
> curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col >
"blue",
> add = TRUE)
> curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1,
> col > "red", add = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> >
> >
> >
> >
> > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>
> > <mailto:ruipbarradas at sapo.pt <mailto:ruipbarradas at
sapo.pt>>>
> wrote:
> >
> >? ? ?Hello,
> >
> >? ? ?Something like this, using base package boot?
> >
> >
> >? ? ?library(boot)
> >
> >? ? ?bootprop <- function(data, index){
> >? ? ? ? ?d <- data[index, ]
> >? ? ? ? ?sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]],
> na.rm = TRUE)
> >? ? ?}
> >
> >? ? ?R <- 1e3
> >? ? ?set.seed(2020)
> >? ? ?b <- boot(daT, bootprop, R)
> >? ? ?b
> >? ? ?b$t0? ? ?# original
> >? ? ?sd(b$t)? # bootstrapped estimate of the SE of the sample
> prop.
> >? ? ?hist(b$t, freq = FALSE)
> >
> >
> >? ? ?Hope this helps,
> >
> >? ? ?Rui Barradas
> >
> >? ? ??s 21:57 de 22/01/21, Marna Wagley escreveu:
> >? ? ? > Hi All,
> >? ? ? > I was trying to estimate standard error (SE) for
the
> proportion
> >? ? ?value using
> >? ? ? > some kind of randomization process (bootstrapping
or
> jackknifing)
> >? ? ?in R, but
> >? ? ? > I could not figure it out.
> >? ? ? >
> >? ? ? > Is there any way to generate SE for the proportion?
> >? ? ? >
> >? ? ? > The example of the data and the code I am using is
> attached for your
> >? ? ? > reference. I would like to generate the value of
> proportion with
> >? ? ?a SE using
> >? ? ? > a 1000 times randomization.
> >? ? ? >
> >? ? ? > dat<-structure(list(Sample = structure(c(1L,
12L, 13L,
> 14L, 15L, 16L,
> >? ? ? > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
> 11L), .Label
> >? ? ?= c("id1",
> >? ? ? > "id10", "id11",
"id12", "id13", "id14", "id15",
> "id16", "id17",
> >? ? ? > "id18", "id19",
"Id2", "id3", "id4", "id5",
"id6",
> "id7", "id8",
> >? ? ? > "id9"), class = "factor"),
Time1 = c(0L, 1L, 1L, 1L,
> 0L, 0L,
> >? ? ? > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
0L),
> Time2 = c(1L,
> >? ? ? > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
> 0L, 1L, 0L,
> >? ? ? > 1L, 1L)), .Names = c("Sample",
"Time1", "Time2"), class > >? ?
?"data.frame",
> >? ? ? > row.names = c(NA,
> >? ? ? > -19L))
> >? ? ? > daT<-data.frame(dat %>%
> >? ? ? >? ? mutate(Time1.but.not.in.Time2 = case_when(
> >? ? ? >? ? ? ? ? ? ? Time1 %in% "1" & Time2
%in% "0"? ~ "1"),
> >? ? ? > Time2.but.not.in.Time1 = case_when(
> >? ? ? >? ? ? ? ? ? ? Time1 %in% "0" & Time2
%in% "1"? ~ "1"),
> >? ? ? >? ?BothTimes = case_when(
> >? ? ? >? ? ? ? ? ? ? Time1 %in% "1" & Time2
%in% "1"? ~ "1")))
> >? ? ? >? ?daT
> >? ? ? >? ?summary(daT)
> >? ? ? >
> >? ? ? > cols.num <-
>
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> >? ? ? > "BothTimes")
> >? ? ? > daT[cols.num] <-
sapply(daT[cols.num],as.numeric)
> >? ? ? > summary(daT)
> >? ? ? > ProportionValue<-sum(daT$BothTimes,
> na.rm=T)/sum(daT$Time1, na.rm=T)
> >? ? ? > ProportionValue
> >? ? ? > standard error??
> >? ? ? >
> >? ? ? >? ? ? ?[[alternative HTML version deleted]]
> >? ? ? >
> >? ? ? > ______________________________________________
> >? ? ? > R-help at r-project.org <mailto:R-help at
r-project.org>
> <mailto:R-help at r-project.org <mailto:R-help at
r-project.org>>
> mailing list
> >? ? ?-- To UNSUBSCRIBE and more, see
> >? ? ? > https://stat.ethz.ch/mailman/listinfo/r-help
> >? ? ? > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> >? ? ? > and provide commented, minimal, self-contained,
> reproducible code.
> >? ? ? >
> >
>