thr3ads.net - R help - [R] How to generate SE for the proportion value using a randomization process in R? [Jan 2021]

If this information is useful, please help other people find it:
Share via:

Rui Barradas

2021-Jan-23 08:28 UTC

[R] How to generate SE for the proportion value using a randomization process in R?

Hello,

Inline.

?s 07:47 de 23/01/21, Marna Wagley escreveu:> Dear Rui,
> I was wondering whether we have to square root of SD to find SE, right?
No, we don't. var already divides by n, don't divide again.
This is the code, that can be seen by running the function name at a 
command line.


sd
#function (x, na.rm = FALSE)
#sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
#    na.rm = na.rm))
#<bytecode: 0x55f3ce900848>
#<environment: namespace:stats>


> 
> bootprop <- function(data, index){
>  ? ?d <- data[index, ]
>  ? ?sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> }
> 
> R <- 1e3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0? ? ?# original
> sd(b$t)? # bootstrapped estimate of the SE of the sample prop.
> sd(b$t)/sqrt(1000)
> pandit*(1-pandit)
> 
> hist(b$t, freq = FALSE)

Try plotting the normal densities for both cases, the red line is 
clearly wrong.


f <- function(x, xbar, s){
   dnorm(x, mean = xbar, sd = s)
}

hist(b$t, freq = FALSE)
curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue", 
add = TRUE)
curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col = 
"red", add = TRUE)


Hope this helps,

Rui Barradas
> 
> 
> 
> 
> On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <ruipbarradas at sapo.pt 
> <mailto:ruipbarradas at sapo.pt>> wrote:
> 
>     Hello,
> 
>     Something like this, using base package boot?
> 
> 
>     library(boot)
> 
>     bootprop <- function(data, index){
>      ? ?d <- data[index, ]
>      ? ?sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]], na.rm = TRUE)
>     }
> 
>     R <- 1e3
>     set.seed(2020)
>     b <- boot(daT, bootprop, R)
>     b
>     b$t0? ? ?# original
>     sd(b$t)? # bootstrapped estimate of the SE of the sample prop.
>     hist(b$t, freq = FALSE)
> 
> 
>     Hope this helps,
> 
>     Rui Barradas
> 
>     ?s 21:57 de 22/01/21, Marna Wagley escreveu:
>      > Hi All,
>      > I was trying to estimate standard error (SE) for the proportion
>     value using
>      > some kind of randomization process (bootstrapping or jackknifing)
>     in R, but
>      > I could not figure it out.
>      >
>      > Is there any way to generate SE for the proportion?
>      >
>      > The example of the data and the code I am using is attached for
your
>      > reference. I would like to generate the value of proportion with
>     a SE using
>      > a 1000 times randomization.
>      >
>      > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L,
15L, 16L,
>      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
>     = c("id1",
>      > "id10", "id11", "id12",
"id13", "id14", "id15", "id16",
"id17",
>      > "id18", "id19", "Id2",
"id3", "id4", "id5", "id6",
"id7", "id8",
>      > "id9"), class = "factor"), Time1 = c(0L, 1L,
1L, 1L, 0L, 0L,
>      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 =
c(1L,
>      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
>      > 1L, 1L)), .Names = c("Sample", "Time1",
"Time2"), class >     "data.frame",
>      > row.names = c(NA,
>      > -19L))
>      > daT<-data.frame(dat %>%
>      >? ? mutate(Time1.but.not.in.Time2 = case_when(
>      >? ? ? ? ? ? ? Time1 %in% "1" & Time2 %in%
"0"? ~ "1"),
>      > Time2.but.not.in.Time1 = case_when(
>      >? ? ? ? ? ? ? Time1 %in% "0" & Time2 %in%
"1"? ~ "1"),
>      >? ?BothTimes = case_when(
>      >? ? ? ? ? ? ? Time1 %in% "1" & Time2 %in%
"1"? ~ "1")))
>      >? ?daT
>      >? ?summary(daT)
>      >
>      > cols.num <-
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
>      > "BothTimes")
>      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
>      > summary(daT)
>      > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1,
na.rm=T)
>      > ProportionValue
>      > standard error??
>      >
>      >? ? ? ?[[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > R-help at r-project.org <mailto:R-help at r-project.org>
mailing list
>     -- To UNSUBSCRIBE and more, see
>      > https://stat.ethz.ch/mailman/listinfo/r-help
>      > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>      > and provide commented, minimal, self-contained, reproducible
code.
>      >
>

Marna Wagley

2021-Jan-23 08:36 UTC

head link

[R] How to generate SE for the proportion value using a randomization process in R?

Yes Rui, I can see we don't need to divide by square root of sample size.
The example is great to understand it.
Thank you.
Marna


On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> Hello,
>
> Inline.
>
> ?s 07:47 de 23/01/21, Marna Wagley escreveu:
> > Dear Rui,
> > I was wondering whether we have to square root of SD to find SE,
right?
>
> No, we don't. var already divides by n, don't divide again.
> This is the code, that can be seen by running the function name at a
> command line.
>
>
> sd
> #function (x, na.rm = FALSE)
> #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
> #    na.rm = na.rm))
> #<bytecode: 0x55f3ce900848>
> #<environment: namespace:stats>
>
>
>
> >
> > bootprop <- function(data, index){
> >     d <- data[index, ]
> >     sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> > }
> >
> > R <- 1e3
> > set.seed(2020)
> > b <- boot(daT, bootprop, R)
> > b
> > b$t0     # original
> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> > sd(b$t)/sqrt(1000)
> > pandit*(1-pandit)
> >
> > hist(b$t, freq = FALSE)
>
>
> Try plotting the normal densities for both cases, the red line is
> clearly wrong.
>
>
> f <- function(x, xbar, s){
>    dnorm(x, mean = xbar, sd = s)
> }
>
> hist(b$t, freq = FALSE)
> curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
"blue",
> add = TRUE)
> curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col >
"red", add = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> >
> >
> >
> >
> > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <ruipbarradas at
sapo.pt
> > <mailto:ruipbarradas at sapo.pt>> wrote:
> >
> >     Hello,
> >
> >     Something like this, using base package boot?
> >
> >
> >     library(boot)
> >
> >     bootprop <- function(data, index){
> >         d <- data[index, ]
> >         sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]], na.rm > TRUE)
> >     }
> >
> >     R <- 1e3
> >     set.seed(2020)
> >     b <- boot(daT, bootprop, R)
> >     b
> >     b$t0     # original
> >     sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> >     hist(b$t, freq = FALSE)
> >
> >
> >     Hope this helps,
> >
> >     Rui Barradas
> >
> >     ?s 21:57 de 22/01/21, Marna Wagley escreveu:
> >      > Hi All,
> >      > I was trying to estimate standard error (SE) for the
proportion
> >     value using
> >      > some kind of randomization process (bootstrapping or
jackknifing)
> >     in R, but
> >      > I could not figure it out.
> >      >
> >      > Is there any way to generate SE for the proportion?
> >      >
> >      > The example of the data and the code I am using is attached
for
> your
> >      > reference. I would like to generate the value of proportion
with
> >     a SE using
> >      > a 1000 times randomization.
> >      >
> >      > dat<-structure(list(Sample = structure(c(1L, 12L, 13L,
14L, 15L,
> 16L,
> >      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L),
.Label
> >     = c("id1",
> >      > "id10", "id11", "id12",
"id13", "id14", "id15", "id16",
"id17",
> >      > "id18", "id19", "Id2",
"id3", "id4", "id5", "id6",
"id7", "id8",
> >      > "id9"), class = "factor"), Time1 = c(0L,
1L, 1L, 1L, 0L, 0L,
> >      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 =
c(1L,
> >      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L,
0L,
> >      > 1L, 1L)), .Names = c("Sample", "Time1",
"Time2"), class > >     "data.frame",
> >      > row.names = c(NA,
> >      > -19L))
> >      > daT<-data.frame(dat %>%
> >      >    mutate(Time1.but.not.in.Time2 = case_when(
> >      >              Time1 %in% "1" & Time2 %in%
"0"  ~ "1"),
> >      > Time2.but.not.in.Time1 = case_when(
> >      >              Time1 %in% "0" & Time2 %in%
"1"  ~ "1"),
> >      >   BothTimes = case_when(
> >      >              Time1 %in% "1" & Time2 %in%
"1"  ~ "1")))
> >      >   daT
> >      >   summary(daT)
> >      >
> >      > cols.num <-
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> >      > "BothTimes")
> >      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> >      > summary(daT)
> >      > ProportionValue<-sum(daT$BothTimes,
na.rm=T)/sum(daT$Time1,
> na.rm=T)
> >      > ProportionValue
> >      > standard error??
> >      >
> >      >       [[alternative HTML version deleted]]
> >      >
> >      > ______________________________________________
> >      > R-help at r-project.org <mailto:R-help at
r-project.org> mailing list
> >     -- To UNSUBSCRIBE and more, see
> >      > https://stat.ethz.ch/mailman/listinfo/r-help
> >      > PLEASE do read the posting guide
> >     http://www.R-project.org/posting-guide.html
> >      > and provide commented, minimal, self-contained, reproducible
code.
> >      >
> >
>
	[[alternative HTML version deleted]]

R help - Jan 2021 - How to generate SE for the proportion value using a randomization process in R?

[R] How to generate SE for the proportion value using a randomization process in R?

[R] How to generate SE for the proportion value using a randomization process in R?