Marna Wagley
2021-Jan-23 08:36 UTC
[R] How to generate SE for the proportion value using a randomization process in R?
Yes Rui, I can see we don't need to divide by square root of sample size. The example is great to understand it. Thank you. Marna On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Inline. > > ?s 07:47 de 23/01/21, Marna Wagley escreveu: > > Dear Rui, > > I was wondering whether we have to square root of SD to find SE, right? > > No, we don't. var already divides by n, don't divide again. > This is the code, that can be seen by running the function name at a > command line. > > > sd > #function (x, na.rm = FALSE) > #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), > # na.rm = na.rm)) > #<bytecode: 0x55f3ce900848> > #<environment: namespace:stats> > > > > > > > bootprop <- function(data, index){ > > d <- data[index, ] > > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE) > > } > > > > R <- 1e3 > > set.seed(2020) > > b <- boot(daT, bootprop, R) > > b > > b$t0 # original > > sd(b$t) # bootstrapped estimate of the SE of the sample prop. > > sd(b$t)/sqrt(1000) > > pandit*(1-pandit) > > > > hist(b$t, freq = FALSE) > > > Try plotting the normal densities for both cases, the red line is > clearly wrong. > > > f <- function(x, xbar, s){ > dnorm(x, mean = xbar, sd = s) > } > > hist(b$t, freq = FALSE) > curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue", > add = TRUE) > curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col > "red", add = TRUE) > > > Hope this helps, > > Rui Barradas > > > > > > > > > > > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <ruipbarradas at sapo.pt > > <mailto:ruipbarradas at sapo.pt>> wrote: > > > > Hello, > > > > Something like this, using base package boot? > > > > > > library(boot) > > > > bootprop <- function(data, index){ > > d <- data[index, ] > > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm > TRUE) > > } > > > > R <- 1e3 > > set.seed(2020) > > b <- boot(daT, bootprop, R) > > b > > b$t0 # original > > sd(b$t) # bootstrapped estimate of the SE of the sample prop. > > hist(b$t, freq = FALSE) > > > > > > Hope this helps, > > > > Rui Barradas > > > > ?s 21:57 de 22/01/21, Marna Wagley escreveu: > > > Hi All, > > > I was trying to estimate standard error (SE) for the proportion > > value using > > > some kind of randomization process (bootstrapping or jackknifing) > > in R, but > > > I could not figure it out. > > > > > > Is there any way to generate SE for the proportion? > > > > > > The example of the data and the code I am using is attached for > your > > > reference. I would like to generate the value of proportion with > > a SE using > > > a 1000 times randomization. > > > > > > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, > 16L, > > > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label > > = c("id1", > > > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17", > > > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8", > > > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L, > > > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L, > > > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, > > > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class > > "data.frame", > > > row.names = c(NA, > > > -19L)) > > > daT<-data.frame(dat %>% > > > mutate(Time1.but.not.in.Time2 = case_when( > > > Time1 %in% "1" & Time2 %in% "0" ~ "1"), > > > Time2.but.not.in.Time1 = case_when( > > > Time1 %in% "0" & Time2 %in% "1" ~ "1"), > > > BothTimes = case_when( > > > Time1 %in% "1" & Time2 %in% "1" ~ "1"))) > > > daT > > > summary(daT) > > > > > > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1", > > > "BothTimes") > > > daT[cols.num] <- sapply(daT[cols.num],as.numeric) > > > summary(daT) > > > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, > na.rm=T) > > > ProportionValue > > > standard error?? > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list > > -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > >[[alternative HTML version deleted]]
Marna Wagley
2021-Jan-28 19:29 UTC
[R] How to generate SE for the proportion value using a randomization process in R?
Hi Rui, I am sorry for asking you several questions. In the given example, randomizations (reshuffle) were done 1000 times, and its 1000 proportion values (results) are stored and it can be seen using b$t; but I was wondering how the table was randomized (which rows have been missed/or repeated in each randomizing procedure?). Is there any way we can see the randomized table and its associated results? Here in this example, I randomized (or bootstrapped) the table into three times (R=3) so I would like to store these three tables and look at them later to know which rows were repeated/missed. Is there any possibility? The example data and the code is given below. Thank you for your help. #### library(boot) dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1", "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17", "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8", "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame", row.names = c(NA, -19L)) daT<-data.frame(dat %>% mutate(Time1.but.not.in.Time2 = case_when( Time1 %in% "1" & Time2 %in% "0" ~ "1"), Time2.but.not.in.Time1 = case_when( Time1 %in% "0" & Time2 %in% "1" ~ "1"), BothTimes = case_when( Time1 %in% "1" & Time2 %in% "1" ~ "1"))) cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1", "BothTimes") daT[cols.num] <- sapply(daT[cols.num],as.numeric) summary(daT) bootprop <- function(data, index){ d <- data[index, ] sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE) } R <- 3 set.seed(2020) b <- boot(daT, bootprop, R) b b$t0 # original b$t sd(b$t) # bootstrapped estimate of the SE of the sample prop. hist(b$t, freq = FALSE) str(b) b$data b$seed b$sim b$strata ################ On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <marna.wagley at gmail.com> wrote:> Yes Rui, I can see we don't need to divide by square root of sample size. > The example is great to understand it. > Thank you. > Marna > > > On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at sapo.pt> > wrote: > >> Hello, >> >> Inline. >> >> ?s 07:47 de 23/01/21, Marna Wagley escreveu: >> > Dear Rui, >> > I was wondering whether we have to square root of SD to find SE, right? >> >> No, we don't. var already divides by n, don't divide again. >> This is the code, that can be seen by running the function name at a >> command line. >> >> >> sd >> #function (x, na.rm = FALSE) >> #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), >> # na.rm = na.rm)) >> #<bytecode: 0x55f3ce900848> >> #<environment: namespace:stats> >> >> >> >> > >> > bootprop <- function(data, index){ >> > d <- data[index, ] >> > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE) >> > } >> > >> > R <- 1e3 >> > set.seed(2020) >> > b <- boot(daT, bootprop, R) >> > b >> > b$t0 # original >> > sd(b$t) # bootstrapped estimate of the SE of the sample prop. >> > sd(b$t)/sqrt(1000) >> > pandit*(1-pandit) >> > >> > hist(b$t, freq = FALSE) >> >> >> Try plotting the normal densities for both cases, the red line is >> clearly wrong. >> >> >> f <- function(x, xbar, s){ >> dnorm(x, mean = xbar, sd = s) >> } >> >> hist(b$t, freq = FALSE) >> curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue", >> add = TRUE) >> curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col >> "red", add = TRUE) >> >> >> Hope this helps, >> >> Rui Barradas >> >> > >> > >> > >> > >> > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <ruipbarradas at sapo.pt >> > <mailto:ruipbarradas at sapo.pt>> wrote: >> > >> > Hello, >> > >> > Something like this, using base package boot? >> > >> > >> > library(boot) >> > >> > bootprop <- function(data, index){ >> > d <- data[index, ] >> > sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm >> TRUE) >> > } >> > >> > R <- 1e3 >> > set.seed(2020) >> > b <- boot(daT, bootprop, R) >> > b >> > b$t0 # original >> > sd(b$t) # bootstrapped estimate of the SE of the sample prop. >> > hist(b$t, freq = FALSE) >> > >> > >> > Hope this helps, >> > >> > Rui Barradas >> > >> > ?s 21:57 de 22/01/21, Marna Wagley escreveu: >> > > Hi All, >> > > I was trying to estimate standard error (SE) for the proportion >> > value using >> > > some kind of randomization process (bootstrapping or jackknifing) >> > in R, but >> > > I could not figure it out. >> > > >> > > Is there any way to generate SE for the proportion? >> > > >> > > The example of the data and the code I am using is attached for >> your >> > > reference. I would like to generate the value of proportion with >> > a SE using >> > > a 1000 times randomization. >> > > >> > > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, >> 16L, >> > > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label >> > = c("id1", >> > > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17", >> > > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8", >> > > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L, >> > > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 >> c(1L, >> > > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, >> > > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class >> > "data.frame", >> > > row.names = c(NA, >> > > -19L)) >> > > daT<-data.frame(dat %>% >> > > mutate(Time1.but.not.in.Time2 = case_when( >> > > Time1 %in% "1" & Time2 %in% "0" ~ "1"), >> > > Time2.but.not.in.Time1 = case_when( >> > > Time1 %in% "0" & Time2 %in% "1" ~ "1"), >> > > BothTimes = case_when( >> > > Time1 %in% "1" & Time2 %in% "1" ~ "1"))) >> > > daT >> > > summary(daT) >> > > >> > > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1", >> > > "BothTimes") >> > > daT[cols.num] <- sapply(daT[cols.num],as.numeric) >> > > summary(daT) >> > > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, >> na.rm=T) >> > > ProportionValue >> > > standard error?? >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > ______________________________________________ >> > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list >> > -- To UNSUBSCRIBE and more, see >> > > https://stat.ethz.ch/mailman/listinfo/r-help >> > > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > > and provide commented, minimal, self-contained, reproducible >> code. >> > > >> > >> >[[alternative HTML version deleted]]