Dear R-help, I am trying to write a function to simulate datasets of size n which contain two time-to-event outcome variables with associated 'Event'/'Censored' indicator variables (flag1 and flag2 respectively). One of these indicator variables needs to be dependent on the other, so I am creating the first and trying to use this to create the second using an if/else statement. My data structure needs to follow this algorithm (for each row of the data): If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero otherwise I can set up this example quite simply using if else statements, but this is incredibly inefficient when running thousands of datasets: data<-as.data.frame(rbinom(10,1,0.5)) colnames(data)<-'flag1' for (i in 1:n) { if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else {data$flag2[i]<-rbinom(1,1,0.5)} } I think to speed up the simulations I would be better changing to vectorisation and using something like: ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) but the rbinom statements here generate one value and repeat this draw for every element of flag2 that matches the 'if' statement on flag1. Is there a way to assign flag2 to a new bernoulli draw for each subject in the data frame with flag1=1? I hope my question is clear, and thank you in advance for your help. Thanks, Natalie PhD student, Reading University P.S. I am using R 2.12.1 on Windows 7. -- View this message in context: http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if-else-statement-in-simulations-tp4633725.html Sent from the R help mailing list archive at Nabble.com.
William Dunlap
2012-Jun-18 21:24 UTC
[R] Trying to speed up an if/else statement in simulations
inline below Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of nqf > Sent: Monday, June 18, 2012 10:30 AM > To: r-help at r-project.org > Subject: [R] Trying to speed up an if/else statement in simulations > > Dear R-help, > > I am trying to write a function to simulate datasets of size n which contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these indicator > variables needs to be dependent on the other, so I am creating the first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { > if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else > {data$flag2[i]<-rbinom(1,1,0.5)} > }Do you mean n <- 10 data <- data.frame(flag1 = rbinom(n, 1, 0.5)) for(i in seq_len(n)) { if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else {data$flag2[i]<-rbinom(1,1,0.5)} } ?> > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this draw for > every element of flag2 that matches the 'if' statement on flag1.Assuming that n is nrow(data), use data$flag2 <- ifelse(data$flag1==1, rbinom(n, 1, 0.95), rbinom(n, 1, 0.5)) so you get n independent draws from each distribution instead of 1. I prefer to factor out the common arguments as in data$flag2 <- rbinom(n, 1, ifelse(data$flag1==1, 0.95, 0.5))> > Is there a way to assign flag2 to a new bernoulli draw for each subject in > the data frame with flag1=1? > > I hope my question is clear, and thank you in advance for your help. > > Thanks, > Natalie > PhD student, Reading University > > P.S. I am using R 2.12.1 on Windows 7. > > > -- > View this message in context: http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if- > else-statement-in-simulations-tp4633725.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2012-Jun-18 21:32 UTC
[R] Trying to speed up an if/else statement in simulations
On Jun 18, 2012, at 1:29 PM, nqf wrote:> Dear R-help, > > I am trying to write a function to simulate datasets of size n which > contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these > indicator > variables needs to be dependent on the other, so I am creating the > first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of > the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero > otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but > this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { > if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else > {data$flag2[i]<-rbinom(1,1,0.5)} > } > > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this > draw for > every element of flag2 that matches the 'if' statement on flag1. > > Is there a way to assign flag2 to a new bernoulli draw for each > subject in > the data frame with flag1=1?If the parameters for the the Bernoulli draws stay the same, as they appear to do then all you need to do is read the help page for `rbinom` and use the appropriate call to create vectors that are as long as data$flag. -- David Winsemius, MD West Hartford, CT
Peter Alspach
2012-Jun-18 21:39 UTC
[R] Trying to speed up an if/else statement in simulations
Tena koe Natalie Does this do what you want? set.seed(123) natalie <- matrix(NA, nrow=10^5, ncol=2, dimnames=list(NULL, paste0('flag', 1:2))) natalie[,'flag1'] <- rbinom(nrow(natalie), 1, 0.5) natalie[natalie[,'flag1']==1, 'flag2'] <- rbinom(sum(natalie[,'flag1']), 1, 0.95) natalie[natalie[,'flag1']==0, 'flag2'] <- rbinom(nrow(natalie)-sum(natalie[,'flag1']), 1, 0.5) tapply(natalie[,'flag2'], natalie[,'flag1'], table) HTH .... Peter Alspach -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of nqf Sent: Tuesday, 19 June 2012 5:30 a.m. To: r-help at r-project.org Subject: [R] Trying to speed up an if/else statement in simulations Dear R-help, I am trying to write a function to simulate datasets of size n which contain two time-to-event outcome variables with associated 'Event'/'Censored' indicator variables (flag1 and flag2 respectively). One of these indicator variables needs to be dependent on the other, so I am creating the first and trying to use this to create the second using an if/else statement. My data structure needs to follow this algorithm (for each row of the data): If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero otherwise I can set up this example quite simply using if else statements, but this is incredibly inefficient when running thousands of datasets: data<-as.data.frame(rbinom(10,1,0.5)) colnames(data)<-'flag1' for (i in 1:n) { if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else {data$flag2[i]<-rbinom(1,1,0.5)} } I think to speed up the simulations I would be better changing to vectorisation and using something like: ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) but the rbinom statements here generate one value and repeat this draw for every element of flag2 that matches the 'if' statement on flag1. Is there a way to assign flag2 to a new bernoulli draw for each subject in the data frame with flag1=1? I hope my question is clear, and thank you in advance for your help. Thanks, Natalie PhD student, Reading University P.S. I am using R 2.12.1 on Windows 7. -- View this message in context: http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if-else-statement-in-simulations-tp4633725.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited.
R. Michael Weylandt
2012-Jun-18 21:46 UTC
[R] Trying to speed up an if/else statement in simulations
I might try something like: data[data$flag1 == 1, "flag2"] <- runif(sum(data$flag1 == 1)) < 0.95 and similarly for the other case. Hope this helps, Michael On Mon, Jun 18, 2012 at 12:29 PM, nqf <n.j.franklin at pgr.reading.ac.uk> wrote:> Dear R-help, > > I am trying to write a function to simulate datasets of size n which contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these indicator > variables needs to be dependent on the other, so I am creating the first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { > ?if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else > {data$flag2[i]<-rbinom(1,1,0.5)} > ?} > > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this draw for > every element of flag2 that matches the 'if' statement on flag1. > > Is there a way to assign flag2 to a new bernoulli draw for each subject in > the data frame with flag1=1? > > I hope my question is clear, and thank you in advance for your help. > > Thanks, > Natalie > PhD student, Reading University > > P.S. I am using R 2.12.1 on Windows 7. > > > -- > View this message in context: http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if-else-statement-in-simulations-tp4633725.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2012-Jun-19 00:08 UTC
[R] Trying to speed up an if/else statement in simulations
Hello, Try creating an index vector. In this case, I've called it 'one'. Then, 'n.one' is the number of ones in flag1. (Since data is an R function, I've renamed your example dat). n <- 10 # number of observations # First make up some data set.seed(123) dat <- data.frame(flag1=rbinom(n, 1, 0.5)) dat$flag2 <- integer(n) # creates flag2, all zeros. one <- dat$flag1 == 1 # logical index vector n.one <- sum(ix) # how many dat$flag2[one] <- rbinom(n.one, 1, 0.95) # whenever TRUE, set dat$flag2[!one] <- rbinom(n - n.one, 1, 0.50) # negation of 'one' dat Simple, no? And always faster. Hope this helps, Rui Barradas Em 18-06-2012 18:29, nqf escreveu:> Dear R-help, > > I am trying to write a function to simulate datasets of size n which contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these indicator > variables needs to be dependent on the other, so I am creating the first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { > if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else > {data$flag2[i]<-rbinom(1,1,0.5)} > } > > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this draw for > every element of flag2 that matches the 'if' statement on flag1. > > Is there a way to assign flag2 to a new bernoulli draw for each subject in > the data frame with flag1=1? > > I hope my question is clear, and thank you in advance for your help. > > Thanks, > Natalie > PhD student, Reading University > > P.S. I am using R 2.12.1 on Windows 7. > > > -- > View this message in context: http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if-else-statement-in-simulations-tp4633725.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Perfect, I now have a choice of workable solutions. Thanks a lot for your help, much appreciated. Natalie -- View this message in context: http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if-else-statement-in-simulations-tp4633725p4633862.html Sent from the R help mailing list archive at Nabble.com.