Dear R-help,
I am trying to write a function to simulate datasets of size n which contain
two time-to-event outcome variables with associated
'Event'/'Censored'
indicator variables (flag1 and flag2 respectively). One of these indicator
variables needs to be dependent on the other, so I am creating the first and
trying to use this to create the second using an if/else statement.
My data structure needs to follow this algorithm (for each row of the data):
If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise
Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero
otherwise
I can set up this example quite simply using if else statements, but this is
incredibly inefficient when running thousands of datasets:
data<-as.data.frame(rbinom(10,1,0.5))
colnames(data)<-'flag1'
for (i in 1:n) {
if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else
{data$flag2[i]<-rbinom(1,1,0.5)}
}
I think to speed up the simulations I would be better changing to
vectorisation and using something like:
ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5))
but the rbinom statements here generate one value and repeat this draw for
every element of flag2 that matches the 'if' statement on flag1.
Is there a way to assign flag2 to a new bernoulli draw for each subject in
the data frame with flag1=1?
I hope my question is clear, and thank you in advance for your help.
Thanks,
Natalie
PhD student, Reading University
P.S. I am using R 2.12.1 on Windows 7.
--
View this message in context:
http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if-else-statement-in-simulations-tp4633725.html
Sent from the R help mailing list archive at Nabble.com.
William Dunlap
2012-Jun-18 21:24 UTC
[R] Trying to speed up an if/else statement in simulations
inline below Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of nqf > Sent: Monday, June 18, 2012 10:30 AM > To: r-help at r-project.org > Subject: [R] Trying to speed up an if/else statement in simulations > > Dear R-help, > > I am trying to write a function to simulate datasets of size n which contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these indicator > variables needs to be dependent on the other, so I am creating the first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { > if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else > {data$flag2[i]<-rbinom(1,1,0.5)} > }Do you mean n <- 10 data <- data.frame(flag1 = rbinom(n, 1, 0.5)) for(i in seq_len(n)) { if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else {data$flag2[i]<-rbinom(1,1,0.5)} } ?> > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this draw for > every element of flag2 that matches the 'if' statement on flag1.Assuming that n is nrow(data), use data$flag2 <- ifelse(data$flag1==1, rbinom(n, 1, 0.95), rbinom(n, 1, 0.5)) so you get n independent draws from each distribution instead of 1. I prefer to factor out the common arguments as in data$flag2 <- rbinom(n, 1, ifelse(data$flag1==1, 0.95, 0.5))> > Is there a way to assign flag2 to a new bernoulli draw for each subject in > the data frame with flag1=1? > > I hope my question is clear, and thank you in advance for your help. > > Thanks, > Natalie > PhD student, Reading University > > P.S. I am using R 2.12.1 on Windows 7. > > > -- > View this message in context: http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if- > else-statement-in-simulations-tp4633725.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2012-Jun-18 21:32 UTC
[R] Trying to speed up an if/else statement in simulations
On Jun 18, 2012, at 1:29 PM, nqf wrote:> Dear R-help, > > I am trying to write a function to simulate datasets of size n which > contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these > indicator > variables needs to be dependent on the other, so I am creating the > first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of > the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero > otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but > this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { > if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else > {data$flag2[i]<-rbinom(1,1,0.5)} > } > > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this > draw for > every element of flag2 that matches the 'if' statement on flag1. > > Is there a way to assign flag2 to a new bernoulli draw for each > subject in > the data frame with flag1=1?If the parameters for the the Bernoulli draws stay the same, as they appear to do then all you need to do is read the help page for `rbinom` and use the appropriate call to create vectors that are as long as data$flag. -- David Winsemius, MD West Hartford, CT
Peter Alspach
2012-Jun-18 21:39 UTC
[R] Trying to speed up an if/else statement in simulations
Tena koe Natalie
Does this do what you want?
set.seed(123)
natalie <- matrix(NA, nrow=10^5, ncol=2, dimnames=list(NULL,
paste0('flag', 1:2)))
natalie[,'flag1'] <- rbinom(nrow(natalie), 1, 0.5)
natalie[natalie[,'flag1']==1, 'flag2'] <-
rbinom(sum(natalie[,'flag1']), 1, 0.95)
natalie[natalie[,'flag1']==0, 'flag2'] <-
rbinom(nrow(natalie)-sum(natalie[,'flag1']), 1, 0.5)
tapply(natalie[,'flag2'], natalie[,'flag1'], table)
HTH ....
Peter Alspach
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of nqf
Sent: Tuesday, 19 June 2012 5:30 a.m.
To: r-help at r-project.org
Subject: [R] Trying to speed up an if/else statement in simulations
Dear R-help,
I am trying to write a function to simulate datasets of size n which contain
two time-to-event outcome variables with associated
'Event'/'Censored'
indicator variables (flag1 and flag2 respectively). One of these indicator
variables needs to be dependent on the other, so I am creating the first and
trying to use this to create the second using an if/else statement.
My data structure needs to follow this algorithm (for each row of the data):
If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise
Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero
otherwise
I can set up this example quite simply using if else statements, but this is
incredibly inefficient when running thousands of datasets:
data<-as.data.frame(rbinom(10,1,0.5))
colnames(data)<-'flag1'
for (i in 1:n) {
if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else
{data$flag2[i]<-rbinom(1,1,0.5)}
}
I think to speed up the simulations I would be better changing to
vectorisation and using something like:
ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5))
but the rbinom statements here generate one value and repeat this draw for
every element of flag2 that matches the 'if' statement on flag1.
Is there a way to assign flag2 to a new bernoulli draw for each subject in
the data frame with flag1=1?
I hope my question is clear, and thank you in advance for your help.
Thanks,
Natalie
PhD student, Reading University
P.S. I am using R 2.12.1 on Windows 7.
--
View this message in context:
http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if-else-statement-in-simulations-tp4633725.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be subject to legal
privilege.
If you are not the intended recipient you must not use, disseminate, distribute
or
reproduce all or any part of this e-mail or attachments. If you have received
this
e-mail in error, please notify the sender and delete all material pertaining to
this
e-mail. Any opinion or views expressed in this e-mail are those of the
individual
sender and may not represent those of The New Zealand Institute for Plant and
Food Research Limited.
R. Michael Weylandt
2012-Jun-18 21:46 UTC
[R] Trying to speed up an if/else statement in simulations
I might try something like: data[data$flag1 == 1, "flag2"] <- runif(sum(data$flag1 == 1)) < 0.95 and similarly for the other case. Hope this helps, Michael On Mon, Jun 18, 2012 at 12:29 PM, nqf <n.j.franklin at pgr.reading.ac.uk> wrote:> Dear R-help, > > I am trying to write a function to simulate datasets of size n which contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these indicator > variables needs to be dependent on the other, so I am creating the first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { > ?if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else > {data$flag2[i]<-rbinom(1,1,0.5)} > ?} > > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this draw for > every element of flag2 that matches the 'if' statement on flag1. > > Is there a way to assign flag2 to a new bernoulli draw for each subject in > the data frame with flag1=1? > > I hope my question is clear, and thank you in advance for your help. > > Thanks, > Natalie > PhD student, Reading University > > P.S. I am using R 2.12.1 on Windows 7. > > > -- > View this message in context: http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if-else-statement-in-simulations-tp4633725.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2012-Jun-19 00:08 UTC
[R] Trying to speed up an if/else statement in simulations
Hello, Try creating an index vector. In this case, I've called it 'one'. Then, 'n.one' is the number of ones in flag1. (Since data is an R function, I've renamed your example dat). n <- 10 # number of observations # First make up some data set.seed(123) dat <- data.frame(flag1=rbinom(n, 1, 0.5)) dat$flag2 <- integer(n) # creates flag2, all zeros. one <- dat$flag1 == 1 # logical index vector n.one <- sum(ix) # how many dat$flag2[one] <- rbinom(n.one, 1, 0.95) # whenever TRUE, set dat$flag2[!one] <- rbinom(n - n.one, 1, 0.50) # negation of 'one' dat Simple, no? And always faster. Hope this helps, Rui Barradas Em 18-06-2012 18:29, nqf escreveu:> Dear R-help, > > I am trying to write a function to simulate datasets of size n which contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these indicator > variables needs to be dependent on the other, so I am creating the first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { > if (data$flag1[i]==1) {data$flag2[i]<-rbinom(1,1,0.95)} else > {data$flag2[i]<-rbinom(1,1,0.5)} > } > > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this draw for > every element of flag2 that matches the 'if' statement on flag1. > > Is there a way to assign flag2 to a new bernoulli draw for each subject in > the data frame with flag1=1? > > I hope my question is clear, and thank you in advance for your help. > > Thanks, > Natalie > PhD student, Reading University > > P.S. I am using R 2.12.1 on Windows 7. > > > -- > View this message in context: http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if-else-statement-in-simulations-tp4633725.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Perfect, I now have a choice of workable solutions. Thanks a lot for your help, much appreciated. Natalie -- View this message in context: http://r.789695.n4.nabble.com/Trying-to-speed-up-an-if-else-statement-in-simulations-tp4633725p4633862.html Sent from the R help mailing list archive at Nabble.com.