Dear Group, I am trying to simulate a dataset with 200 individuals with random assignment of Sex (1,0) and Weight from lognormal distribution specific to Sex. I am intrigued by the behavior of rlnorm function to impute a value of Weight from the specified distribution. Here is the code: ID<-1:200 Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6)) fulldata<-data.frame(ID,Sex) fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), sdlog = sqrt(0.0329)), rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442))) mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73 mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85 I see that the number of simulated values has an effect on the mean calculated after imputation. That is, the code rlnorm(100, meanlog log(73), sdlog = sqrt(0.0442)) gives much better match compared to rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in the code above. My understanding is that ifelse will be imputing only one value where the condition is met as specified. I appreciate your insights on the behavior for better performance of increasing sample number. I appreciate your comments. Regards, Ayyappa [[alternative HTML version deleted]]
Dear Ayyappa, ifelse works on a vector. See the example below. ifelse( sample(c(TRUE, FALSE), size = length(letters), replace = TRUE), letters, LETTERS ) However, note that it will recycle short vectors when they are not of equal length. ifelse( sample(c(TRUE, FALSE), size = 2 * length(letters), replace = TRUE), letters, LETTERS ) In your code the length of the condition vector is 200, the length of the two other vectors is 100. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach at gmail.com>:> Dear Group, > > I am trying to simulate a dataset with 200 individuals with random > assignment of Sex (1,0) and Weight from lognormal distribution specific to > Sex. I am intrigued by the behavior of rlnorm function to impute a value > of Weight from the specified distribution. Here is the code: > ID<-1:200 > Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6)) > fulldata<-data.frame(ID,Sex) > fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), sdlog > = sqrt(0.0329)), > rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442))) > > mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73 > mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85 > > I see that the number of simulated values has an effect on the mean > calculated after imputation. That is, the code rlnorm(100, meanlog > log(73), sdlog = sqrt(0.0442)) gives much better match compared to > rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in > the code above. > > My understanding is that ifelse will be imputing only one value where the > condition is met as specified. I appreciate your insights on the behavior > for better performance of increasing sample number. I appreciate your > comments. > > Regards, > Ayyappa > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Please keep r-help in cc. Yes. Have a look at this example ifelse( sample(c(TRUE, FALSE), size = 0.5 * length(letters), replace = TRUE), letters, LETTERS ) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-14 17:31 GMT+02:00 Ayyappa Chaturvedula <ayyappach at gmail.com>:> Thank you very much for your kind support. The length of my condition > vector is ~80 because I want only Sex==1 and else will be the other. I > understand now how ifelse works. If the vector of the simulated vector is > longer than the condition vector, then it takes the first few elements to > match the length of condition vector and discards the rest? > > Regards, > Ayyappa > > On Tue, Jun 14, 2016 at 10:15 AM, Thierry Onkelinx < > thierry.onkelinx at inbo.be> wrote: > >> Dear Ayyappa, >> >> ifelse works on a vector. See the example below. >> >> ifelse( >> sample(c(TRUE, FALSE), size = length(letters), replace = TRUE), >> letters, >> LETTERS >> ) >> >> However, note that it will recycle short vectors when they are not of >> equal length. >> >> ifelse( >> sample(c(TRUE, FALSE), size = 2 * length(letters), replace = TRUE), >> letters, >> LETTERS >> ) >> >> In your code the length of the condition vector is 200, the length of the >> two other vectors is 100. >> >> Best regards, >> >> ir. Thierry Onkelinx >> Instituut voor natuur- en bosonderzoek / Research Institute for Nature >> and Forest >> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance >> Kliniekstraat 25 >> 1070 Anderlecht >> Belgium >> >> To call in the statistician after the experiment is done may be no more >> than asking him to perform a post-mortem examination: he may be able to say >> what the experiment died of. ~ Sir Ronald Aylmer Fisher >> The plural of anecdote is not data. ~ Roger Brinner >> The combination of some data and an aching desire for an answer does not >> ensure that a reasonable answer can be extracted from a given body of data. >> ~ John Tukey >> >> 2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach at gmail.com>: >> >>> Dear Group, >>> >>> I am trying to simulate a dataset with 200 individuals with random >>> assignment of Sex (1,0) and Weight from lognormal distribution specific >>> to >>> Sex. I am intrigued by the behavior of rlnorm function to impute a value >>> of Weight from the specified distribution. Here is the code: >>> ID<-1:200 >>> Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6)) >>> fulldata<-data.frame(ID,Sex) >>> fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), >>> sdlog >>> = sqrt(0.0329)), >>> rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442))) >>> >>> mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73 >>> mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85 >>> >>> I see that the number of simulated values has an effect on the mean >>> calculated after imputation. That is, the code rlnorm(100, meanlog >>> log(73), sdlog = sqrt(0.0442)) gives much better match compared to >>> rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in >>> the code above. >>> >>> My understanding is that ifelse will be imputing only one value where the >>> condition is met as specified. I appreciate your insights on the >>> behavior >>> for better performance of increasing sample number. I appreciate your >>> comments. >>> >>> Regards, >>> Ayyappa >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >[[alternative HTML version deleted]]
I am sorry, I missed that. I think I made it more appropriate and not using unnecessary simulated values. Thank you for your help. fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(length(fulldata$Sex[fulldata$Sex==1]), meanlog = log(85.1), sdlog = sqrt(0.0329)), rlnorm(length(fulldata$Sex[fulldata$Sex==0]), meanlog log(73), sdlog = sqrt(0.0442))) On Tue, Jun 14, 2016 at 10:42 AM, Thierry Onkelinx <thierry.onkelinx at inbo.be> wrote:> Please keep r-help in cc. > > Yes. Have a look at this example > > ifelse( > sample(c(TRUE, FALSE), size = 0.5 * length(letters), replace = TRUE), > letters, > LETTERS > ) > > > ir. Thierry Onkelinx > Instituut voor natuur- en bosonderzoek / Research Institute for Nature and > Forest > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance > Kliniekstraat 25 > 1070 Anderlecht > Belgium > > To call in the statistician after the experiment is done may be no more > than asking him to perform a post-mortem examination: he may be able to say > what the experiment died of. ~ Sir Ronald Aylmer Fisher > The plural of anecdote is not data. ~ Roger Brinner > The combination of some data and an aching desire for an answer does not > ensure that a reasonable answer can be extracted from a given body of data. > ~ John Tukey > > 2016-06-14 17:31 GMT+02:00 Ayyappa Chaturvedula <ayyappach at gmail.com>: > >> Thank you very much for your kind support. The length of my condition >> vector is ~80 because I want only Sex==1 and else will be the other. I >> understand now how ifelse works. If the vector of the simulated vector is >> longer than the condition vector, then it takes the first few elements to >> match the length of condition vector and discards the rest? >> >> Regards, >> Ayyappa >> >> On Tue, Jun 14, 2016 at 10:15 AM, Thierry Onkelinx < >> thierry.onkelinx at inbo.be> wrote: >> >>> Dear Ayyappa, >>> >>> ifelse works on a vector. See the example below. >>> >>> ifelse( >>> sample(c(TRUE, FALSE), size = length(letters), replace = TRUE), >>> letters, >>> LETTERS >>> ) >>> >>> However, note that it will recycle short vectors when they are not of >>> equal length. >>> >>> ifelse( >>> sample(c(TRUE, FALSE), size = 2 * length(letters), replace = TRUE), >>> letters, >>> LETTERS >>> ) >>> >>> In your code the length of the condition vector is 200, the length of >>> the two other vectors is 100. >>> >>> Best regards, >>> >>> ir. Thierry Onkelinx >>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature >>> and Forest >>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance >>> Kliniekstraat 25 >>> 1070 Anderlecht >>> Belgium >>> >>> To call in the statistician after the experiment is done may be no more >>> than asking him to perform a post-mortem examination: he may be able to say >>> what the experiment died of. ~ Sir Ronald Aylmer Fisher >>> The plural of anecdote is not data. ~ Roger Brinner >>> The combination of some data and an aching desire for an answer does not >>> ensure that a reasonable answer can be extracted from a given body of data. >>> ~ John Tukey >>> >>> 2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach at gmail.com>: >>> >>>> Dear Group, >>>> >>>> I am trying to simulate a dataset with 200 individuals with random >>>> assignment of Sex (1,0) and Weight from lognormal distribution specific >>>> to >>>> Sex. I am intrigued by the behavior of rlnorm function to impute a >>>> value >>>> of Weight from the specified distribution. Here is the code: >>>> ID<-1:200 >>>> Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6)) >>>> fulldata<-data.frame(ID,Sex) >>>> fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), >>>> sdlog >>>> = sqrt(0.0329)), >>>> rlnorm(100, meanlog = log(73), sdlog >>>> sqrt(0.0442))) >>>> >>>> mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73 >>>> mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85 >>>> >>>> I see that the number of simulated values has an effect on the mean >>>> calculated after imputation. That is, the code rlnorm(100, meanlog >>>> log(73), sdlog = sqrt(0.0442)) gives much better match compared to >>>> rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement >>>> in >>>> the code above. >>>> >>>> My understanding is that ifelse will be imputing only one value where >>>> the >>>> condition is met as specified. I appreciate your insights on the >>>> behavior >>>> for better performance of increasing sample number. I appreciate your >>>> comments. >>>> >>>> Regards, >>>> Ayyappa >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >> >[[alternative HTML version deleted]]