Dear Group,
I am trying to simulate a dataset with 200 individuals with random
assignment of Sex (1,0) and Weight from lognormal distribution specific to
Sex. I am intrigued by the behavior of rlnorm function to impute a value
of Weight from the specified distribution. Here is the code:
ID<-1:200
Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6))
fulldata<-data.frame(ID,Sex)
fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), sdlog
= sqrt(0.0329)),
rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442)))
mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73
mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85
I see that the number of simulated values has an effect on the mean
calculated after imputation. That is, the code rlnorm(100, meanlog log(73),
sdlog = sqrt(0.0442)) gives much better match compared to
rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in
the code above.
My understanding is that ifelse will be imputing only one value where the
condition is met as specified. I appreciate your insights on the behavior
for better performance of increasing sample number. I appreciate your
comments.
Regards,
Ayyappa
[[alternative HTML version deleted]]
Dear Ayyappa, ifelse works on a vector. See the example below. ifelse( sample(c(TRUE, FALSE), size = length(letters), replace = TRUE), letters, LETTERS ) However, note that it will recycle short vectors when they are not of equal length. ifelse( sample(c(TRUE, FALSE), size = 2 * length(letters), replace = TRUE), letters, LETTERS ) In your code the length of the condition vector is 200, the length of the two other vectors is 100. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach at gmail.com>:> Dear Group, > > I am trying to simulate a dataset with 200 individuals with random > assignment of Sex (1,0) and Weight from lognormal distribution specific to > Sex. I am intrigued by the behavior of rlnorm function to impute a value > of Weight from the specified distribution. Here is the code: > ID<-1:200 > Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6)) > fulldata<-data.frame(ID,Sex) > fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), sdlog > = sqrt(0.0329)), > rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442))) > > mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73 > mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85 > > I see that the number of simulated values has an effect on the mean > calculated after imputation. That is, the code rlnorm(100, meanlog > log(73), sdlog = sqrt(0.0442)) gives much better match compared to > rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in > the code above. > > My understanding is that ifelse will be imputing only one value where the > condition is met as specified. I appreciate your insights on the behavior > for better performance of increasing sample number. I appreciate your > comments. > > Regards, > Ayyappa > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Please keep r-help in cc. Yes. Have a look at this example ifelse( sample(c(TRUE, FALSE), size = 0.5 * length(letters), replace = TRUE), letters, LETTERS ) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-14 17:31 GMT+02:00 Ayyappa Chaturvedula <ayyappach at gmail.com>:> Thank you very much for your kind support. The length of my condition > vector is ~80 because I want only Sex==1 and else will be the other. I > understand now how ifelse works. If the vector of the simulated vector is > longer than the condition vector, then it takes the first few elements to > match the length of condition vector and discards the rest? > > Regards, > Ayyappa > > On Tue, Jun 14, 2016 at 10:15 AM, Thierry Onkelinx < > thierry.onkelinx at inbo.be> wrote: > >> Dear Ayyappa, >> >> ifelse works on a vector. See the example below. >> >> ifelse( >> sample(c(TRUE, FALSE), size = length(letters), replace = TRUE), >> letters, >> LETTERS >> ) >> >> However, note that it will recycle short vectors when they are not of >> equal length. >> >> ifelse( >> sample(c(TRUE, FALSE), size = 2 * length(letters), replace = TRUE), >> letters, >> LETTERS >> ) >> >> In your code the length of the condition vector is 200, the length of the >> two other vectors is 100. >> >> Best regards, >> >> ir. Thierry Onkelinx >> Instituut voor natuur- en bosonderzoek / Research Institute for Nature >> and Forest >> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance >> Kliniekstraat 25 >> 1070 Anderlecht >> Belgium >> >> To call in the statistician after the experiment is done may be no more >> than asking him to perform a post-mortem examination: he may be able to say >> what the experiment died of. ~ Sir Ronald Aylmer Fisher >> The plural of anecdote is not data. ~ Roger Brinner >> The combination of some data and an aching desire for an answer does not >> ensure that a reasonable answer can be extracted from a given body of data. >> ~ John Tukey >> >> 2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach at gmail.com>: >> >>> Dear Group, >>> >>> I am trying to simulate a dataset with 200 individuals with random >>> assignment of Sex (1,0) and Weight from lognormal distribution specific >>> to >>> Sex. I am intrigued by the behavior of rlnorm function to impute a value >>> of Weight from the specified distribution. Here is the code: >>> ID<-1:200 >>> Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6)) >>> fulldata<-data.frame(ID,Sex) >>> fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), >>> sdlog >>> = sqrt(0.0329)), >>> rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442))) >>> >>> mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73 >>> mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85 >>> >>> I see that the number of simulated values has an effect on the mean >>> calculated after imputation. That is, the code rlnorm(100, meanlog >>> log(73), sdlog = sqrt(0.0442)) gives much better match compared to >>> rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in >>> the code above. >>> >>> My understanding is that ifelse will be imputing only one value where the >>> condition is met as specified. I appreciate your insights on the >>> behavior >>> for better performance of increasing sample number. I appreciate your >>> comments. >>> >>> Regards, >>> Ayyappa >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >[[alternative HTML version deleted]]
I am sorry, I missed that. I think I made it more appropriate and not
using unnecessary simulated values. Thank you for your help.
fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(length(fulldata$Sex[fulldata$Sex==1]),
meanlog = log(85.1), sdlog = sqrt(0.0329)),
rlnorm(length(fulldata$Sex[fulldata$Sex==0]), meanlog
log(73), sdlog = sqrt(0.0442)))
On Tue, Jun 14, 2016 at 10:42 AM, Thierry Onkelinx <thierry.onkelinx at
inbo.be> wrote:
> Please keep r-help in cc.
>
> Yes. Have a look at this example
>
> ifelse(
> sample(c(TRUE, FALSE), size = 0.5 * length(letters), replace = TRUE),
> letters,
> LETTERS
> )
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2016-06-14 17:31 GMT+02:00 Ayyappa Chaturvedula <ayyappach at
gmail.com>:
>
>> Thank you very much for your kind support. The length of my condition
>> vector is ~80 because I want only Sex==1 and else will be the other. I
>> understand now how ifelse works. If the vector of the simulated vector
is
>> longer than the condition vector, then it takes the first few elements
to
>> match the length of condition vector and discards the rest?
>>
>> Regards,
>> Ayyappa
>>
>> On Tue, Jun 14, 2016 at 10:15 AM, Thierry Onkelinx <
>> thierry.onkelinx at inbo.be> wrote:
>>
>>> Dear Ayyappa,
>>>
>>> ifelse works on a vector. See the example below.
>>>
>>> ifelse(
>>> sample(c(TRUE, FALSE), size = length(letters), replace = TRUE),
>>> letters,
>>> LETTERS
>>> )
>>>
>>> However, note that it will recycle short vectors when they are not
of
>>> equal length.
>>>
>>> ifelse(
>>> sample(c(TRUE, FALSE), size = 2 * length(letters), replace =
TRUE),
>>> letters,
>>> LETTERS
>>> )
>>>
>>> In your code the length of the condition vector is 200, the length
of
>>> the two other vectors is 100.
>>>
>>> Best regards,
>>>
>>> ir. Thierry Onkelinx
>>> Instituut voor natuur- en bosonderzoek / Research Institute for
Nature
>>> and Forest
>>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality
Assurance
>>> Kliniekstraat 25
>>> 1070 Anderlecht
>>> Belgium
>>>
>>> To call in the statistician after the experiment is done may be no
more
>>> than asking him to perform a post-mortem examination: he may be
able to say
>>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> The combination of some data and an aching desire for an answer
does not
>>> ensure that a reasonable answer can be extracted from a given body
of data.
>>> ~ John Tukey
>>>
>>> 2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach at
gmail.com>:
>>>
>>>> Dear Group,
>>>>
>>>> I am trying to simulate a dataset with 200 individuals with
random
>>>> assignment of Sex (1,0) and Weight from lognormal distribution
specific
>>>> to
>>>> Sex. I am intrigued by the behavior of rlnorm function to
impute a
>>>> value
>>>> of Weight from the specified distribution. Here is the code:
>>>> ID<-1:200
>>>> Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6))
>>>> fulldata<-data.frame(ID,Sex)
>>>> fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog =
log(85.1),
>>>> sdlog
>>>> = sqrt(0.0329)),
>>>> rlnorm(100, meanlog = log(73), sdlog
>>>> sqrt(0.0442)))
>>>>
>>>> mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close
to 73
>>>> mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close
to 85
>>>>
>>>> I see that the number of simulated values has an effect on the
mean
>>>> calculated after imputation. That is, the code rlnorm(100,
meanlog >>>> log(73), sdlog = sqrt(0.0442)) gives much better match
compared to
>>>> rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse
statement
>>>> in
>>>> the code above.
>>>>
>>>> My understanding is that ifelse will be imputing only one value
where
>>>> the
>>>> condition is met as specified. I appreciate your insights on
the
>>>> behavior
>>>> for better performance of increasing sample number. I
appreciate your
>>>> comments.
>>>>
>>>> Regards,
>>>> Ayyappa
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>
>>>
>>
>
[[alternative HTML version deleted]]