Tom Cohen <tom.cohen78@yahoo.se> skrev: Thanks Prof Brian for your suggestion. I should know that for right-skewed data, one should generate the samples from a lognormal. My problem is that x and y are two instruments that were thought to be measured the same thing but somehow show a wide confidence interval of the difference between the two intruments.This may be true that these two measure differently but can also due to the small number of observations, so the idea is if I increases the sample size then I may get better precision between the two instrument by generating samples based on the means and standard deviations from x and y. I am using 'urlnorm' which allows sampling from truncated distribution since I want the samples to take values from 0 to the max(x) respectively max(y). I am unsure how to specify the means and standard deviations in 'urlnorm'. Based on x- and y-values I have standard deviations sd_x=0.3372137, sd_y=0.5120841 and the means mean_x=0.3126667 mean_y=0.4223137 which are not on log scale as required in urlnorm. To covert sd_x, sd_y and mean_x, mean_y on a log-scale I did sd_logx=sqrt(log(1.3372137))=0.54, sd_logy=sqrt(log(1.5120841))=0.64, mean_logx=-(0.54^2)/2 and mean_logy=-(0.64^2)/2. Can anyone tell if these are correctly calculated? Are these the values to be specified in urlnorm? Do the lower respectively upper bound have to be on the log-scale as well or which scale? set.seed(7)> for(i in 1:len){ > s1[[i]]<-cbind.data.frame(x=urlnorm(n*i,meanlog=mean_logx,sdlog=sd_logx, lb=0, ub=max(x)), > y=urlnorm(n*i,meanlog=mean_logy,sdlog=sd_logy, lb=0, ub=max(y))) > }Thanks again for any suggetions. Prof Brian Ripley <ripley@stats.ox.ac.uk> skrev: On Thu, 27 Mar 2008, Tom Cohen wrote:> > Dear list,> I have a dataset containing values obtained from two different > instruments (x and y). I want to generate 5 samples from normal > distribution for each instrument based on their means and standard > deviations. The problem is values from both instruments are > non-negative, so if using rnorm I would get some negative values. Is > there any options to determine the lower bound of normal distribution to > be 0 or can I simulate the samples in different ways to avoid the > negative values?Well, that would not be a normal distribution. If you want a _truncated_ normal distribution it is very easy by inversion. E.g. trunc_rnorm <- function(n, mean = 0, sd = 1, lb = 0) { lb <- pnorm(lb, mean, sd) qnorm(runif(n, lb, 1), mean, sd) } but I suggest you may rather want samples from a lognormal.> > > > dat > id x y > 75 101 0.134 0.1911315 > 79 102 0.170 0.1610306 > 76 103 0.134 0.1911315 > 84 104 0.170 0.1610306 > 74 105 0.134 0.1911315 > 80 106 0.170 0.1610306 > 77 107 0.134 0.1911315 > 81 108 0.170 0.1610306 > 82 109 0.170 0.1610306 > 78 111 0.170 0.1610306 > 83 112 0.170 0.1610306 > 85 113 0.097 0.2777778 > 2 201 1.032 1.5510434 > 1 202 0.803 1.0631001 > 5 203 1.032 1.5510434 > > mu<-apply(dat[,-1],2,mean) > sigma<-apply(dat[,-1],2,sd) > len<-5 > n<-20 > s1<-vector("list",len) > set.seed(7) > for(i in 1:len){ > s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]), > y=rnorm(n*i,mean=mu[2],sd=sigma[2])) > } > > Thanks for any help, > Tom > > > --------------------------------- > S?? efter k??leken! > > [[alternative HTML version deleted]] > >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 --------------------------------- Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling. --------------------------------- Låna pengar utan säkerhet. [[alternative HTML version deleted]]
ONKELINX, Thierry
2008-Apr-01 12:49 UTC
[R] set the lower bound of normal distribution to 0 ?
Dear Tom, In my opinion you should first transform your data to the log-scale and then calculate the mean and st.dev. of the log-transformed data. Because mean(log(x)) is not equal to log(mean(x)). HTH, Thierry ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx op inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-help-bounces op r-project.org [mailto:r-help-bounces op r-project.org] Namens Tom Cohen Verzonden: dinsdag 1 april 2008 14:17 Aan: r-help op stat.math.ethz.ch Onderwerp: [R] set the lower bound of normal distribution to 0 ? Tom Cohen <tom.cohen78 op yahoo.se> skrev: Thanks Prof Brian for your suggestion. I should know that for right-skewed data, one should generate the samples from a lognormal. My problem is that x and y are two instruments that were thought to be measured the same thing but somehow show a wide confidence interval of the difference between the two intruments.This may be true that these two measure differently but can also due to the small number of observations, so the idea is if I increases the sample size then I may get better precision between the two instrument by generating samples based on the means and standard deviations from x and y. I am using 'urlnorm' which allows sampling from truncated distribution since I want the samples to take values from 0 to the max(x) respectively max(y). I am unsure how to specify the means and standard deviations in 'urlnorm'. Based on x- and y-values I have standard deviations sd_x=0.3372137, sd_y=0.5120841 and the means mean_x=0.3126667 mean_y=0.4223137 which are not on log scale as required in urlnorm. To covert sd_x, sd_y and mean_x, mean_y on a log-scale I did sd_logx=sqrt(log(1.3372137))=0.54, sd_logy=sqrt(log(1.5120841))=0.64, mean_logx=-(0.54^2)/2 and mean_logy=-(0.64^2)/2. Can anyone tell if these are correctly calculated? Are these the values to be specified in urlnorm? Do the lower respectively upper bound have to be on the log-scale as well or which scale? set.seed(7)> for(i in 1:len){ > s1[[i]]<-cbind.data.frame(x=urlnorm(n*i,meanlog=mean_logx,sdlog=sd_logx, lb=0, ub=max(x)), > y=urlnorm(n*i,meanlog=mean_logy,sdlog=sd_logy, lb=0, ub=max(y))) > }Thanks again for any suggetions. Prof Brian Ripley <ripley op stats.ox.ac.uk> skrev: On Thu, 27 Mar 2008, Tom Cohen wrote:> > Dear list,> I have a dataset containing values obtained from two different > instruments (x and y). I want to generate 5 samples from normal > distribution for each instrument based on their means and standard > deviations. The problem is values from both instruments are > non-negative, so if using rnorm I would get some negative values. Is > there any options to determine the lower bound of normal distribution to > be 0 or can I simulate the samples in different ways to avoid the > negative values?Well, that would not be a normal distribution. If you want a _truncated_ normal distribution it is very easy by inversion. E.g. trunc_rnorm <- function(n, mean = 0, sd = 1, lb = 0) { lb <- pnorm(lb, mean, sd) qnorm(runif(n, lb, 1), mean, sd) } but I suggest you may rather want samples from a lognormal.> > > > dat > id x y > 75 101 0.134 0.1911315 > 79 102 0.170 0.1610306 > 76 103 0.134 0.1911315 > 84 104 0.170 0.1610306 > 74 105 0.134 0.1911315 > 80 106 0.170 0.1610306 > 77 107 0.134 0.1911315 > 81 108 0.170 0.1610306 > 82 109 0.170 0.1610306 > 78 111 0.170 0.1610306 > 83 112 0.170 0.1610306 > 85 113 0.097 0.2777778 > 2 201 1.032 1.5510434 > 1 202 0.803 1.0631001 > 5 203 1.032 1.5510434 > > mu<-apply(dat[,-1],2,mean) > sigma<-apply(dat[,-1],2,sd) > len<-5 > n<-20 > s1<-vector("list",len) > set.seed(7) > for(i in 1:len){ > s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]), > y=rnorm(n*i,mean=mu[2],sd=sigma[2])) > } > > Thanks for any help, > Tom > > > --------------------------------- > S?? efter k??leken! > > [[alternative HTML version deleted]] > >-- Brian D. Ripley, ripley op stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 --------------------------------- G?r det l?ngsamt? Skaffa dig en snabbare bredbandsuppkoppling. --------------------------------- L?na pengar utan s?kerhet. [[alternative HTML version deleted]]