Tom Cohen
2008-Mar-27 12:27 UTC
[R] options in 'rnorm' to set the lower bound of normal distribution to 0 ?
Dear list, I have a dataset containing values obtained from two different instruments (x and y). I want to generate 5 samples from normal distribution for each instrument based on their means and standard deviations. The problem is values from both instruments are non-negative, so if using rnorm I would get some negative values. Is there any options to determine the lower bound of normal distribution to be 0 or can I simulate the samples in different ways to avoid the negative values? > dat id x y 75 101 0.134 0.1911315 79 102 0.170 0.1610306 76 103 0.134 0.1911315 84 104 0.170 0.1610306 74 105 0.134 0.1911315 80 106 0.170 0.1610306 77 107 0.134 0.1911315 81 108 0.170 0.1610306 82 109 0.170 0.1610306 78 111 0.170 0.1610306 83 112 0.170 0.1610306 85 113 0.097 0.2777778 2 201 1.032 1.5510434 1 202 0.803 1.0631001 5 203 1.032 1.5510434 mu<-apply(dat[,-1],2,mean) sigma<-apply(dat[,-1],2,sd) len<-5 n<-20 s1<-vector("list",len) set.seed(7) for(i in 1:len){ s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]), y=rnorm(n*i,mean=mu[2],sd=sigma[2])) } Thanks for any help, Tom --------------------------------- Sök efter kärleken! [[alternative HTML version deleted]]
Prof Brian Ripley
2008-Mar-27 13:02 UTC
[R] options in 'rnorm' to set the lower bound of normal distribution to 0 ?
On Thu, 27 Mar 2008, Tom Cohen wrote:> > Dear list,> I have a dataset containing values obtained from two different > instruments (x and y). I want to generate 5 samples from normal > distribution for each instrument based on their means and standard > deviations. The problem is values from both instruments are > non-negative, so if using rnorm I would get some negative values. Is > there any options to determine the lower bound of normal distribution to > be 0 or can I simulate the samples in different ways to avoid the > negative values?Well, that would not be a normal distribution. If you want a _truncated_ normal distribution it is very easy by inversion. E.g. trunc_rnorm <- function(n, mean = 0, sd = 1, lb = 0) { lb <- pnorm(lb, mean, sd) qnorm(runif(n, lb, 1), mean, sd) } but I suggest you may rather want samples from a lognormal.> > > > dat > id x y > 75 101 0.134 0.1911315 > 79 102 0.170 0.1610306 > 76 103 0.134 0.1911315 > 84 104 0.170 0.1610306 > 74 105 0.134 0.1911315 > 80 106 0.170 0.1610306 > 77 107 0.134 0.1911315 > 81 108 0.170 0.1610306 > 82 109 0.170 0.1610306 > 78 111 0.170 0.1610306 > 83 112 0.170 0.1610306 > 85 113 0.097 0.2777778 > 2 201 1.032 1.5510434 > 1 202 0.803 1.0631001 > 5 203 1.032 1.5510434 > > mu<-apply(dat[,-1],2,mean) > sigma<-apply(dat[,-1],2,sd) > len<-5 > n<-20 > s1<-vector("list",len) > set.seed(7) > for(i in 1:len){ > s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]), > y=rnorm(n*i,mean=mu[2],sd=sigma[2])) > } > > Thanks for any help, > Tom > > > --------------------------------- > S?? efter k??leken! > > [[alternative HTML version deleted]] > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
(Ted Harding)
2008-Mar-27 14:00 UTC
[R] options in 'rnorm' to set the lower bound of normal dist
Brian Ripley's suggestions of truncated normal and log-normal are of course resources for ensuring that you get positive simulated results. However, I think youre real problem (having looked at the numbers you quote) is that you should not be thinking of using a normal distribution, or anything similar, at all. Your variables dat$x and dat$y have: mean(dat$x) ## [1] 0.3126667 sd(dat$x) ## [1] 0.3372137 mean(dat$y) ## [1] 0.4223137 sd(dat$y) ## [1] 0.5120841 so in both cases the SD is about the same as the mean, and getting negative values from simulated normal distributions with these means and SDs is inevitable. But now look at the two series of 15 numbers in dat$x and dat$y: The first 12 are of the order of 0.15 and 0.20 respectively, while the final 3 in each case are of the order of 1.0 and 1.5 respectively. And this is where your large SD is coming from. Neither series is from a simple normal distribution. mean(dat$x[1:12]) ## [1] 0.1519167 sd(dat$x[1:12]) ## [1] 0.02447432 and it is impossible for the last 3 ( 1.032 0.803 1.032) to have come from a normal distribution giving the first 12. mean(dat$y[1:12]) ## [1] 0.1807932 sd(dat$y[1:12]) ## [1] 0.03380083 with a similar conclusion; and likewise for the last three 1.551043 1.063100 1.551043 Note that, for the first 12 in each case, the SD less that 1/5 of the mean: mean(dat$x[1:12])/sd(dat$x[1:12]) ## [1] 6.207186 mean(dat$y[1:12])/sd(dat$y[1:12]) ## [1] 5.348779 so, where the first 12 of x and y are concerned, if you sampled from normal distributions with the same means and SDs you would get a negative number with probability less than pnorm(-5) ## [1] 2.866516e-07 What you in fact have here is that the numbers are in two groups, each with a small SD relative to its mean: dat$x dat$y Mean SD Mean SD ------------------------+------------------- 1:12 0.152 0.024 | 0.181 0.034 | 13:15 0.956 0.132 | 1.39 0.282 Note that for dat$x Mean/SD approx = 6 for each sub-series, and for data$y Mean/SD approx = 5 for each subseries, so you could be looking at results which display a nearly constant coefficient of variation. Now, this is indeed a property of the log-normal distribution (as well as of others), so that could indeed be worth considering. However, you still have to account for the apparent split noted above into distinct groups. So you are really facing a modelling question: why did the numbers come out as they did, and what is a good way to represent that mechanism as a distribution? With best wishes, Ted. On 27-Mar-08 12:27:55, Tom Cohen wrote:> > Dear list, > I have a dataset containing values obtained from two different > instruments (x and y). > I want to generate 5 samples from normal distribution for each > instrument based on > their means and standard deviations. The problem is values from both > instruments are > non-negative, so if using rnorm I would get some negative values. Is > there any options > to determine the lower bound of normal distribution to be 0 or can I > simulate the > samples in different ways to avoid the negative values? > > > > dat > id x y > 75 101 0.134 0.1911315 > 79 102 0.170 0.1610306 > 76 103 0.134 0.1911315 > 84 104 0.170 0.1610306 > 74 105 0.134 0.1911315 > 80 106 0.170 0.1610306 > 77 107 0.134 0.1911315 > 81 108 0.170 0.1610306 > 82 109 0.170 0.1610306 > 78 111 0.170 0.1610306 > 83 112 0.170 0.1610306 > 85 113 0.097 0.2777778 > 2 201 1.032 1.5510434 > 1 202 0.803 1.0631001 > 5 203 1.032 1.5510434 > > mu<-apply(dat[,-1],2,mean) > sigma<-apply(dat[,-1],2,sd) > len<-5 > n<-20 > s1<-vector("list",len) > set.seed(7) > for(i in 1:len){ > s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]), > y=rnorm(n*i,mean=mu[2],sd=sigma[2])) > } > > Thanks for any help, > Tom > > > --------------------------------- > S?k efter k?rleken! > > [[alternative HTML version deleted]] >-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 27-Mar-08 Time: 14:00:44 ------------------------------ XFMail ------------------------------