Tom Cohen
2008-Mar-27 12:27 UTC
[R] options in 'rnorm' to set the lower bound of normal distribution to 0 ?
Dear list,
I have a dataset containing values obtained from two different instruments (x
and y).
I want to generate 5 samples from normal distribution for each instrument based
on
their means and standard deviations. The problem is values from both instruments
are
non-negative, so if using rnorm I would get some negative values. Is there any
options
to determine the lower bound of normal distribution to be 0 or can I simulate
the
samples in different ways to avoid the negative values?
> dat
id x y
75 101 0.134 0.1911315
79 102 0.170 0.1610306
76 103 0.134 0.1911315
84 104 0.170 0.1610306
74 105 0.134 0.1911315
80 106 0.170 0.1610306
77 107 0.134 0.1911315
81 108 0.170 0.1610306
82 109 0.170 0.1610306
78 111 0.170 0.1610306
83 112 0.170 0.1610306
85 113 0.097 0.2777778
2 201 1.032 1.5510434
1 202 0.803 1.0631001
5 203 1.032 1.5510434
mu<-apply(dat[,-1],2,mean)
sigma<-apply(dat[,-1],2,sd)
len<-5
n<-20
s1<-vector("list",len)
set.seed(7)
for(i in 1:len){
s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]),
y=rnorm(n*i,mean=mu[2],sd=sigma[2]))
}
Thanks for any help,
Tom
---------------------------------
Sök efter kärleken!
[[alternative HTML version deleted]]
Prof Brian Ripley
2008-Mar-27 13:02 UTC
[R] options in 'rnorm' to set the lower bound of normal distribution to 0 ?
On Thu, 27 Mar 2008, Tom Cohen wrote:> > Dear list,> I have a dataset containing values obtained from two different > instruments (x and y). I want to generate 5 samples from normal > distribution for each instrument based on their means and standard > deviations. The problem is values from both instruments are > non-negative, so if using rnorm I would get some negative values. Is > there any options to determine the lower bound of normal distribution to > be 0 or can I simulate the samples in different ways to avoid the > negative values?Well, that would not be a normal distribution. If you want a _truncated_ normal distribution it is very easy by inversion. E.g. trunc_rnorm <- function(n, mean = 0, sd = 1, lb = 0) { lb <- pnorm(lb, mean, sd) qnorm(runif(n, lb, 1), mean, sd) } but I suggest you may rather want samples from a lognormal.> > > > dat > id x y > 75 101 0.134 0.1911315 > 79 102 0.170 0.1610306 > 76 103 0.134 0.1911315 > 84 104 0.170 0.1610306 > 74 105 0.134 0.1911315 > 80 106 0.170 0.1610306 > 77 107 0.134 0.1911315 > 81 108 0.170 0.1610306 > 82 109 0.170 0.1610306 > 78 111 0.170 0.1610306 > 83 112 0.170 0.1610306 > 85 113 0.097 0.2777778 > 2 201 1.032 1.5510434 > 1 202 0.803 1.0631001 > 5 203 1.032 1.5510434 > > mu<-apply(dat[,-1],2,mean) > sigma<-apply(dat[,-1],2,sd) > len<-5 > n<-20 > s1<-vector("list",len) > set.seed(7) > for(i in 1:len){ > s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]), > y=rnorm(n*i,mean=mu[2],sd=sigma[2])) > } > > Thanks for any help, > Tom > > > --------------------------------- > S?? efter k??leken! > > [[alternative HTML version deleted]] > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
(Ted Harding)
2008-Mar-27 14:00 UTC
[R] options in 'rnorm' to set the lower bound of normal dist
Brian Ripley's suggestions of truncated normal and
log-normal are of course resources for ensuring that
you get positive simulated results.
However, I think youre real problem (having looked at
the numbers you quote) is that you should not be thinking
of using a normal distribution, or anything similar, at all.
Your variables dat$x and dat$y have:
mean(dat$x)
## [1] 0.3126667
sd(dat$x)
## [1] 0.3372137
mean(dat$y)
## [1] 0.4223137
sd(dat$y)
## [1] 0.5120841
so in both cases the SD is about the same as the mean, and
getting negative values from simulated normal distributions
with these means and SDs is inevitable.
But now look at the two series of 15 numbers in dat$x and dat$y:
The first 12 are of the order of 0.15 and 0.20 respectively,
while the final 3 in each case are of the order of 1.0 and 1.5
respectively. And this is where your large SD is coming from.
Neither series is from a simple normal distribution.
mean(dat$x[1:12])
## [1] 0.1519167
sd(dat$x[1:12])
## [1] 0.02447432
and it is impossible for the last 3 ( 1.032 0.803 1.032)
to have come from a normal distribution giving the first 12.
mean(dat$y[1:12])
## [1] 0.1807932
sd(dat$y[1:12])
## [1] 0.03380083
with a similar conclusion; and likewise for the last three
1.551043 1.063100 1.551043
Note that, for the first 12 in each case, the SD less that
1/5 of the mean:
mean(dat$x[1:12])/sd(dat$x[1:12])
## [1] 6.207186
mean(dat$y[1:12])/sd(dat$y[1:12])
## [1] 5.348779
so, where the first 12 of x and y are concerned, if you sampled
from normal distributions with the same means and SDs you would
get a negative number with probability less than
pnorm(-5)
## [1] 2.866516e-07
What you in fact have here is that the numbers are in two groups,
each with a small SD relative to its mean:
dat$x dat$y
Mean SD Mean SD
------------------------+-------------------
1:12 0.152 0.024 | 0.181 0.034
|
13:15 0.956 0.132 | 1.39 0.282
Note that for dat$x Mean/SD approx = 6 for each sub-series,
and for data$y Mean/SD approx = 5 for each subseries, so
you could be looking at results which display a nearly
constant coefficient of variation. Now, this is indeed a
property of the log-normal distribution (as well as of
others), so that could indeed be worth considering.
However, you still have to account for the apparent split
noted above into distinct groups.
So you are really facing a modelling question: why did
the numbers come out as they did, and what is a good way
to represent that mechanism as a distribution?
With best wishes,
Ted.
On 27-Mar-08 12:27:55, Tom Cohen wrote:>
> Dear list,
> I have a dataset containing values obtained from two different
> instruments (x and y).
> I want to generate 5 samples from normal distribution for each
> instrument based on
> their means and standard deviations. The problem is values from both
> instruments are
> non-negative, so if using rnorm I would get some negative values. Is
> there any options
> to determine the lower bound of normal distribution to be 0 or can I
> simulate the
> samples in different ways to avoid the negative values?
>
>
> > dat
> id x y
> 75 101 0.134 0.1911315
> 79 102 0.170 0.1610306
> 76 103 0.134 0.1911315
> 84 104 0.170 0.1610306
> 74 105 0.134 0.1911315
> 80 106 0.170 0.1610306
> 77 107 0.134 0.1911315
> 81 108 0.170 0.1610306
> 82 109 0.170 0.1610306
> 78 111 0.170 0.1610306
> 83 112 0.170 0.1610306
> 85 113 0.097 0.2777778
> 2 201 1.032 1.5510434
> 1 202 0.803 1.0631001
> 5 203 1.032 1.5510434
>
> mu<-apply(dat[,-1],2,mean)
> sigma<-apply(dat[,-1],2,sd)
> len<-5
> n<-20
> s1<-vector("list",len)
> set.seed(7)
> for(i in 1:len){
> s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]),
> y=rnorm(n*i,mean=mu[2],sd=sigma[2]))
> }
>
> Thanks for any help,
> Tom
>
>
> ---------------------------------
> S?k efter k?rleken!
>
> [[alternative HTML version deleted]]
>
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Mar-08 Time: 14:00:44
------------------------------ XFMail ------------------------------