thr3ads.net - R help - [R] contour for plotting confidence interval on scatter plot of bivariate normal distribution [Mar 2012]

If this information is useful, please help other people find it:
Share via:

drflxms

2012-Mar-03 14:54 UTC

[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

Dear all,

I created a bivariate normal distribution:

set.seed(138813)
n<-100
x<-rnorm(n); y<-rnorm(n)

and plotted a scatterplot of it:

plot(x,y)

Now I'd like to add the 2D-standard deviation.

I found a thread regarding plotting arbitrary confidence boundaries from
Pascal H?nggi
http://www.mail-archive.com/r-help at r-project.org/msg24013.html
which cites the even older thread
http://tolstoy.newcastle.edu.au/R/help/03b/5384.html

As I am unfortunately only a very poor R programmer, the code of Pascal
H?nggi is a myth to me and I am not sure whether I was able to translate
the recommendation of Brain Ripley in the later thread (which provides
no code) into the the correct R code. Brain wrote:

You need a 2D density estimate (e.g. kde2d in MASS) then compute the
density values at the points and draw the contour of the density which
includes 95% of the points (at a level computed from the sorted values
via quantile()). [95% confidence interval was desired in thread instead
of standard deviation...]

So I tried this...

den<-kde2d(x, y, n=n) #as I chose n to be the same as during creating
the distributions x and y (see above), a z-value is assigned to every
combination of x and y.

# create a sorted vector of z-values (instead of the matrix stored
inside the den object
den.z <-sort(den$z)

# set desired confidence border to draw and store it in variable
confidence.border <- quantile(den.z, probs=0.6827, na.rm = TRUE)

# draw a line representing confidence.border on the existing scatterplot
par(new=TRUE)
contour(den, levels=confidence.border, col = "red", add = TRUE)

Unfortunately I doubt very much this is correct :( In fact I am sure
this is wrong, because the border for probs=0.05 is drawn outside the
values.... So please help and check.
Pascal H?nggis code seems to work, but I don't understand the magic he
does with

pp <- array()
for (i in 1:1000){
        z.x <- max(which(den$x < x[i]))
        z.y <- max(which(den$y < y[i]))
        pp[i] <- den$z[z.x, z.y]
}

before doing the very same as I did above:

confidencebound <- quantile(pp, 0.05, na.rm = TRUE)

plot(x, y)
contour(den, levels = confidencebound, col = "red", add = TRUE)


My problems:

1.) setting probs=0.6827 is somehow a dirty trick which I can only use
by simply knowing that this is the percentage of values inside +-1sd
when a distribution is normal. Is there a way doing this with "native"
sd function?
sd(den.z) is not correct, as den.z is in contrast to x and y not normal
any more. So ecdf(den.z)(sd(den.z)) results in a percentile of 0.5644 in
this example instead of the desired 0.6827.

2.) I would like to have code that works with any desired confidence.
Unfortunately setting probs to the desired confidence would probably be
wrong (?!) as it relates to den.z instead of x and y, which are the
underlying distributions I am interested in. To put it short I want the
confidence of x/y and not of den.z.


I am really completely stuck. Please help me out of this! Felix

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Nachrichtenteil als Anhang
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20120303/ba420fff/attachment.pl>

drflxms

2012-Mar-03 16:01 UTC

head link

[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

OK, the following seems to work
still do not understand exactly why...

library(MASS)

# parameters:
n<-100

# generate samples:
set.seed(138813)
#seed <- .Random.seed
x<-rnorm(n); y<-rnorm(n)

# estimate non-parameteric density surface via kernel smoothing
den<-kde2d(x, y, n=n)
# store z values of density estimate in an extra variable
den.z <-den$z

# this is the critical block, which I still do not comprehend in detail
z <- array()
for (i in 1:n){
        z.x <- max(which(den$x < x[i]))
        z.y <- max(which(den$y < y[i]))
        z[i] <- den$z[z.x, z.y]
}

# store class/level borders of confidence interval in variables
confidence.border <- quantile(z, probs=1-0.6827, na.rm = TRUE) # +-1sd

plot(x,y)
par(new=TRUE)
contour(den, levels=confidence.border, col = "red", add = TRUE)

The results look at least plausible. Do not know if they are really true.
Still hoping for some replies ;), Felix


Am 03.03.12 15:54, schrieb drflxms:> Dear all,
> 
> I created a bivariate normal distribution:
> 
> set.seed(138813)
> n<-100
> x<-rnorm(n); y<-rnorm(n)
> 
> and plotted a scatterplot of it:
> 
> plot(x,y)
> 
> Now I'd like to add the 2D-standard deviation.
> 
> I found a thread regarding plotting arbitrary confidence boundaries from
> Pascal H?nggi
> http://www.mail-archive.com/r-help at r-project.org/msg24013.html
> which cites the even older thread
> http://tolstoy.newcastle.edu.au/R/help/03b/5384.html
> 
> As I am unfortunately only a very poor R programmer, the code of Pascal
> H?nggi is a myth to me and I am not sure whether I was able to translate
> the recommendation of Brain Ripley in the later thread (which provides
> no code) into the the correct R code. Brain wrote:
> 
> You need a 2D density estimate (e.g. kde2d in MASS) then compute the
> density values at the points and draw the contour of the density which
> includes 95% of the points (at a level computed from the sorted values
> via quantile()). [95% confidence interval was desired in thread instead
> of standard deviation...]
> 
> So I tried this...
> 
> den<-kde2d(x, y, n=n) #as I chose n to be the same as during creating
> the distributions x and y (see above), a z-value is assigned to every
> combination of x and y.
> 
> # create a sorted vector of z-values (instead of the matrix stored
> inside the den object
> den.z <-sort(den$z)
> 
> # set desired confidence border to draw and store it in variable
> confidence.border <- quantile(den.z, probs=0.6827, na.rm = TRUE)
> 
> # draw a line representing confidence.border on the existing scatterplot
> par(new=TRUE)
> contour(den, levels=confidence.border, col = "red", add = TRUE)
> 
> Unfortunately I doubt very much this is correct :( In fact I am sure
> this is wrong, because the border for probs=0.05 is drawn outside the
> values.... So please help and check.
> Pascal H?nggis code seems to work, but I don't understand the magic he
> does with
> 
> pp <- array()
> for (i in 1:1000){
>         z.x <- max(which(den$x < x[i]))
>         z.y <- max(which(den$y < y[i]))
>         pp[i] <- den$z[z.x, z.y]
> }
> 
> before doing the very same as I did above:
> 
> confidencebound <- quantile(pp, 0.05, na.rm = TRUE)
> 
> plot(x, y)
> contour(den, levels = confidencebound, col = "red", add = TRUE)
> 
> 
> My problems:
> 
> 1.) setting probs=0.6827 is somehow a dirty trick which I can only use
> by simply knowing that this is the percentage of values inside +-1sd
> when a distribution is normal. Is there a way doing this with
"native"
> sd function?
> sd(den.z) is not correct, as den.z is in contrast to x and y not normal
> any more. So ecdf(den.z)(sd(den.z)) results in a percentile of 0.5644 in
> this example instead of the desired 0.6827.
> 
> 2.) I would like to have code that works with any desired confidence.
> Unfortunately setting probs to the desired confidence would probably be
> wrong (?!) as it relates to den.z instead of x and y, which are the
> underlying distributions I am interested in. To put it short I want the
> confidence of x/y and not of den.z.
> 
> 
> I am really completely stuck. Please help me out of this! Felix
>

Greg Snow

2012-Mar-03 16:28 UTC

head link

[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

Look at the ellipse package (and the ellipse function in the package)
for a simple way of showing a confidence region for bivariate data on
a plot (a 68% confidence interval is about 1 SD if you just want to
show 1 SD).

On Sat, Mar 3, 2012 at 7:54 AM, drflxms <drflxms at googlemail.com>
wrote:> Dear all,
>
> I created a bivariate normal distribution:
>
> set.seed(138813)
> n<-100
> x<-rnorm(n); y<-rnorm(n)
>
> and plotted a scatterplot of it:
>
> plot(x,y)
>
> Now I'd like to add the 2D-standard deviation.
>
> I found a thread regarding plotting arbitrary confidence boundaries from
> Pascal H?nggi
> http://www.mail-archive.com/r-help at r-project.org/msg24013.html
> which cites the even older thread
> http://tolstoy.newcastle.edu.au/R/help/03b/5384.html
>
> As I am unfortunately only a very poor R programmer, the code of Pascal
> H?nggi is a myth to me and I am not sure whether I was able to translate
> the recommendation of Brain Ripley in the later thread (which provides
> no code) into the the correct R code. Brain wrote:
>
> You need a 2D density estimate (e.g. kde2d in MASS) then compute the
> density values at the points and draw the contour of the density which
> includes 95% of the points (at a level computed from the sorted values
> via quantile()). [95% confidence interval was desired in thread instead
> of standard deviation...]
>
> So I tried this...
>
> den<-kde2d(x, y, n=n) #as I chose n to be the same as during creating
> the distributions x and y (see above), a z-value is assigned to every
> combination of x and y.
>
> # create a sorted vector of z-values (instead of the matrix stored
> inside the den object
> den.z <-sort(den$z)
>
> # set desired confidence border to draw and store it in variable
> confidence.border <- quantile(den.z, probs=0.6827, na.rm = TRUE)
>
> # draw a line representing confidence.border on the existing scatterplot
> par(new=TRUE)
> contour(den, levels=confidence.border, col = "red", add = TRUE)
>
> Unfortunately I doubt very much this is correct :( In fact I am sure
> this is wrong, because the border for probs=0.05 is drawn outside the
> values.... So please help and check.
> Pascal H?nggis code seems to work, but I don't understand the magic he
> does with
>
> pp <- array()
> for (i in 1:1000){
> ? ? ? ?z.x <- max(which(den$x < x[i]))
> ? ? ? ?z.y <- max(which(den$y < y[i]))
> ? ? ? ?pp[i] <- den$z[z.x, z.y]
> }
>
> before doing the very same as I did above:
>
> confidencebound <- quantile(pp, 0.05, na.rm = TRUE)
>
> plot(x, y)
> contour(den, levels = confidencebound, col = "red", add = TRUE)
>
>
> My problems:
>
> 1.) setting probs=0.6827 is somehow a dirty trick which I can only use
> by simply knowing that this is the percentage of values inside +-1sd
> when a distribution is normal. Is there a way doing this with
"native"
> sd function?
> sd(den.z) is not correct, as den.z is in contrast to x and y not normal
> any more. So ecdf(den.z)(sd(den.z)) results in a percentile of 0.5644 in
> this example instead of the desired 0.6827.
>
> 2.) I would like to have code that works with any desired confidence.
> Unfortunately setting probs to the desired confidence would probably be
> wrong (?!) as it relates to den.z instead of x and y, which are the
> underlying distributions I am interested in. To put it short I want the
> confidence of x/y and not of den.z.
>
>
> I am really completely stuck. Please help me out of this! Felix
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com

Apparently Analagous Threads

Search for more maybe matching threads

R help - Mar 2012 - contour for plotting confidence interval on scatter plot of bivariate normal distribution

[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

Apparently Analagous Threads