drflxms
2012-Mar-03 14:54 UTC
[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution
Dear all, I created a bivariate normal distribution: set.seed(138813) n<-100 x<-rnorm(n); y<-rnorm(n) and plotted a scatterplot of it: plot(x,y) Now I'd like to add the 2D-standard deviation. I found a thread regarding plotting arbitrary confidence boundaries from Pascal H?nggi http://www.mail-archive.com/r-help at r-project.org/msg24013.html which cites the even older thread http://tolstoy.newcastle.edu.au/R/help/03b/5384.html As I am unfortunately only a very poor R programmer, the code of Pascal H?nggi is a myth to me and I am not sure whether I was able to translate the recommendation of Brain Ripley in the later thread (which provides no code) into the the correct R code. Brain wrote: You need a 2D density estimate (e.g. kde2d in MASS) then compute the density values at the points and draw the contour of the density which includes 95% of the points (at a level computed from the sorted values via quantile()). [95% confidence interval was desired in thread instead of standard deviation...] So I tried this... den<-kde2d(x, y, n=n) #as I chose n to be the same as during creating the distributions x and y (see above), a z-value is assigned to every combination of x and y. # create a sorted vector of z-values (instead of the matrix stored inside the den object den.z <-sort(den$z) # set desired confidence border to draw and store it in variable confidence.border <- quantile(den.z, probs=0.6827, na.rm = TRUE) # draw a line representing confidence.border on the existing scatterplot par(new=TRUE) contour(den, levels=confidence.border, col = "red", add = TRUE) Unfortunately I doubt very much this is correct :( In fact I am sure this is wrong, because the border for probs=0.05 is drawn outside the values.... So please help and check. Pascal H?nggis code seems to work, but I don't understand the magic he does with pp <- array() for (i in 1:1000){ z.x <- max(which(den$x < x[i])) z.y <- max(which(den$y < y[i])) pp[i] <- den$z[z.x, z.y] } before doing the very same as I did above: confidencebound <- quantile(pp, 0.05, na.rm = TRUE) plot(x, y) contour(den, levels = confidencebound, col = "red", add = TRUE) My problems: 1.) setting probs=0.6827 is somehow a dirty trick which I can only use by simply knowing that this is the percentage of values inside +-1sd when a distribution is normal. Is there a way doing this with "native" sd function? sd(den.z) is not correct, as den.z is in contrast to x and y not normal any more. So ecdf(den.z)(sd(den.z)) results in a percentile of 0.5644 in this example instead of the desired 0.6827. 2.) I would like to have code that works with any desired confidence. Unfortunately setting probs to the desired confidence would probably be wrong (?!) as it relates to den.z instead of x and y, which are the underlying distributions I am interested in. To put it short I want the confidence of x/y and not of den.z. I am really completely stuck. Please help me out of this! Felix -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Nachrichtenteil als Anhang URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120303/ba420fff/attachment.pl>
drflxms
2012-Mar-03 16:01 UTC
[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution
OK, the following seems to work still do not understand exactly why... library(MASS) # parameters: n<-100 # generate samples: set.seed(138813) #seed <- .Random.seed x<-rnorm(n); y<-rnorm(n) # estimate non-parameteric density surface via kernel smoothing den<-kde2d(x, y, n=n) # store z values of density estimate in an extra variable den.z <-den$z # this is the critical block, which I still do not comprehend in detail z <- array() for (i in 1:n){ z.x <- max(which(den$x < x[i])) z.y <- max(which(den$y < y[i])) z[i] <- den$z[z.x, z.y] } # store class/level borders of confidence interval in variables confidence.border <- quantile(z, probs=1-0.6827, na.rm = TRUE) # +-1sd plot(x,y) par(new=TRUE) contour(den, levels=confidence.border, col = "red", add = TRUE) The results look at least plausible. Do not know if they are really true. Still hoping for some replies ;), Felix Am 03.03.12 15:54, schrieb drflxms:> Dear all, > > I created a bivariate normal distribution: > > set.seed(138813) > n<-100 > x<-rnorm(n); y<-rnorm(n) > > and plotted a scatterplot of it: > > plot(x,y) > > Now I'd like to add the 2D-standard deviation. > > I found a thread regarding plotting arbitrary confidence boundaries from > Pascal H?nggi > http://www.mail-archive.com/r-help at r-project.org/msg24013.html > which cites the even older thread > http://tolstoy.newcastle.edu.au/R/help/03b/5384.html > > As I am unfortunately only a very poor R programmer, the code of Pascal > H?nggi is a myth to me and I am not sure whether I was able to translate > the recommendation of Brain Ripley in the later thread (which provides > no code) into the the correct R code. Brain wrote: > > You need a 2D density estimate (e.g. kde2d in MASS) then compute the > density values at the points and draw the contour of the density which > includes 95% of the points (at a level computed from the sorted values > via quantile()). [95% confidence interval was desired in thread instead > of standard deviation...] > > So I tried this... > > den<-kde2d(x, y, n=n) #as I chose n to be the same as during creating > the distributions x and y (see above), a z-value is assigned to every > combination of x and y. > > # create a sorted vector of z-values (instead of the matrix stored > inside the den object > den.z <-sort(den$z) > > # set desired confidence border to draw and store it in variable > confidence.border <- quantile(den.z, probs=0.6827, na.rm = TRUE) > > # draw a line representing confidence.border on the existing scatterplot > par(new=TRUE) > contour(den, levels=confidence.border, col = "red", add = TRUE) > > Unfortunately I doubt very much this is correct :( In fact I am sure > this is wrong, because the border for probs=0.05 is drawn outside the > values.... So please help and check. > Pascal H?nggis code seems to work, but I don't understand the magic he > does with > > pp <- array() > for (i in 1:1000){ > z.x <- max(which(den$x < x[i])) > z.y <- max(which(den$y < y[i])) > pp[i] <- den$z[z.x, z.y] > } > > before doing the very same as I did above: > > confidencebound <- quantile(pp, 0.05, na.rm = TRUE) > > plot(x, y) > contour(den, levels = confidencebound, col = "red", add = TRUE) > > > My problems: > > 1.) setting probs=0.6827 is somehow a dirty trick which I can only use > by simply knowing that this is the percentage of values inside +-1sd > when a distribution is normal. Is there a way doing this with "native" > sd function? > sd(den.z) is not correct, as den.z is in contrast to x and y not normal > any more. So ecdf(den.z)(sd(den.z)) results in a percentile of 0.5644 in > this example instead of the desired 0.6827. > > 2.) I would like to have code that works with any desired confidence. > Unfortunately setting probs to the desired confidence would probably be > wrong (?!) as it relates to den.z instead of x and y, which are the > underlying distributions I am interested in. To put it short I want the > confidence of x/y and not of den.z. > > > I am really completely stuck. Please help me out of this! Felix >
Greg Snow
2012-Mar-03 16:28 UTC
[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution
Look at the ellipse package (and the ellipse function in the package) for a simple way of showing a confidence region for bivariate data on a plot (a 68% confidence interval is about 1 SD if you just want to show 1 SD). On Sat, Mar 3, 2012 at 7:54 AM, drflxms <drflxms at googlemail.com> wrote:> Dear all, > > I created a bivariate normal distribution: > > set.seed(138813) > n<-100 > x<-rnorm(n); y<-rnorm(n) > > and plotted a scatterplot of it: > > plot(x,y) > > Now I'd like to add the 2D-standard deviation. > > I found a thread regarding plotting arbitrary confidence boundaries from > Pascal H?nggi > http://www.mail-archive.com/r-help at r-project.org/msg24013.html > which cites the even older thread > http://tolstoy.newcastle.edu.au/R/help/03b/5384.html > > As I am unfortunately only a very poor R programmer, the code of Pascal > H?nggi is a myth to me and I am not sure whether I was able to translate > the recommendation of Brain Ripley in the later thread (which provides > no code) into the the correct R code. Brain wrote: > > You need a 2D density estimate (e.g. kde2d in MASS) then compute the > density values at the points and draw the contour of the density which > includes 95% of the points (at a level computed from the sorted values > via quantile()). [95% confidence interval was desired in thread instead > of standard deviation...] > > So I tried this... > > den<-kde2d(x, y, n=n) #as I chose n to be the same as during creating > the distributions x and y (see above), a z-value is assigned to every > combination of x and y. > > # create a sorted vector of z-values (instead of the matrix stored > inside the den object > den.z <-sort(den$z) > > # set desired confidence border to draw and store it in variable > confidence.border <- quantile(den.z, probs=0.6827, na.rm = TRUE) > > # draw a line representing confidence.border on the existing scatterplot > par(new=TRUE) > contour(den, levels=confidence.border, col = "red", add = TRUE) > > Unfortunately I doubt very much this is correct :( In fact I am sure > this is wrong, because the border for probs=0.05 is drawn outside the > values.... So please help and check. > Pascal H?nggis code seems to work, but I don't understand the magic he > does with > > pp <- array() > for (i in 1:1000){ > ? ? ? ?z.x <- max(which(den$x < x[i])) > ? ? ? ?z.y <- max(which(den$y < y[i])) > ? ? ? ?pp[i] <- den$z[z.x, z.y] > } > > before doing the very same as I did above: > > confidencebound <- quantile(pp, 0.05, na.rm = TRUE) > > plot(x, y) > contour(den, levels = confidencebound, col = "red", add = TRUE) > > > My problems: > > 1.) setting probs=0.6827 is somehow a dirty trick which I can only use > by simply knowing that this is the percentage of values inside +-1sd > when a distribution is normal. Is there a way doing this with "native" > sd function? > sd(den.z) is not correct, as den.z is in contrast to x and y not normal > any more. So ecdf(den.z)(sd(den.z)) results in a percentile of 0.5644 in > this example instead of the desired 0.6827. > > 2.) I would like to have code that works with any desired confidence. > Unfortunately setting probs to the desired confidence would probably be > wrong (?!) as it relates to den.z instead of x and y, which are the > underlying distributions I am interested in. To put it short I want the > confidence of x/y and not of den.z. > > > I am really completely stuck. Please help me out of this! Felix > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com