Nathan Miller

2013-Jan-28 17:53 UTC

### Adding 95% contours around scatterplot points with ggplot2

Hi all, I have been looking for means of add a contour around some points in a scatterplot as a means of representing the center of density for of the data. I''m imagining something like a 95% confidence estimate drawn around the data. So far I have found some code for drawing polygons around the data. These look nice, but in some cases the polygons are strongly influenced by outlying points. Does anyone have a thought on how to draw a contour which is more along the lines of a 95% confidence space? I have provided a working example below to illustrate the drawing of the polygons. As I said I would rather have three "ovals"/95% contours drawn around the points by "level" to capture the different density distributions without the visualization being heavily influenced by outliers. I have looked into the code provided here from Hadley https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 using the mvtnorm package and the dmvnorm function, but haven''t been able to get it work for my data example. The calculated densities are always zero (at this step of Hadley''s code: dgrid$dens <- dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) I appreciate any assistance. Thanks, Nate x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), seq(0.4,0.6,length.out=30)) y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) find_hull <- function(data) data[chull(data$x, data$y), ] hulls <- ddply(data, .(level), find_hull) fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+geom_point() fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2) fig1 [[alternative HTML version deleted]]

Ista Zahn

2013-Jan-28 18:50 UTC

### [R] Adding 95% contours around scatterplot points with ggplot2

Hi Nathan, This only fits some of your criteria, but have you looked at ?stat_density2d? Best, Ista On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller <natemiller77 at gmail.com> wrote:> Hi all, > > I have been looking for means of add a contour around some points in a > scatterplot as a means of representing the center of density for of the > data. I''m imagining something like a 95% confidence estimate drawn around > the data. > > So far I have found some code for drawing polygons around the data. These > look nice, but in some cases the polygons are strongly influenced by > outlying points. Does anyone have a thought on how to draw a contour which > is more along the lines of a 95% confidence space? > > I have provided a working example below to illustrate the drawing of the > polygons. As I said I would rather have three "ovals"/95% contours drawn > around the points by "level" to capture the different density distributions > without the visualization being heavily influenced by outliers. > > I have looked into the code provided here from Hadley > https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 > using the mvtnorm package and the dmvnorm function, but haven''t been able > to get it work for my data example. The calculated densities are always > zero (at this step of Hadley''s code: dgrid$dens <- > dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) > > I appreciate any assistance. > > Thanks, > Nate > > x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), > seq(0.4,0.6,length.out=30)) > y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) > data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) > > > find_hull <- function(data) data[chull(data$x, data$y), ] > hulls <- ddply(data, .(level), find_hull) > > fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)), > fill=level))+geom_point() > fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2) > fig1 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.

Nathan Miller

2013-Jan-28 19:43 UTC

### Re: Adding 95% contours around scatterplot points with ggplot2

Thanks Ista, I have played a bit with stat_density2d as well. It doesn''t completely capture what I am looking for and ends up being quite busy at the same time. I''m looking for a way of helping those looking that the figure to see the broad patterns of where in the x/y space the data from different groups are distributed. Using the 95% CI type idea is so that I don''t end up arbitrarily drawing circles around each set of points. I appreciate your direction though. Nate On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn <istazahn@gmail.com> wrote:> Hi Nathan, > > This only fits some of your criteria, but have you looked at > ?stat_density2d? > > Best, > Ista > > On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller <natemiller77@gmail.com> > wrote: > > Hi all, > > > > I have been looking for means of add a contour around some points in a > > scatterplot as a means of representing the center of density for of the > > data. I''m imagining something like a 95% confidence estimate drawn around > > the data. > > > > So far I have found some code for drawing polygons around the data. These > > look nice, but in some cases the polygons are strongly influenced by > > outlying points. Does anyone have a thought on how to draw a contour > which > > is more along the lines of a 95% confidence space? > > > > I have provided a working example below to illustrate the drawing of the > > polygons. As I said I would rather have three "ovals"/95% contours drawn > > around the points by "level" to capture the different density > distributions > > without the visualization being heavily influenced by outliers. > > > > I have looked into the code provided here from Hadley > > https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 > > using the mvtnorm package and the dmvnorm function, but haven''t been able > > to get it work for my data example. The calculated densities are always > > zero (at this step of Hadley''s code: dgrid$dens <- > > dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) > > > > I appreciate any assistance. > > > > Thanks, > > Nate > > > > x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), > > seq(0.4,0.6,length.out=30)) > > > y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) > > data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) > > > > > > find_hull <- function(data) data[chull(data$x, data$y), ] > > hulls <- ddply(data, .(level), find_hull) > > > > fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)), > > fill=level))+geom_point() > > fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2) > > fig1 > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]

Ista Zahn

2013-Jan-28 20:37 UTC

### [R] Adding 95% contours around scatterplot points with ggplot2

Hi Nate, You can make it less busy using the bins argument. This is not documented, except in the examples to stat_contour, but try ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+ geom_point()+ stat_density2d(bins=2) HTH, Ista On Mon, Jan 28, 2013 at 2:43 PM, Nathan Miller <natemiller77 at gmail.com> wrote:> Thanks Ista, > > I have played a bit with stat_density2d as well. It doesn''t completely > capture what I am looking for and ends up being quite busy at the same time. > I''m looking for a way of helping those looking that the figure to see the > broad patterns of where in the x/y space the data from different groups are > distributed. Using the 95% CI type idea is so that I don''t end up > arbitrarily drawing circles around each set of points. I appreciate your > direction though. > > Nate > > > On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn <istazahn at gmail.com> wrote: >> >> Hi Nathan, >> >> This only fits some of your criteria, but have you looked at >> ?stat_density2d? >> >> Best, >> Ista >> >> On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller <natemiller77 at gmail.com> >> wrote: >> > Hi all, >> > >> > I have been looking for means of add a contour around some points in a >> > scatterplot as a means of representing the center of density for of the >> > data. I''m imagining something like a 95% confidence estimate drawn >> > around >> > the data. >> > >> > So far I have found some code for drawing polygons around the data. >> > These >> > look nice, but in some cases the polygons are strongly influenced by >> > outlying points. Does anyone have a thought on how to draw a contour >> > which >> > is more along the lines of a 95% confidence space? >> > >> > I have provided a working example below to illustrate the drawing of the >> > polygons. As I said I would rather have three "ovals"/95% contours drawn >> > around the points by "level" to capture the different density >> > distributions >> > without the visualization being heavily influenced by outliers. >> > >> > I have looked into the code provided here from Hadley >> > https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 >> > using the mvtnorm package and the dmvnorm function, but haven''t been >> > able >> > to get it work for my data example. The calculated densities are always >> > zero (at this step of Hadley''s code: dgrid$dens <- >> > dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) >> > >> > I appreciate any assistance. >> > >> > Thanks, >> > Nate >> > >> > x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), >> > seq(0.4,0.6,length.out=30)) >> > >> > y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) >> > data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) >> > >> > >> > find_hull <- function(data) data[chull(data$x, data$y), ] >> > hulls <- ddply(data, .(level), find_hull) >> > >> > fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)), >> > fill=level))+geom_point() >> > fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2) >> > fig1 >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > >

Nathan Miller

2013-Jan-28 20:56 UTC

### Re: Adding 95% contours around scatterplot points with ggplot2

Hi Ista, Thanks. That does look pretty nice and I hadn''t realized that was possible. Do you know how to extract information regarding those curves? I''d like to be able to report something about what portion of the data they encompass or really any other feature about them in a figure legend. I''ll look into stat_density2d and see if I can determine how they are set. Thanks for your help, Nate On Mon, Jan 28, 2013 at 12:37 PM, Ista Zahn <istazahn@gmail.com> wrote:> Hi Nate, > > You can make it less busy using the bins argument. This is not > documented, except in the examples to stat_contour, but try > > ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+ > geom_point()+ > stat_density2d(bins=2) > > HTH, > Ista > > On Mon, Jan 28, 2013 at 2:43 PM, Nathan Miller <natemiller77@gmail.com> > wrote: > > Thanks Ista, > > > > I have played a bit with stat_density2d as well. It doesn''t completely > > capture what I am looking for and ends up being quite busy at the same > time. > > I''m looking for a way of helping those looking that the figure to see the > > broad patterns of where in the x/y space the data from different groups > are > > distributed. Using the 95% CI type idea is so that I don''t end up > > arbitrarily drawing circles around each set of points. I appreciate your > > direction though. > > > > Nate > > > > > > On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn <istazahn@gmail.com> wrote: > >> > >> Hi Nathan, > >> > >> This only fits some of your criteria, but have you looked at > >> ?stat_density2d? > >> > >> Best, > >> Ista > >> > >> On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller <natemiller77@gmail.com > > > >> wrote: > >> > Hi all, > >> > > >> > I have been looking for means of add a contour around some points in a > >> > scatterplot as a means of representing the center of density for of > the > >> > data. I''m imagining something like a 95% confidence estimate drawn > >> > around > >> > the data. > >> > > >> > So far I have found some code for drawing polygons around the data. > >> > These > >> > look nice, but in some cases the polygons are strongly influenced by > >> > outlying points. Does anyone have a thought on how to draw a contour > >> > which > >> > is more along the lines of a 95% confidence space? > >> > > >> > I have provided a working example below to illustrate the drawing of > the > >> > polygons. As I said I would rather have three "ovals"/95% contours > drawn > >> > around the points by "level" to capture the different density > >> > distributions > >> > without the visualization being heavily influenced by outliers. > >> > > >> > I have looked into the code provided here from Hadley > >> > > https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 > >> > using the mvtnorm package and the dmvnorm function, but haven''t been > >> > able > >> > to get it work for my data example. The calculated densities are > always > >> > zero (at this step of Hadley''s code: dgrid$dens <- > >> > dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) > >> > > >> > I appreciate any assistance. > >> > > >> > Thanks, > >> > Nate > >> > > >> > x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), > >> > seq(0.4,0.6,length.out=30)) > >> > > >> > > y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) > >> > data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) > >> > > >> > > >> > find_hull <- function(data) data[chull(data$x, data$y), ] > >> > hulls <- ddply(data, .(level), find_hull) > >> > > >> > fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)), > >> > fill=level))+geom_point() > >> > fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2) > >> > fig1 > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > > > > >[[alternative HTML version deleted]]

Ista Zahn

2013-Jan-28 20:59 UTC

### [R] Adding 95% contours around scatterplot points with ggplot2

Hi Nate, I infer from the stat_density2d documentation that the calculation is carried out by the kde2d function in the MASS package. Refer to ?kde2d for details. Best, Ista On Mon, Jan 28, 2013 at 3:56 PM, Nathan Miller <natemiller77 at gmail.com> wrote:> Hi Ista, > > Thanks. That does look pretty nice and I hadn''t realized that was possible. > Do you know how to extract information regarding those curves? I''d like to > be able to report something about what portion of the data they encompass or > really any other feature about them in a figure legend. I''ll look into > stat_density2d and see if I can determine how they are set. > > Thanks for your help, > > Nate > > > On Mon, Jan 28, 2013 at 12:37 PM, Ista Zahn <istazahn at gmail.com> wrote: >> >> Hi Nate, >> >> You can make it less busy using the bins argument. This is not >> documented, except in the examples to stat_contour, but try >> >> ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+ >> geom_point()+ >> stat_density2d(bins=2) >> >> HTH, >> Ista >> >> On Mon, Jan 28, 2013 at 2:43 PM, Nathan Miller <natemiller77 at gmail.com> >> wrote: >> > Thanks Ista, >> > >> > I have played a bit with stat_density2d as well. It doesn''t completely >> > capture what I am looking for and ends up being quite busy at the same >> > time. >> > I''m looking for a way of helping those looking that the figure to see >> > the >> > broad patterns of where in the x/y space the data from different groups >> > are >> > distributed. Using the 95% CI type idea is so that I don''t end up >> > arbitrarily drawing circles around each set of points. I appreciate your >> > direction though. >> > >> > Nate >> > >> > >> > On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn <istazahn at gmail.com> wrote: >> >> >> >> Hi Nathan, >> >> >> >> This only fits some of your criteria, but have you looked at >> >> ?stat_density2d? >> >> >> >> Best, >> >> Ista >> >> >> >> On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller >> >> <natemiller77 at gmail.com> >> >> wrote: >> >> > Hi all, >> >> > >> >> > I have been looking for means of add a contour around some points in >> >> > a >> >> > scatterplot as a means of representing the center of density for of >> >> > the >> >> > data. I''m imagining something like a 95% confidence estimate drawn >> >> > around >> >> > the data. >> >> > >> >> > So far I have found some code for drawing polygons around the data. >> >> > These >> >> > look nice, but in some cases the polygons are strongly influenced by >> >> > outlying points. Does anyone have a thought on how to draw a contour >> >> > which >> >> > is more along the lines of a 95% confidence space? >> >> > >> >> > I have provided a working example below to illustrate the drawing of >> >> > the >> >> > polygons. As I said I would rather have three "ovals"/95% contours >> >> > drawn >> >> > around the points by "level" to capture the different density >> >> > distributions >> >> > without the visualization being heavily influenced by outliers. >> >> > >> >> > I have looked into the code provided here from Hadley >> >> > >> >> > https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 >> >> > using the mvtnorm package and the dmvnorm function, but haven''t been >> >> > able >> >> > to get it work for my data example. The calculated densities are >> >> > always >> >> > zero (at this step of Hadley''s code: dgrid$dens <- >> >> > dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) >> >> > >> >> > I appreciate any assistance. >> >> > >> >> > Thanks, >> >> > Nate >> >> > >> >> > x<-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), >> >> > seq(0.4,0.6,length.out=30)) >> >> > >> >> > >> >> > y<-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) >> >> > data<-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) >> >> > >> >> > >> >> > find_hull <- function(data) data[chull(data$x, data$y), ] >> >> > hulls <- ddply(data, .(level), find_hull) >> >> > >> >> > fig1 <- ggplot(data=data, aes(x, y, colour=(factor(level)), >> >> > fill=level))+geom_point() >> >> > fig1 <- fig1 + geom_polygon(data=hulls, alpha=.2) >> >> > fig1 >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > ______________________________________________ >> >> > R-help at r-project.org mailing list >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guide >> >> > http://www.R-project.org/posting-guide.html >> >> > and provide commented, minimal, self-contained, reproducible code. >> > >> > > >