Emmanuel Levy
2010-Nov-20 01:44 UTC
[R] How to do a probability density based filtering in 2D?
Hello, This sounds like a problem to which many solutions should exist, but I did not manage to find one. Basically, given a list of datapoints, I'd like to keep those within the X% percentile highest density. That would be equivalent to retain only points within a given line of a contour plot. Thanks to anybody who could let me know which function I could use! Best, Emmanuel
David Winsemius
2010-Nov-20 02:25 UTC
[R] How to do a probability density based filtering in 2D?
On Nov 19, 2010, at 8:44 PM, Emmanuel Levy wrote:> Hello, > > This sounds like a problem to which many solutions should exist, but I > did not manage to find one. > > Basically, given a list of datapoints, I'd like to keep those within > the X% percentile highest density. > That would be equivalent to retain only points within a given line of > a contour plot. > > Thanks to anybody who could let me know which function I could use!What's wrong with quantile? -- David Winsemius, MD West Hartford, CT
Emmanuel Levy
2010-Nov-20 03:32 UTC
[R] How to do a probability density based filtering in 2D?
Hello Roger, Thanks for the suggestions. I finally managed to do it using the output of kde2d - The code is pasted below. Actually this made me realize that the outcome of kde2d can be quite influenced by outliers if a boundary box is not given (try running the code without the boundary box, e.g.lims, and you will see). library(MASS) x1 = rnorm(10000) x2 = rnorm(10000) nBins=200 z1 = kde2d(x1,x2,n=nBins, lims=c(-4,4,-4,4)) x1.cut = cut(x1, seq(z1$x[1], z1$x[nBins],len=(nBins+1) )) x2.cut = cut(x2, seq(z1$y[1], z1$y[nBins],len=(nBins+1) )) xy.cuts = data.frame(x1.cut,x2.cut, ord=1:(length(x1.cut)) ) density = data.frame( X=rep(factor(levels(x1.cut)),rep(nBins,nBins) ), Y=rep(factor(levels(x2.cut)),nBins) , Z= as.vector(z1$z)) xy.density = merge( xy.cuts, density, by=c(1,2), sort=FALSE, all.x=TRUE) xy.density = xy.density[order(x=xy.density$ord),] xy.density$Z[is.na(xy.density$Z)]=0 Z.lim = quantile(xy.density$Z, prob=c(0.1)) to.plot = xy.density$Z > Z.lim[1] par(mfrow=c(1,2)) plot(x1,x2, xlim=c(-3,3) ,ylim=c(-3,3)) plot(x1[to.plot], x2[to.plot], xlim=c(-3,3),ylim=c(-3,3)) On 19 November 2010 21:52, Roger Koenker <rkoenker at illinois.edu> wrote:> >> Also I wouldn't be surprised if there is a trick with quantile that >> escapes my mind. > > I would be surprised... ?this is all very dependent on how you do the > density estimation, but you might look at contourLines and then try > to use in.convex.hull() from tripack, but beware that things get more > complicated if the density is multi-modal. > > You might also look at one of the packages that do "data depth" > sorts of things. > > Roger Koenker > rkoenker at illinois.edu > > > > > >