Ana Marija
2020-Oct-08 20:52 UTC
[R] 2 D density plot interpretation and manipulating the data
Hello, I have a data frame like this:> head(SNP)mean var sd FQC.10090295 0.0327 0.002678 0.0517 FQC.10119363 0.0220 0.000978 0.0313 FQC.10132112 0.0275 0.002088 0.0457 FQC.10201128 0.0169 0.000289 0.0170 FQC.10208432 0.0443 0.004081 0.0639 FQC.10218466 0.0116 0.000131 0.0115 ... and I am creating plot like this: s <- ggplot(SNP, mapping = aes(x = mean, y = var)) s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") s I am getting plot in attach. My question is how do I: 1.interpret the inclusion versus exclusion within the ellipses-contours? 2. how do I extract from my data frame the points which are outside of ellipses? Thanks Ana -------------- next part -------------- A non-text attachment was scrubbed... Name: snps.pdf Type: application/pdf Size: 27821 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20201008/d35a5c66/attachment.pdf>
Ana Marija
2020-Oct-09 01:35 UTC
[R] 2 D density plot interpretation and manipulating the data
My understanding is that this represents bivariate normal approximation of the data which uses the kernel density function to test for inclusion within a level set. (please correct me) In order to exclude the outlier to these ellipses/contours is it advisable to do something like this: SNP$density <- get_density(SNP$mean, SNP$var)> summary(SNP$density)Min. 1st Qu. Median Mean 3rd Qu. Max. 0 383 696 738 1170 1789 where get_density() is function from here: https://slowkow.com/notes/ggplot2-color-by-density/ and then do something like this: a=SNP[SNP$density>400,] and plot it again: p <- ggplot(a, mapping = aes(x = mean, y = var)) p <- p + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPS_red") On Thu, Oct 8, 2020 at 3:52 PM Ana Marija <sokovic.anamarija at gmail.com> wrote:> > Hello, > > I have a data frame like this: > > > head(SNP) > mean var sd > FQC.10090295 0.0327 0.002678 0.0517 > FQC.10119363 0.0220 0.000978 0.0313 > FQC.10132112 0.0275 0.002088 0.0457 > FQC.10201128 0.0169 0.000289 0.0170 > FQC.10208432 0.0443 0.004081 0.0639 > FQC.10218466 0.0116 0.000131 0.0115 > ... > > and I am creating plot like this: > > s <- ggplot(SNP, mapping = aes(x = mean, y = var)) > s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") > s > > I am getting plot in attach. > > My question is how do I: > 1.interpret the inclusion versus exclusion within the ellipses-contours? > > 2. how do I extract from my data frame the points which are outside of ellipses? > > Thanks > Ana
Abby Spurdle
2020-Oct-09 07:12 UTC
[R] 2 D density plot interpretation and manipulating the data
> My understanding is that this represents bivariate normal > approximation of the data which uses the kernel density function to > test for inclusion within a level set. (please correct me)You can fit a bivariate normal distribution by computing five parameters. Two means, two standard deviations (or two variances) and one correlation (or covariance) coefficient. The bivariate normal *has* elliptical contours. A kernel density estimate is usually regarded as an estimate of an unknown density function. Often they use a normal (or Gaussian) kernel, but I wouldn't describe them as normal approximations. In general, bivariate kernel density estimates do *not* have elliptical contours. But in saying that, if the data is close to normality, then contours will be close to elliptical. Kernel density estimates do not test for inclusion, as such. (But technically, there are some exceptions to that). I'm not sure what you're trying to achieve here.
Possibly Parallel Threads
- 2 D density plot interpretation and manipulating the data
- 2 D density plot interpretation and manipulating the data
- 2 D density plot interpretation and manipulating the data
- 2 D density plot interpretation and manipulating the data
- 2 D density plot interpretation and manipulating the data