Bert Gunter
2020-Oct-09 16:31 UTC
[R] 2 D density plot interpretation and manipulating the data
I recommend that you consult with a local statistical expert. Much of what you say (outliers?!?) seems to make little sense, and your statistical knowledge seems minimal. Perhaps more to the point, none of your questions can be properly answered without subject matter context, which this list is not designed to provide. That's why I believe you need local expertise. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Oct 9, 2020 at 8:25 AM Ana Marija <sokovic.anamarija at gmail.com> wrote:> Hi Abby, > > thank you for getting back to me and for this useful information. > > I'm trying to detect the outliers in my distribution based of mean and > variance. Can I see that from the plot I provided? Would outliers be > outside of ellipses? If so how do I extract those from my data frame, > based on which parameter? > > So I am trying to connect outliers based on what the plot is showing: > s <- ggplot(SNP, mapping = aes(x = mean, y = var)) > s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") > > versus what is in the data: > > > head(SNP) > mean var sd > FQC.10090295 0.0327 0.002678 0.0517 > FQC.10119363 0.0220 0.000978 0.0313 > FQC.10132112 0.0275 0.002088 0.0457 > FQC.10201128 0.0169 0.000289 0.0170 > FQC.10208432 0.0443 0.004081 0.0639 > FQC.10218466 0.0116 0.000131 0.0115 > ... > > the distribution is not normal, it is right-skewed. > > Cheers, > Ana > > On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle <spurdle.a at gmail.com> wrote: > > > > > My understanding is that this represents bivariate normal > > > approximation of the data which uses the kernel density function to > > > test for inclusion within a level set. (please correct me) > > > > You can fit a bivariate normal distribution by computing five parameters. > > Two means, two standard deviations (or two variances) and one > > correlation (or covariance) coefficient. > > The bivariate normal *has* elliptical contours. > > > > A kernel density estimate is usually regarded as an estimate of an > > unknown density function. > > Often they use a normal (or Gaussian) kernel, but I wouldn't describe > > them as normal approximations. > > In general, bivariate kernel density estimates do *not* have > > elliptical contours. > > But in saying that, if the data is close to normality, then contours > > will be close to elliptical. > > > > Kernel density estimates do not test for inclusion, as such. > > (But technically, there are some exceptions to that). > > > > I'm not sure what you're trying to achieve here. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Ana Marija
2020-Oct-09 16:47 UTC
[R] 2 D density plot interpretation and manipulating the data
Hi Bert, Another confrontational response from you... You might have noticed that I use the word "outlier" carefully in this post and only in relation to the plotted ellipses. I do not know the underlying algorithm of geom_density_2d() and therefore I am having an issue of how to interpret the plot. I was hoping someone here knows that and can help me. Ana On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > I recommend that you consult with a local statistical expert. Much of what you say (outliers?!?) seems to make little sense, and your statistical knowledge seems minimal. Perhaps more to the point, none of your questions can be properly answered without subject matter context, which this list is not designed to provide. That's why I believe you need local expertise. > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Oct 9, 2020 at 8:25 AM Ana Marija <sokovic.anamarija at gmail.com> wrote: >> >> Hi Abby, >> >> thank you for getting back to me and for this useful information. >> >> I'm trying to detect the outliers in my distribution based of mean and >> variance. Can I see that from the plot I provided? Would outliers be >> outside of ellipses? If so how do I extract those from my data frame, >> based on which parameter? >> >> So I am trying to connect outliers based on what the plot is showing: >> s <- ggplot(SNP, mapping = aes(x = mean, y = var)) >> s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") >> >> versus what is in the data: >> >> > head(SNP) >> mean var sd >> FQC.10090295 0.0327 0.002678 0.0517 >> FQC.10119363 0.0220 0.000978 0.0313 >> FQC.10132112 0.0275 0.002088 0.0457 >> FQC.10201128 0.0169 0.000289 0.0170 >> FQC.10208432 0.0443 0.004081 0.0639 >> FQC.10218466 0.0116 0.000131 0.0115 >> ... >> >> the distribution is not normal, it is right-skewed. >> >> Cheers, >> Ana >> >> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle <spurdle.a at gmail.com> wrote: >> > >> > > My understanding is that this represents bivariate normal >> > > approximation of the data which uses the kernel density function to >> > > test for inclusion within a level set. (please correct me) >> > >> > You can fit a bivariate normal distribution by computing five parameters. >> > Two means, two standard deviations (or two variances) and one >> > correlation (or covariance) coefficient. >> > The bivariate normal *has* elliptical contours. >> > >> > A kernel density estimate is usually regarded as an estimate of an >> > unknown density function. >> > Often they use a normal (or Gaussian) kernel, but I wouldn't describe >> > them as normal approximations. >> > In general, bivariate kernel density estimates do *not* have >> > elliptical contours. >> > But in saying that, if the data is close to normality, then contours >> > will be close to elliptical. >> > >> > Kernel density estimates do not test for inclusion, as such. >> > (But technically, there are some exceptions to that). >> > >> > I'm not sure what you're trying to achieve here. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
Abby Spurdle
2020-Oct-09 23:08 UTC
[R] 2 D density plot interpretation and manipulating the data
You could assign a density value to each point. Maybe you've done that already...? Then trim the lowest n (number of) data points Or trim the lowest p (proportion of) data points. e.g. Remove the data points with the 20 lowest density values. Or remove the data points with the lowest 5% of density values. I'll let you decide whether that is a good idea or a bad idea. And if it's a good idea, then how much to trim. On Sat, Oct 10, 2020 at 5:47 AM Ana Marija <sokovic.anamarija at gmail.com> wrote:> > Hi Bert, > > Another confrontational response from you... > > You might have noticed that I use the word "outlier" carefully in this > post and only in relation to the plotted ellipses. I do not know the > underlying algorithm of geom_density_2d() and therefore I am having an > issue of how to interpret the plot. I was hoping someone here knows > that and can help me. > > Ana > > On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter <bgunter.4567 at gmail.com> wrote: > > > > I recommend that you consult with a local statistical expert. Much of what you say (outliers?!?) seems to make little sense, and your statistical knowledge seems minimal. Perhaps more to the point, none of your questions can be properly answered without subject matter context, which this list is not designed to provide. That's why I believe you need local expertise. > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along and sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Fri, Oct 9, 2020 at 8:25 AM Ana Marija <sokovic.anamarija at gmail.com> wrote: > >> > >> Hi Abby, > >> > >> thank you for getting back to me and for this useful information. > >> > >> I'm trying to detect the outliers in my distribution based of mean and > >> variance. Can I see that from the plot I provided? Would outliers be > >> outside of ellipses? If so how do I extract those from my data frame, > >> based on which parameter? > >> > >> So I am trying to connect outliers based on what the plot is showing: > >> s <- ggplot(SNP, mapping = aes(x = mean, y = var)) > >> s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") > >> > >> versus what is in the data: > >> > >> > head(SNP) > >> mean var sd > >> FQC.10090295 0.0327 0.002678 0.0517 > >> FQC.10119363 0.0220 0.000978 0.0313 > >> FQC.10132112 0.0275 0.002088 0.0457 > >> FQC.10201128 0.0169 0.000289 0.0170 > >> FQC.10208432 0.0443 0.004081 0.0639 > >> FQC.10218466 0.0116 0.000131 0.0115 > >> ... > >> > >> the distribution is not normal, it is right-skewed. > >> > >> Cheers, > >> Ana > >> > >> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle <spurdle.a at gmail.com> wrote: > >> > > >> > > My understanding is that this represents bivariate normal > >> > > approximation of the data which uses the kernel density function to > >> > > test for inclusion within a level set. (please correct me) > >> > > >> > You can fit a bivariate normal distribution by computing five parameters. > >> > Two means, two standard deviations (or two variances) and one > >> > correlation (or covariance) coefficient. > >> > The bivariate normal *has* elliptical contours. > >> > > >> > A kernel density estimate is usually regarded as an estimate of an > >> > unknown density function. > >> > Often they use a normal (or Gaussian) kernel, but I wouldn't describe > >> > them as normal approximations. > >> > In general, bivariate kernel density estimates do *not* have > >> > elliptical contours. > >> > But in saying that, if the data is close to normality, then contours > >> > will be close to elliptical. > >> > > >> > Kernel density estimates do not test for inclusion, as such. > >> > (But technically, there are some exceptions to that). > >> > > >> > I'm not sure what you're trying to achieve here. > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code.
Reasonably Related Threads
- 2 D density plot interpretation and manipulating the data
- 2 D density plot interpretation and manipulating the data
- 2 D density plot interpretation and manipulating the data
- 2 D density plot interpretation and manipulating the data
- 2 D density plot interpretation and manipulating the data