Dear Community,
I want to identify outliers in my data. I don't know how to use identify
command in the plots obtained.
I've gone through help files and use mahalanobis example for my purpose:
NormalMultivarianteComparefunc <- function(x) {
Sx <- cov(x)
D2 <- mahalanobis(x, colMeans(x), Sx)
plot(density(D2, bw=.5), main="Squared Mahalanobis distances, n=nrow(x),
p=ncol(x)")
rug(D2)
qqplot(qchisq(ppoints(nrow(x)), df=ncol(x)), D2,
main = expression("Q-Q plot of Mahalanobis" * ~D^2 *
" vs. quantiles of" * ~ chi[ncol(x)]^2))
abline(0, 1, col = 'gray')
}
Then I run:
NormalMultivarianteComparefunc(y); y dataframe with the data. Now, let's say
y =replicate(5, rnorm(100))
##what should I write now to identify data from the plot??
##/identify(y)
warning: no point within 0.25 inches
/ ?????
I know I can use aq.plot, but I would be very grateful if you could help me
with identify.
/By the way, in the function, how can the title write the value of the
variables in spite of "ncol(x)" or "nrow(x)"/
Thanks in advance, user at host.com
--
View this message in context:
http://r.789695.n4.nabble.com/outlier-identify-in-qqplot-tp4076587p4076587.html
Sent from the R help mailing list archive at Nabble.com.
Try this
qqInteractive <- function(..., IDENTIFY = TRUE){
qqplot(...) -> X
if(IDENTIFY) return(identify(X))
invisisble(X)
}
The trick is that identify wants coordinates of the point in the
scatter plot which are not the inputs to qqplot() but rather a
transformation thereof.
Michael
On Wed, Nov 16, 2011 at 9:52 AM, agent dunham <crosspide at hotmail.com>
wrote:> Dear Community,
>
> I want to identify outliers in my data. I don't know how to use
identify
> command in the plots obtained.
>
> I've gone through help files and use mahalanobis example for my
purpose:
>
>
> NormalMultivarianteComparefunc <- function(x) {
>
> ? ? ? ?Sx <- cov(x)
> ? ? ? ?D2 <- mahalanobis(x, colMeans(x), Sx)
> ? ? ? ?plot(density(D2, bw=.5), main="Squared Mahalanobis distances,
n=nrow(x),
> p=ncol(x)")
> ? ? ? ?rug(D2)
> ? ? ? ?qqplot(qchisq(ppoints(nrow(x)), df=ncol(x)), D2,
> ? ? ? ? ? ? ? ?main = expression("Q-Q plot of Mahalanobis" * ~D^2
*
> ? ? ? ? ? ? ? ? ? ? ? ? " vs. quantiles of" * ~ chi[ncol(x)]^2))
>
> ? ? ? ?abline(0, 1, col = 'gray')
> }
>
> Then I run:
>
> NormalMultivarianteComparefunc(y); y dataframe with the data. Now,
let's say
> y =replicate(5, rnorm(100))
>
> ##what should I write now to identify data from the plot??
> ##/identify(y)
> warning: no point within 0.25 inches
> / ??????
>
> I know I can use aq.plot, but I would be very grateful if you could help me
> with identify.
>
> /By the way, in the function, how can the title write the value of the
> variables in spite of "ncol(x)" or "nrow(x)"/
>
> Thanks in advance, user at host.com
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/outlier-identify-in-qqplot-tp4076587p4076587.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Do you want a qqnorm() instead of a qqplot() ? [Reproducibility also
involves posting the code you used that lead to the error / warning]
This seems to work for me:
# mydata <-
source("http://r.789695.n4.nabble.com/file/n4623493/mydata.txt")[[1]]
# Have to drop visible attribute
lmmodel <- lm(log(vdep) ~ v1 + sqrt(v2) + v3 +v5 + v6 + v7 + v8 + v9 +
v10, data = mydata)
qqnorm(residuals(lmmodel))
# Or if interactive:
qqnormInt <- function(..., IDENTIFY = TRUE){
qqnorm(...) -> X
if(IDENTIFY) return(identify(X))
invisisble(X)
}
qqnormInt(residuals(lmmodel))
Michael
On Thu, May 10, 2012 at 9:20 AM, agent dunham <crosspide at hotmail.com>
wrote:> Find the data attached,
>
> http://r.789695.n4.nabble.com/file/n4623493/mydata.txt mydata.txt
>
> The model would be /lmmodel <- lm(log(vdep) ~ v1 + sqrt(v2) + v3 +v5 +
v6 +
> v7 + v8 + v9 + v10, data = mydata)/
>
> Thanks again,
>
>
> user at host.com
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/outlier-identify-in-qqplot-tp4076587p4623493.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.