greatest.possible.newbie
2012-May-11 12:28 UTC
[R] identify() doesn't return "true" numbers
Dear R community. I am using the identify() function to identify outliers in my dataset. This is the code I am using: #################################################################### # Function to allow identifying points in the QQ plot (by mouseclicking) qqInteractive <- function(..., IDENTIFY = TRUE) { qqplot(...) -> X abline(a=0,b=1) if(IDENTIFY) return(identify(X)) invisisble(X) } qqplot.mv.interactive <- function (data, xlim=NULL, ylim=NULL) { x <- as.matrix(data) # n x p numeric matrix center <- colMeans(x) # centroid n <- nrow(x); p <- ncol(x); cov <- cov(x); d <- mahalanobis(x,center,cov) # distances qqInteractive(qchisq(ppoints(n),df=p),d, # ppoints(n) makes a sequence from 0 to 1. with stepsize 1/n main="QQ Plot Assessing Multivariate Normality", # qchisq() makes a chi squared distribution function for the given probabilities in ppoints(n) and degress of freedom df ylab="Mahalanobis D2", xlim=xlim, ylim=ylim) #abline(a=0,b=1) } y <- c((1:100)+rnorm(100, sd=100)) x <- c(1:100) windows();qqInteractive(x,y) #################################################################### When i click the points in the graph identify() only returns the number of the points in the order they are lying on the X-axis. Let's say I mark the point in the upper right corner, identify() will return 100. But what I want is the number in the original dataset y. Lets say the point was at y[87]. Otherwise I wont be able to remove this point from my original dataset. I hope you understand my problem. I apreciate every help. Regards, Daniel Hoop -- View this message in context: http://r.789695.n4.nabble.com/identify-doesn-t-return-true-numbers-tp4626273.html Sent from the R help mailing list archive at Nabble.com.
Daniel, There are a few ways to deal with this. You could sort your data by y before you apply these functions. Then the point labelled 100 will be the 100th row in the data frame. df <- data.frame(x=1:100, y=(1:100)+rnorm(100, sd=100) df2 <- df[order(df$y), ] windows() qqInteractive(df$x, df$y) You could modify the code to keep track of the original row numbers. (No example given.) You could use your code pretty much as you have it, then convert the numbers you see on the plot back to the original row in the data frame. This would be made a bit easier if you let your function keep the identified points qqInteractive.v2 <- function(..., IDENTIFY = TRUE) { X <- qqplot(...) abline(a=0, b=1) if(IDENTIFY) identify(X) } y <- 1:100 + rnorm(100, sd=100) x <- 1:100 id.pts <- qqInteractive.v2(x, y) seq(y)[is.element(rank(y), id.pts)] Jean "greatest.possible.newbie" <daniel.hoop@gmx.net> wrote on 05/11/2012 07:28:43 AM:> Dear R community. > > I am using the identify() function to identify outliers in my dataset. > This is the code I am using: > > #################################################################### > # Function to allow identifying points in the QQ plot (bymouseclicking)> qqInteractive <- function(..., IDENTIFY = TRUE) > { > qqplot(...) -> X > abline(a=0,b=1) > if(IDENTIFY) return(identify(X)) > invisisble(X) > } > > qqplot.mv.interactive <- function (data, xlim=NULL, ylim=NULL) > { > x <- as.matrix(data) # n x p numeric matrix > center <- colMeans(x) # centroid > n <- nrow(x); p <- ncol(x); cov <- cov(x); > d <- mahalanobis(x,center,cov) # distances > qqInteractive(qchisq(ppoints(n),df=p),d, # ppoints(n)makes> a sequence from 0 to 1. with stepsize 1/n > main="QQ Plot Assessing Multivariate Normality", # qchisq()makes a> chi squared distribution function for the given probabilities inppoints(n)> and degress of freedom df > ylab="Mahalanobis D2", xlim=xlim, ylim=ylim) > #abline(a=0,b=1) > } > > y <- c((1:100)+rnorm(100, sd=100)) > x <- c(1:100) > windows();qqInteractive(x,y) > #################################################################### > > When i click the points in the graph identify() only returns the numberof> the points in the order they are lying on the X-axis. Let's say I markthe> point in the upper right corner, identify() will return 100. But what Iwant> is the number in the original dataset y. Lets say the point was aty[87].> Otherwise I wont be able to remove this point from my original dataset. > > I hope you understand my problem. I apreciate every help. > Regards, Daniel Hoop[[alternative HTML version deleted]]