I'm not certain what you are asking. PLEASE do read the posting
guide! "http://www.R-project.org/posting-guide.html". If you
formulate
your question in terms of a simple example, showing where you got stuck
as suggested in the posting guide, it might help others understand your
question and inspire suggestions.
TINSTAFL = There is no such thing as a free lunch (Heinlein, The Moon
is a Harsh Mistress)
spencer graves
Weiwei Shi wrote:
> Dear listers:
> I have an idea to do the outlier detection and I need to use R to
> implement it first. Here I hope I can get some input from all the
> guru's here.
>
> I select distance-based approach---
> step 1:
> calculate the distance of any two rows for a dataframe. considering
> the scaling among different variables, I choose mahalanobis, using
> variance as scaler.
>
> step 2:
> Let k be the number of points in one "cluster". K is decided by
> answering the following question: how many neighbors a point needs for
> not being an outlier.
>
> for each point, get the smallest (k-1) distances from step1. Among
> the (k-1) distances of each point, get the max for the point.
>
> step 3:
> get the distribution of those max for all the points. Thus, the
> multivariate problem becomes a univariate one. Then the outlier in
> those max's will define the outlier of the point.
>
> My question is:
> 1. I don't know if using mahalanobis is proper or not since most
> clustering algorithms implemented in R (like pam or clara) use
> euclidean or mahattan.
> 2. Is there a way to get the mahalanobis distance matrix for any two
> rows of a dataframe or matrix?
> 3. My approach does allow a point belonging to more than one
> k-cluster. Is there similar algorithm in R or published?
>
> Thanks for any suggestions,
>
> weiwei
--
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA
spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel: 408-938-4420
Fax: 408-280-7915