thr3ads.net - R help - [R] Why CLARA clustering method does not give the same classes as when I do clustering manually? [Feb 2016]

If this information is useful, please help other people find it:
Share via:

ABABAEI, Behnam

2016-Feb-19 11:30 UTC

[R] Why CLARA clustering method does not give the same classes as when I do clustering manually?

Hi,


I am using CLARA (in 'cluster' package). This method is supposed to
assign each observation to the closest 'medoid'. But when I calculate
the distance of medoids and observations manually and assign them manually, the
results are slightly different (1-2 percent of occurrence probability). Does
anyone know how clara calculates dissimilarities and why I get different
clustering results?


Behnam.

	[[alternative HTML version deleted]]

Sarah Goslee

2016-Feb-19 19:46 UTC

head link

[R] Why CLARA clustering method does not give the same classes as when I do clustering manually?

clara() is a version of pam() adapted to use large datasets.

pam() uses the entire dataset, and should give results identical to
your manual procedure, or nearly so. clara() works on subsets of the
data, so it may give a slightly different result each time you run it.

The default parameters for clara() are very small, so you can get
substantially different results from run to run on a large dataset if
you don't change them.

Sarah

On Fri, Feb 19, 2016 at 6:30 AM, ABABAEI, Behnam
<Behnam.ABABAEI at limagrain.com> wrote:> Hi,
>
>
> I am using CLARA (in 'cluster' package). This method is supposed to
assign each observation to the closest 'medoid'. But when I calculate
the distance of medoids and observations manually and assign them manually, the
results are slightly different (1-2 percent of occurrence probability). Does
anyone know how clara calculates dissimilarities and why I get different
clustering results?
>
>
> Behnam.

David L Carlson

2016-Feb-21 16:55 UTC

head link

[R] Why CLARA clustering method does not give the same classes as when I do clustering manually?

I do not think this is quite true. When the medoids are not specified, pam/clara
looks for a good initial set (build phase) and then finds a local minimum of the
objective function (swap phase). Both pam/clara and kmeans can find local minima
that are not the global minimum. If the build phase involves any random element,
two runs could produce different results. If not, then the original order of the
data determines the final result, but the final result is not necessarily the
best one possible (assuming the order of the data is irrelevant to the analysis
so we are not looking at observations taken along a line in time or space). That
is why kmeans includes an argument to run the algorithm multiple times and pick
the best result.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Sarah Goslee
Sent: Friday, February 19, 2016 1:47 PM
To: ABABAEI, Behnam
Cc: r-help at r-project.org
Subject: Re: [R] Why CLARA clustering method does not give the same classes as
when I do clustering manually?

clara() is a version of pam() adapted to use large datasets.

pam() uses the entire dataset, and should give results identical to
your manual procedure, or nearly so. clara() works on subsets of the
data, so it may give a slightly different result each time you run it.

The default parameters for clara() are very small, so you can get
substantially different results from run to run on a large dataset if
you don't change them.

Sarah

On Fri, Feb 19, 2016 at 6:30 AM, ABABAEI, Behnam
<Behnam.ABABAEI at limagrain.com> wrote:> Hi,
>
>
> I am using CLARA (in 'cluster' package). This method is supposed to
assign each observation to the closest 'medoid'. But when I calculate
the distance of medoids and observations manually and assign them manually, the
results are slightly different (1-2 percent of occurrence probability). Does
anyone know how clara calculates dissimilarities and why I get different
clustering results?
>
>
> Behnam.
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hennig, Christian

2016-Feb-21 23:17 UTC

head link

[R] Why CLARA clustering method does not give the same classes as when I do clustering manually?

Clara uses the Euclidean distance. 
Why you get different results can only be said if you provide a reproducible
code example for both what you did in clara and what you did
"manually".

Best wishes,
Christian

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
c.hennig at ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

________________________________________
From: R-help <r-help-bounces at r-project.org> on behalf of ABABAEI,
Behnam <Behnam.ABABAEI at limagrain.com>
Sent: 19 February 2016 11:30
To: r-help at r-project.org
Subject: [R] Why CLARA clustering method does not give the same classes as when
I do clustering manually?

Hi,


I am using CLARA (in 'cluster' package). This method is supposed to
assign each observation to the closest 'medoid'. But when I calculate
the distance of medoids and observations manually and assign them manually, the
results are slightly different (1-2 percent of occurrence probability). Does
anyone know how clara calculates dissimilarities and why I get different
clustering results?


Behnam.

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

R help - Feb 2016 - Why CLARA clustering method does not give the same classes as when I do clustering manually?

[R] Why CLARA clustering method does not give the same classes as when I do clustering manually?

[R] Why CLARA clustering method does not give the same classes as when I do clustering manually?

[R] Why CLARA clustering method does not give the same classes as when I do clustering manually?

[R] Why CLARA clustering method does not give the same classes as when I do clustering manually?