thr3ads.net - R help - [R] statistical significance test for cluster agreement [Mar 2004]

If this information is useful, please help other people find it:
Share via:

Alexander Sirotkin [at Yahoo]

2004-Mar-23 23:27 UTC

[R] statistical significance test for cluster agreement

I was wondering, whether there is a way to have
statistical significance test for cluster agreement.

I know that I can use classAgreement() function to get
Rand index, which will give me some indication whether
the clusters agree or not, but it would be interesting
to have a formal test.

Thanks.

Duncan Murdoch

2004-Mar-24 02:30 UTC

head link

[R] statistical significance test for cluster agreement

On Tue, 23 Mar 2004 15:27:14 -0800 (PST), you wrote:
>I was wondering, whether there is a way to have
>statistical significance test for cluster agreement.
>
>I know that I can use classAgreement() function to get
>Rand index, which will give me some indication whether
>the clusters agree or not, but it would be interesting
>to have a formal test.
Why not simulate data from your hypothesized null distribution,
cluster it, and see how your dataset's index value compares to the
simulated ones?

Duncan Murdoch

Liaw, Andy

2004-Mar-24 02:34 UTC

head link

[R] statistical significance test for cluster agreement

But what would such a test do that the rand index does not?  Would you
interpret the p-value from such a test, if exists, to have the meaning that
a real test of hypothesis has?  AFAIK you basically need to have the
hypotheses pinned down even before you see any data, for the inference to be
valid.  Is that possible with clustering?

Just my $0.02...
Andy
> From: Alexander Sirotkin [at Yahoo]
> 
> I was wondering, whether there is a way to have
> statistical significance test for cluster agreement.
> 
> I know that I can use classAgreement() function to get
> Rand index, which will give me some indication whether
> the clusters agree or not, but it would be interesting
> to have a formal test.
> 
> Thanks.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 

------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}

Liaw, Andy

2004-Mar-24 11:29 UTC

head link

[R] statistical significance test for cluster agreement

[Apology to the list for the off-topic rant...]

As it turned out, I also have a problem with LOF/GOL/etc. tests:  I'd bet
most of the time when such a test is carried out, it is _not_ the only test
being done, but the p-values in the downstream analysis are almost never
adjusted for this.  How valid would the p-values be?

IMHO, it's bad enough that users of statistical methods do things like this,
but it's quite something else that statisticians do just the same, or even
promote such tests.  It's not a crime to do analysis like that, but to treat
the p-values as if they actually are meaningful probably ought to be
outlawed.

OK, I better run for cover now...

Andy
> From: Alexander Sirotkin [at Yahoo] [mailto:alex_s_42 at yahoo.com] 
> 
> Like you said, such kind of test will not give me
> anything that Rand index does not, except for p-value.
> 
> The null hypothesis, in my case, is that clustering
> results does not match a different clustering, that
> someone alse did on the same data.
> 
> And I do believe that this hypothesis is valid.
> Basicly, it's not that different from chi-squared
> goodness of fit test which check whether or not my 
> data comes from particular distribution. With an 
> exception that I don't know how to do chi-squared test
> in this case :)
> 
> 
> 
> --- "Liaw, Andy" <andy_liaw at merck.com> wrote:
> > But what would such a test do that the rand index
> > does not?  Would you
> > interpret the p-value from such a test, if exists,
> > to have the meaning that
> > a real test of hypothesis has?  AFAIK you basically
> > need to have the
> > hypotheses pinned down even before you see any data,
> > for the inference to be
> > valid.  Is that possible with clustering?
> > 
> > Just my $0.02...
> > Andy
> > 
> > > From: Alexander Sirotkin [at Yahoo]
> > > 
> > > I was wondering, whether there is a way to have
> > > statistical significance test for cluster
> > agreement.
> > > 
> > > I know that I can use classAgreement() function to
> > get
> > > Rand index, which will give me some indication
> > whether
> > > the clusters agree or not, but it would be
> > interesting
> > > to have a formal test.
> > > 
> > > Thanks.
> > > 
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > >
> >
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide! 
> > > http://www.R-project.org/posting-guide.html
> > > 
> > > 
> > 
> > 
> >
> --------------------------------------------------------------
> ----------------
> > Notice:  This e-mail message, together with any
> > attachments, contains
> > information of Merck & Co., Inc. (One Merck Drive,
> > Whitehouse Station, New
> > Jersey, USA 08889), and/or its affiliates (which may
> > be known outside the
> > United States as Merck Frosst, Merck Sharp & Dohme
> > or MSD and in Japan as
> > Banyu) that may be confidential, proprietary
> > copyrighted and/or legally
> > privileged. It is intended solely for the use of the
> > individual or entity
> > named on this message.  If you are not the intended
> > recipient, and have
> > received this message in error, please notify us
> > immediately by reply e-mail
> > and then delete it from your system.
> >
> --------------------------------------------------------------
> ----------------
> 
> 
> __________________________________
>

Liaw, Andy

2004-Mar-25 03:00 UTC

head link

[R] statistical significance test for cluster agreement

> From: Alexander Sirotkin [at Yahoo] [mailto:alex_s_42 at yahoo.com] 
> 
> Christian,
> 
> I think I understand your point, but I do not
> completely agree with you. I also did not describe 
> my problem clear enough.
> 
> > If you see two
> > clusterings on the same
> > data, they are identical, if they are 100%
> > identical, and if not, then
> > not. 
> 
> What you are actually saying is that all values of 
> Rand index for cluster agreement other then 1 
> inidicate that clusters do not agree. I believe
> that many people would disagree with this statement.
> 
> Let me explain my problem in a little bit more detail.
> 
> I have some classified data set. These classes were 
> ontained using non-statistical methods. What I'm
> trying
> to do is run some clustering algorithm and compare
> it's results to this known classification.
> 
> I think that this is not very different from
> calculating mean and comparing it to some known value.
AFAICS they are most definitely not the same.  The hypotheses in statistical
tests are about `true', unknown, population mean, not the sample mean
observed in the data.  What exactly would be the hypotheses you intend to
test?  If you are testing whether the clustering algorithm produces
something that disagree with the non-statistical classification, then one
disagreement would have settled it, no?  Before you think about what
statistic to use, do try to figure out how you would write the null and
alternative hypotheses, mathematically.

Andy

 > I think that is should be theoretically possible to
> use
> Rand index as a test statistic. 
> 
> Or maybe I'm missing something...
> 
> __________________________________
> 
> 

------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}

Maybe Matching Threads

Search for more apparently analagous threads

R help - Mar 2004 - statistical significance test for cluster agreement

[R] statistical significance test for cluster agreement

[R] statistical significance test for cluster agreement

[R] statistical significance test for cluster agreement

[R] statistical significance test for cluster agreement

[R] statistical significance test for cluster agreement

Maybe Matching Threads