thr3ads.net - R help - [R] Using statistical test to distinguish two groups [May 2010]

If this information is useful, please help other people find it:
Share via:

Ralf B

2010-May-05 17:21 UTC

[R] Using statistical test to distinguish two groups

Hi R friends,

I am posting this question even though I know that the nature of it is
closer to general stats than R. Please let me know if you are aware of
a list for general statistical questions:

I am looking for a simple method to distinguish two groups of data in
a long vector of numbers:

list <- c(1,2,3,2,3,2,3,4,3,2,3,4,3,2,400,340,3,2,4,5,6,4,3,6,4,5,3)

I would like to 'learn' that 400,430 are different numbers by using a
simple approach.The outcome of processing 'list' should therefore be:

listA <- c(1,2,3,2,3,2,3,4,3,2,3,4,3,2,3,2,4,5,6,4,3,6,4,5,3)
listB <- c(400,340)

I am thinking a non-parametric test since I have no knowledge of the
underlying distribution. The numbers are time differences between two
actions recorded from a the same person over time. Because the data
was obtained from the same person I would naturally tend to use
Wilcoxon Signed-Rank test. Any thoughts on that?

Are there any R packages that would process such a vector and use
non-parametric methods to split or divide groups based on their
values? Could clustering be the answer given that I already know that
I always have two groups with a significant difference between the
two.

Thanks a lot,
Ralf

Erik Iverson

2010-May-05 17:32 UTC

head link

[R] Using statistical test to distinguish two groups

One of many possible approaches is called k-means clustering.

my.data <- c(1,2,3,2,3,2,3,4,3,2,3,4,3,2,400,340,3,2,4,5,6,4,3,6,4,5,3)
split(my.data, kmeans(my.data, 2)$cluster)

$`1`
[1] 400 340

$`2`
  [1] 1 2 3 2 3 2 3 4 3 2 3 4 3 2 3 2 4 5 6 4 3 6 4 5 3

Ralf B wrote:> Hi R friends,
> 
> I am posting this question even though I know that the nature of it is
> closer to general stats than R. Please let me know if you are aware of
> a list for general statistical questions:
> 
> I am looking for a simple method to distinguish two groups of data in
> a long vector of numbers:
> 
> list <- c(1,2,3,2,3,2,3,4,3,2,3,4,3,2,400,340,3,2,4,5,6,4,3,6,4,5,3)
> 
> I would like to 'learn' that 400,430 are different numbers by using
a
> simple approach.The outcome of processing 'list' should therefore
be:
> 
> listA <- c(1,2,3,2,3,2,3,4,3,2,3,4,3,2,3,2,4,5,6,4,3,6,4,5,3)
> listB <- c(400,340)
> 
> I am thinking a non-parametric test since I have no knowledge of the
> underlying distribution. The numbers are time differences between two
> actions recorded from a the same person over time. Because the data
> was obtained from the same person I would naturally tend to use
> Wilcoxon Signed-Rank test. Any thoughts on that?
> 
> Are there any R packages that would process such a vector and use
> non-parametric methods to split or divide groups based on their
> values? Could clustering be the answer given that I already know that
> I always have two groups with a significant difference between the
> two.
> 
> Thanks a lot,
> Ralf
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Achim Zeileis

2010-May-05 17:35 UTC

head link

[R] Using statistical test to distinguish two groups

On Wed, 5 May 2010, Ralf B wrote:
> Hi R friends,
>
> I am posting this question even though I know that the nature of it is
> closer to general stats than R. Please let me know if you are aware of
> a list for general statistical questions:
>
> I am looking for a simple method to distinguish two groups of data in
> a long vector of numbers:
>
> list <- c(1,2,3,2,3,2,3,4,3,2,3,4,3,2,400,340,3,2,4,5,6,4,3,6,4,5,3)
>
> I would like to 'learn' that 400,430 are different numbers by using
a
> simple approach.
It seems that you want to cluster the data. There are, of course, loads of 
clustering algorithms around, see e.g.,
   http://CRAN.R-project.org/view=Cluster

In this simple example a standard hierarchical clustering approach shows 
you what you're after.

## data
list <- c(1,2,3,2,3,2,3,4,3,2,3,4,3,2,400,340,3,2,4,5,6,4,3,6,4,5,3)

## cluster using Ward method for Euclidian distances
hc <- hclust(dist(list, method = "euclidian"), method =
"ward")
plot(hc)
hc

## cut into two clusters
split(list, cutree(hc, k = 2))

hth,
Z
> The outcome of processing 'list' should therefore be:
>
> listA <- c(1,2,3,2,3,2,3,4,3,2,3,4,3,2,3,2,4,5,6,4,3,6,4,5,3)
> listB <- c(400,340)
>
> I am thinking a non-parametric test since I have no knowledge of the
> underlying distribution. The numbers are time differences between two
> actions recorded from a the same person over time. Because the data
> was obtained from the same person I would naturally tend to use
> Wilcoxon Signed-Rank test. Any thoughts on that?
>
> Are there any R packages that would process such a vector and use
> non-parametric methods to split or divide groups based on their
> values? Could clustering be the answer given that I already know that
> I always have two groups with a significant difference between the
> two.
>
> Thanks a lot,
> Ralf
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more maybe matching threads

R help - May 2010 - Using statistical test to distinguish two groups

[R] Using statistical test to distinguish two groups

[R] Using statistical test to distinguish two groups

[R] Using statistical test to distinguish two groups

Apparently Analagous Threads