If I have a vector, x, such that x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35) if I plot that vector plot(x) it is visibly obvious that the data "groups" or "clusters" into distinct groupings. The data trends along a more-or-less linear path, and then an abrupt jump. For a trivial case, such as I have given, you can pick out the groups or categories visually, and manually derive the upper and lower bounds for each group. My question is, is there a function in R that can do the same thing for more complex and subtle groupings in univariate data, and provide a statistical basis for the result? Allen
On Mon, 21 Feb 2005, Allen Hathaway wrote:> If I have a vector, x, such that > > x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35) > > if I plot that vector > > plot(x) > > it is visibly obvious that the data "groups" or "clusters" into distinct > groupings. The data trends along a more-or-less linear path, and then an > abrupt jump. For a trivial case, such as I have given, you can pick out the > groups or categories visually, and manually derive the upper and lower > bounds for each group. My question is, is there a function in R that can do > the same thing for more complex and subtle groupings in univariate data, and > provide a statistical basis for the result?Maybe breakpoints() in package strucchange can be of help. It looks for breaks in linear regression relationships over a certain ordering of the variables. For the data above: ## setup index variable idx <- seq(along = x) ## find breaks in linear trend model library(strucchange) bp <- breakpoints(x ~ idx) ## visualize fitted model plot(x) lines(fitted(bp)) See help(breakpoints) for further information and references for the underlying theory. hth, Z
Allen Hathaway <hathaway <at> sover.net> writes: : : If I have a vector, x, such that : : x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35) : : if I plot that vector : : plot(x) : : it is visibly obvious that the data "groups" or "clusters" into distinct : groupings. The data trends along a more-or-less linear path, and then an : abrupt jump. For a trivial case, such as I have given, you can pick out the : groups or categories visually, and manually derive the upper and lower : bounds for each group. My question is, is there a function in R that can do : the same thing for more complex and subtle groupings in univariate data, and : provide a statistical basis for the result? If the actual data is exactly linear and increasing, as with this example, then the breakpoints are at points of positive acceleration, thus which(diff(x, diff = 2)>0) + 2 gives the indices of the breakpoints.
x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35)
require(cluster)
pam(x,5)
Medoids:
[,1]
[1,] 3
[2,] 10
[3,] 17
[4,] 23
[5,] 34
Clustering vector:
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 5 5 5
Objective function:
build swap
1.285714 1.047619
Available components:
[1] "medoids" "clustering" "objective"
"isolation" "clusinfo" "silinfo"
"diss" "call" "data"
Does this help?
> -----Original Message-----
> From: Allen Hathaway [mailto:hathaway at sover.net]
> Sent: Tuesday, 22 February 2005 8:48 AM
> To: r-help list
> Subject: [R] Categories or clusters for univariate data
>
>
> If I have a vector, x, such that
>
> x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35)
>
> if I plot that vector
>
> plot(x)
>
> it is visibly obvious that the data "groups" or
"clusters"
> into distinct
> groupings. The data trends along a more-or-less linear path,
> and then an
> abrupt jump. For a trivial case, such as I have given, you
> can pick out the
> groups or categories visually, and manually derive the upper
> and lower
> bounds for each group. My question is, is there a function
> in R that can do
> the same thing for more complex and subtle groupings in
> univariate data, and
> provide a statistical basis for the result?
>
> Allen
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>