If I have a vector, x, such that x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35) if I plot that vector plot(x) it is visibly obvious that the data "groups" or "clusters" into distinct groupings. The data trends along a more-or-less linear path, and then an abrupt jump. For a trivial case, such as I have given, you can pick out the groups or categories visually, and manually derive the upper and lower bounds for each group. My question is, is there a function in R that can do the same thing for more complex and subtle groupings in univariate data, and provide a statistical basis for the result? Allen
On Mon, 21 Feb 2005, Allen Hathaway wrote:> If I have a vector, x, such that > > x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35) > > if I plot that vector > > plot(x) > > it is visibly obvious that the data "groups" or "clusters" into distinct > groupings. The data trends along a more-or-less linear path, and then an > abrupt jump. For a trivial case, such as I have given, you can pick out the > groups or categories visually, and manually derive the upper and lower > bounds for each group. My question is, is there a function in R that can do > the same thing for more complex and subtle groupings in univariate data, and > provide a statistical basis for the result?Maybe breakpoints() in package strucchange can be of help. It looks for breaks in linear regression relationships over a certain ordering of the variables. For the data above: ## setup index variable idx <- seq(along = x) ## find breaks in linear trend model library(strucchange) bp <- breakpoints(x ~ idx) ## visualize fitted model plot(x) lines(fitted(bp)) See help(breakpoints) for further information and references for the underlying theory. hth, Z
Allen Hathaway <hathaway <at> sover.net> writes: : : If I have a vector, x, such that : : x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35) : : if I plot that vector : : plot(x) : : it is visibly obvious that the data "groups" or "clusters" into distinct : groupings. The data trends along a more-or-less linear path, and then an : abrupt jump. For a trivial case, such as I have given, you can pick out the : groups or categories visually, and manually derive the upper and lower : bounds for each group. My question is, is there a function in R that can do : the same thing for more complex and subtle groupings in univariate data, and : provide a statistical basis for the result? If the actual data is exactly linear and increasing, as with this example, then the breakpoints are at points of positive acceleration, thus which(diff(x, diff = 2)>0) + 2 gives the indices of the breakpoints.
x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35) require(cluster) pam(x,5) Medoids: [,1] [1,] 3 [2,] 10 [3,] 17 [4,] 23 [5,] 34 Clustering vector: [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 5 5 5 Objective function: build swap 1.285714 1.047619 Available components: [1] "medoids" "clustering" "objective" "isolation" "clusinfo" "silinfo" "diss" "call" "data" Does this help?> -----Original Message----- > From: Allen Hathaway [mailto:hathaway at sover.net] > Sent: Tuesday, 22 February 2005 8:48 AM > To: r-help list > Subject: [R] Categories or clusters for univariate data > > > If I have a vector, x, such that > > x <- c(1,2,3,4,5,8,9,10,11,12,15,16,17,18,19,22,23,24,33,34,35) > > if I plot that vector > > plot(x) > > it is visibly obvious that the data "groups" or "clusters" > into distinct > groupings. The data trends along a more-or-less linear path, > and then an > abrupt jump. For a trivial case, such as I have given, you > can pick out the > groups or categories visually, and manually derive the upper > and lower > bounds for each group. My question is, is there a function > in R that can do > the same thing for more complex and subtle groupings in > univariate data, and > provide a statistical basis for the result? > > Allen > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >