Sébastien Bihorel
2011-Dec-06 08:29 UTC
[R] How to automate the detection of break points for use in cut
Dear R-users, I would like to know if there is a function (in base R or the extension packages) that would automatically detect the break points in a vector x for later use in the cut function. The idea is to determine the boundaries of the n intervals (n>=1) delimiting clusters of data points which could be considered "reasonably" close, given a numerical vector x with unknown content and unknown multimodal distribution. For instance, given for the vector x defined by set.seed(1234); x <- sort(c(rnorm(20,-1,0.1),rnorm( 10,5,0.1),rnorm(10,100,0.1))), this function would return a vector of 4 points: min(x), one value between 20 and 5, one value between 5 and 100, and max(x). Thank you in advance for your suggestions. Sebastien [[alternative HTML version deleted]]
Sébastien Bihorel
2011-Dec-06 08:34 UTC
[R] How to automate the detection of break points for use in cut
Obviously, cut would do the job if one knows the number of intervals in advance, which I assume I won't. I guess what I'm looking for is a function that figures out the number of intervals and their boundaries. Sebastien On Tue, Dec 6, 2011 at 3:29 AM, Sébastien Bihorel <pomchip@free.fr> wrote:> Dear R-users, > > I would like to know if there is a function (in base R or the extension > packages) that would automatically detect the break points in a vector x > for later use in the cut function. The idea is to determine the boundaries > of the n intervals (n>=1) delimiting clusters of data points which could be > considered "reasonably" close, given a numerical vector x with unknown > content and unknown multimodal distribution. > > For instance, given for the vector x defined by set.seed(1234); x <- > sort(c(rnorm(20,-1,0.1),rnorm( > 10,5,0.1),rnorm(10,100,0.1))), this function would return a vector of 4 > points: min(x), one value between 20 and 5, one value between 5 and 100, > and max(x). > > Thank you in advance for your suggestions. > > Sebastien >[[alternative HTML version deleted]]
Sébastien Bihorel
2011-Dec-06 15:15 UTC
[R] How to automate the detection of break points for use in cut
I forgot to post my reply to Duncan's response on the list. On Tue, Dec 6, 2011 at 7:56 AM, Sébastien Bihorel <pomchip@free.fr> wrote:> Thanks for the link Duncan, > > Given the number of methods and links listed in the Cluster task view, > things are looking a bit more complex than I thought... I'll have to read > more about clustering before I can start testing. > > Sebastien > > > On Tue, Dec 6, 2011 at 7:28 AM, Duncan Murdoch <murdoch.duncan@gmail.com>wrote: > >> On 11-12-06 3:34 AM, Sébastien Bihorel wrote: >> >>> Obviously, cut would do the job if one knows the number of intervals in >>> advance, which I assume I won't. I guess what I'm looking for is a >>> function >>> that figures out the number of intervals and their boundaries. >>> >> >> That's not really a simple problem, but there are functions that do >> clustering and fit mixture models to data, which might be close enough. >> See the Cluster task view at http://cran.r-project.org/web/** >> views/Cluster.html <http://cran.r-project.org/web/views/Cluster.html>. >> >> Duncan Murdoch >> >> >>> Sebastien >>> >>> On Tue, Dec 6, 2011 at 3:29 AM, Sébastien Bihorel<pomchip@free.fr> >>> wrote: >>> >>> Dear R-users, >>>> >>>> I would like to know if there is a function (in base R or the extension >>>> packages) that would automatically detect the break points in a vector x >>>> for later use in the cut function. The idea is to determine the >>>> boundaries >>>> of the n intervals (n>=1) delimiting clusters of data points which >>>> could be >>>> considered "reasonably" close, given a numerical vector x with unknown >>>> content and unknown multimodal distribution. >>>> >>>> For instance, given for the vector x defined by set.seed(1234); x<- >>>> sort(c(rnorm(20,-1,0.1),rnorm( >>>> 10,5,0.1),rnorm(10,100,0.1))), this function would return a vector of 4 >>>> points: min(x), one value between 20 and 5, one value between 5 and 100, >>>> and max(x). >>>> >>>> Thank you in advance for your suggestions. >>>> >>>> Sebastien >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> >>> ______________________________**________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >>> PLEASE do read the posting guide http://www.R-project.org/** >>> posting-guide.html <http://www.R-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >[[alternative HTML version deleted]]