Hi people, Does anybody know some Density-Based Method for clustering implemented in R? Thanks, Fernando Prass _______________________________________________________
Kjetil Brinchmann Halvorsen
2004-Oct-21 12:34 UTC
[R] Cluster Analysis: Density-Based Method
Fernando Prass wrote:>Hi people, > >Does anybody know some Density-Based Method for clustering implemented in R? > >Have you looked at CRAN package mclust?>Thanks, > >Fernando Prass > > >_______________________________________________________ > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > > >-- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra
I'm no expert in this, but mclust is `density-based' because it estimates the density with a mixture of Gaussians. If this is not what you want, you should clarify what you mean by `density-based'. Do you mean an algorithm based on kernel estimator of the density? Andy> From: Fernando Prass > > Yes, but mclust don't have a density-based algorithm. Mclust > have the algorithm > BIC, that is a model-based method... > > Fernando Prass > > --- Kjetil Brinchmann Halvorsen <kjetil at acelerate.com> escreveu: > > Fernando Prass wrote: > > > > >Hi people, > > > > > >Does anybody know some Density-Based Method for clustering > implemented in R? > > > > > > > > Have you looked at CRAN package mclust? > > > > >Thanks, > > > > > >Fernando Prass > > > > > > > _______________________________________________________ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
> From: Martin Maechler > > >>>>> "AndyL" == Liaw, Andy <andy_liaw at merck.com> > >>>>> on Thu, 21 Oct 2004 09:18:54 -0400 writes: > > AndyL> I'm no expert in this, but mclust is `density-based' > AndyL> because it estimates the density with a mixture of > AndyL> Gaussians. If this is not what you want, you should > AndyL> clarify what you mean by `density-based'. Do you > AndyL> mean an algorithm based on kernel estimator of the density? > > yes, kernel or other "nonparametric" density estimator, is what > is usually meant in these contexts. > [ Of course, many "nonparametric" estimators can be seen to live > in finite-dimensional spaces, so the difference to an explicit > "flexible" / "high dimensional" method isn't that big.. ] > > MartinYes. However, after reading ftp://ftp.stat.rice.edu/pub/scottdw/TECH/ipra.ps (David Scott's `From Kernels to Mixtures' published in Technometrics in 2000, I believe the Tukey memorial issue) I thought the line between kernel densities and mixture models is rather gray... Best, Andy> >> From: Fernando Prass > >> > >> Yes, but mclust don't have a density-based algorithm. Mclust > >> have the algorithm > >> BIC, that is a model-based method... > >> > >> Fernando Prass > >> > >> --- Kjetil Brinchmann Halvorsen <kjetil at acelerate.com> > escreveu: > >> > Fernando Prass wrote: > >> > > >> > >Hi people, > >> > > > >> > >Does anybody know some Density-Based Method for clustering > >> implemented in R? > >> > > > >> > > > >> > Have you looked at CRAN package mclust? > >> > > >> > >Thanks, > >> > > > >> > >Fernando Prass > >> > >> > >> > >> > >> > >> > >> _______________________________________________________ > >> > >> ______________________________________________ > >> R-help at stat.math.ethz.ch mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide! > >> http://www.R-project.org/posting-guide.html > >> > >> > > AndyL> ______________________________________________ > AndyL> R-help at stat.math.ethz.ch mailing list > AndyL> https://stat.ethz.ch/mailman/listinfo/r-help > AndyL> PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Andy, I can be wrong, I'm no expert too, but density estimation is different of density-model. MClust is a model-basead method because use model statistics from clustering data (more information in ftp://ftp.u.washington.edu/public/mclust/tr415R.pdf). I need some package that implement algorithms like OPTICIS, DBSCAN or DENCLUE... Fernando Prass> --- "Liaw, Andy" <andy_liaw at merck.com> escreveu: > > I'm no expert in this, but mclust is `density-based' because it estimates > > the density with a mixture of Gaussians. If this is not what you want, you > > should clarify what you mean by `density-based'. Do you mean an algorithm > > based on kernel estimator of the density? > > > > Andy > > > > > From: Fernando Prass > > > > > > Yes, but mclust don't have a density-based algorithm. Mclust > > > have the algorithm > > > BIC, that is a model-based method... > > > > > > Fernando Prass > > >
Dear Christian, Do you have DBSCAN algorithm in R? Can you send me? I need it for my dissertation (ms degree). Thanks, Fernando --- Christian Hennig <fm3a004 at math.uni-hamburg.de> escreveu:> Dear Fernando, > > below you find a DBSCAN function I wrote for my own purposes. > It comes with no warranty and without proper documentation, but I followed > the notation of the original KDD-96 DBSCAN paper. > For large data sets, it may be slow. > > Best, > Christian > > On Thu, 21 Oct 2004, Fernando Prass wrote: > > > No, kmeans is a partition method. I need a model-based method, like DBSCAN > or > > DENCLUE algorithm... > > > > Fernando Prass > > distvector <- function(x,data){ > ddata <- t(data)-x > dv <- apply(ddata^2,2,sum) > } > > # data may be nxp or distance matrix > # eps is the dbscan distance cutoff parameter > # MinPts is the minimum size of a cluster > # scale: Should the data be scaled? > # distances: has to be TRUE if data is a distance matrix > # showplot: Should the computation process be visualized? > # countmode: dbscan gives messages when processing point no. (countmode) > dbscan <- function(data,eps,MinPts=5, scale=FALSE, distances=FALSE, > showplot=FALSE, > countmode=c(1,2,3,5,10,100,1000,5000,10000,50000)){ > data <- as.matrix(data) > n <- nrow(data) > if (scale) data <- scale(data) > unregpoints <- rep(0,n) > e2 <- eps^2 > cv <- rep(0,n) > cn <- 0 > i <- 1 > for (i in 1:n){ > if (i %in% countmode) cat("Processing point ", i," of ",n, ".\n") > unclass <- cv<1 > if (cv[i]==0){ > if (distances) seeds <- data[i,]<=eps > else{ > seeds <- rep(FALSE,n) > seeds[unclass] <- distvector(data[i,],data[unclass,])<=e2 > } > if (sum(seeds)+unregpoints[i]<MinPts) cv[i] <- (-1) > else{ > cn <- cn+1 > cv[i] <- cn > seeds[i] <- unclass[i] <- FALSE > unregpoints[seeds] <- unregpoints[seeds]+1 > while (sum(seeds)>0){ > if (showplot) plot(data,col=1+cv) > unclass[seeds] <- FALSE > cv[seeds] <- cn > ap <- (1:n)[seeds] > # print(ap) > seeds <- rep(FALSE,n) > for (j in ap){ > # if (showplot) plot(data,col=1+cv) > jseeds <- rep(FALSE,n) > if (distances) jseeds[unclass] <- data[j,unclass]<=eps > else{ > jseeds[unclass] <- distvector(data[j,],data[unclass,])<=e2 > } > unregpoints[jseeds] <- unregpoints[jseeds]+1 > # if (cn==1) > # cat(j," sum seeds=",sum(seeds)," unreg=",unregpoints[j], > # " newseeds=",sum(cv[jseeds]==0),"\n") > if (sum(jseeds)+unregpoints[j]>=MinPts){ > seeds[jseeds] <- cv[jseeds]==0 > cv[jseeds & cv<0] <- cn > } > } # for j > } # while sum seeds>0 > } # else (sum seeds + ... >= MinPts) > } # if cv==0 > } # for i > if (sum(cv==(-1))>0){ > noisenumber <- cn+1 > cv[cv==(-1)] <- noisenumber > } > else > noisenumber <- FALSE > out <- list(classification=cv, noisenumber=noisenumber, > eps=eps, MinPts=MinPts, unregpoints=unregpoints) > out > } # dbscan > # classification: classification vector > # noisenumber: number in the classification vector indicating noise points > # unregpoints: ignore... > > *********************************************************************** > Christian Hennig > Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg > hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/ > ####################################################################### > ich empfehle www.boag-online.de > >