Martin Maechler
2010-May-07 10:28 UTC
[R] Cluster procedure using geographical neighborhood
Dear Dario Sacco,>>>>> "DS" == Dario Sacco <dario.sacco at unito.it> >>>>> on Thu, 06 May 2010 17:45:30 +0200 writes:DS> Dear Dr. Maechler, DS> I am an agronomist and a researcher at the University of Turin. I am DS> also teaching "Applied statistics", then I have some knowledge in DS> Statistics, but not in numerical computation. DS> I found your email at the Cran website. DS> At now I am working on segmentation of a GIS database. My problem is DS> that I have a set of points over a region and I need to define sub DS> region characterised by small inside variability. DS> The application seems to apply a hierarchical cluster analysis, but the DS> agglomeration procedure should consider only pairs of clusters or of DS> points that are neighbours. DS> This can be performed deleting the dissimilarities in the dissimilarity DS> matrix (for example calculated with the dist() procedure in R) that DS> refers to pairs of points that are not neighbours. Deleeting is not ok; you should make them "large" in some way. I think you should just define your dissimilarities by *both* the "variability" (your current dist()) *and* the geographical distance, maybe giving much more weight to the geographical distance, something like D_{i,j} := d_{i,j} + w* d~(X_i, X_i) where d_{i,j} are your dist() or daisy() dissimilarities, 'w' is weight factor and d~(u,v) is e.g. the geodesic distance between u and v. I'm CC'ing this to the R-help mailing list, as I think you could get more advice from there. Martin Maechler, ETH Zurich DS> However if I do that the procedure hclust () does not work anymore. DS> Moreover, even if it would work, after the first agglomeration any DS> further agglomeration should take into account only pairs of point or DS> clusters that are geographically neighbour. DS> My idea is to create a procedure able to read the list of pairs of point DS> that are neighbours, and after each agglomeration, indicate to the DS> procedure which pairs are neighbour, but I am not able to understand the DS> source code that I dowloaded from the Cran web site. DS> So, my questions are: DS> could you help me in solving the problem? DS> Or, alternatively, could you send to me the agglomeration procedure DS> applied by R in hcluster() as a programme written in command of R or as DS> a code for Visual Basic. These two programming language are the only two DS> that I am able to understand. DS> Thank you in advance for any suggestion or help you will give me. DS> Best regards, DS> Dario Sacco DS> -- DS> Dr. Dario Sacco DS> Dept. of Agronomy, Forestry and Land Management DS> University of Turin