Corrado
2009-Dec-01 09:35 UTC
[R] Distance between sets of points in transformed environmental space
Dear friends, I have several sets of points in a transformed environmental space. Each set of points can be represented as a cloud in the environmental space. This space is spanned by n coordinates, corresponding to the first n PCs of 36 PCs of some environmental variables (12 monthly minimum temperatures, 12 monthly maximum temperature, 12 monthly precipitations). I would like to calculate a "distance" or dissimilarity between each pair of sets of points. Let's label two of those sets as X,Y, where x is in X and y is in Y. We are interested in defining a distance between X and Y. I have thought of using the following: 1) The Euclidean distance between the centroids of X and Y. Simple and effective but does not give much real information on the actual degree of overlapping. 2) The median of the all the distances between all pairs of points (x,y). Same problem as (1), partially resolved. 3) The proportion of points of X U Y which fall outside the intersection of the convex or concave hulls (defined with a smoothing parameter) of X and Y, i.e. C(X) intersect C(Y). Very complicated, and does not necessarily lead to What do you think? Are there any other approaches worth considering? Kind Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct529 at york.ac.uk
Charlotte Maia
2009-Dec-01 10:19 UTC
[R] Distance between sets of points in transformed environmental space
Well, here's another naive post from me (hopefully better than the last one). Firstly I'm not sure computing euclidean distance is that simple. I would assume temperatures and precipitation would need to be standardised in some way. I think the notion of how far away something is, and how distinct location wise something is, are quite different, so maybe two measures? For distance per se, I think your first idea is the best. Plus simple is always good... For distinctness, given one one of two sets, for each point, you could just compute the closest point to it. If the closest point is a member of the same set, we will call that a + point, if the closest point is a member of the other set, we will call it a - point. In principle the measure of distinctness would be the sum of the +'s, however there might need to be some scaling to take into account the number of points in each set. There are also a lot of fancy things out there, so someone will probably come up with a much fancier (and possibly better) idea than this. Well, that's just my rant, before I go to bed. kind regards -- Charlotte Maia http://sites.google.com/site/maiagx/home
Charlotte Maia
2009-Dec-02 00:53 UTC
[R] Distance between sets of points in transformed environmental space
Hi Corrado, I was thinking about this some more. Maybe you could use a linear discriminate, i.e. a (hyper)plane that partitions your points into two sets, such that the misclassification rate is minimised. Closeness could be regarded as the number of misclassified points. Two sets would be distant, if no points are misclassified. I am assuming there is a standard function in R to do this, no idea what it is though. Plus this is a reasonably well known technique. Again the size of the sets needs to be accounted for. As well as the question, does the distance of set A from B, need to be the same as the distance of set B from A. Both the nearest neighbour approach and the discriminant approach, don't necessarily satisfy this condition. regards -- Charlotte Maia http://sites.google.com/site/maiagx/home