Hi, I should preface this problem with a statement that although I am sure this is a really easy function to write, I have tried and failed to get my head around writing functions in R. I can use R where functions exist to do what I want done, but have found myself completely incapable of writing them myself. The problem is that I have a table with several rows of species and several columns of trait data for each species. Now what I want to do is, for each possible pair of species, extract the Euclidean distance between them based on specified trait data columns. While as far as I can see the dist() function could manage this to some extent for 2 dimensions (traits) for each species, I need a more generalised function that can handle n-dimensions. Ideally this function would allow me to choose which columns (traits) to use to calculate the Euclidean distance rather than having to reformat the dataset every time. In the hope of clarifying this with a simplified example, I want to take a dataset like this: Species x y z n spA 2.9 34.2 0.54 15.7 spB 5.5 46.5 0.45 19.4 spC 1.4 48.6 0.84 24.8 spD 8.3 56.1 0.48 21.3 Then extract the Euclidean distances using the general equation d=sqrt[(x2-x1)^2+(y2-y1)^2+...+(n2-n1)^2] for particular data columns. So in this example I might want the distances using the traits x, z and n, thereby specifying the equation to be d=sqrt[(x2-x1)^2+(z2-z1)^2+(n2-n1)^2], and return a distance matrix as follows (calculated distances represented by . for the purposes of this example): Species spA spB spC spB . spC . . spD . . . I hope this makes sense. I only presume that this would be a quick and easy function to write on the basis that the underlying process is basically simple maths repeated for each pair of species. Again I have no experience in writing custom functions (no matter how simple) and just can't seem to get into my head how to go about it. I look forward to your response and hope someone gets bored enough to quickly write out the code to implement this function. Thank you in advance. Best wishes, Kev -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-distance-function-tp4641177.html Sent from the R help mailing list archive at Nabble.com.
Hello, You don't need to write a function. Try the following. nms <- paste0("species", 1:4) mat <- matrix(rnorm(16), ncol=4, dimnames = list(nms, nms)) ?dist dist(mat) dist(mat, diag = TRUE, upper = TRUE) Hope this helps, Rui Barradas Em 24-08-2012 11:56, Arbuckle escreveu:> Hi, > > I should preface this problem with a statement that although I am sure this > is a really easy function to write, I have tried and failed to get my head > around writing functions in R. I can use R where functions exist to do what > I want done, but have found myself completely incapable of writing them > myself. > > The problem is that I have a table with several rows of species and several > columns of trait data for each species. Now what I want to do is, for each > possible pair of species, extract the Euclidean distance between them based > on specified trait data columns. While as far as I can see the dist() > function could manage this to some extent for 2 dimensions (traits) for each > species, I need a more generalised function that can handle n-dimensions. > Ideally this function would allow me to choose which columns (traits) to use > to calculate the Euclidean distance rather than having to reformat the > dataset every time. > > In the hope of clarifying this with a simplified example, I want to take a > dataset like this: > > Species x y z n > spA 2.9 34.2 0.54 15.7 > spB 5.5 46.5 0.45 19.4 > spC 1.4 48.6 0.84 24.8 > spD 8.3 56.1 0.48 21.3 > > Then extract the Euclidean distances using the general equation > d=sqrt[(x2-x1)^2+(y2-y1)^2+...+(n2-n1)^2] for particular data columns. So in > this example I might want the distances using the traits x, z and n, thereby > specifying the equation to be d=sqrt[(x2-x1)^2+(z2-z1)^2+(n2-n1)^2], and > return a distance matrix as follows (calculated distances represented by . > for the purposes of this example): > > Species spA spB spC > spB . > spC . . > spD . . . > > I hope this makes sense. I only presume that this would be a quick and easy > function to write on the basis that the underlying process is basically > simple maths repeated for each pair of species. Again I have no experience > in writing custom functions (no matter how simple) and just can't seem to > get into my head how to go about it. > > I look forward to your response and hope someone gets bored enough to > quickly write out the code to implement this function. Thank you in advance. > > Best wishes, > > Kev > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Euclidean-distance-function-tp4641177.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Kev, The dist() function handles more than 2 dimensions. Using the example you provided ... mydat <- structure(list(Species = c("spA", "spB", "spC", "spD"), x = c(2.9, 5.5, 1.4, 8.3), y = c(34.2, 46.5, 48.6, 56.1), z = c(0.54, 0.45, 0.84, 0.48), n = c(15.7, 19.4, 24.8, 21.3)), .Names = c("Species", "x", "y", "z", "n"), class = "data.frame", row.names = c(NA, -4L)) dist(mydat [, c("x", "z", "n")]) Jean Arbuckle <k.arbuckle@liverpool.ac.uk> wrote on 08/24/2012 05:56:51 AM:> > Hi, > > I should preface this problem with a statement that although I am surethis> is a really easy function to write, I have tried and failed to get myhead> around writing functions in R. I can use R where functions exist to dowhat> I want done, but have found myself completely incapable of writing them > myself. > > The problem is that I have a table with several rows of species andseveral> columns of trait data for each species. Now what I want to do is, foreach> possible pair of species, extract the Euclidean distance between thembased> on specified trait data columns. While as far as I can see the dist() > function could manage this to some extent for 2 dimensions (traits) foreach> species, I need a more generalised function that can handlen-dimensions.> Ideally this function would allow me to choose which columns (traits) touse> to calculate the Euclidean distance rather than having to reformat the > dataset every time. > > In the hope of clarifying this with a simplified example, I want to takea> dataset like this: > > Species x y z n > spA 2.9 34.2 0.54 15.7 > spB 5.5 46.5 0.45 19.4 > spC 1.4 48.6 0.84 24.8 > spD 8.3 56.1 0.48 21.3 > > Then extract the Euclidean distances using the general equation > d=sqrt[(x2-x1)^2+(y2-y1)^2+...+(n2-n1)^2] for particular data columns.So in> this example I might want the distances using the traits x, z and n,thereby> specifying the equation to be d=sqrt[(x2-x1)^2+(z2-z1)^2+(n2-n1)^2], and > return a distance matrix as follows (calculated distances represented by.> for the purposes of this example): > > Species spA spB spC > spB . > spC . . > spD . . . > > I hope this makes sense. I only presume that this would be a quick andeasy> function to write on the basis that the underlying process is basically > simple maths repeated for each pair of species. Again I have noexperience> in writing custom functions (no matter how simple) and just can't seemto> get into my head how to go about it. > > I look forward to your response and hope someone gets bored enough to > quickly write out the code to implement this function. Thank you inadvance.> > Best wishes, > > Kev[[alternative HTML version deleted]]
Thank you kindly for both of the replies I've received, that does indeed work perfectly. I had been looking at the description of that function and it reads as though it only deals with 2-dimensional data. Thanks again! Kev -- View this message in context: http://r.789695.n4.nabble.com/Euclidean-distance-function-tp4641177p4641193.html Sent from the R help mailing list archive at Nabble.com.