Hans-Joerg Bibiko
2006-Jul-05 13:55 UTC
[R] R 2.3.1 on Mac OSX: apply a function to 'dist()' ?
Dear all, maybe there is someone who has an hint for me. I have to calculate a distance matrix by using my own function to get the distance between two rows of my matrix. The normal way to do this is to use two 'for loops' like for(i in 1:nrow) for(j in i:nrow) _my_function(matrix[i,],matrix[j,]) OK. This works, but unfortunately I have a very huge matrix and it takes hours even on a cpu-cluster. My question is: Is there any chance to increase the speed of doing this? My first idea was to apply my function to 'dist()' like 'outer (x,y,"fun")'. But up to know there is no implementation for that(?) and I don't know whether it would increase the speed. On the other hand it would be possible to write a C-routine for that, but I have to use several function to calculate the distance and some other people who want to use this algorithm aren't familiar with C. I would be pleased if there are any hints! Many thanks in advance! Cheers, Hans
Hans-Joerg Bibiko
2006-Jul-05 15:37 UTC
[R] R 2.3.1 on Mac OSX: apply a function to 'dist()' ?
Dear Sarah, many thanks for the hint. Unfortunately my distance metric is more complex. For me it is not only important to know whether there is a change between rows but also to know where (which column) the change occurs and furthermore to know if there is a change in e.g. column 5 to look at column 144 etc. in order to be able to calculate a kind of weighted distance matrix. I doubt that it would be possible to decompose my metric into some of the common manipulations(?) Albeit I had a look at the package ecodist (many thanks for that, now I can solve an other problem ;) ). Based on this mechanism of decomposing I will adopt this idea to prepare some subfunctions written in C which could be combined to solve a specific problem. Doing so other users can modify these a bit easier (I hope). But nevertheless I still have a speed problem! Best regards, Hans