Rogério Rosa da Silva
2006-Jul-11 19:11 UTC
[R] script problem to obtain pairs of overlap values
Dear, I wrote a code to estimate the overlap between two kernel distributions. The script must estimates the overlap among each columns of data frame. With S sampled species (columns) in my data frame, I want obtain S(S-1)/2 pairs of overlap values between species. However, the code is not well write at all (only an overlap value is produced) and I can't find the solution. To illustrate the calculations, I use the data frame "tdon" and the value of the bandwidth "h", which was estimated in other part of script. tdon <- data.frame (sp.1=c (5 ,9 ,NA ,5, 11) , sp.2=c (4, 2, 4, NA, 11, ),sp.3=c(5, 4, 2, 6, 13), sp.4=c(3 , 11, NA, 5, 3), sp.5=c(2 ,5 ,2, 9, 9))> h[1] 1.047 2.973 0.887 1.520 2.955 Here is the code: for (i in 1:(nbcol-1)) # nbcol<-ncol(tdon) {tdon1<-tdon[,i] tdon11<- subset(tdon1,tdon1!="NA") fctk1<-function(x) {density (tdon11, bw=h[i], kernel="gaussian")$y} for (j in (i+1):nbcol) {tdon2<-tdon[,j] tdon21<- subset(tdon2,tdon2!="NA") fctk2<-function(x) {density (tdon21, bw=h[j], kernel="gaussian")$y} diffctk<-function(x) {abs(fctk1(x)-fctk2(x))} intctk<- approxfun (diffctk(x), rule=2) int<- integrate(diffctk,-Inf,Inf)$value overlap<- 1 - 0.5* int } } The use of "approxfun" to integrate the difference in the estimated density values (my "diffctk" function) was suggested by Thomas Lumley, but I'm not sure that I have found the solution or if this solution is correct for my problem. I need that the "overlap" produce a vector with the length equal to 10, with all pairs of overlap values. Any help or advice on improvement for this code will be appreciated. With kind regards, Rog?rio