Good morning! I recently implemented a KD tree in JAVA for faster kernel density estimation (part of the code follows). It went well. To hook it with R, however, has proved more difficult. My question is: is it possible to implement the algorithm in R? My impression seems to indicate no as the code requires a complete class-object framework that R does not support. But is there an R package or something that may make it possible? Thanks in advance for your help. Java implementation of KD tree: public class Kdnode { private double[] center; //center of the bounding box private double diameter; //maximum distance from center to anywhere within the bounding box private int numOfPoints; //number of source data points in the bounding box private Kdnode left, right; public Kdnode(double[][] points, int split_dim, int [][] sortedIndices, double[][] bBox) { //bBox: the bounding box, 1st row the lower bound, 2nd row the upper bound numOfPoints = points.length; int d = points[0].length; center = new double[d]; for(int j=0; j<d; j++) center[j] (bBox[0][j]+bBox[1][j])/2.; diameter = get_diameter(bBox); if(numOfPoints==1) { diameter = 0.; for(int j=0; j<d; j++) center[j] = points[0][j]; left = null; right = null; } else { int middlePoint sortedIndices[split_dim][numOfPoints/2]; double splitValue = points[middlePoint][split_dim]; middlePoint sortedIndices[split_dim][numOfPoints/2-1]; double splitValue_small points[middlePoint][split_dim]; int left_size = numOfPoints/2; int right_size = numOfPoints - left_size; double[][] leftPoints = new double[left_size][d]; double[][] rightPoints = new double[right_size][d]; int[][] leftSortedIndices = new int[d][left_size]; int[][] rightSortedIndices = new int[d][right_size]; int left_counter = 0, right_counter = 0; int[] splitInfo = new int [numOfPoints]; for(int i = 0; i < numOfPoints; i++) { if(points[i][split_dim] < splitValue) { for(int j=0; j<d; j++) leftPoints[left_counter][j] = points[i][j]; splitInfo[i] = right_counter; left_counter++; } else { for(int j=0; j<d; j++) rightPoints[right_counter][j] = points[i][j]; splitInfo[i] = left_counter; right_counter++; } } // modify appropriately the indices to correspond to the new lists for(int i = 0; i < d; i++) { int left_index = 0, right_index = 0; for(int j = 0; j < numOfPoints; j++) { if(points[sortedIndices[i][j]][split_dim] < splitValue) leftSortedIndices[i][left_index++] = sortedIndices[i][j] - splitInfo[sortedIndices[i][j]]; else rightSortedIndices[i][right_index++] = sortedIndices[i][j] - splitInfo[sortedIndices[i][j]]; } } // Recursively compute the kdnodes for the points in the two splitted spaces double[][] leftBBox = new double[2][]; double[][] rightBBox = new double[2][]; for(int i=0; i<2; i++) { leftBBox[i] (double[])bBox[i].clone(); rightBBox[i] (double[])bBox[i].clone(); } leftBBox[1][split_dim] = splitValue_small; rightBBox[0][split_dim] = splitValue; int next_dim = (split_dim + 1) % (d); left = new Kdnode(leftPoints, next_dim, leftSortedIndices, leftBBox); right = new Kdnode(rightPoints, next_dim, rightSortedIndices, rightBBox); } } public double evaluate(double[] target, double delta, double bandwidth) throws Exception { double dis_2_center = Common.distance(target, center)/bandwidth; double dm = diameter/bandwidth; if(dis_2_center >= 1+dm) return 0.; if(numOfPoints==1) return Common.K(dis_2_center); /*if(dis_2_center<1) { double temp2 = dm*Common.KDeriv(dis_2_center); if(temp2<delta) return Common.K(dis_2_center)*numOfPoints; } */ return left.evaluate(target,delta, bandwidth) + right.evaluate(target,delta, bandwidth); } public double get_diameter(double[][] bBox) { double value = 0., diff; for (int i=0; i<bBox[0].length;i++) { diff = (bBox[1][i] - bBox[0][i])/2.; value += diff*diff; } return Math.sqrt(value); } } ====Jason Liao, http://www.geocities.com/jg_liao Dept. of Biostatistics, http://www2.umdnj.edu/bmtrxweb University of Medicine and Dentistry of New Jersey phone 732-235-5429, School of Public Health office phone 732-235-8611, Cancer Institute of New Jersey office moble phone 908-720-4205
Jason Liao <jg_liao <at> yahoo.com> writes: : : Good morning! I recently implemented a KD tree in JAVA for faster : kernel density estimation (part of the code follows). It went well. To : hook it with R, however, has proved more difficult. My question is: is : it possible to implement the algorithm in R? My impression seems to : indicate no as the code requires a complete class-object framework that : R does not support. But is there an R package or something that may : make it possible? Thanks in advance for your help. R comes with the S3 and S4 object systems out-of-the-box and there is an addon package oo.R available at: http://www.maths.lth.se/help/R/R.classes/ that provides a more conventional OO system. Its likely that one or more of these would satisfy your requirements.
For an overview of the OOP R package, see http://cran.r-project.org/doc/Rnews/Rnews_2001-3.pdf + seth
On Thu, 12 Aug 2004, Jason Liao wrote:> Good morning! I recently implemented a KD tree in JAVA for faster > kernel density estimation (part of the code follows). It went well. To > hook it with R, however, has proved more difficult. My question is: is > it possible to implement the algorithm in R? My impression seems to > indicate no as the code requires a complete class-object framework that > R does not support. But is there an R package or something that may > make it possible? Thanks in advance for your help.This code doesn't seem to have any requirement for a class-object framework. The methods can all be written as functions, there isn't any use of inheritance or polymorphism. The data structure can then be a list. Now, you might want to make this list an object, to improve printing or to make it easier to check that the functions don't get called with arguments that aren't really k-d trees. This is well within the facilities of even the S3 method system. AFAICS the only class/object facility that Java provides and the "methods" package doesn't is enforcement of "private" methods and data, which has absolutely no impact on the complexity of programs (it can affect how easy code *maintenance* is, because it forces you to decide what is and isn't in your API, but that's a separate issue). The old S3 class system is weaker, since it doesn't support function polymorphism based on more than one of the arguments and doesn't have reliable reflectance facilities. -thomas> > Java implementation of KD tree: > > public class Kdnode { > > private double[] center; //center of the bounding box > private double diameter; //maximum distance from center to > anywhere within the bounding box > private int numOfPoints; //number of source data points in the > bounding box > > private Kdnode left, right; > > > public Kdnode(double[][] points, int split_dim, int [][] > sortedIndices, double[][] bBox) { > //bBox: the bounding box, 1st row the lower bound, 2nd row > the upper bound > numOfPoints = points.length; > int d = points[0].length; > > center = new double[d]; > for(int j=0; j<d; j++) center[j] > (bBox[0][j]+bBox[1][j])/2.; > diameter = get_diameter(bBox); > > if(numOfPoints==1) { > diameter = 0.; > for(int j=0; j<d; j++) center[j] = points[0][j]; > left = null; > right = null; > } > else { > int middlePoint > sortedIndices[split_dim][numOfPoints/2]; > double splitValue = points[middlePoint][split_dim]; > > middlePoint > sortedIndices[split_dim][numOfPoints/2-1]; > double splitValue_small > points[middlePoint][split_dim]; > > int left_size = numOfPoints/2; > int right_size = numOfPoints - left_size; > > double[][] leftPoints = new double[left_size][d]; > double[][] rightPoints = new double[right_size][d]; > > > int[][] leftSortedIndices = new int[d][left_size]; > int[][] rightSortedIndices = new int[d][right_size]; > > int left_counter = 0, right_counter = 0; > int[] splitInfo = new int [numOfPoints]; > > for(int i = 0; i < numOfPoints; i++) { > if(points[i][split_dim] < splitValue) { > for(int j=0; j<d; j++) leftPoints[left_counter][j] = points[i][j]; > splitInfo[i] = right_counter; > left_counter++; > } > > else { > for(int j=0; j<d; j++) rightPoints[right_counter][j] = points[i][j]; > splitInfo[i] = left_counter; > right_counter++; > } > } > // modify appropriately the indices to correspond to the new lists > for(int i = 0; i < d; i++) { > int left_index = 0, right_index = 0; > for(int j = 0; j < numOfPoints; j++) { > if(points[sortedIndices[i][j]][split_dim] < splitValue) > leftSortedIndices[i][left_index++] = sortedIndices[i][j] - > splitInfo[sortedIndices[i][j]]; > else rightSortedIndices[i][right_index++] = sortedIndices[i][j] > - splitInfo[sortedIndices[i][j]]; > } > } > > // Recursively compute the kdnodes for the points in the two > splitted spaces > double[][] leftBBox = new double[2][]; > double[][] rightBBox = new double[2][]; > > for(int i=0; i<2; i++) { > leftBBox[i] > (double[])bBox[i].clone(); > rightBBox[i] > (double[])bBox[i].clone(); > } > > leftBBox[1][split_dim] = splitValue_small; > rightBBox[0][split_dim] = splitValue; > > int next_dim = (split_dim + 1) % (d); > left = new Kdnode(leftPoints, next_dim, leftSortedIndices, > leftBBox); > right = new Kdnode(rightPoints, next_dim, rightSortedIndices, > rightBBox); > } > } > > > public double evaluate(double[] target, double delta, double > bandwidth) throws Exception > { > > double dis_2_center = Common.distance(target, > center)/bandwidth; > double dm = diameter/bandwidth; > > if(dis_2_center >= 1+dm) return 0.; > if(numOfPoints==1) return Common.K(dis_2_center); > > /*if(dis_2_center<1) > { > double temp2 = dm*Common.KDeriv(dis_2_center); > if(temp2<delta) return > Common.K(dis_2_center)*numOfPoints; > } */ > > return left.evaluate(target,delta, bandwidth) + > right.evaluate(target,delta, bandwidth); > } > > > public double get_diameter(double[][] bBox) > { > double value = 0., diff; > for (int i=0; i<bBox[0].length;i++) > { > diff = (bBox[1][i] - bBox[0][i])/2.; > value += diff*diff; > } > return Math.sqrt(value); > } > } > > ====> Jason Liao, http://www.geocities.com/jg_liao > Dept. of Biostatistics, http://www2.umdnj.edu/bmtrxweb > University of Medicine and Dentistry of New Jersey > phone 732-235-5429, School of Public Health office > phone 732-235-8611, Cancer Institute of New Jersey office > moble phone 908-720-4205 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
Dear Jason, Of course you can do almost eveything in R, but I have rarely seen someone prototyping something in Java to then implement it in R. KD-trees are for performance, and interpreted R is not really fast. So interfacing some foreign code might really be the better choice here. Opportunity 1 ============Since you have already Java code, the most straightforward choice might seem interfacing the Java code via SJava or RJava. However, though there are packages RJava and SJava, it looks like there is no recommended Java interface which is maintained across all relevant platforms, and thus nothing at CRAN. Given the richness of available Java software, this is a pitty, but a reality. I am currently trying to remove remaining memory leaks from SJava 0.66 for Windows, but this is a one time effort, not a maintenance promise. Opportunity 2 ============Get some standard KD-tree C-code and modify it to your needs. Then interface it using the C- or Call-interface, to write an access function for every operation you want to perform on the KD-tree from R. Opportunity 3 ============Write your own C-code operating directly on R-data structures. In this case you do not only interface a KD-tree, you really have it in R but you can have the most important operations coded in C. But this is the technically most challenging one. You need to to know C, read "Writing R Extensions" very carefully, you need to understand the sets of macros defined in Rdefines.h and Rinternals.h, especially you need to understand all the PROTECT macros and how to avoid overloading the protect-stack, because you will need to reprotect your KD-tree very often. Hope that helps Best Jens Oehlschl??gel -- NEU: WLAN-Router f??r 0,- EUR* - auch f??r DSL-Wechsler! GMX DSL = superg??nstig & kabellos http://www.gmx.net/de/go/dsl