Hi, I have a set of x,y data points and each data point lies between (0,0) and (1,1). Of this set I have selected all those that lie in the lower triangle (of the plot of these points). What I would like to do is to divide the region (0,0) to (1,1) into cells of say, side = 0.01 and then count the number of cells that contain a point. My first approach is to generate the coordinates of these cells and then loop over the point list to see whether a point lies in a cell or not. However this seems to be very inefficient esepcially since I will have 1000's of points. Has anybody dealt with this type of problem and are there routines to handle it? ------------------------------------------------------------------- Rajarshi Guha <rxg218 at psu.edu> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- Alone, adj.: In bad company. -- Ambrose Bierce, "The Devil's Dictionary"
On Monday 04 April 2005 13:22, Rajarshi Guha wrote:> Hi, > I have a set of x,y data points and each data point lies between > (0,0) and (1,1). Of this set I have selected all those that lie in > the lower triangle (of the plot of these points). > > What I would like to do is to divide the region (0,0) to (1,1) into > cells of say, side = 0.01 and then count the number of cells that > contain a point. > > My first approach is to generate the coordinates of these cells and > then loop over the point list to see whether a point lies in a cell > or not. > > However this seems to be very inefficient esepcially since I will > have 1000's of points. > > Has anybody dealt with this type of problem and are there routines to > handle it?A combination of cut and table/xtabs should do it, e.g.: x <- runif(3000) y <- runif(3000) fx <- cut(x, breaks = seq(0, 1, length = 101)) fy <- cut(y, breaks = seq(0, 1, length = 101)) txy <- xtabs(~ fx + fy) image(txy > 0) sum(txy > 0) Deepayan
> From: Deepayan Sarkar <deepayan at stat.wisc.edu> Mon, 4 Apr 2005 13:52:48 -0500 > > On Monday 04 April 2005 13:22, Rajarshi Guha wrote: > > Hi, > > I have a set of x,y data points and each data point lies between > > (0,0) and (1,1). Of this set I have selected all those that lie in > > the lower triangle (of the plot of these points). > > > > What I would like to do is to divide the region (0,0) to (1,1) into > > cells of say, side = 0.01 and then count the number of cells that > > contain a point. > > > > My first approach is to generate the coordinates of these cells and > > then loop over the point list to see whether a point lies in a cell > > or not. > > > > However this seems to be very inefficient esepcially since I will > > have 1000's of points. > > > > Has anybody dealt with this type of problem and are there routines to > > handle it? > > A combination of cut and table/xtabs should do it, e.g.: > > > x <- runif(3000) > y <- runif(3000) > > fx <- cut(x, breaks = seq(0, 1, length = 101)) > fy <- cut(y, breaks = seq(0, 1, length = 101)) > > txy <- xtabs(~ fx + fy) > :Another significantly faster way (but not generating row/column names) is: x <- runif(3000) y <- runif(3000) ints <- 100 myfun <- function(x, y, ints) { fx <- x %/% (1/ints) fy <- y %/% (1/ints) txy <- hist(fx + ints*fy+ 1, breaks=0:(ints*ints), plot=FALSE)$counts dim(fxy) <- c(ints, ints) return(txy) } myfun(x, y, ints) Hope this helps, Ray Brownrigg
On Mon, 2005-04-04 at 14:22 -0400, Rajarshi Guha wrote:> Hi, > I have a set of x,y data points and each data point lies between (0,0) > and (1,1). Of this set I have selected all those that lie in the lower > triangle (of the plot of these points). > > What I would like to do is to divide the region (0,0) to (1,1) into > cells of say, side = 0.01 and then count the number of cells that > contain a point.Thanks very much to Deepayan Sarkar, James Holtman and Ray Brownrigg for very efficient (and elegant) solutions. I've summarized them below: Deepayan Sarkar A combination of cut and table/xtabs should do it, e.g.: x <- runif(3000) y <- runif(3000) fx <- cut(x, breaks = seq(0, 1, length = 101)) fy <- cut(y, breaks = seq(0, 1, length = 101)) txy <- xtabs(~ fx + fy) image(txy > 0) sum(txy > 0) --------------------------------------------------------- james Holtman Here is a start. This creates a dataframe and then divides the data up into 10 segments (you wanted 100, so extend it) and then counts the number in each cell.> df <- data.frame(x=runif(100), y=runif(100)) # create data > breaks <- seq(0,1,.1) # define breaks; you would use 0.01 > table(cut(df$x, breaks=breaks,labels=F),cut(df$y,breaks=breaks,labels=F)) # use 'cut' to partition and then 'table' to count 1 2 3 4 5 6 7 8 9 10 1 0 2 0 1 0 3 0 1 0 0 2 0 1 0 0 0 2 1 2 0 0 3 0 1 0 0 3 0 2 2 1 2 4 0 0 1 2 3 3 1 2 2 0 5 3 1 2 2 1 2 1 1 1 0 6 2 0 2 0 0 0 0 1 0 0 7 0 1 1 1 2 1 1 1 2 1 8 0 3 2 1 1 2 2 2 1 1 9 0 0 2 2 0 1 2 0 2 2 10 0 2 1 0 0 0 0 0 0 3 ----------------------------------------------------------------- Ray Brownrigg Another significantly faster way (but not generating row/column names) is: x <- runif(3000) y <- runif(3000) ints <- 100 myfun <- function(x, y, ints) { fx <- x %/% (1/ints) fy <- y %/% (1/ints) txy <- hist(fx + ints*fy+ 1, breaks=0:(ints*ints), plot=FALSE)$counts dim(fxy) <- c(ints, ints) return(txy) } myfun(x, y, ints) ------------------------------------------------------------------- Rajarshi Guha <rxg218 at psu.edu> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- Q: Why did the mathematician name his dog "Cauchy"? A: Because he left a residue at every pole.
I said:> myfun <- function(x, y, ints) { > fx <- x %/% (1/ints) > fy <- y %/% (1/ints) > txy <- hist(fx + ints*fy+ 1, breaks=0:(ints*ints), plot=FALSE)$counts > dim(fxy) <- c(ints, ints)^^^> return(txy) > }Of course it should be: dim(txy) <- c(ints, ints) ^^^ Sorry about that, Ray
Perhaps the following, substituting your vectors of x and y for runif(10000)> x<-trunc(100*runif(10000)) > y<-trunc(100*runif(10000))/100 > length(unique(x+y))[1] 6390 Ben Fairbank -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Rajarshi Guha Sent: Monday, April 04, 2005 1:23 PM To: R Subject: [R] a question about box counting Hi, I have a set of x,y data points and each data point lies between (0,0) and (1,1). Of this set I have selected all those that lie in the lower triangle (of the plot of these points). What I would like to do is to divide the region (0,0) to (1,1) into cells of say, side = 0.01 and then count the number of cells that contain a point. My first approach is to generate the coordinates of these cells and then loop over the point list to see whether a point lies in a cell or not. However this seems to be very inefficient esepcially since I will have 1000's of points. Has anybody dealt with this type of problem and are there routines to handle it? ------------------------------------------------------------------- Rajarshi Guha <rxg218 at psu.edu> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- Alone, adj.: In bad company. -- Ambrose Bierce, "The Devil's Dictionary" ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html