Dear list, I'm writing a function to re-grid a data set from finer to coarser resolutions in R as follows (I use this function with sapply/apply): gridResize <- function(startVec = stop("What's your input vector"), to = stop("Missing 'to': How long do you want the fnial vector to be?")){ from <- length(startVec) shortVec<-numeric() tics <- from*to for(j in 1:to){ interval <- ((j/to)*tics - (1/to)*tics + 1):((j/to)*tics) benchmarks <- interval/to #FIRST RUN ASSUMES FINAL BENCHMARK/TO IS AN INTEGER... positions <- which(round(benchmarks) == benchmarks) indeces <- benchmarks[positions] fracs <- numeric() #SINCE MUCH OF THE TIME THIS WILL NOT BE THE CASE, THIS SCRIPT DEALS WITH THE REMAINDER... for(i in 1:length(positions)){ if(i == 1) fracs[i] <- positions[i]/length(benchmarks) else{ fracs[i] <- (positions[i] - sum(positions[1:(i-1)]))/length(benchmarks) } } #AND UPDATES STARTVEC INDECES AND FRACTION MULTIPLIERS if(max(positions) != length(benchmarks)) indeces <- c(indeces, max(indeces) + 1) if(sum(fracs) != 1) fracs <- c(fracs, 1 - sum(fracs)) fromVals <- startVec[indeces] if(any(is.na(fromVals))){ NAindex <- which(is.na(fromVals)) if(sum(Fracs[-NAindex]) >= 0.5) shortVec[j] <- sum(fromVals*fracs, na.rm=TRUE) else shortVec[j] <- NA }else{shortVec[j] <- sum(fromVals*fracs)} } return(shortVec) } for the simple test case test <- gridResize(startVec c(2,4,6,8,10,8,6,4,2), to = 7) the function works fine. For larger vectors, however, it breaks down. E.g.: test <- gridResize(startVec = rnorm(300, 9, 20), to = 200) This returns the error: Error in positions[1:(i - 1)] : only 0's may be mixed with negative subscripts and the problem seems to be in the line positions <- which(round(benchmarks) == benchmarks). In this particular example the code cracks up at j = 27. When set j = 27 and run the calculation manually I discover the following:> benchmarks[200][1] 40> benchmarks[200] == 40[1] FALSE> round(benchmarks[200]) == 40[1] TRUE Even though my benchmark calculation seems to be returning a clean integers to serve as inputs for the creation of the 'positions' variable, for whatever reason R doesn't read it that way. I would be very grateful for any advice on how I can either alter my approach entirely (I am sure there is a far more elegant way to regrid data in R) or a simple fix for this rounding error. Many thanks in advance, Aaron -- Aaron Polhamus <aaronpolhamus@gmail.com> Statistical consultant, Revolution Analytics MSc Applied Statistics, The University of Oxford, 2009 838a NW 52nd St, Seattle, WA 98107 Cell: +1 (206) 380.3948 [[alternative HTML version deleted]]
I believe you've fallen into one of the R FAQs, namely the difference between a float and an integer. There are probably much better ways to set up your 'before' and 'after' gridpoint references, but you could start out by replacing the offending line with positions <- which(round(benchmarks)-benchmarks < 1-e20) or better yet, replace "1e-20" with, say, interval/to/100 Carl <quote> From: Aaron Polhamus <aaronpolhamus_at_gmail.com> Date: Mon, 17 Jan 2011 12:02:21 -0800 Dear list, I'm writing a function to re-grid a data set from finer to coarser resolutions in R as follows (I use this function with sapply/apply): gridResize <- function(startVec = stop("What's your input vector"), to = stop("Missing 'to': How long do you want the fnial vector to be?")){ from <- length(startVec) shortVec<-numeric() tics <- from*to for(j in 1:to){ interval <- ((j/to)*tics - (1/to)*tics + 1):((j/to)*tics) benchmarks <- interval/to #FIRST RUN ASSUMES FINAL BENCHMARK/TO IS AN INTEGER... positions <- which(round(benchmarks) == benchmarks) indeces <- benchmarks[positions] fracs <- numeric() #SINCE MUCH OF THE TIME THIS WILL NOT BE THE CASE, THIS SCRIPT DEALS WITH THE REMAINDER... for(i in 1:length(positions)){ if(i == 1) fracs[i] <- positions[i]/length(benchmarks) else{ fracs[i] <- (positions[i] - sum(positions[1:(i-1)]))/length(benchmarks) } } #AND UPDATES STARTVEC INDECES AND FRACTION MULTIPLIERS if(max(positions) != length(benchmarks)) indeces <- c(indeces, max(indeces) + 1) if(sum(fracs) != 1) fracs <- c(fracs, 1 - sum(fracs)) fromVals <- startVec[indeces] if(any(is.na(fromVals))){ NAindex <- which(is.na(fromVals)) if(sum(Fracs[-NAindex]) >= 0.5) shortVec[j] <- sum(fromVals*fracs, na.rm=TRUE) else shortVec[j] <- NA }else{shortVec[j] <- sum(fromVals*fracs)} } return(shortVec) } for the simple test case test <- gridResize(startVec = c(2,4,6,8,10,8,6,4,2), to = 7) the function works fine. For larger vectors, however, it breaks down. E.g.: test <- gridResize(startVec = rnorm(300, 9, 20), to = 200) This returns the error: Error in positions[1:(i - 1)] : only 0's may be mixed with negative subscripts and the problem seems to be in the line positions <- which(round(benchmarks) == benchmarks). In this particular example the code cracks up at j = 27. When set j = 27 and run the calculation manually I discover the following: > benchmarks[200] [1] 40 > benchmarks[200] == 40 [1] FALSE > round(benchmarks[200]) == 40 [1] TRUE Even though my benchmark calculation seems to be returning a clean integers to serve as inputs for the creation of the 'positions' variable, for whatever reason R doesn't read it that way. I would be very grateful for any advice on how I can either alter my approach entirely (I am sure there is a far more elegant way to regrid data in R) or a simple fix for this rounding error. Many thanks in advance, Aaron -- Aaron Polhamus <aaronpolhamus_at_gmail.com> Statistical consultant, Revolution Analytics MSc Applied Statistics, The University of Oxford, 2009 838a NW 52nd St, Seattle, WA 98107 Cell: +1 (206) 380.3948
On Mon, Jan 17, 2011 at 12:02:21PM -0800, Aaron Polhamus wrote:> Dear list, > > I'm writing a function to re-grid a data set from finer to coarser > resolutions in R as follows (I use this function with sapply/apply): > > gridResize <- function(startVec = stop("What's your input vector"), > to = stop("Missing 'to': How long do you want the fnial vector to be?")){ > from <- length(startVec) > shortVec<-numeric() > tics <- from*to > for(j in 1:to){ > interval <- ((j/to)*tics - (1/to)*tics + 1):((j/to)*tics)The vector "interval" computed as above may contain non-integer values, because of rounding error. For example, to <- 200 from <- 300 tics <- from*to j <- c(27, 54, 107, 108, 109) (j/to)*tics - j*from # [1] 9.094947e-13 1.818989e-12 3.637979e-12 3.637979e-12 3.637979e-12 Since "tics" is a multiple of "to", the expression may be rearranged as interval <- (j*from - from + 1):(j*from) and then it only contains integers.> benchmarks <- interval/to > #FIRST RUN ASSUMES FINAL BENCHMARK/TO IS AN INTEGER... > positions <- which(round(benchmarks) == benchmarks)If the above modification of "interval" is used, then "benchmarks" is obtained as a result of a division of two integers. In this situation, this test is correct, since a single division, whose exact result is an integer, produces an integer also in double precision. Alternatively, the test for integer "benchmarks" may be done by testing divisibility of "interval" by "to", for example which(interval %% to == 0)> indeces <- benchmarks[positions] > fracs <- numeric() > #SINCE MUCH OF THE TIME THIS WILL NOT BE THE CASE, THIS SCRIPT DEALS WITH > THE REMAINDER... > for(i in 1:length(positions)){If "positions" may, in some situations, be of length 0, then it is better to use for(i in seq(along=positions)) or for(i in seq(length=length(positions))) For zero length "positions", 1:length(positions) is 1:0, which is of length 2. The expressions seq(along=positions) and seq(length=length(positions)) are integer(0) in this situation. Petr Savicky.