Dear list,
I'm writing a function to re-grid a data set from finer to coarser
resolutions in R as follows (I use this function with sapply/apply):
gridResize <- function(startVec = stop("What's your input
vector"),
to = stop("Missing 'to': How long do you want the fnial vector to
be?")){
from <- length(startVec)
shortVec<-numeric()
tics <- from*to
for(j in 1:to){
interval <- ((j/to)*tics - (1/to)*tics + 1):((j/to)*tics)
benchmarks <- interval/to
#FIRST RUN ASSUMES FINAL BENCHMARK/TO IS AN INTEGER...
positions <- which(round(benchmarks) == benchmarks)
indeces <- benchmarks[positions]
fracs <- numeric()
#SINCE MUCH OF THE TIME THIS WILL NOT BE THE CASE, THIS SCRIPT DEALS WITH
THE REMAINDER...
for(i in 1:length(positions)){
if(i == 1) fracs[i] <- positions[i]/length(benchmarks) else{
fracs[i] <- (positions[i] - sum(positions[1:(i-1)]))/length(benchmarks)
}
}
#AND UPDATES STARTVEC INDECES AND FRACTION MULTIPLIERS
if(max(positions) != length(benchmarks)) indeces <- c(indeces, max(indeces)
+ 1)
if(sum(fracs) != 1) fracs <- c(fracs, 1 - sum(fracs))
fromVals <- startVec[indeces]
if(any(is.na(fromVals))){
NAindex <- which(is.na(fromVals))
if(sum(Fracs[-NAindex]) >= 0.5) shortVec[j] <- sum(fromVals*fracs,
na.rm=TRUE) else shortVec[j] <- NA
}else{shortVec[j] <- sum(fromVals*fracs)}
}
return(shortVec)
}
for the simple test case test <- gridResize(startVec c(2,4,6,8,10,8,6,4,2),
to = 7) the function works fine. For larger vectors,
however, it breaks down. E.g.: test <- gridResize(startVec = rnorm(300, 9,
20), to = 200)
This returns the error:
Error in positions[1:(i - 1)] :
only 0's may be mixed with negative subscripts
and the problem seems to be in the line positions <- which(round(benchmarks)
== benchmarks). In this particular example the code cracks up at j = 27.
When set j = 27 and run the calculation manually I discover the following:
> benchmarks[200]
[1] 40> benchmarks[200] == 40
[1] FALSE> round(benchmarks[200]) == 40
[1] TRUE
Even though my benchmark calculation seems to be returning a clean integers
to serve as inputs for the creation of the 'positions' variable, for
whatever reason R doesn't read it that way. I would be very grateful for any
advice on how I can either alter my approach entirely (I am sure there is a
far more elegant way to regrid data in R) or a simple fix for this rounding
error.
Many thanks in advance,
Aaron
--
Aaron Polhamus <aaronpolhamus@gmail.com>
Statistical consultant, Revolution Analytics
MSc Applied Statistics, The University of Oxford, 2009
838a NW 52nd St, Seattle, WA 98107
Cell: +1 (206) 380.3948
[[alternative HTML version deleted]]
I believe you've fallen into one of the R FAQs, namely the difference
between a float and an integer.
There are probably much better ways to set up your 'before' and
'after'
gridpoint references, but you could start out by replacing the
offending line with
positions <- which(round(benchmarks)-benchmarks < 1-e20)
or better yet, replace "1e-20" with, say, interval/to/100
Carl
<quote>
From: Aaron Polhamus <aaronpolhamus_at_gmail.com>
Date: Mon, 17 Jan 2011 12:02:21 -0800
Dear list,
I'm writing a function to re-grid a data set from finer to coarser
resolutions in R as follows (I use this function with sapply/apply):
gridResize <- function(startVec = stop("What's your input
vector"), to =
stop("Missing 'to': How long do you want the fnial vector to
be?")){
from <- length(startVec)
shortVec<-numeric()
tics <- from*to
for(j in 1:to){
interval <- ((j/to)*tics - (1/to)*tics + 1):((j/to)*tics) benchmarks <-
interval/to
#FIRST RUN ASSUMES FINAL BENCHMARK/TO IS AN INTEGER... positions <-
which(round(benchmarks) == benchmarks) indeces <- benchmarks[positions]
fracs <- numeric()
#SINCE MUCH OF THE TIME THIS WILL NOT BE THE CASE, THIS SCRIPT DEALS
WITH THE REMAINDER...
for(i in 1:length(positions)){
if(i == 1) fracs[i] <- positions[i]/length(benchmarks) else{ fracs[i] <-
(positions[i] - sum(positions[1:(i-1)]))/length(benchmarks) }
}
#AND UPDATES STARTVEC INDECES AND FRACTION MULTIPLIERS
if(max(positions) != length(benchmarks)) indeces <- c(indeces,
max(indeces) + 1)
if(sum(fracs) != 1) fracs <- c(fracs, 1 - sum(fracs)) fromVals <-
startVec[indeces]
if(any(is.na(fromVals))){
NAindex <- which(is.na(fromVals))
if(sum(Fracs[-NAindex]) >= 0.5) shortVec[j] <- sum(fromVals*fracs,
na.rm=TRUE) else shortVec[j] <- NA
}else{shortVec[j] <- sum(fromVals*fracs)} }
return(shortVec)
}
for the simple test case test <- gridResize(startVec =
c(2,4,6,8,10,8,6,4,2), to = 7) the function works fine. For larger
vectors, however, it breaks down. E.g.: test <- gridResize(startVec =
rnorm(300, 9, 20), to = 200)
This returns the error:
Error in positions[1:(i - 1)] :
only 0's may be mixed with negative subscripts
and the problem seems to be in the line positions <-
which(round(benchmarks) == benchmarks). In this particular example the
code cracks up at j = 27. When set j = 27 and run the calculation
manually I discover the following:
> benchmarks[200]
[1] 40
> benchmarks[200] == 40
[1] FALSE
> round(benchmarks[200]) == 40
[1] TRUE Even though my benchmark calculation seems to be returning a
clean integers to serve as inputs for the creation of the 'positions'
variable, for whatever reason R doesn't read it that way. I would be
very grateful for any advice on how I can either alter my approach
entirely (I am sure there is a far more elegant way to regrid data in R)
or a simple fix for this rounding error.
Many thanks in advance,
Aaron
--
Aaron Polhamus <aaronpolhamus_at_gmail.com>
Statistical consultant, Revolution Analytics
MSc Applied Statistics, The University of Oxford, 2009
838a NW 52nd St, Seattle, WA 98107
Cell: +1 (206) 380.3948
On Mon, Jan 17, 2011 at 12:02:21PM -0800, Aaron Polhamus wrote:> Dear list, > > I'm writing a function to re-grid a data set from finer to coarser > resolutions in R as follows (I use this function with sapply/apply): > > gridResize <- function(startVec = stop("What's your input vector"), > to = stop("Missing 'to': How long do you want the fnial vector to be?")){ > from <- length(startVec) > shortVec<-numeric() > tics <- from*to > for(j in 1:to){ > interval <- ((j/to)*tics - (1/to)*tics + 1):((j/to)*tics)The vector "interval" computed as above may contain non-integer values, because of rounding error. For example, to <- 200 from <- 300 tics <- from*to j <- c(27, 54, 107, 108, 109) (j/to)*tics - j*from # [1] 9.094947e-13 1.818989e-12 3.637979e-12 3.637979e-12 3.637979e-12 Since "tics" is a multiple of "to", the expression may be rearranged as interval <- (j*from - from + 1):(j*from) and then it only contains integers.> benchmarks <- interval/to > #FIRST RUN ASSUMES FINAL BENCHMARK/TO IS AN INTEGER... > positions <- which(round(benchmarks) == benchmarks)If the above modification of "interval" is used, then "benchmarks" is obtained as a result of a division of two integers. In this situation, this test is correct, since a single division, whose exact result is an integer, produces an integer also in double precision. Alternatively, the test for integer "benchmarks" may be done by testing divisibility of "interval" by "to", for example which(interval %% to == 0)> indeces <- benchmarks[positions] > fracs <- numeric() > #SINCE MUCH OF THE TIME THIS WILL NOT BE THE CASE, THIS SCRIPT DEALS WITH > THE REMAINDER... > for(i in 1:length(positions)){If "positions" may, in some situations, be of length 0, then it is better to use for(i in seq(along=positions)) or for(i in seq(length=length(positions))) For zero length "positions", 1:length(positions) is 1:0, which is of length 2. The expressions seq(along=positions) and seq(length=length(positions)) are integer(0) in this situation. Petr Savicky.