Denis Chabot
2005-Feb-05 15:31 UTC
Rép : [R] 2 small problems: integer division and the nature of NA
Thanks to the many R users who convinced me that the sum of NAs should be zero and gave me a solution if I did not want it to be zero. Thank you also for the explanations of rounding errors with floating point arithmetics. I did not expect it. This small error was a real problem for me as I was trying to find a way to recode numeric values into intervals. Because I wanted to retain numeric values as a result, I tried not to use cut or cut2. Hence to convert a range of temperatures into 0.2 degree intervals I had written: (lets first make a fake temperature variable k for testing) k <- seq(-5,5,0.1) k1 <- ifelse(k<0,-0.2*(abs(k) %/% 0.2) - 0.1, 0.2 *(k %/% 0.2) + 0.1) Note that this works well to quickly recode a numeric variable that only takes integer values. But it produces the problem that prompted my call for help when there are decimals: some values end up in a different class than what you'd expect. Considering your answers, I found 3 solutions: k2 <- ifelse(k<0,-0.2*(abs(round(10*k)) %/% 2) - 0.1, 0.2 *(round(10*k) %/% 2) + 0.1) k3 <- (-0.1+min(k)) + 0.2 * as.numeric(cut(k, seq(min(k),max(k)+0.2,0.2), right=F, labels=F)) k4 <- cut2(k, seq(min(k), max(k)+0.2, 0.2), levels.mean=T) k5 <- as.numeric(levels(k7))[k7] I could "round" to 1 decimal to be even more exact but this is good enough. If it can be more elegant, please let me know! Denis> Subject: [R] 2 small problems: integer division and the nature of NA > > > Hi, > > I'm wondering why > > 48 %/% 2 gives 24 > but > 4.8 %/% 0.2 gives 23... > I'm not trying to round up here, but to find out how many times > something fits into something else, and the answer should have been the > same for both examples, no? > > On a different topic, I like the behavior of NAs better in R than in > SAS (at least they are not considered the smallest value for a > variable), but at the same time I am surprised that the sum of NAs is 0 > instead of NA. > > The sum of a vector having at least one NA but also valid data gives NA > if we do not specify na.rm=T. But with na.rm=T, we are telling sum to > give the sum of valid data, ignoring NAs that do not tell us anything > about the value of a variable. I found out while getting the sum of > small subsets of my data (such as when subsetting by several > variables), sometimes a "cell" only contained NAs for my response > variable. I would have expected the sum to be NA in such cases, as I do > not have a single data point telling me the value of my response here. > But R tells me the sum was zero in that cell! Was this behavior > considered "desirable" when sum was built? If not, any hope it will be > fixed? > > Sincerely, > > Denis Chabot >