Denis Chabot
2005-Feb-04 16:00 UTC
[R] 2 small problems: integer division and the nature of NA
Hi, I'm wondering why 48 %/% 2 gives 24 but 4.8 %/% 0.2 gives 23... I'm not trying to round up here, but to find out how many times something fits into something else, and the answer should have been the same for both examples, no? On a different topic, I like the behavior of NAs better in R than in SAS (at least they are not considered the smallest value for a variable), but at the same time I am surprised that the sum of NAs is 0 instead of NA. The sum of a vector having at least one NA but also valid data gives NA if we do not specify na.rm=T. But with na.rm=T, we are telling sum to give the sum of valid data, ignoring NAs that do not tell us anything about the value of a variable. I found out while getting the sum of small subsets of my data (such as when subsetting by several variables), sometimes a "cell" only contained NAs for my response variable. I would have expected the sum to be NA in such cases, as I do not have a single data point telling me the value of my response here. But R tells me the sum was zero in that cell! Was this behavior considered "desirable" when sum was built? If not, any hope it will be fixed? Sincerely, Denis Chabot
Uwe Ligges
2005-Feb-04 16:40 UTC
[R] 2 small problems: integer division and the nature of NA
Denis Chabot wrote:> Hi, > > I'm wondering why > > 48 %/% 2 gives 24 > but > 4.8 %/% 0.2 gives 23... > I'm not trying to round up here, but to find out how many times > something fits into something else, and the answer should have been the > same for both examples, no?No. Not from the perspective of a digital computer who cannot represent all real numbers exactly (well, only a very small subset, since we are using floating point arithmetics) ...> On a different topic, I like the behavior of NAs better in R than in SAS > (at least they are not considered the smallest value for a variable), > but at the same time I am surprised that the sum of NAs is 0 instead of NA.It *is* NA: sum(c(NA, NA)) # [1] NA sum(c(NA, 1)) # [1] NA> The sum of a vector having at least one NA but also valid data gives NA > if we do not specify na.rm=T. But with na.rm=T, we are telling sum to > give the sum of valid data, ignoring NAs that do not tell us anything > about the value of a variable. I found out while getting the sum of > small subsets of my data (such as when subsetting by several variables), > sometimes a "cell" only contained NAs for my response variable. I would > have expected the sum to be NA in such cases, as I do not have a single > data point telling me the value of my response here. But R tells me the > sum was zero in that cell! Was this behavior considered "desirable" when > sum was built? If not, any hope it will be fixed????? I don't get your point! If you *remove* NAs as in sum(c(NA, NA), na.rm=TRUE) # [1] 0 sum(c(NA, 1), na.rm=TRUE) # [1] 1 you are summing up not that much.... so what do you expect in the cases above? Please read the docs on NA handling. Uwe Ligges> Sincerely, > > Denis Chabot > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
Peter Dalgaard
2005-Feb-04 16:42 UTC
[R] 2 small problems: integer division and the nature of NA
Denis Chabot <chabotd at globetrotter.net> writes:> Hi, > > I'm wondering why > > 48 %/% 2 gives 24 > but > 4.8 %/% 0.2 gives 23... > I'm not trying to round up here, but to find out how many times > something fits into something else, and the answer should have been > the same for both examples, no?Well, you can't trust floating point numbers to give you an exact result:> 4.8 / 0.2 - 24[1] -3.552714e-15 and even> (48/10) / (2/10) - 24[1] -3.552714e-15 the basic issue being that tenths are not exactly representable in binary floating point. I think very few people even expected you to use integer division on non-integers, but I note that the claim on the help page actually holds:> 0.2 * 4.8 %/% 0.2 + 4.8 %% 0.2 == 4.8[1] TRUE> On a different topic, I like the behavior of NAs better in R than in > SAS (at least they are not considered the smallest value for a > variable), but at the same time I am surprised that the sum of NAs is > 0 instead of NA. > > The sum of a vector having at least one NA but also valid data gives > NA if we do not specify na.rm=T. But with na.rm=T, we are telling sum > to give the sum of valid data, ignoring NAs that do not tell us > anything about the value of a variable. I found out while getting the > sum of small subsets of my data (such as when subsetting by several > variables), sometimes a "cell" only contained NAs for my response > variable. I would have expected the sum to be NA in such cases, as I > do not have a single data point telling me the value of my response > here. But R tells me the sum was zero in that cell! Was this behavior > considered "desirable" when sum was built? If not, any hope it will be > fixed?Yes it was, and no there isn't. In math, the sum over an empty index set is zero, which has some nice consistency properties (the sum over a disjoint union of sets is the sum of the sums over each set, for instance. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Spencer Graves
2005-Feb-04 16:43 UTC
[R] 2 small problems: integer division and the nature of NA
It's the difference between integers and reals: 48 and 24 are integers; 4.8 and 0.2 are floating point numbers. Consider: > (4.8+.Machine$double.eps) %/% (0.2-.Machine$double.eps) [1] 24 > (4.8-.Machine$double.eps) %/% (0.2+.Machine$double.eps) [1] 23 > Does this help? spencer graves Denis Chabot wrote:> Hi, > > I'm wondering why > > 48 %/% 2 gives 24 > but > 4.8 %/% 0.2 gives 23... > I'm not trying to round up here, but to find out how many times > something fits into something else, and the answer should have been > the same for both examples, no? > > On a different topic, I like the behavior of NAs better in R than in > SAS (at least they are not considered the smallest value for a > variable), but at the same time I am surprised that the sum of NAs is > 0 instead of NA. > > The sum of a vector having at least one NA but also valid data gives > NA if we do not specify na.rm=T. But with na.rm=T, we are telling sum > to give the sum of valid data, ignoring NAs that do not tell us > anything about the value of a variable. I found out while getting the > sum of small subsets of my data (such as when subsetting by several > variables), sometimes a "cell" only contained NAs for my response > variable. I would have expected the sum to be NA in such cases, as I > do not have a single data point telling me the value of my response > here. But R tells me the sum was zero in that cell! Was this behavior > considered "desirable" when sum was built? If not, any hope it will be > fixed? > > Sincerely, > > Denis Chabot > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
Huntsinger, Reid
2005-Feb-04 16:50 UTC
[R] 2 small problems: integer division and the nature of NA
It's convention in mathematics that the empty sum is 0. You can think of this as a generalization of 0*x = 0. Reid Huntsinger -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Denis Chabot Sent: Friday, February 04, 2005 11:01 AM To: r-help at stat.math.ethz.ch Subject: [R] 2 small problems: integer division and the nature of NA Hi, I'm wondering why 48 %/% 2 gives 24 but 4.8 %/% 0.2 gives 23... I'm not trying to round up here, but to find out how many times something fits into something else, and the answer should have been the same for both examples, no? On a different topic, I like the behavior of NAs better in R than in SAS (at least they are not considered the smallest value for a variable), but at the same time I am surprised that the sum of NAs is 0 instead of NA. The sum of a vector having at least one NA but also valid data gives NA if we do not specify na.rm=T. But with na.rm=T, we are telling sum to give the sum of valid data, ignoring NAs that do not tell us anything about the value of a variable. I found out while getting the sum of small subsets of my data (such as when subsetting by several variables), sometimes a "cell" only contained NAs for my response variable. I would have expected the sum to be NA in such cases, as I do not have a single data point telling me the value of my response here. But R tells me the sum was zero in that cell! Was this behavior considered "desirable" when sum was built? If not, any hope it will be fixed? Sincerely, Denis Chabot ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Gabor Grothendieck
2005-Feb-04 19:48 UTC
[R] 2 small problems: integer division and the nature of NA
Denis Chabot <chabotd <at> globetrotter.net> writes: : The sum of a vector having at least one NA but also valid data gives NA : if we do not specify na.rm=T. But with na.rm=T, we are telling sum to : give the sum of valid data, ignoring NAs that do not tell us anything : about the value of a variable. I found out while getting the sum of : small subsets of my data (such as when subsetting by several : variables), sometimes a "cell" only contained NAs for my response : variable. I would have expected the sum to be NA in such cases, as I do : not have a single data point telling me the value of my response here. : But R tells me the sum was zero in that cell! Was this behavior : considered "desirable" when sum was built? If not, any hope it will be : fixed? Think of it this way: If u and v are index vectors then its desirable that sum(x[u]) + sum(x[v]) == sum(x[c(u,v)]) hold for zero length index vectors too in which case sum(numeric()) should be zero, not NA. If you want a short expression that gives NA for zero length x try this: sum(x) + if (length(x)) 0 else NA or define your own function, sum0, say.