(a) In R 2.12.2 rowsum can overflow if given an integer input: > rowsum(c(2e9L, 2e9L), c("a", "a")) [,1] a -294967296 > 2^32 + .Last.value [,1] a 4e+09 Should it be changed to coerce its x argument to numeric (double precision) so it always returns a numeric output? (b) When rowsum is given an x containing both NaN and NA it appears to use the last of the NaN/NA entries to determine if the output is NaN or NA while the `+` function uses the first: > z <- cbind( c(NA,NA), c(NA,NaN), c(NaN,NA), c(NaN,NaN)) > rowsum(z, c("a","a")) [,1] [,2] [,3] [,4] a NA NaN NA NaN > z[1,,drop=FALSE] + z[2,,drop=FALSE] [,1] [,2] [,3] [,4] [1,] NA NA NaN NaN (The name rowsum is a metabug, since it may be confused with the entirely different rowSums, but it has been around for a long time.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
On Fri, 25 Mar 2011, William Dunlap wrote:> (a) In R 2.12.2 rowsum can overflow if given an integer input: > > rowsum(c(2e9L, 2e9L), c("a", "a")) > [,1] > a -294967296 > > 2^32 + .Last.value > [,1] > a 4e+09 > Should it be changed to coerce its x argument to numeric > (double precision) so it always returns a numeric output?No, I don't think so. But it should return NA on overflow (as sum() does), and I've altered pre-2.13.0 to do so.> (b) When rowsum is given an x containing both NaN and NA it > appears to use the last of the NaN/NA entries to determine > if the output is NaN or NA while the `+` function uses the > first: > > z <- cbind( c(NA,NA), c(NA,NaN), c(NaN,NA), c(NaN,NaN)) > > rowsum(z, c("a","a")) > [,1] [,2] [,3] [,4] > a NA NaN NA NaN > > z[1,,drop=FALSE] + z[2,,drop=FALSE] > [,1] [,2] [,3] [,4] > [1,] NA NA NaN NaNWhich is not a bug: R does not claim to be consistent about this (except for a few documented functions), and there are lots of instances of this.> (The name rowsum is a metabug, since it may be confused > with the entirely different rowSums, but it has been around > for a long time.)A lot longer than rowSums ...> Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi, On 03/29/2011 01:24 AM, Prof Brian Ripley wrote:> On Fri, 25 Mar 2011, William Dunlap wrote:[...]> >> (The name rowsum is a metabug, since it may be confused >> with the entirely different rowSums, but it has been around >> for a long time.) > > A lot longer than rowSums ...Another problem with the current naming is the inconsistent use of the row/col prefixes, IMO: > x <- matrix(runif(100), ncol=5) > rowsum(x, rep(1, 20)) [,1] [,2] [,3] [,4] [,5] 1 11.13374 10.50038 10.0258 11.04087 8.150401 > colSums(x) [1] 11.133738 10.500381 10.025805 11.040867 8.150401 and the fact that the See Also section points to rowSums and not colSums, which adds to the confusion... Cheers, H.> >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com >-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319