Full_Name: Jens Oehlschl?gel Version: 2.10.1 OS: Windows XP Submission from: (NULL) (156.109.18.2) # fine as expected from help page: # "from+by, ..., up to the sequence value less than or equal to to" # thus 1+10=11 is not in> seq.int(1L, 10L, by=10L)[1] 1 # of course 1+1e7 should also not be in # but is: wrong> seq.int(1L, 1e7L, by=1e7L)[1] 1e+00 1e+07 # since we use seq.int AND are within integer range, rounding should not be an issue> version_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 10.1 year 2009 month 12 day 14 svn rev 50720 language R version.string R version 2.10.1 (2009-12-14)
On Mon, Dec 28, 2009 at 08:50:13PM +0100, oehl_list at gmx.de wrote: [...]> # fine as expected from help page: > # "from+by, ..., up to the sequence value less than or equal to to" > # thus 1+10=11 is not in > > seq.int(1L, 10L, by=10L) > [1] 1 > > # of course 1+1e7 should also not be in > # but is: wrong > > seq.int(1L, 1e7L, by=1e7L) > [1] 1e+00 1e+07 > > # since we use seq.int AND are within integer range, rounding should not be an > issueIn my opinion, this is a documented behavior. The Details section of the help page says Note that the computed final value can go just beyond 'to' to allow for rounding error, but (as from R 2.9.0) is truncated to 'to'. Since "by" is 1e7, going by 1 more than 'to' is "just beyond 'to'". What can be a bit misleading is the following difference between the type of seq() and seq.int(), which is only partially documented. x <- seq.int(from=1L, to=10000000L, by=10000000L); typeof(x); x # [1] "double" # [1] 1e+00 1e+07 x <- seq(from=1L, to=10000000L, by=10000000L); typeof(x); x # [1] "integer" # [1] 1 10000000 The Value section of the help page says: Currently, 'seq.int' and the default method of 'seq' return a result of type '"integer"' (if it is representable as that and) if 'from' is (numerically equal to an) integer and, e.g., only 'to' is specified, or also if only 'length' or only 'along.with' is specified. *Note:* this may change in the future and programmers should not rely on it. This suggests that we should get "double" in both cases, since all three arguments "from", "to", and "by" are specified. I do not know, whether having an "integer" result in seq() in the above case is intended or not. Petr Savicky.
Petr, Aside of the fact that the argument about someting bad being good because documented is strongly overused. I think this does NOT behave as documented, because a) the behaviour cannot be explained by rounding error on double precision. b) 1e7 is not even outside the range of integer calculation Up to the limit of a) or at least upto b) any expression of the type seq(a, b, by=b) should only return a but not b. Also something like seq.int should ONLY use and return integer, for performance reasons, but even more so for reliability: the reported behaviour is not just a little bit wrong. Since seq is used for looping in R, the looping of the language is broken. This can have severe consequences like accessing beyond the limits of an array. If C-code is involved, this can crash R. In the worst case algorithms can silently do wrong. Being an admirer of R since its early days, I was shocked to see this, and as a consequence, I suggest we do our homework and suspend -- for a year or two -- any claims that R can be used productive such as SAS. Yours regretfully Jens Oehlschl?gel