On 1/4/17 1:26 AM, Martin Maechler wrote:>>>>>> Mick Jordan <mick.jordan at oracle.com> >>>>>> on Tue, 3 Jan 2017 07:57:15 -0800 writes: > > This is a message for someone familiar with the implementation. > > Superficially the R code for seq.default and the C code for seq.int > > appear to be semantically very similar. My question is whether, in fact, > > it is intended that behave identically for all inputs. > > Strictly speaking, "no": As usual, RT?Manual (;-) > > The help page says in the very first paragraph ('Description'): > > ?seq? is a standard generic with a default method. > ?seq.int? is a primitive which can be much faster but has a few restrictions. > > > I have found two cases so far where they differ, first > > that seq.int will coerce a character string to a real (via > > Rf_asReal) whereas seq.default appears to coerce it to NA > > and then throws an error: > > >> seq.default("2", "5") > > Error in seq.default("2", "5") : 'from' cannot be NA, NaN or infinite > >> seq.int("2", "5") > > [1] 2 3 4 5 > >> > > this may be a bit surprising (if one does _not_ look at the code), > indeed, notably because seq.int() is mentioned to have more > restrictions than seq() which here calls seq.default(). > "Surprising" also when considering > > > "2":"5" > [1] 2 3 4 5 > > and the documentation of ':' claims 'from:to' to be the same as > rep(from,to) apart from the case of factors. > > --- I am considering a small change in seq.default() > which would make it work for this case, compatibly with ":" and seq.int(). > > > > and second, that the error messages for non-numeric arguments differ: > > which I find fine... if the functions where meant to be > identical, we (the R developers) would be silly to have both, > notably as the ".int" suffix has emerged as confusing the > majority of useRs (who don't read help pages). > > Rather it has been meant as saying "internal" (including "fast") also for other > such R functions, but the suffix of course is a potential clash > with S3 method naming schemes _and_ the fact that 'int' is used > as type name for integer in other languages, notably C. > > > seq.default(to=quote(b), by=2) > > Error in is.finite(to) : default method not implemented for type 'symbol' > > which I find a very appropriate and helpful message > > > seq.int(to=quote(b), by=2) > > Error in seq.int(to = quote(b), by = 2) : > > 'to' cannot be NA, NaN or infinite > > which is true, as well, and there's no "default method" to be > mentioned, but you are right that it would be nicer if the > message mentioned 'symbol' as well.Thanks for the clarifications. It was surprising that seq.int supported more types than seq.default. I was expecting the reverse. BTW, There are a couple of, admittedly odd, cases, exposed by brute force testing, where seq.int will actually return "missing", which I presume is not intended, and seq.default behaves differently, vis: > seq.default(to=1,by=2) [1] 1 > seq.int(to=1,by=2) > > x <- seq.int(to=1,by=2) > x Error: argument "x" is missing, with no default Lines 792 and 799 of seq.c return the incoming argument (as opposed to a value based on its coercion to double via asReal) and this can, as in the above example, be "missing".> > > Please reply off list. > > [which I understand as that we should CC you (which of course is > netiquette to do)] >Yes Thanks Mick Jordan
On 1/4/17 8:15 AM, Mick Jordan wrote: Here is another difference that I am guessing is unintended. > y <- seq.int(1L, 3L, length.out=2) > typeof(y) [1] "double" > x <- seq.default(1L, 3L, length.out=2) > typeof(x) [1] "integer" The if (by == R_MissingArg) branch at line 842 doesn't contain a check for "all INTSXP" unlike the if (to == R_MissingArg) branch. Mick
>>>>> Mick Jordan <mick.jordan at oracle.com> >>>>> on Wed, 4 Jan 2017 08:15:03 -0800 writes:> On 1/4/17 1:26 AM, Martin Maechler wrote: >>>>>>> Mick Jordan <mick.jordan at oracle.com> >>>>>>> on Tue, 3 Jan 2017 07:57:15 -0800 writes: >> > This is a message for someone familiar with the implementation. >> > Superficially the R code for seq.default and the C code for seq.int >> > appear to be semantically very similar. My question is whether, in fact, >> > it is intended that behave identically for all inputs. >> >> Strictly speaking, "no": As usual, RT?Manual (;-) >> >> The help page says in the very first paragraph ('Description'): >> >> ?seq? is a standard generic with a default method. >> ?seq.int? is a primitive which can be much faster but has a few restrictions. >> >> > I have found two cases so far where they differ, first >> > that seq.int will coerce a character string to a real (via >> > Rf_asReal) whereas seq.default appears to coerce it to NA >> > and then throws an error: >> >> >> seq.default("2", "5") >> > Error in seq.default("2", "5") : 'from' cannot be NA, NaN or infinite >> >> seq.int("2", "5") >> > [1] 2 3 4 5 >> >> >> >> this may be a bit surprising (if one does _not_ look at the code), >> indeed, notably because seq.int() is mentioned to have more >> restrictions than seq() which here calls seq.default(). >> "Surprising" also when considering >> >> > "2":"5" >> [1] 2 3 4 5 >> >> and the documentation of ':' claims 'from:to' to be the same as >> rep(from,to) apart from the case of factors. >> >> --- I am considering a small change in seq.default() >> which would make it work for this case, compatibly with ":" and seq.int(). >> >> >> > and second, that the error messages for non-numeric arguments differ: >> >> which I find fine... if the functions where meant to be >> identical, we (the R developers) would be silly to have both, >> notably as the ".int" suffix has emerged as confusing the >> majority of useRs (who don't read help pages). >> >> Rather it has been meant as saying "internal" (including "fast") also for other >> such R functions, but the suffix of course is a potential clash >> with S3 method naming schemes _and_ the fact that 'int' is used >> as type name for integer in other languages, notably C. >> >> > seq.default(to=quote(b), by=2) >> > Error in is.finite(to) : default method not implemented for type 'symbol' >> >> which I find a very appropriate and helpful message >> >> > seq.int(to=quote(b), by=2) >> > Error in seq.int(to = quote(b), by = 2) : >> > 'to' cannot be NA, NaN or infinite >> >> which is true, as well, and there's no "default method" to be >> mentioned, but you are right that it would be nicer if the >> message mentioned 'symbol' as well. > Thanks for the clarifications. It was surprising that seq.int supported > more types than seq.default. I was expecting the reverse. exactly, me too! > BTW, There are a couple of, admittedly odd, cases, exposed by brute > force testing, where seq.int will actually return "missing", which I > presume is not intended, and seq.default behaves differently, vis: >> seq.default(to=1,by=2) > [1] 1 >> seq.int(to=1,by=2) >> > x <- seq.int(to=1,by=2) >> x > Error: argument "x" is missing, with no default > Lines 792 and 799 of seq.c return the incoming argument (as opposed to a > value based on its coercion to double via asReal) and this can, as in > the above example, be "missing". > Thanks > Mick Jordan Thanks a lot, Mick -- you are right! I'm fixing these (the line numbers have greatly changed in the mean time: Remember we work with "R-devel", i.e., the "trunk" : always available at https://svn.r-project.org/R/trunk/src/main/seq.c Martin Maechler ETH Zurich
>>>>> Mick Jordan <mick.jordan at oracle.com> >>>>> on Wed, 4 Jan 2017 12:49:41 -0800 writes:> On 1/4/17 8:15 AM, Mick Jordan wrote: > Here is another difference that I am guessing is unintended. >> y <- seq.int(1L, 3L, length.out=2) >> typeof(y) > [1] "double" >> x <- seq.default(1L, 3L, length.out=2) >> typeof(x) > [1] "integer" > The if (by == R_MissingArg) branch at line 842 doesn't contain a check > for "all INTSXP" unlike the if (to == R_MissingArg) branch. > Mick I'll look at this case, too, thank you once more!
>>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Thu, 5 Jan 2017 12:39:29 +0100 writes:>>>>> Mick Jordan <mick.jordan at oracle.com> >>>>> on Wed, 4 Jan 2017 08:15:03 -0800 writes:>> On 1/4/17 1:26 AM, Martin Maechler wrote: >>>>>>>> Mick Jordan <mick.jordan at oracle.com> on Tue, 3 Jan >>>>>>>> 2017 07:57:15 -0800 writes: >>> > This is a message for someone familiar with the >>> implementation. > Superficially the R code for >>> seq.default and the C code for seq.int > appear to be >>> semantically very similar. My question is whether, in >>> fact, > it is intended that behave identically for all >>> inputs. >>> >>> Strictly speaking, "no": As usual, RT?Manual (;-) >>> >>> The help page says in the very first paragraph >>> ('Description'): >>> >>> ?seq? is a standard generic with a default method. >>> ?seq.int? is a primitive which can be much faster but >>> has a few restrictions. >>> >>> > I have found two cases so far where they differ, first >>> > that seq.int will coerce a character string to a real >>> (via > Rf_asReal) whereas seq.default appears to coerce >>> it to NA > and then throws an error: >>> >>> >> seq.default("2", "5") > Error in seq.default("2", >>> "5") : 'from' cannot be NA, NaN or infinite >> >>> seq.int("2", "5") > [1] 2 3 4 5 >>> >> >>> >>> this may be a bit surprising (if one does _not_ look at >>> the code), indeed, notably because seq.int() is >>> mentioned to have more restrictions than seq() which >>> here calls seq.default(). "Surprising" also when >>> considering >>> >>> > "2":"5" [1] 2 3 4 5 >>> >>> and the documentation of ':' claims 'from:to' to be the >>> same as rep(from,to) apart from the case of factors. >>> >>> --- I am considering a small change in seq.default() >>> which would make it work for this case, compatibly with >>> ":" and seq.int(). >>> >>> >>> > and second, that the error messages for non-numeric >>> arguments differ: >>> >>> which I find fine... if the functions where meant to be >>> identical, we (the R developers) would be silly to have >>> both, notably as the ".int" suffix has emerged as >>> confusing the majority of useRs (who don't read help >>> pages). >>> >>> Rather it has been meant as saying "internal" (including >>> "fast") also for other such R functions, but the suffix >>> of course is a potential clash with S3 method naming >>> schemes _and_ the fact that 'int' is used as type name >>> for integer in other languages, notably C. >>> >>> > seq.default(to=quote(b), by=2) > Error in >>> is.finite(to) : default method not implemented for type >>> 'symbol' >>> >>> which I find a very appropriate and helpful message >>> >>> > seq.int(to=quote(b), by=2) > Error in seq.int(to >>> quote(b), by = 2) : > 'to' cannot be NA, NaN or infinite >>> >>> which is true, as well, and there's no "default method" >>> to be mentioned, but you are right that it would be >>> nicer if the message mentioned 'symbol' as well. >> Thanks for the clarifications. It was surprising that >> seq.int supported more types than seq.default. I was >> expecting the reverse. > exactly, me too! >> BTW, There are a couple of, admittedly odd, cases, >> exposed by brute force testing, where seq.int will >> actually return "missing", which I presume is not >> intended, and seq.default behaves differently, vis: >>> seq.default(to=1,by=2) >> [1] 1 >>> seq.int(to=1,by=2) >>> > x <- seq.int(to=1,by=2) x >> Error: argument "x" is missing, with no default >> Lines 792 and 799 of seq.c return the incoming argument >> (as opposed to a value based on its coercion to double >> via asReal) and this can, as in the above example, be >> "missing". >> Thanks Mick Jordan > Thanks a lot, Mick -- you are right! > I'm fixing these (the line numbers have greatly changed in > the mean time: Remember we work with "R-devel", i.e., the > "trunk" : always available at > https://svn.r-project.org/R/trunk/src/main/seq.c This has happened in the mean time (and more is planned for another error message). And there's this --- where I'm pretty sure we want an error for seq.default() as well :> seq.int(1,7, by=1:2)Error: 'by' must be of length 1> seq(1,7, by=1:2)[1] 1 3 3 7 5 7 7 Warning messages: 1: In if (n < 0L) stop("wrong sign in 'by' argument") : the condition has length > 1 and only the first element will be used 2: In if (n > .Machine$integer.max) stop("'by' argument is much too small") : the condition has length > 1 and only the first element will be used 3: In 0L:n : numerical expression has 2 elements: only the first used 4: In (0L:n) * by : longer object length is not a multiple of shorter object length 5: In if (by > 0) pmin(x, to) else pmax(x, to) : the condition has length > 1 and only the first element will be used>