bill at insightful.com
2007-Jul-26 17:45 UTC
[Rd] sequence(c(2, 0, 3)) produces surprising results, would like output length to be sum(input) (PR#9811)
Full_Name: Bill Dunlap Version: 2.5.0 OS: Linux Submission from: (NULL) (70.98.76.47) sequence(nvec) is documented to return the concatenation of seq(nvec[i]), for i in seq(along=nvec). This produces inconvenient (for me) results for 0 inputs. > sequence(c(2,0,3)) # would like 1 2 1 2 3, ignore 0 [1] 1 2 1 0 1 2 3 Would changing sequence(nvec) to use seq_len(nvec[i]) instead of the current 1:nvec[i] break much existing code? On the other hand, almost no one seems to use sequence() and it might make more sense to allow seq_len() and seq() to accept a vector for length.out and they would return a vector of length sum(length.out), c(seq_len(length.out[1]), seq_len(length.out[2]), ...)
Bill Dunlap
2007-Jul-26 18:37 UTC
[Rd] sequence(c(2, 0, 3)) produces surprising results, would like output length to be sum(input) (PR#9811)
On Thu, 26 Jul 2007 bill at insightful.com wrote:> Full_Name: Bill Dunlap > Version: 2.5.0 > OS: Linux > Submission from: (NULL) (70.98.76.47) > > sequence(nvec) is documented to return > the concatenation of seq(nvec[i]), for > i in seq(along=nvec). This produces inconvenient > (for me) results for 0 inputs. > > sequence(c(2,0,3)) # would like 1 2 1 2 3, ignore 0 > [1] 1 2 1 0 1 2 3 > Would changing sequence(nvec) to use seq_len(nvec[i]) > instead of the current 1:nvec[i] break much existing code? > > On the other hand, almost no one seems to use sequence() > and it might make more sense to allow seq_len() and seq() > to accept a vector for length.out and they would return a > vector of length sum(length.out), > c(seq_len(length.out[1]), seq_len(length.out[2]), ...)seq_len() could be changed to do that with the following code change. It does slow down seq_len in the scalar case old time new time for(i in 1:1e6)seq_len(2) 1.251 1.516 for(i in 1:1e6)seq_len(20) 1.690 1.990 for(i in 1:1e6)seq_len(200) 5.480 5.860 It becomes much faster than sequence in the vectorized case. > unix.time(for(i in 1:1e4)sequence(20:1)) user system elapsed 1.550 0.000 1.557 > unix.time(for(i in 1:1e4)seq_len(20:1)) user system elapsed 0.070 0.000 0.066 > identical(sequence(20:1), seq_len(20:1)) [1] TRUE My problem cases are where the length.out vector is long and contains small integers (e.g., the output of table on a vector of mostly unique values). Index: src/main/seq.c ==================================================================--- src/main/seq.c (revision 42329) +++ src/main/seq.c (working copy) @@ -594,16 +594,31 @@ SEXP attribute_hidden do_seq_len(SEXP call, SEXP op, SEXP args, SEXP rho) { - SEXP ans; - int i, len, *p; + SEXP ans, slengths; + int i, *p, anslen, *lens, nlens, ilen, nprotected=0 ; checkArity(op, args); - len = asInteger(CAR(args)); - if(len == NA_INTEGER || len < 0) - errorcall(call, _("argument must be non-negative")); - ans = allocVector(INTSXP, len); + slengths = CAR(args); + if (TYPEOF(slengths) != INTSXP) { + PROTECT(slengths = coerceVector(CAR(args), INTSXP)); + nprotected++; + } + lens = INTEGER(slengths); + nlens = LENGTH(slengths); + anslen = 0 ; + for(ilen=0;ilen<nlens;ilen++) { + int len = lens[ilen] ; + if(len == NA_INTEGER || len < 0) + errorcall(call, _("argument must be non-negative")); + anslen += len ; + } + ans = allocVector(INTSXP, anslen); p = INTEGER(ans); - for(i = 0; i < len; i++) p[i] = i+1; - + for(ilen=0;ilen<nlens;ilen++) { + int len = lens[ilen] ; + for(i = 0; i < len; i++) *p++ = i+1; + } + if(nprotected>0) + UNPROTECT(nprotected); return ans; }