thr3ads.net - R help - [R] Semantics of sequences in R [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Stavros Macrakis

2009-Feb-22 20:42 UTC

[R] Semantics of sequences in R

Inspired by the exchange between Rolf Turner and Wacek Kusnierczyk, I
thought I'd clear up for myself the exact relationship among the
various sequence concepts in R, including not only generic vectors
(lists) and atomic vectors, but also pairlists, factor sequences,
date/time sequences, and difftime sequences.

I tabulated type of sequence vs. property to see if I could make sense
of all this.  The properties I looked at were the predicates
is.{vector,list,pairlist}; whether various sequence operations (c,
rev, unique, sort, rle) can be used on objects of the various types,
and if relevant, whether they preserve the type of the input; and what
the length of class( as.XXX (1:2) ) is.

Here are the results (code to reproduce at end of email):

             numer list  plist fact  POSIXct difft
is.vector    TRUE  TRUE  FALSE FALSE FALSE   FALSE
is.list      FALSE TRUE  TRUE  FALSE FALSE   FALSE
is.pairlist  FALSE FALSE TRUE  FALSE FALSE   FALSE
c_keep?      TRUE  TRUE  FALSE FALSE TRUE    FALSE
rev_keep?    TRUE  TRUE  FALSE TRUE  TRUE    TRUE
unique_keep? TRUE  TRUE  "Err" TRUE  TRUE    FALSE
sort_keep?   TRUE  "Err" "Err" TRUE  TRUE    TRUE
rle_len      2     "Err" "Err" "Err"
"Err"   "Err"

Alas, this tabulation, rather than clarifying things for me, just
confused me more -- the diverse treatment of sequences by various
operations is all rather bewildering.

Wouldn't it be easier to teach, learn, and use R if there were more
consistency in the treatment of sequences?  I understand that in
long-running projects like S/R, there is an accumulation of
contributions by a variety of authors, but perhaps the time has come
for some cleanup at least for the base library?

             -s


# generic outer: for generic vectors and non-vectorized functions
gouter <-
  function(x,y,f,...)
  matrix( mapply( f,
                  rep(x,length(y)),
                  rep(y,each = length(x)),
                  SIMPLIFY = FALSE ), # don't coerce booleans to numerics
          length(x), length(y),
          dimnames = list( names(x), names(y) ) )

# if arg evaluation gives error, return "Err", else its value
if_err <-
  function(expr)
    { if (class(try(expr,silent = TRUE)) == "try-error")
"Err"
      else expr }
# {} needed so else will parse properly

# does f(x) have the same class as x?
keep_class <-
  function(f)
    function(x)
      if_err( all(class(x) == class(f(x))))

seqtest <- function(seq)
  {
    lseq <- length(seq)
    gouter(
       list(
            is.vector = is.vector,
            is.list = is.list,
            is.pairlist = is.pairlist,
            `c_keep?` = keep_class(c),
            `rev_keep?` = keep_class(rev) ,
            `unique_keep?` = keep_class(unique),
## Beware: unique prints an error message for bad args
## even within try(...,silent=TRUE)
            `sort_keep?` = keep_class(sort),
            rle_len = function(a) if_err(length(rle(a)$length))
            ),
       list(
            numer = as.numeric(seq),
            list = as.list(seq),
            plist = as.pairlist(seq),
            fact = as.factor(seq),
            POSIXct = as.POSIXct(seq,origin = '1970-1-1'),
            difft = as.difftime(seq,units = 'secs')
            ),
       function(f,a)f(a)
       )
  }

print(seqtest(1:2))
# This starts by printing [[1]] [1]...
# because of the bug in unique mentioned above

Duncan Murdoch

2009-Feb-22 21:12 UTC

head link

[R] Semantics of sequences in R

I think this was posted to the wrong list, so my followup is going to 
R-devel.

On 22/02/2009 3:42 PM, Stavros Macrakis wrote:> Inspired by the exchange between Rolf Turner and Wacek Kusnierczyk, I
> thought I'd clear up for myself the exact relationship among the
> various sequence concepts in R, including not only generic vectors
> (lists) and atomic vectors, but also pairlists, factor sequences,
> date/time sequences, and difftime sequences.
> 
> I tabulated type of sequence vs. property to see if I could make sense
> of all this.  The properties I looked at were the predicates
> is.{vector,list,pairlist}; whether various sequence operations (c,
> rev, unique, sort, rle) can be used on objects of the various types,
> and if relevant, whether they preserve the type of the input; and what
> the length of class( as.XXX (1:2) ) is.
> 
> Here are the results (code to reproduce at end of email):
> 
>              numer list  plist fact  POSIXct difft
> is.vector    TRUE  TRUE  FALSE FALSE FALSE   FALSE
> is.list      FALSE TRUE  TRUE  FALSE FALSE   FALSE
> is.pairlist  FALSE FALSE TRUE  FALSE FALSE   FALSE
> c_keep?      TRUE  TRUE  FALSE FALSE TRUE    FALSE
> rev_keep?    TRUE  TRUE  FALSE TRUE  TRUE    TRUE
> unique_keep? TRUE  TRUE  "Err" TRUE  TRUE    FALSE
> sort_keep?   TRUE  "Err" "Err" TRUE  TRUE    TRUE
> rle_len      2     "Err" "Err" "Err"
"Err"   "Err"
> 
> Alas, this tabulation, rather than clarifying things for me, just
> confused me more -- the diverse treatment of sequences by various
> operations is all rather bewildering.
But you are asking lots of different questions, so of course you should 
get different answers.  For example, the first three rows are behaving 
exactly as documented.  (Perhaps the functions should have been designed 
differently, but a pretty-looking matrix isn't an argument for that. 
Give some examples of how the documented behaviour is causing problems.)

I think some of the operations in the later rows are undocumented 
(generally pairlists tend not to be documented, even if in some cases 
they are supported), and it might make sense to make them more 
consistent in the undocumented cases.  But it may make more sense to 
completely hide pairlists, for instance, and then several more of the 
examples are behaving as documented.  (BTW, your description of your 
last row doesn't match what you did, as far as I can see.)
> Wouldn't it be easier to teach, learn, and use R if there were more
> consistency in the treatment of sequences?  
Which ones in particular should change?  What should they change to? 
What will break when you do that?

 > I understand that in> long-running projects like S/R, there is an accumulation of
> contributions by a variety of authors, but perhaps the time has come
> for some cleanup at least for the base library?
Generally R core members are reluctant to take on work just because 
someone else thinks it would be nice if they did.  If you want to do 
this, that's one thing, but if you are just saying that it would be nice 
if someone else did it, then it's much less likely to get done.  To get 
someone else to do it you need to convince them that it's a valuable use 
of their time, and I don't see that yet.

Duncan Murdoch

Stavros Macrakis

2009-Feb-22 21:50 UTC

head link

[Rd] [R] Semantics of sequences in R

On Sun, Feb 22, 2009 at 4:12 PM, Duncan Murdoch <murdoch at stats.uwo.ca>
wrote:> I think this was posted to the wrong list, so my followup is going to
> R-devel.
OK.
> On 22/02/2009 3:42 PM, Stavros Macrakis wrote:
>>
>> Inspired by the exchange between Rolf Turner and Wacek Kusnierczyk, I
>> thought I'd clear up for myself the exact relationship among the
>> various sequence concepts in R, including not only generic vectors
>> (lists) and atomic vectors, but also pairlists, factor sequences,
>> date/time sequences, and difftime sequences.
>>
>> I tabulated type of sequence vs. property to see if I could make sense
>> of all this.  The properties I looked at were the predicates
>> is.{vector,list,pairlist}; whether various sequence operations (c,
>> rev, unique, sort, rle) can be used on objects of the various types,
>> and if relevant, whether they preserve the type of the input; and what
>> the length of class( as.XXX (1:2) ) is.
>>
>> Here are the results (code to reproduce at end of email):
>>
>>             numer list  plist fact  POSIXct difft
>> is.vector    TRUE  TRUE  FALSE FALSE FALSE   FALSE
>> is.list      FALSE TRUE  TRUE  FALSE FALSE   FALSE
>> is.pairlist  FALSE FALSE TRUE  FALSE FALSE   FALSE
>> c_keep?      TRUE  TRUE  FALSE FALSE TRUE    FALSE
>> rev_keep?    TRUE  TRUE  FALSE TRUE  TRUE    TRUE
>> unique_keep? TRUE  TRUE  "Err" TRUE  TRUE    FALSE
>> sort_keep?   TRUE  "Err" "Err" TRUE  TRUE    TRUE
>> rle_len      2     "Err" "Err" "Err"
"Err"   "Err"
>>
>> Alas, this tabulation, rather than clarifying things for me, just
>> confused me more -- the diverse treatment of sequences by various
>> operations is all rather bewildering.
>
> But you are asking lots of different questions, so of course you should get
> different answers.  For example, the first three rows are behaving exactly
> as documented.
Yes, I wasn't questioning that.  This started out as an exploration of
Rolf's claim that "vectors can be considered to be lists", which I
think the table shows pretty clearly not to be true.  He did qualify
the claim with "At a certain level.", but I don't know what that
level
is....
> (Perhaps the functions should have been designed
> differently, but a pretty-looking matrix isn't an argument for that.
Give
> some examples of how the documented behaviour is causing problems.)
>From my own experience, and the experience of colleagues who havetried to learn R, I can tell you that these idiosyncracies make
learning the system more difficult.  A "pretty-looking matrix" is a
reflection of an orthogonal design, which is generally considered to
be a good thing. Many of the missing operations are perfectly
meaningful and useful.
> ...But it may make more sense to completely hide pairlists,
I agree that the pairlist cases are the least interesting.
> (BTW, your description of your last row doesn't match what you did, as
far as I can see.)
Yes, sorry, older draft....
>> Wouldn't it be easier to teach, learn, and use R if there were more
>> consistency in the treatment of sequences?
>
> Which ones in particular should change?  What should they change to? What
> will break when you do that?
In many cases, the orthogonal design is pretty straightforward.  And
in the cases where the operation is currently an error (e.g.
sort(list(...))), I'd hope that wouldn't break existing code. There
are certainly cases which would be hard to change without breaking
existing code....
> Generally R core members are reluctant to take on work just because someone
> else thinks it would be nice if they did.
I understand this principle quite well, having been a contributor to
other similar projects.  I was simply starting the discussion.  After
all, if the core group disagrees that the functions should be made
more orthogonal, it is a waste of my time to submit code.
>  If you want to do this, that's one thing,
I have already suggested code changes in some (pretty trivial) cases
-- see r-help Feb 6, 2009 6:17 PM "Operations on difftime (abs, /, c)"
-- but perhaps r-help was the wrong place to send them.  I will
forward to r-devel.  And I will be happy to work on some of the
consistency issues I've mentioned here.

             -s

Maybe Matching Threads

Search for more reasonably related threads

R help - Feb 2009 - Semantics of sequences in R

[R] Semantics of sequences in R

[R] Semantics of sequences in R

[Rd] [R] Semantics of sequences in R

Maybe Matching Threads