Stavros Macrakis
2010-Jan-22 01:19 UTC
[Rd] Inconsistency in as.data.frame.table for stringsAsFactors
I noticed that in as.data.frame.table, the stringsAsFactors argument defaults to TRUE, whereas in the other as.data.frame methods, it defaults to default.stringsAsFactors(). The documentation and implementation agree on this, so this is not a bug. However, I was wondering if this disparity was intended or if it might be some sort of unintentional oversight. If it is intentional, I wonder what the rationale is. Thanks, -s [[alternative HTML version deleted]]
Martin Maechler
2010-Jan-22 08:17 UTC
[Rd] Inconsistency in as.data.frame.table for stringsAsFactors
>>>>> "SM" == Stavros Macrakis <macrakis at alum.mit.edu> >>>>> on Thu, 21 Jan 2010 20:19:28 -0500 writes:SM> I noticed that in as.data.frame.table, the stringsAsFactors argument SM> defaults to TRUE, whereas in the other as.data.frame methods, it defaults to SM> default.stringsAsFactors(). SM> The documentation and implementation agree on this, so this is not a bug. SM> However, I was wondering if this disparity was intended or if it might be SM> some sort of unintentional oversight. If it is intentional, I wonder what SM> the rationale is. Some of us (including me) have strongly argued on several occasions that global options() settings should *not* have an effect on anything "computing" but just on "output" i.e. printing/graphing of R results. As it is currently, potentially R scripts and R functions may only work correctly for one setting of options( stringsAsFactors = * ) which is against all principles of functional programming.>From this (my) point of view, we should strive to eventually deprecatedefault.stringsAsFactors() which basically returns getOption("stringsAsFactors"), or as first/2nd step redefine it as default.stringsAsFactors <- function() TRUE Martin M?chler. SM> Thanks, SM> -s
S Ellison
2010-Jan-22 14:25 UTC
[Rd] Inconsistency in as.data.frame.table for stringsAsFactors
>> Some of us (including me) have strongly argued on several >> occasions that global options() settings should *not* have aneffect>> on anything "computing" ... > ...Global options are less of a problem where a function allows them to be overridden by the user or programmer. If something is affected by a global option, a programmer desiring consistent behaviour then has a simple recourse - set it explicitly in the call. In other words, the programmer should be able to enforce that principle of functional programming; an observation which I would offer as a wider imperative than the language having to do so. That is, I believe that 'it should always be possible for a user to set any parameter used by a function' irrespective of the existence or otherwise of global options. (I believe in programmer choice in these things). In a sense, too, a global option expresses a user choice for a set of operations. There are often good reasons for this; for example, factor contrasts. It has drastic effects on the model computed, but there seems good reason for the convenience of allowing a user to set contrasts for a series of related analyses rather than setting them individually on each model call. The (small number of pretty much trivial) defaults that have, over the last few years, given me a temporary headache are not globals or argument defaults; they have been hardwired defaults that couldn't be changed in the function calls. Rewriting the function is almost always possible, but not quite a straightforward method of overriding a default! To an extent, the same can be said of global options that affect a function, are user-settable but can't be overridden in the call itself. The main 'culprits' of this tend to be older graphics calls, which often respect par() options but don't all take all the par() options in '...' . But in general I too concur; there shouldn't be global options without pretty good reason.>Which isn't to say I don't think that you're right - I would hate for >R to head in the direction of PHP ....PHP is indeed an example to stay away from; it changes the nature of data without allowing a test on the stored data to reveal the fact. By contrast, stringsAsFactors produces a _detectable_ effect on the data; we can tell what form the data has now, irrespective of system settings now or (worse) on original input. Steve Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
Terry Therneau
2010-Jan-25 15:36 UTC
[Rd] Inconsistency in as.data.frame.table for stringsAsFactors
Kudos to Peter for actually answering the question of why the inconsistency was there. It might be well to add a bit to the documentation. As to the larger discussion of global defaults let me offer two opinions: 1. They are the salvation of those of us who do not agree with certain global defaults. -- 'best practice' is not always a consensus -- defaults are often informed too much by "the data we happened to be analyising when we decided the default". The long-standing contrast.helmert one for instance; a look at the white book shows that they were working on an orthagonal manufacturing design, the one case where Helmert contrasts make sense. The survival package contains several defaults with the same type of origin. 2. People in these discussions play the "it might break something" card far too often. At Mayo, for instance, the table() command has been replaced by one which lists NA by default, for all data types. We've done this for as long as R and Splus have been used (10+ years), for all 150 people in the biostat group, and nothing has broken yet. A suggestion to allow this as a global default will immediately elicit the above argument, I guarrantee it. Ditto for our experience with stringsAsFactors=FALSE; nothing's broken yet. Give a concrete example before crying wolf. Terry T
Gabor Grothendieck
2010-Jan-25 15:49 UTC
[Rd] Inconsistency in as.data.frame.table for stringsAsFactors
On Mon, Jan 25, 2010 at 10:36 AM, Terry Therneau <therneau at mayo.edu> wrote:> Kudos to Peter for actually answering the question of why the > inconsistency was there. ?It might be well to add a bit to the > documentation. > > ?As to the larger discussion of global defaults let me offer two > opinions: > ?1. They are the salvation of those of us who do not agree with certainAs soon as you have to interface with other software that may require a different global default it becomes problematic.