Hi all, I love the option to not automatically convert strings into factors, but there are three places that the current option doesn't work where I think it should: options(stringsAsFactors = FALSE) str(expand.grid(letters)) str(type.convert(letters)) df <- read.fwf(textConnection(paste(letters,collapse="\n")), 1) str(df) I think type.convert and read.fwf can be fixed by giving them a stringsAsFactors argument and then using asis = !stringsAsFactors (like read.table). The key lines in expand.grid would seem to be if (!is.factor(x) && is.character(x)) x <- factor(x, levels = unique(x)) but I'm not sure why they are being converted to factors in the first place. Regards, Hadley -- http://had.co.nz/
On Mon, 17 Nov 2008, hadley wickham wrote:> Hi all, > > I love the option to not automatically convert strings into factors, > but there are three places that the current option doesn't work where > I think it should:Perhaps you mean 'when I would like it to'? Things *should* work as documented, surely?> options(stringsAsFactors = FALSE) > > str(expand.grid(letters)) > str(type.convert(letters)) > > df <- read.fwf(textConnection(paste(letters,collapse="\n")), 1) > str(df)I get> str(df)'data.frame': 26 obs. of 1 variable: $ V1: chr "a" "b" "c" "d" ... so what is wrong with that? read.fwf just calls read.table, so the default options of read.table apply.> I think type.convert and read.fwf can be fixed by giving them a > stringsAsFactors argument and then using asis = !stringsAsFactors > (like read.table).Seems to me that there is nothing wrong with read.fwf. For type.convert() we could have the default as.is = !default.stringsAsFactors() but I think a strong case needs to be made to change the documented behaviour.> The key lines in expand.grid would seem to be > > if (!is.factor(x) && is.character(x)) > x <- factor(x, levels = unique(x)) > > but I'm not sure why they are being converted to factors in the first place.Nor I am, but it goes back to at least r2107, over 10 years ago. I don't see much problem with adding a 'stringsAsFactors' argument there. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
> From: r-devel-bounces at r-project.org > [mailto:r-devel-bounces at r-project.org] On Behalf Of hadley wickham > Sent: Monday, November 17, 2008 5:10 AM > To: r-devel at r-project.org > Subject: [Rd] stringsAsFactors = FALSE > ... > The key lines in > expand.grid would seem to be > > if (!is.factor(x) && is.character(x)) > x <- factor(x, levels = unique(x)) > > but I'm not sure why they are being converted to factors in > the first place.I think expand.grid converts input strings to factors so they retain the order they have in the input. (Note that the levels argument is unique(x), not the sort(unique(x)) that data.frame uses.) People generally give expand.grid sorted input and expect it to not alter the order (the order of the levels affects tables and and some plots).>lapply(expand.grid(Grade=c("Bad","Good","Better"),Size=c("Small","Medium ","Large")), levels) $Grade [1] "Bad" "Good" "Better" $Size [1] "Small" "Medium" "Large">lapply(data.frame(Grade=c("Bad","Good","Better"),Size=c("Small","Medium" ,"Large")), levels) $Grade [1] "Bad" "Better" "Good" $Size [1] "Large" "Medium" "Small" I have nothing against adding the stringsAsFactors argument to expand.grid. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com