RINNER Heinrich
2010-Jan-12 10:28 UTC
[R] read.spss: option "to.data.frame" and string variables
Dear R-users, I am using R version 2.10.1 and package foreign version 0.8-39 under windows. When reading .sav-Files (PASW Statistics 18.0.1) containing string variables, these are automatically converted to factors when using option "to.data.frame = TRUE" (see example below). It's clear to me why this happens (the default behaviour of a call to as.data.frame). But this is not always what one might want (or even be aware of). So maybe one of the following improvements could be made? * Add a description of this behaviour in ?read.spss. * Or (even better): Add an extra argument, like: read.spss("C:\\temp\\test.sav", to.data.frame = TRUE, stringsAsFactors = FALSE). Just a suggestion; kind regards Heinrich. # EXAMPLE: Suppose there is a simple file "test.sav", containing one variable ("x") of type STRING with 3 values (a,b,c).> library(foreign) > test <- read.spss("C:\\temp\\test.sav") > test$x [1] "a " "b " "c " attr(,"label.table") attr(,"label.table")$x NULL attr(,"codepage") [1] 1252> is.factor(test$x)[1] FALSE> is.character(test$x)[1] TRUE # Ok, that's just fine. But things change when using option "to.data.frame = TRUE":> test <- read.spss("C:\\temp\\test.sav", to.data.frame = TRUE) > testx 1 a 2 b 3 c> is.factor(test$x)[1] TRUE> is.character(test$x)[1] FALSE
David Winsemius
2010-Jan-12 11:08 UTC
[R] read.spss: option "to.data.frame" and string variables
It would be an significant undertaking to annotate all the places where the default behavior of strings-to-factors conversion might trip up the unwary. You are not the first by any means to complain. You might: a) take the step that the Mayo Clinic has taken of setting the default in options() to FALSE, or b) make your own read.spss with your desired arguments, and then put it in your .Rprofile. -- David On Jan 12, 2010, at 5:28 AM, RINNER Heinrich wrote:> Dear R-users, > > I am using R version 2.10.1 and package foreign version 0.8-39 under > windows. > > When reading .sav-Files (PASW Statistics 18.0.1) containing string > variables, these are automatically converted to factors when using > option "to.data.frame = TRUE" (see example below). > It's clear to me why this happens (the default behaviour of a call > to as.data.frame). But this is not always what one might want (or > even be aware of). > > So maybe one of the following improvements could be made? > * Add a description of this behaviour in ?read.spss. > * Or (even better): Add an extra argument, like: read.spss("C:\\temp\ > \test.sav", to.data.frame = TRUE, stringsAsFactors = FALSE). > > Just a suggestion; > kind regards > Heinrich. > > # EXAMPLE: > Suppose there is a simple file "test.sav", containing one variable > ("x") of type STRING with 3 values (a,b,c). >> library(foreign) >> test <- read.spss("C:\\temp\\test.sav") >> test > $x > [1] "a " "b " "c " > > attr(,"label.table") > attr(,"label.table")$x > NULL > > attr(,"codepage") > [1] 1252 >> is.factor(test$x) > [1] FALSE >> is.character(test$x) > [1] TRUE > # Ok, that's just fine. But things change when using option > "to.data.frame = TRUE": >> test <- read.spss("C:\\temp\\test.sav", to.data.frame = TRUE) >> test > x > 1 a > 2 b > 3 c >> is.factor(test$x) > [1] TRUE >> is.character(test$x) > [1] FALSE > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.