RINNER Heinrich
2003-Aug-27 16:06 UTC
[R] read.spss (package foreign) and character columns
Dear R users! I am using R Version 1.7.1, Windows XP, package "foreign" (Version: 0.6-1), SPSS 11.5.1. There is one thing I noticed with "read.spss", and I'd like to ask if this is considered to be a feature, or possibly a bug: When reading character columns, character strings seem to get filled with blanks at the end. Simple example: In SPSS, create a file with one variable called "xchar" of type "A5" (character of length 5), and 3 values ("a", "ab", "abcde"), save it as "test.sav". In R:> library(foreign) > test <- read.spss("test.sav", to.data.frame=T) > testXCHAR 1 a 2 ab 3 abcde> levels(test$XCHAR)[1] "a " "ab " "abcde" Shouldn't it rather be "a" "ab" "abcde" (no blanks)? -Heinrich.
Prof Brian Ripley
2003-Aug-27 16:45 UTC
[R] read.spss (package foreign) and character columns
On Wed, 27 Aug 2003, RINNER Heinrich wrote:> Dear R users! > > I am using R Version 1.7.1, Windows XP, package "foreign" (Version: 0.6-1), > SPSS 11.5.1. > > There is one thing I noticed with "read.spss", and I'd like to ask if this > is considered to be a feature, or possibly a bug: > When reading character columns, character strings seem to get filled with > blanks at the end. > > Simple example: > In SPSS, create a file with one variable called "xchar" of type "A5" > (character of length 5), and 3 values ("a", "ab", "abcde"), save it as > "test.sav". > > In R: > > library(foreign) > > test <- read.spss("test.sav", to.data.frame=T) > > test > XCHAR > 1 a > 2 ab > 3 abcde > > levels(test$XCHAR) > [1] "a " "ab " "abcde" > > Shouldn't it rather be "a" "ab" "abcde" (no blanks)?You said it was a character string of length 5, not <=5. It's easy to strip trailing blanks (?sub has several ways). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
RINNER Heinrich <H.RINNER at tirol.gv.at> writes:> Dear R users! > > I am using R Version 1.7.1, Windows XP, package "foreign" (Version: 0.6-1), > SPSS 11.5.1. > > There is one thing I noticed with "read.spss", and I'd like to ask if this > is considered to be a feature, or possibly a bug: > When reading character columns, character strings seem to get filled with > blanks at the end. > > Simple example: > In SPSS, create a file with one variable called "xchar" of type "A5" > (character of length 5), and 3 values ("a", "ab", "abcde"), save it as > "test.sav". > > In R: > > library(foreign) > > test <- read.spss("test.sav", to.data.frame=T) > > test > XCHAR > 1 a > 2 ab > 3 abcde > > levels(test$XCHAR) > [1] "a " "ab " "abcde" > > Shouldn't it rather be "a" "ab" "abcde" (no blanks)?I believe they are being saved as fixed length strings in the SPSS file and R is just reading what it was given.
Thomas Petzoldt
2003-Aug-28 10:43 UTC
[R] read.spss (package foreign) and character columns
RINNER Heinrich wrote:> In R: > >>library(foreign) >>test <- read.spss("test.sav", to.data.frame=T) >>test > > XCHAR > 1 a > 2 ab > 3 abcde > >>levels(test$XCHAR) > > [1] "a " "ab " "abcde" > > Shouldn't it rather be "a" "ab" "abcde" (no blanks)?I think, that should be no problem since the blanks in XCHAR may be easily removed with gsub() If you have > s<-c("a ","ab ","ab cd ") then > gsub(" ","",s) [1] "a" "ab" "abcd" removes all spaces or > gsub(" *$","",s) [1] "a" "ab" "ab cd" removes all trailing spaces. Hope this helps! Thomas P.
RINNER Heinrich
2003-Aug-28 12:31 UTC
[R] read.spss (package foreign) and character columns
Thanks to Brian Ripley, Douglas Bates and Thomas Petzoldt for their comments. I agree that it is not really a problem, as you can easily use sub() after read.spss() to get rid of the blanks (I had already done that). On the other hand, it might be important to _know_ about the fact that characters are filled with blanks here. [I noticed it because I used a character variable as the common column in a merge(tab1,tab2,by=XCHAR), where tab1 came into R from an SPSS file using read.spss(), and tab2 came into R from an Excel file via odbc using odbcConnectExcel(). The merge failed on some cases, because some values of XCHAR from tab1 had trailing blanks, the values of tab2 had none.] I know now, so I know what to do in future cases. But as not everybody else might be aware of this, my suggestion would be that it could be worth adding a short comment about this in help(read.spss), so noone will be "surprised". Regards, Heinrich.> -----Urspr?ngliche Nachricht----- > Von: RINNER Heinrich [mailto:H.RINNER at tirol.gv.at] > Gesendet: Mittwoch, 27. August 2003 18:06 > An: 'r-help at stat.math.ethz.ch' > Betreff: [R] read.spss (package foreign) and character columns > > > Dear R users! > > I am using R Version 1.7.1, Windows XP, package "foreign" > (Version: 0.6-1), > SPSS 11.5.1. > > There is one thing I noticed with "read.spss", and I'd like > to ask if this > is considered to be a feature, or possibly a bug: > When reading character columns, character strings seem to get > filled with > blanks at the end. > > Simple example: > In SPSS, create a file with one variable called "xchar" of type "A5" > (character of length 5), and 3 values ("a", "ab", "abcde"), > save it as > "test.sav". > > In R: > > library(foreign) > > test <- read.spss("test.sav", to.data.frame=T) > > test > XCHAR > 1 a > 2 ab > 3 abcde > > levels(test$XCHAR) > [1] "a " "ab " "abcde" > > Shouldn't it rather be "a" "ab" "abcde" (no blanks)? > > -Heinrich. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >