Jeroen Ooms
2008-Aug-01 15:17 UTC
[Rd] importing explicitly declared missing values in read.spss (foreign)
There is a problem when importing an spss-file containing explicitly declared missing values in R using the read.spss function from the foreign package. I'm not sure these problems are the same in every version of spss, I am using the latest version 16.0.2. I included http://www.nabble.com/file/p18776776/missingdata.sav missingdata.sav and http://www.nabble.com/file/p18776776/frequencies.jpg frequencies.jpg as an example. The data contains 3 types of missing data: 2 are explicitly declared as a missing-value ('8' = NA and '9' = NAP), the third type are the system missings. When this file is imported in R, only the system missings are recognized as missing values, the others are just imported as levels in the nominal case, and as (labeled) real values 8 and 9 in the continuous case. There are also no attributes in the object returned by read.spss that contain information about which values/levels are the missing values; their missingness seems to be completely ignored by the function. Is there some way or other function to be able to import spss files, with an option that replaces all missing values with <NA>'s in R? Of course this comes with the trade-off of losing the meaning of the missingness when there are multiple types of missingness, but I think this is far less harmfull than treating all missing values as normal values. [code]> mydata <- read.spss("c:/users/jeroen/desktop/missingdata.sav", > to.data.frame=T)Warning messages: 1: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame T) : c:/users/jeroen/desktop/missingdata.sav: File-indicated character representation code (1252) looks like a Windows codepage 2: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame T) : c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, subtype 16 encountered in system file 3: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame T) : c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, subtype 20 encountered in system file> mydataSUBJECT CATEGORI CONTINUO 1 1 yes 3.11 2 2 yes 2.10 3 3 yes 5.34 4 4 yes 1.54 5 5 yes 3.89 6 6 no 2.98 7 7 no 4.53 8 8 no 1.98 9 9 no 3.68 10 10 no 2.94 11 11 NA 8.00 12 12 NA 8.00 13 13 NA 8.00 14 14 NA 8.00 15 15 NA 8.00 16 16 NAP 9.00 17 17 NAP 9.00 18 18 NAP 9.00 19 19 NAP 9.00 20 20 NAP 9.00 21 21 <NA> NA 22 22 <NA> NA 23 23 <NA> NA 24 24 <NA> NA 25 25 <NA> NA> is.na(mydata$CONTINUO)[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE> is.na(mydata$CATEGORI)[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE> summary(mydata)SUBJECT CATEGORI CONTINUO Min. : 1 yes :5 Min. :1.540 1st Qu.: 7 no :5 1st Qu.:3.078 Median :13 NA :5 Median :6.670 Mean :13 NAP :5 Mean :5.854 3rd Qu.:19 NA's:5 3rd Qu.:8.250 Max. :25 Max. :9.000 NA's :5.000 [/code] -- View this message in context: http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18776776.html Sent from the R devel mailing list archive at Nabble.com.
Prof Brian Ripley
2008-Aug-04 05:39 UTC
[Rd] importing explicitly declared missing values in read.spss (foreign)
>From the messages you get I do not believe this is a recent version ofread.spss (message 2 no longer appears), and you haven't followed the posting guide and told us. However, your message 3 does still appear, and that might be significant. A small anount of googling came up with https://stat.ethz.ch/pipermail/r-help/2008-April/159342.html and I guess this is the same issue. A quick look at the code for read.spss() suggests that the information on user-defined missing values is being read in, and that there are yet more possible types of missingness (only some of which I understand). So what is needed is to return that info to the R user: now we have an example at least something shold be possible. On Fri, 1 Aug 2008, Jeroen Ooms wrote:> > There is a problem when importing an spss-file containing explicitly declared > missing values in R using the read.spss function from the foreign package. > I'm not sure these problems are the same in every version of spss, I am > using the latest version 16.0.2. > > I included http://www.nabble.com/file/p18776776/missingdata.sav > missingdata.sav and http://www.nabble.com/file/p18776776/frequencies.jpg > frequencies.jpg as an example. The data contains 3 types of missing data: 2 > are explicitly declared as a missing-value ('8' = NA and '9' = NAP), the > third type are the system missings. When this file is imported in R, only > the system missings are recognized as missing values, the others are just > imported as levels in the nominal case, and as (labeled) real values 8 and 9 > in the continuous case. There are also no attributes in the object returned > by read.spss that contain information about which values/levels are the > missing values; their missingness seems to be completely ignored by the > function. > > Is there some way or other function to be able to import spss files, with an > option that replaces all missing values with <NA>'s in R? Of course this > comes with the trade-off of losing the meaning of the missingness when there > are multiple types of missingness, but I think this is far less harmfull > than treating all missing values as normal values.If the missingness information were returned others are likely to disagree, especially for factors. All that is 'harmfull' is that you are not told that value labels NA and NAP were to be regarded as 'missing' in SPSS. We've no idea whether if would be a more or less egregious choice to map them to R's NA, and certainly are not in a position to assert 'far less harmfull' in general.> [code] >> mydata <- read.spss("c:/users/jeroen/desktop/missingdata.sav", >> to.data.frame=T) > Warning messages: > 1: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame > T) : > c:/users/jeroen/desktop/missingdata.sav: File-indicated character > representation code (1252) looks like a Windows codepage > 2: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame > T) : > c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, > subtype 16 encountered in system file > 3: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame > T) : > c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, > subtype 20 encountered in system file > >> mydata > SUBJECT CATEGORI CONTINUO > 1 1 yes 3.11 > 2 2 yes 2.10 > 3 3 yes 5.34 > 4 4 yes 1.54 > 5 5 yes 3.89 > 6 6 no 2.98 > 7 7 no 4.53 > 8 8 no 1.98 > 9 9 no 3.68 > 10 10 no 2.94 > 11 11 NA 8.00 > 12 12 NA 8.00 > 13 13 NA 8.00 > 14 14 NA 8.00 > 15 15 NA 8.00 > 16 16 NAP 9.00 > 17 17 NAP 9.00 > 18 18 NAP 9.00 > 19 19 NAP 9.00 > 20 20 NAP 9.00 > 21 21 <NA> NA > 22 22 <NA> NA > 23 23 <NA> NA > 24 24 <NA> NA > 25 25 <NA> NA > >> is.na(mydata$CONTINUO) > [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE > TRUE > >> is.na(mydata$CATEGORI) > [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE > TRUE > >> summary(mydata) > SUBJECT CATEGORI CONTINUO > Min. : 1 yes :5 Min. :1.540 > 1st Qu.: 7 no :5 1st Qu.:3.078 > Median :13 NA :5 Median :6.670 > Mean :13 NAP :5 Mean :5.854 > 3rd Qu.:19 NA's:5 3rd Qu.:8.250 > Max. :25 Max. :9.000 > NA's :5.000 > [/code] > > > -- > View this message in context: http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18776776.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595