Dirk Enzmann
2005-Sep-20 21:44 UTC
[R] Problem with read.spss() and as.data.frame(), or: alternative to subset()?
Trying to select a subset of cases (rows of data) I encountered several problems: Firstly, because I did not read the help to read.spss() thoroughly enough, I treated the data read as a data frame. For example, dr2000 <- read.spss('myfile.sav') d <- subset(dr2000,RBINZ99 > 0) and thus received an error message (Object "RBINZ99" not found), because dr2000 is not a data.frame but a list (shown by class(dr2000)). d <- subset(dr2000,dr2000$RBINZ99) didn' help either, because now d is empty (dim = NULL). Thus, I tried to use the option "to.data.frame=T" of read.spss(): dr2000 <- read.spss('myfile.sav',to.data.frame=T) However, now R "crashes" ('R for Windows GUI front-end has found an error and must be closed') (the error message is in German). Finally, I tried again using read.spss() without the option 'to.data.frame=T' (as before) and tried to convert dr2000 to a data frame by using d <- as.data.frame(dr2000) However, R crashes again (with the same error message). Of course, I could use SPSS first and save only the cases with RBINZ99 > 0, but this is not always possible (all users of the data must have SPSS available and we have to use different selection criteria). Is there another possibility to solve the problem by using R? I want to select certain rows (cases) based on the values of one "variable" of dr2000, but keep all columns (variables) - although dr2000 is not a data frame? And: R should not crash but rather give a warning. ------------------------ R version 2.1.1 Patched (2005-07-15) Package Foreign Version 0.8-10 Operating system: Windows XP Professional (5.1 (Build 2600)) CPU: Pentium Model 2 Stepping 9 RAM: 512 MB ************************************************* Dr. Dirk Enzmann Institute of Criminal Sciences Dept. of Criminology Edmund-Siemers-Allee 1 D-20146 Hamburg Germany phone: +49-040-42838.7498 (office) +49-040-42838.4591 (Billon) fax: +49-040-42838.2344 email: dirk.enzmann at jura.uni-hamburg.de www: http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
Dirk Enzmann
2005-Sep-21 11:18 UTC
[R] Problem with read.spss() and as.data.frame(), or: alternative to subset()?
The selection problem can be solved by dr2000=read.spss('myfile') d=lapply(dr2000,subset,dr2000$RBINZ99 > 0) however, there is still the problem that R crashes when using d = as.data.frame(dr2000) or dr2000=read.spss('myfile',to.data.frame=T) Any suggestions why? I checked whether all components of dr2000 are of the same length and the sort of object of each component. This is not the problem: Each component has the same length (9232) and there are 66 components of the class 'character', 981 of the class 'factor', and 479 of the class 'numeric'.> Trying to select a subset of cases (rows of data) I encountered several > problems: > > Firstly, because I did not read the help to read.spss() thoroughly > enough, I treated the data read as a data frame. For example, > > dr2000 <- read.spss('myfile.sav') > d <- subset(dr2000,RBINZ99 > 0) > > and thus received an error message (Object "RBINZ99" not found), because > dr2000 is not a data.frame but a list (shown by class(dr2000)). > > d <- subset(dr2000,dr2000$RBINZ99 > 0) > > didn' help either, because now d is empty (dim = NULL). > > Thus, I tried to use the option "to.data.frame=T" of read.spss(): > > dr2000 <- read.spss('myfile.sav',to.data.frame=T) > > However, now R "crashes" ('R for Windows GUI front-end has found an > error and must be closed') (the error message is in German). > > Finally, I tried again using read.spss() without the option > 'to.data.frame=T' (as before) and tried to convert dr2000 to a data > frame by using > > d <- as.data.frame(dr2000) > > However, R crashes again (with the same error message). > > Of course, I could use SPSS first and save only the cases with RBINZ99 > > 0, but this is not always possible (all users of the data must have SPSS > available and we have to use different selection criteria). Is there > another possibility to solve the problem by using R? I want to select > certain rows (cases) based on the values of one "variable" of dr2000, > but keep all columns (variables) - although dr2000 is not a data frame? > > And: R should not crash but rather give a warning. > > ------------------------ > R version 2.1.1 Patched (2005-07-15) > Package Foreign Version 0.8-10 > > Operating system: Windows XP Professional (5.1 (Build 2600)) > CPU: Pentium Model 2 Stepping 9 > RAM: 512 MB************************************************* Dr. Dirk Enzmann Institute of Criminal Sciences Dept. of Criminology Edmund-Siemers-Allee 1 D-20146 Hamburg Germany phone: +49-040-42838.7498 (office) +49-040-42838.4591 (Billon) fax: +49-040-42838.2344 email: dirk.enzmann at jura.uni-hamburg.de www: http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html