Bliese, Paul D LTC USAMH
2005-May-26 09:31 UTC
[R] read.spss in R 2.1.0 & make basic dataframe
Recent changes to read.spss() in the foreign package return a dataframe containing additional attributes. For example,>TEMP<-read.spss(choose.files(), to.data.frame=T,use.value.labels=F)> str(TEMP)`data.frame': 780 obs. of 8 variables: $ EXPOS01: atomic 1 1 2 1 2 3 2 4 2 1 ... ..- attr(*, "value.labels")= Named num 5 4 3 2 1 .. ..- attr(*, "names")= chr "Yes, experienced it with Extreme Impact" "Yes, experienced it with Moderate Impact" "Yes, experienced it with A Little Impact" "Yes, experienced it with No Impact" ... $ EXPOS02: atomic 1 1 1 1 1 1 1 1 1 1 ... ..- attr(*, "value.labels")= Named num 5 4 3 2 1 .. ..- attr(*, "names")= chr "Yes, experienced it with Extreme Impact" "Yes, experienced it with Moderate Impact" "Yes, experienced it with A Little Impact" "Yes, experienced it with No Impact" ... Unfortunately, these changes may be ahead of their time (certainly ahead of several functions). For instance edit balks at the changes:> edit(TEMP)Error in edit.data.frame(TEMP) : can only handle vector and factor elements It used to be that the command "as.data.frame" or "data.frame" would return a fairly basic data.frame and "fix" the problem. However, this does not work (obviously because TEMP is already a data.frame). For example,> TEMP<-as.data.frame(TEMP)> edit(TEMP)Error in edit.data.frame(TEMP) : can only handle vector and factor elements It is possible to use "as.matrix", and then "data.frame" the result of "as.matrix", but this gets a bit cumbersome. The question is: Is there a simple command to strip additional attribute characteristics from a data.frame and get a simple, easy to use, uncomplicated data.frame? On a related note, do other users routinely use read.spss with the defaults of "to.data.frame=F" or "use.value.labels=T"? My experience is that I am always using the non-default values in which case it would be helpful to change the defaults to "to.data.frame=T" and "use.value.labels=F". It would also probably make sense to change the default for "trim.factor.names=T". Interested in others' perspective. Appreciate all the great work Saikat DebRoy has done...just trying to improve an already useful function. Paul [[alternative HTML version deleted]]
On Thu, 26 May 2005, Bliese, Paul D LTC USAMH wrote:> On a related note, do other users routinely use read.spss with the > defaults of "to.data.frame=F" or "use.value.labels=T"? My experience > is that I am always using the non-default values in which case it would > be helpful to change the defaults to "to.data.frame=T" and > "use.value.labels=F". It would also probably make sense to change the > default for "trim.factor.names=T". Interested in others' perspective. >Actually, most of this is me rather than Saikat. I use use.value.labels=TRUE most of the time. The main point of to.data.frame=TRUE is that it is quite a lot faster for large files, especially if you are going to use only a few of the variables. I think Brian Ripley spoke up in favour of it for this reason last time the issue was raised. The reason I made trim.factor.names=FALSE the default was backwards compatibility, but it probably makes sense to switch it at some point. Incidentally, PSPP (the original source of the code) now has a version that reads long variable names from post-version 12 SPSS files. This confirms that the "unrecognised record type 7, subtype 13" message really is due to long variable names and so is harmless. It also means that anyone who wants long variable names badly enough could work out a patch. -thomas
The main problem you are experiencing is that edit() (more precisely the method edit.data.frame()) is a bit restricted - I think contributions are welcome. Note that coding must be done very careful here (and is not trivial at all) in order to deal with different kinds of attributes, in particular names and factor stuff. Uwe Ligges Bliese, Paul D LTC USAMH wrote:> Recent changes to read.spss() in the foreign package return a dataframe > containing additional attributes. For example, > > > > >>TEMP<-read.spss(choose.files(), to.data.frame=T,use.value.labels=F) > > > > > >>str(TEMP) > > > `data.frame': 780 obs. of 8 variables: > > $ EXPOS01: atomic 1 1 2 1 2 3 2 4 2 1 ... > > ..- attr(*, "value.labels")= Named num 5 4 3 2 1 > > .. ..- attr(*, "names")= chr "Yes, experienced it with Extreme > Impact" "Yes, experienced it with Moderate Impact" "Yes, experienced it > with A Little Impact" "Yes, experienced it with No Impact" ... > > $ EXPOS02: atomic 1 1 1 1 1 1 1 1 1 1 ... > > ..- attr(*, "value.labels")= Named num 5 4 3 2 1 > > .. ..- attr(*, "names")= chr "Yes, experienced it with Extreme > Impact" "Yes, experienced it with Moderate Impact" "Yes, experienced it > with A Little Impact" "Yes, experienced it with No Impact" ... > > > > > > Unfortunately, these changes may be ahead of their time (certainly ahead > of several functions). For instance edit balks at the changes: > > > > >>edit(TEMP) > > > Error in edit.data.frame(TEMP) : can only handle vector and factor > elements > > > > It used to be that the command "as.data.frame" or "data.frame" would > return a fairly basic data.frame and "fix" the problem. However, this > does not work (obviously because TEMP is already a data.frame). For > example, > > > > >>TEMP<-as.data.frame(TEMP) > > >>edit(TEMP) > > > Error in edit.data.frame(TEMP) : can only handle vector and factor > elements > > > > It is possible to use "as.matrix", and then "data.frame" the result of > "as.matrix", but this gets a bit cumbersome. > > > > The question is: Is there a simple command to strip additional > attribute characteristics from a data.frame and get a simple, easy to > use, uncomplicated data.frame? > > > > On a related note, do other users routinely use read.spss with the > defaults of "to.data.frame=F" or "use.value.labels=T"? My experience > is that I am always using the non-default values in which case it would > be helpful to change the defaults to "to.data.frame=T" and > "use.value.labels=F". It would also probably make sense to change the > default for "trim.factor.names=T". Interested in others' perspective. > > > > Appreciate all the great work Saikat DebRoy has done...just trying to > improve an already useful function. > > > > Paul > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html