Dimitri Liakhovitski
2015-Jul-20 13:56 UTC
[R] For Hadley Wickham: Need for a small fix in haven::read_spss
Hadley, you've added function labelled to haven, which is great. However, when it so happens that in SPSS a variable has no long label, your code considers it to be NULL rather than an NA. NULL is correct, but NA would probably be better. For example, I've read in an SPSS file: library(haven) spss1 <- read_spss("SPSS_Example.sav") varnames <- names(spss1) mylabels <- unlist(lapply(spss1, attr, "label")) length(varnames) [1] 64 length(mylabels) [1] 62 Because in this particular dataset there were 2 variables without either variable labels or data labels. When I run lapply(spss1, attr, "label") I see under those 2 variables "NULL" - which is true and valid. However, would it be possible to have instead of NULL an NA? This way the length of varnames and mylables would the same and one could put them side by side (e.g., in one data frame)? Thanks a lot! -- Dimitri Liakhovitski
Hadley Wickham
2015-Jul-20 14:01 UTC
[R] For Hadley Wickham: Need for a small fix in haven::read_spss
(FWIW this would've been better send to me directly or filed on github, rather than sent to R-help) I think this is more of a problem with the way that you're accessing the info, than the design of the underlying structure. I'd do something like this: attr_default <- function(x, which, default) { val <- attr(x, which) if (is.null(val)) default else val } sapply(spss1, attr_default, "label", NA_character_) (code untested, but you get the idea) Hadley On Mon, Jul 20, 2015 at 8:56 AM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Hadley, > > you've added function labelled to haven, which is great. However, when > it so happens that in SPSS a variable has no long label, your code > considers it to be NULL rather than an NA. NULL is correct, but NA > would probably be better. > > For example, I've read in an SPSS file: > > library(haven) > spss1 <- read_spss("SPSS_Example.sav") > > varnames <- names(spss1) > mylabels <- unlist(lapply(spss1, attr, "label")) > > length(varnames) > [1] 64 > > length(mylabels) > [1] 62 > > > Because in this particular dataset there were 2 variables without > either variable labels or data labels. > When I run lapply(spss1, attr, "label") I see under those 2 variables > "NULL" - which is true and valid. > However, would it be possible to have instead of NULL an NA? This way > the length of varnames and mylables would the same and one could put > them side by side (e.g., in one data frame)? > > > Thanks a lot! > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- http://had.co.nz/
Dimitri Liakhovitski
2015-Jul-20 14:06 UTC
[R] For Hadley Wickham: Need for a small fix in haven::read_spss
Thank you, Hadley. Yes, you are right - next time I'll email you directly. On Mon, Jul 20, 2015 at 10:01 AM, Hadley Wickham <h.wickham at gmail.com> wrote:> (FWIW this would've been better send to me directly or filed on > github, rather than sent to R-help) > > I think this is more of a problem with the way that you're accessing > the info, than the design of the underlying structure. I'd do > something like this: > > attr_default <- function(x, which, default) { > val <- attr(x, which) > if (is.null(val)) default else val > } > > sapply(spss1, attr_default, "label", NA_character_) > > (code untested, but you get the idea) > > Hadley > > On Mon, Jul 20, 2015 at 8:56 AM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Hadley, >> >> you've added function labelled to haven, which is great. However, when >> it so happens that in SPSS a variable has no long label, your code >> considers it to be NULL rather than an NA. NULL is correct, but NA >> would probably be better. >> >> For example, I've read in an SPSS file: >> >> library(haven) >> spss1 <- read_spss("SPSS_Example.sav") >> >> varnames <- names(spss1) >> mylabels <- unlist(lapply(spss1, attr, "label")) >> >> length(varnames) >> [1] 64 >> >> length(mylabels) >> [1] 62 >> >> >> Because in this particular dataset there were 2 variables without >> either variable labels or data labels. >> When I run lapply(spss1, attr, "label") I see under those 2 variables >> "NULL" - which is true and valid. >> However, would it be possible to have instead of NULL an NA? This way >> the length of varnames and mylables would the same and one could put >> them side by side (e.g., in one data frame)? >> >> >> Thanks a lot! >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > http://had.co.nz/-- Dimitri Liakhovitski