Dimitri Liakhovitski
2015-Jul-20 13:56 UTC
[R] For Hadley Wickham: Need for a small fix in haven::read_spss
Hadley,
you've added function labelled to haven, which is great. However, when
it so happens that in SPSS a variable has no long label, your code
considers it to be NULL rather than an NA. NULL is correct, but NA
would probably be better.
For example, I've read in an SPSS file:
library(haven)
spss1 <- read_spss("SPSS_Example.sav")
varnames <- names(spss1)
mylabels <- unlist(lapply(spss1, attr, "label"))
length(varnames)
[1] 64
length(mylabels)
[1] 62
Because in this particular dataset there were 2 variables without
either variable labels or data labels.
When I run lapply(spss1, attr, "label") I see under those 2 variables
"NULL" - which is true and valid.
However, would it be possible to have instead of NULL an NA? This way
the length of varnames and mylables would the same and one could put
them side by side (e.g., in one data frame)?
Thanks a lot!
--
Dimitri Liakhovitski
Hadley Wickham
2015-Jul-20 14:01 UTC
[R] For Hadley Wickham: Need for a small fix in haven::read_spss
(FWIW this would've been better send to me directly or filed on
github, rather than sent to R-help)
I think this is more of a problem with the way that you're accessing
the info, than the design of the underlying structure. I'd do
something like this:
attr_default <- function(x, which, default) {
val <- attr(x, which)
if (is.null(val)) default else val
}
sapply(spss1, attr_default, "label", NA_character_)
(code untested, but you get the idea)
Hadley
On Mon, Jul 20, 2015 at 8:56 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:> Hadley,
>
> you've added function labelled to haven, which is great. However, when
> it so happens that in SPSS a variable has no long label, your code
> considers it to be NULL rather than an NA. NULL is correct, but NA
> would probably be better.
>
> For example, I've read in an SPSS file:
>
> library(haven)
> spss1 <- read_spss("SPSS_Example.sav")
>
> varnames <- names(spss1)
> mylabels <- unlist(lapply(spss1, attr, "label"))
>
> length(varnames)
> [1] 64
>
> length(mylabels)
> [1] 62
>
>
> Because in this particular dataset there were 2 variables without
> either variable labels or data labels.
> When I run lapply(spss1, attr, "label") I see under those 2
variables
> "NULL" - which is true and valid.
> However, would it be possible to have instead of NULL an NA? This way
> the length of varnames and mylables would the same and one could put
> them side by side (e.g., in one data frame)?
>
>
> Thanks a lot!
>
> --
> Dimitri Liakhovitski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
http://had.co.nz/
Dimitri Liakhovitski
2015-Jul-20 14:06 UTC
[R] For Hadley Wickham: Need for a small fix in haven::read_spss
Thank you, Hadley. Yes, you are right - next time I'll email you directly. On Mon, Jul 20, 2015 at 10:01 AM, Hadley Wickham <h.wickham at gmail.com> wrote:> (FWIW this would've been better send to me directly or filed on > github, rather than sent to R-help) > > I think this is more of a problem with the way that you're accessing > the info, than the design of the underlying structure. I'd do > something like this: > > attr_default <- function(x, which, default) { > val <- attr(x, which) > if (is.null(val)) default else val > } > > sapply(spss1, attr_default, "label", NA_character_) > > (code untested, but you get the idea) > > Hadley > > On Mon, Jul 20, 2015 at 8:56 AM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Hadley, >> >> you've added function labelled to haven, which is great. However, when >> it so happens that in SPSS a variable has no long label, your code >> considers it to be NULL rather than an NA. NULL is correct, but NA >> would probably be better. >> >> For example, I've read in an SPSS file: >> >> library(haven) >> spss1 <- read_spss("SPSS_Example.sav") >> >> varnames <- names(spss1) >> mylabels <- unlist(lapply(spss1, attr, "label")) >> >> length(varnames) >> [1] 64 >> >> length(mylabels) >> [1] 62 >> >> >> Because in this particular dataset there were 2 variables without >> either variable labels or data labels. >> When I run lapply(spss1, attr, "label") I see under those 2 variables >> "NULL" - which is true and valid. >> However, would it be possible to have instead of NULL an NA? This way >> the length of varnames and mylables would the same and one could put >> them side by side (e.g., in one data frame)? >> >> >> Thanks a lot! >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > http://had.co.nz/-- Dimitri Liakhovitski