Dimitri Liakhovitski
2015-Nov-13 01:37 UTC
[R] "haven" - read_spss: How to avoid extracting value labels instead of long labels?
I have to rephrase my question again - it's clearly a small bug in
haven. Here is what it is about:
If I have a column in SPSS that has BOTH a long label and value
labels, then everything works fine - I access one with 'label' and
another with 'labels':
attr(spss1$MYVAR, "label")
[1] "LONG LABEL"
attr(spss1$MYVAR, "labels")
DEFINITELY CONSIDER PROBABLY CONSIDER PROBABLY NOT
CONSIDER DEFINITELY NOT CONSIDER
1 2
3 4
However, if I have a column that has no long label and ONLY value
labels, then it's not working properly:
> attr(spss1$MYVAR, "label")
VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR
1 2> attr(spss1$MYVAR, "labels")
VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR
1 2
And I actually need to be able to identify if label is empty.
Thank you for looking into it!
Dimitri
On Thu, Nov 12, 2015 at 5:55 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:> Looks like a little bug in 'haven':
>
> When I actually look at the attributes of one variable that has no
> long label in SPSS but has Value Labels, I am getting:
> attr(spss1$WAVE, "label")
> NULL
>
> But when I sapply my function longlabels to my data frame and ask it
> to print the long labels for each column, for the same column
"WAVE" I
> am getting - instead of NULL:
> NULL
> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR
> 1 2
>
> This is, of course, incorrect, because it grabs the next attribute
> (which one? And replaces NULL with it).
> Any suggestions?
> Thanks!
>
>
>
>
> On Thu, Nov 12, 2015 at 11:56 AM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> Hello!
>>
>> I don't have an example file, but I think my question should be
clear
>> without it.
>> I have an SPSS file. I read it in using 'haven':
>>
>> library(haven)
>> spss1 <- read_spss("SPSS_Example.sav")
>>
>> I created a function that extracts the long labels (in SPSS -
"Label"):
>>
>> fix_labels <- function(x, TextIfMissing) {
>> val <- attr(x, "label")
>> if (is.null(val)) TextIfMissing else val
>> }
>> longlabels <- sapply(spss1, fix_labels, TextIfMissing = "NO
LABLE IN SPSS")
>>
>> This function is supposed to create a vector of long labels and
>> usually it does, e.g.:
>>
>> str(longlabels)
>> Named chr [1:64] "Serial number" ...
>> - attr(*, "names")= chr [1:64] "Respondent_Serial"
"weight" "r7_1" "r7_2" ...
>>
>> However, I just got an SPSS file with 92 columns and ran exactly the
>> same function on it. Now, I am getting not a vector, but a list
>>
>> str(longlabels)
>> List of 92
>> $ VEHRATED : chr "VEHICLE RATED"
>> $ RESPID : chr "RESPONDENT ID"
>> $ RESPID8 : chr "8 DIGIT RESPONDENT NUMBER"
>>
>> An observation about the structure of longlabels here: those columns
>> that do NOT have a long lable in SPSS but DO have Values (value
>> labels) - for them my function grabs their value labels, so that now
>> my long label is recorded as a numeric vector with names, e.g.:
>>
>> $ AWARE2 : Named num [1:2] 1 2
>> ..- attr(*, "names")= chr [1:2] "VERY/SOMEWHAT
FAMILIAR" "NOT AT ALL FAMILIAR"
>>
>> Question: How could I avoid the extraction of the Value Labels for the
>> columns that have no long labels?
>>
>> Thank you very much!
>> --
>> Dimitri Liakhovitski
>
>
>
> --
> Dimitri Liakhovitski
--
Dimitri Liakhovitski
Ista Zahn
2015-Nov-13 15:00 UTC
[R] "haven" - read_spss: How to avoid extracting value labels instead of long labels?
Why do you think this is a bug in have? To the contrary, I don't think this has anything to do with haven at all. The problem seems to be that attr does partial matching by default. Check it out:> attr(x, "labels") <- c("foo", "bar", "baz") > attr(x, "label")[1] "foo" "bar" "baz" and see ?attr for details. The answer I think is fix_labels <- function(x, TextIfMissing) { val <- attr(x, "label", exact = TRUE) if (is.null(val)) TextIfMissing else val } Finally, note that the development version of rio (https://github.com/leeper/rio) has an (non-exported) function for cleaning up meta data from haven imports. See https://github.com/leeper/rio/blob/master/R/utils.R#L86 Best, Ista On Thu, Nov 12, 2015 at 8:37 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> I have to rephrase my question again - it's clearly a small bug in > haven. Here is what it is about: > > If I have a column in SPSS that has BOTH a long label and value > labels, then everything works fine - I access one with 'label' and > another with 'labels': > > attr(spss1$MYVAR, "label") > [1] "LONG LABEL" > attr(spss1$MYVAR, "labels") > DEFINITELY CONSIDER PROBABLY CONSIDER PROBABLY NOT > CONSIDER DEFINITELY NOT CONSIDER > 1 2 > 3 4 > > However, if I have a column that has no long label and ONLY value > labels, then it's not working properly: > >> attr(spss1$MYVAR, "label") > VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR > 1 2 >> attr(spss1$MYVAR, "labels") > VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR > 1 2 > > And I actually need to be able to identify if label is empty. > Thank you for looking into it! > > Dimitri > > > On Thu, Nov 12, 2015 at 5:55 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Looks like a little bug in 'haven': >> >> When I actually look at the attributes of one variable that has no >> long label in SPSS but has Value Labels, I am getting: >> attr(spss1$WAVE, "label") >> NULL >> >> But when I sapply my function longlabels to my data frame and ask it >> to print the long labels for each column, for the same column "WAVE" I >> am getting - instead of NULL: >> NULL >> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR >> 1 2 >> >> This is, of course, incorrect, because it grabs the next attribute >> (which one? And replaces NULL with it). >> Any suggestions? >> Thanks! >> >> >> >> >> On Thu, Nov 12, 2015 at 11:56 AM, Dimitri Liakhovitski >> <dimitri.liakhovitski at gmail.com> wrote: >>> Hello! >>> >>> I don't have an example file, but I think my question should be clear >>> without it. >>> I have an SPSS file. I read it in using 'haven': >>> >>> library(haven) >>> spss1 <- read_spss("SPSS_Example.sav") >>> >>> I created a function that extracts the long labels (in SPSS - "Label"): >>> >>> fix_labels <- function(x, TextIfMissing) { >>> val <- attr(x, "label") >>> if (is.null(val)) TextIfMissing else val >>> } >>> longlabels <- sapply(spss1, fix_labels, TextIfMissing = "NO LABLE IN SPSS") >>> >>> This function is supposed to create a vector of long labels and >>> usually it does, e.g.: >>> >>> str(longlabels) >>> Named chr [1:64] "Serial number" ... >>> - attr(*, "names")= chr [1:64] "Respondent_Serial" "weight" "r7_1" "r7_2" ... >>> >>> However, I just got an SPSS file with 92 columns and ran exactly the >>> same function on it. Now, I am getting not a vector, but a list >>> >>> str(longlabels) >>> List of 92 >>> $ VEHRATED : chr "VEHICLE RATED" >>> $ RESPID : chr "RESPONDENT ID" >>> $ RESPID8 : chr "8 DIGIT RESPONDENT NUMBER" >>> >>> An observation about the structure of longlabels here: those columns >>> that do NOT have a long lable in SPSS but DO have Values (value >>> labels) - for them my function grabs their value labels, so that now >>> my long label is recorded as a numeric vector with names, e.g.: >>> >>> $ AWARE2 : Named num [1:2] 1 2 >>> ..- attr(*, "names")= chr [1:2] "VERY/SOMEWHAT FAMILIAR" "NOT AT ALL FAMILIAR" >>> >>> Question: How could I avoid the extraction of the Value Labels for the >>> columns that have no long labels? >>> >>> Thank you very much! >>> -- >>> Dimitri Liakhovitski >> >> >> >> -- >> Dimitri Liakhovitski > > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dimitri Liakhovitski
2015-Nov-13 17:42 UTC
[R] "haven" - read_spss: How to avoid extracting value labels instead of long labels?
You are absolutely right, Ista - it's not haven's fault, my bad. Of course, it's the attr function and exact = TRUE. Thank you so much! Dimitri On Fri, Nov 13, 2015 at 10:00 AM, Ista Zahn <istazahn at gmail.com> wrote:> Why do you think this is a bug in have? To the contrary, I don't think > this has anything to do with haven at all. The problem seems to be > that attr does partial matching by default. Check it out: > >> attr(x, "labels") <- c("foo", "bar", "baz") >> attr(x, "label") > [1] "foo" "bar" "baz" > > and see ?attr for details. > > The answer I think is > > fix_labels <- function(x, TextIfMissing) { > val <- attr(x, "label", exact = TRUE) > if (is.null(val)) TextIfMissing else val > } > > Finally, note that the development version of rio > (https://github.com/leeper/rio) has an (non-exported) function for > cleaning up meta data from haven imports. See > https://github.com/leeper/rio/blob/master/R/utils.R#L86 > > Best, > Ista > > On Thu, Nov 12, 2015 at 8:37 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> I have to rephrase my question again - it's clearly a small bug in >> haven. Here is what it is about: >> >> If I have a column in SPSS that has BOTH a long label and value >> labels, then everything works fine - I access one with 'label' and >> another with 'labels': >> >> attr(spss1$MYVAR, "label") >> [1] "LONG LABEL" >> attr(spss1$MYVAR, "labels") >> DEFINITELY CONSIDER PROBABLY CONSIDER PROBABLY NOT >> CONSIDER DEFINITELY NOT CONSIDER >> 1 2 >> 3 4 >> >> However, if I have a column that has no long label and ONLY value >> labels, then it's not working properly: >> >>> attr(spss1$MYVAR, "label") >> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR >> 1 2 >>> attr(spss1$MYVAR, "labels") >> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR >> 1 2 >> >> And I actually need to be able to identify if label is empty. >> Thank you for looking into it! >> >> Dimitri >> >> >> On Thu, Nov 12, 2015 at 5:55 PM, Dimitri Liakhovitski >> <dimitri.liakhovitski at gmail.com> wrote: >>> Looks like a little bug in 'haven': >>> >>> When I actually look at the attributes of one variable that has no >>> long label in SPSS but has Value Labels, I am getting: >>> attr(spss1$WAVE, "label") >>> NULL >>> >>> But when I sapply my function longlabels to my data frame and ask it >>> to print the long labels for each column, for the same column "WAVE" I >>> am getting - instead of NULL: >>> NULL >>> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR >>> 1 2 >>> >>> This is, of course, incorrect, because it grabs the next attribute >>> (which one? And replaces NULL with it). >>> Any suggestions? >>> Thanks! >>> >>> >>> >>> >>> On Thu, Nov 12, 2015 at 11:56 AM, Dimitri Liakhovitski >>> <dimitri.liakhovitski at gmail.com> wrote: >>>> Hello! >>>> >>>> I don't have an example file, but I think my question should be clear >>>> without it. >>>> I have an SPSS file. I read it in using 'haven': >>>> >>>> library(haven) >>>> spss1 <- read_spss("SPSS_Example.sav") >>>> >>>> I created a function that extracts the long labels (in SPSS - "Label"): >>>> >>>> fix_labels <- function(x, TextIfMissing) { >>>> val <- attr(x, "label") >>>> if (is.null(val)) TextIfMissing else val >>>> } >>>> longlabels <- sapply(spss1, fix_labels, TextIfMissing = "NO LABLE IN SPSS") >>>> >>>> This function is supposed to create a vector of long labels and >>>> usually it does, e.g.: >>>> >>>> str(longlabels) >>>> Named chr [1:64] "Serial number" ... >>>> - attr(*, "names")= chr [1:64] "Respondent_Serial" "weight" "r7_1" "r7_2" ... >>>> >>>> However, I just got an SPSS file with 92 columns and ran exactly the >>>> same function on it. Now, I am getting not a vector, but a list >>>> >>>> str(longlabels) >>>> List of 92 >>>> $ VEHRATED : chr "VEHICLE RATED" >>>> $ RESPID : chr "RESPONDENT ID" >>>> $ RESPID8 : chr "8 DIGIT RESPONDENT NUMBER" >>>> >>>> An observation about the structure of longlabels here: those columns >>>> that do NOT have a long lable in SPSS but DO have Values (value >>>> labels) - for them my function grabs their value labels, so that now >>>> my long label is recorded as a numeric vector with names, e.g.: >>>> >>>> $ AWARE2 : Named num [1:2] 1 2 >>>> ..- attr(*, "names")= chr [1:2] "VERY/SOMEWHAT FAMILIAR" "NOT AT ALL FAMILIAR" >>>> >>>> Question: How could I avoid the extraction of the Value Labels for the >>>> columns that have no long labels? >>>> >>>> Thank you very much! >>>> -- >>>> Dimitri Liakhovitski >>> >>> >>> >>> -- >>> Dimitri Liakhovitski >> >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.-- Dimitri Liakhovitski