Dimitri Liakhovitski
2015-Nov-13 01:37 UTC
[R] "haven" - read_spss: How to avoid extracting value labels instead of long labels?
I have to rephrase my question again - it's clearly a small bug in haven. Here is what it is about: If I have a column in SPSS that has BOTH a long label and value labels, then everything works fine - I access one with 'label' and another with 'labels': attr(spss1$MYVAR, "label") [1] "LONG LABEL" attr(spss1$MYVAR, "labels") DEFINITELY CONSIDER PROBABLY CONSIDER PROBABLY NOT CONSIDER DEFINITELY NOT CONSIDER 1 2 3 4 However, if I have a column that has no long label and ONLY value labels, then it's not working properly:> attr(spss1$MYVAR, "label")VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR 1 2> attr(spss1$MYVAR, "labels")VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR 1 2 And I actually need to be able to identify if label is empty. Thank you for looking into it! Dimitri On Thu, Nov 12, 2015 at 5:55 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Looks like a little bug in 'haven': > > When I actually look at the attributes of one variable that has no > long label in SPSS but has Value Labels, I am getting: > attr(spss1$WAVE, "label") > NULL > > But when I sapply my function longlabels to my data frame and ask it > to print the long labels for each column, for the same column "WAVE" I > am getting - instead of NULL: > NULL > VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR > 1 2 > > This is, of course, incorrect, because it grabs the next attribute > (which one? And replaces NULL with it). > Any suggestions? > Thanks! > > > > > On Thu, Nov 12, 2015 at 11:56 AM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Hello! >> >> I don't have an example file, but I think my question should be clear >> without it. >> I have an SPSS file. I read it in using 'haven': >> >> library(haven) >> spss1 <- read_spss("SPSS_Example.sav") >> >> I created a function that extracts the long labels (in SPSS - "Label"): >> >> fix_labels <- function(x, TextIfMissing) { >> val <- attr(x, "label") >> if (is.null(val)) TextIfMissing else val >> } >> longlabels <- sapply(spss1, fix_labels, TextIfMissing = "NO LABLE IN SPSS") >> >> This function is supposed to create a vector of long labels and >> usually it does, e.g.: >> >> str(longlabels) >> Named chr [1:64] "Serial number" ... >> - attr(*, "names")= chr [1:64] "Respondent_Serial" "weight" "r7_1" "r7_2" ... >> >> However, I just got an SPSS file with 92 columns and ran exactly the >> same function on it. Now, I am getting not a vector, but a list >> >> str(longlabels) >> List of 92 >> $ VEHRATED : chr "VEHICLE RATED" >> $ RESPID : chr "RESPONDENT ID" >> $ RESPID8 : chr "8 DIGIT RESPONDENT NUMBER" >> >> An observation about the structure of longlabels here: those columns >> that do NOT have a long lable in SPSS but DO have Values (value >> labels) - for them my function grabs their value labels, so that now >> my long label is recorded as a numeric vector with names, e.g.: >> >> $ AWARE2 : Named num [1:2] 1 2 >> ..- attr(*, "names")= chr [1:2] "VERY/SOMEWHAT FAMILIAR" "NOT AT ALL FAMILIAR" >> >> Question: How could I avoid the extraction of the Value Labels for the >> columns that have no long labels? >> >> Thank you very much! >> -- >> Dimitri Liakhovitski > > > > -- > Dimitri Liakhovitski-- Dimitri Liakhovitski
Ista Zahn
2015-Nov-13 15:00 UTC
[R] "haven" - read_spss: How to avoid extracting value labels instead of long labels?
Why do you think this is a bug in have? To the contrary, I don't think this has anything to do with haven at all. The problem seems to be that attr does partial matching by default. Check it out:> attr(x, "labels") <- c("foo", "bar", "baz") > attr(x, "label")[1] "foo" "bar" "baz" and see ?attr for details. The answer I think is fix_labels <- function(x, TextIfMissing) { val <- attr(x, "label", exact = TRUE) if (is.null(val)) TextIfMissing else val } Finally, note that the development version of rio (https://github.com/leeper/rio) has an (non-exported) function for cleaning up meta data from haven imports. See https://github.com/leeper/rio/blob/master/R/utils.R#L86 Best, Ista On Thu, Nov 12, 2015 at 8:37 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> I have to rephrase my question again - it's clearly a small bug in > haven. Here is what it is about: > > If I have a column in SPSS that has BOTH a long label and value > labels, then everything works fine - I access one with 'label' and > another with 'labels': > > attr(spss1$MYVAR, "label") > [1] "LONG LABEL" > attr(spss1$MYVAR, "labels") > DEFINITELY CONSIDER PROBABLY CONSIDER PROBABLY NOT > CONSIDER DEFINITELY NOT CONSIDER > 1 2 > 3 4 > > However, if I have a column that has no long label and ONLY value > labels, then it's not working properly: > >> attr(spss1$MYVAR, "label") > VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR > 1 2 >> attr(spss1$MYVAR, "labels") > VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR > 1 2 > > And I actually need to be able to identify if label is empty. > Thank you for looking into it! > > Dimitri > > > On Thu, Nov 12, 2015 at 5:55 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Looks like a little bug in 'haven': >> >> When I actually look at the attributes of one variable that has no >> long label in SPSS but has Value Labels, I am getting: >> attr(spss1$WAVE, "label") >> NULL >> >> But when I sapply my function longlabels to my data frame and ask it >> to print the long labels for each column, for the same column "WAVE" I >> am getting - instead of NULL: >> NULL >> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR >> 1 2 >> >> This is, of course, incorrect, because it grabs the next attribute >> (which one? And replaces NULL with it). >> Any suggestions? >> Thanks! >> >> >> >> >> On Thu, Nov 12, 2015 at 11:56 AM, Dimitri Liakhovitski >> <dimitri.liakhovitski at gmail.com> wrote: >>> Hello! >>> >>> I don't have an example file, but I think my question should be clear >>> without it. >>> I have an SPSS file. I read it in using 'haven': >>> >>> library(haven) >>> spss1 <- read_spss("SPSS_Example.sav") >>> >>> I created a function that extracts the long labels (in SPSS - "Label"): >>> >>> fix_labels <- function(x, TextIfMissing) { >>> val <- attr(x, "label") >>> if (is.null(val)) TextIfMissing else val >>> } >>> longlabels <- sapply(spss1, fix_labels, TextIfMissing = "NO LABLE IN SPSS") >>> >>> This function is supposed to create a vector of long labels and >>> usually it does, e.g.: >>> >>> str(longlabels) >>> Named chr [1:64] "Serial number" ... >>> - attr(*, "names")= chr [1:64] "Respondent_Serial" "weight" "r7_1" "r7_2" ... >>> >>> However, I just got an SPSS file with 92 columns and ran exactly the >>> same function on it. Now, I am getting not a vector, but a list >>> >>> str(longlabels) >>> List of 92 >>> $ VEHRATED : chr "VEHICLE RATED" >>> $ RESPID : chr "RESPONDENT ID" >>> $ RESPID8 : chr "8 DIGIT RESPONDENT NUMBER" >>> >>> An observation about the structure of longlabels here: those columns >>> that do NOT have a long lable in SPSS but DO have Values (value >>> labels) - for them my function grabs their value labels, so that now >>> my long label is recorded as a numeric vector with names, e.g.: >>> >>> $ AWARE2 : Named num [1:2] 1 2 >>> ..- attr(*, "names")= chr [1:2] "VERY/SOMEWHAT FAMILIAR" "NOT AT ALL FAMILIAR" >>> >>> Question: How could I avoid the extraction of the Value Labels for the >>> columns that have no long labels? >>> >>> Thank you very much! >>> -- >>> Dimitri Liakhovitski >> >> >> >> -- >> Dimitri Liakhovitski > > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dimitri Liakhovitski
2015-Nov-13 17:42 UTC
[R] "haven" - read_spss: How to avoid extracting value labels instead of long labels?
You are absolutely right, Ista - it's not haven's fault, my bad. Of course, it's the attr function and exact = TRUE. Thank you so much! Dimitri On Fri, Nov 13, 2015 at 10:00 AM, Ista Zahn <istazahn at gmail.com> wrote:> Why do you think this is a bug in have? To the contrary, I don't think > this has anything to do with haven at all. The problem seems to be > that attr does partial matching by default. Check it out: > >> attr(x, "labels") <- c("foo", "bar", "baz") >> attr(x, "label") > [1] "foo" "bar" "baz" > > and see ?attr for details. > > The answer I think is > > fix_labels <- function(x, TextIfMissing) { > val <- attr(x, "label", exact = TRUE) > if (is.null(val)) TextIfMissing else val > } > > Finally, note that the development version of rio > (https://github.com/leeper/rio) has an (non-exported) function for > cleaning up meta data from haven imports. See > https://github.com/leeper/rio/blob/master/R/utils.R#L86 > > Best, > Ista > > On Thu, Nov 12, 2015 at 8:37 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> I have to rephrase my question again - it's clearly a small bug in >> haven. Here is what it is about: >> >> If I have a column in SPSS that has BOTH a long label and value >> labels, then everything works fine - I access one with 'label' and >> another with 'labels': >> >> attr(spss1$MYVAR, "label") >> [1] "LONG LABEL" >> attr(spss1$MYVAR, "labels") >> DEFINITELY CONSIDER PROBABLY CONSIDER PROBABLY NOT >> CONSIDER DEFINITELY NOT CONSIDER >> 1 2 >> 3 4 >> >> However, if I have a column that has no long label and ONLY value >> labels, then it's not working properly: >> >>> attr(spss1$MYVAR, "label") >> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR >> 1 2 >>> attr(spss1$MYVAR, "labels") >> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR >> 1 2 >> >> And I actually need to be able to identify if label is empty. >> Thank you for looking into it! >> >> Dimitri >> >> >> On Thu, Nov 12, 2015 at 5:55 PM, Dimitri Liakhovitski >> <dimitri.liakhovitski at gmail.com> wrote: >>> Looks like a little bug in 'haven': >>> >>> When I actually look at the attributes of one variable that has no >>> long label in SPSS but has Value Labels, I am getting: >>> attr(spss1$WAVE, "label") >>> NULL >>> >>> But when I sapply my function longlabels to my data frame and ask it >>> to print the long labels for each column, for the same column "WAVE" I >>> am getting - instead of NULL: >>> NULL >>> VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR >>> 1 2 >>> >>> This is, of course, incorrect, because it grabs the next attribute >>> (which one? And replaces NULL with it). >>> Any suggestions? >>> Thanks! >>> >>> >>> >>> >>> On Thu, Nov 12, 2015 at 11:56 AM, Dimitri Liakhovitski >>> <dimitri.liakhovitski at gmail.com> wrote: >>>> Hello! >>>> >>>> I don't have an example file, but I think my question should be clear >>>> without it. >>>> I have an SPSS file. I read it in using 'haven': >>>> >>>> library(haven) >>>> spss1 <- read_spss("SPSS_Example.sav") >>>> >>>> I created a function that extracts the long labels (in SPSS - "Label"): >>>> >>>> fix_labels <- function(x, TextIfMissing) { >>>> val <- attr(x, "label") >>>> if (is.null(val)) TextIfMissing else val >>>> } >>>> longlabels <- sapply(spss1, fix_labels, TextIfMissing = "NO LABLE IN SPSS") >>>> >>>> This function is supposed to create a vector of long labels and >>>> usually it does, e.g.: >>>> >>>> str(longlabels) >>>> Named chr [1:64] "Serial number" ... >>>> - attr(*, "names")= chr [1:64] "Respondent_Serial" "weight" "r7_1" "r7_2" ... >>>> >>>> However, I just got an SPSS file with 92 columns and ran exactly the >>>> same function on it. Now, I am getting not a vector, but a list >>>> >>>> str(longlabels) >>>> List of 92 >>>> $ VEHRATED : chr "VEHICLE RATED" >>>> $ RESPID : chr "RESPONDENT ID" >>>> $ RESPID8 : chr "8 DIGIT RESPONDENT NUMBER" >>>> >>>> An observation about the structure of longlabels here: those columns >>>> that do NOT have a long lable in SPSS but DO have Values (value >>>> labels) - for them my function grabs their value labels, so that now >>>> my long label is recorded as a numeric vector with names, e.g.: >>>> >>>> $ AWARE2 : Named num [1:2] 1 2 >>>> ..- attr(*, "names")= chr [1:2] "VERY/SOMEWHAT FAMILIAR" "NOT AT ALL FAMILIAR" >>>> >>>> Question: How could I avoid the extraction of the Value Labels for the >>>> columns that have no long labels? >>>> >>>> Thank you very much! >>>> -- >>>> Dimitri Liakhovitski >>> >>> >>> >>> -- >>> Dimitri Liakhovitski >> >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.-- Dimitri Liakhovitski