thr3ads.net - R help - [R] WG: AW: Another problem with encoding [Jan 2008]

If this information is useful, please help other people find it:
Share via:

Matthias Wendel

2008-Jan-02 15:02 UTC

[R] WG: AW: Another problem with encoding

Hello, Peter,
	I tried it out: iconv(names(attributes(spss[,'Y6'])[[1]][14]),
"UTF-8", "LATIN1", sub='byte') yielded

[1] "<c4>rzte Chirurgie" 

and c4 corresponds in most encodings to ?. What can I do next? I wonder whether
there is a more comfortable way then to change the
occurences of <..> by the adequate character.
Regards,
Matthias

-----Urspr?ngliche Nachricht-----
Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] 
Gesendet: Dienstag, 1. Januar 2008 20:21
An: Matthias Wendel
Betreff: Re: AW: [R] Another problem with encoding

Matthias Wendel wrote:> Happy new year and my apologies, Peter. Here are the missing facts:
> I'm reading in a spss-file, doing some calculations and putting the 
> results in a xml file. The xml-file is UTF-8 encoded and so should the
results and their labels (eg  ?rzte Chirurgie):
> Here is part of the R session:
>
>   As a matter of principle: Requests for more information are not offers that I
will solve your problems personally. Stay on the list!

The characters seem to travel OK in email, so latin1is a guess. Have you tried
the sub="byte" argument to iconv()?


>   
>> Sys.getlocale()
>>     
> [1]
>
"LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.125> 2"
>   
>> spss[,'Y6']
>>     
>   [1]  6  3  8 11  8  9  6  8  3  5 10 15 NA  9  8  3  8 16  6  6 NA 10  5 
2  7  7  6 16  7 15  7 10 12
>  [34]  8  7 12 12 16  7  6  8  8 15  6 NA  8 99  7 12  8  9 16  7 16  8  7 
7  1 15 12  8  7 10  7  8  7
>  [67]  8  9  8  6  6  8  6 16 11  5 11 11  1 11  3  7  7 10 10 10  6 11 16
NA  1  3  2 10 99 10  3  3  9
> [100]  7 16 99 16  1 10  2 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
13 NA 10 16 16 NA  6 10  5 11
> [133] 11  1  1  1  1 16  1 16  1  1  1  1  6  6  6 16  8 16 16 16 16  5  6
10 99 11 11 10  6  6  1  1  6
> [166]  1 11 11 16  9 11 16  6  8  8 16 16  8  6 16 16 12 12 12 12 12 12 12
16  9 16 15 12 12 15 10 16 15
> [199]  4  1  2 14  4  4  2  5 NA  1  5  5  7  9  5 12 12 NA 16 12 12 12 12
12 12 12 12 12 99 NA 12 12 NA
> [232]  1 16  1  7 11  5  6  7  1 13  6  8 16  2  1  5 16 16  9  8  8  8  7
16  8  8  2  8  5  4  6 14  5
> [265] 14  8  8 14  4  4  8 14  8 14  6  2  3 14  3 16  5 15 15 15 15 15 15
15 15 15 15 15 13 13 13 13 13
> [298] 13 13 13 13 13 13 13 13 15  6 NA 12  3  9  9 NA 10 16
> attr(,"value.labels")
>                           Verwaltung Servicegesellschaft Waldfriede (SKW) 
>                                   16                                   15 
>            Kurzzeitpflege Waldfriede                        Sozialstation 
>                                   14                                   13 
>                  Krankenpflegeschule              Med. Technischer Dienst 
>                                   12                                   11 
>                            Pflege OP                      Funktionsdienst 
>                                   10                                    9 
>                   Pflege Gyn?kologie                     Pflege Chirurgie 
>                                    8                                    7 
>                        Pflege Innere            ?rzte An?sthesie, R?ntgen 
>                                    6                                    5 
>                    ?rzte Gyn?kologie                      ?rzte Chirurgie 
>                                    4                                    3 
>                         ?rzte Innere         Patientenberatung/-betreuung 
>                                    2                                    1 
>   
>> names(attributes(spss[,'Y6'])[[1]][14])
>>     
> [1] "?rzte Chirurgie"
>   
>> iconv(names(attributes(spss[,'Y6'])[[1]][14]),
"UTF-8", "LATIN1")
>>     
> [1] NA
>   
>> utf8ToInt(names(attributes(spss[,'Y6'])[[1]][14]))
>>     
> Fehler in utf8ToInt(names(attributes(spss[, "Y6"])[[1]][14])) : 
>   invalid UTF-8 string
>   
>
> Cheers,
> Matthias
>
>
> -----Urspr?ngliche Nachricht-----
> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] 
> Gesendet: Montag, 31. Dezember 2007 10:45
> An: Matthias Wendel
> Cc: r-help at stat.math.ethz.ch
> Betreff: Re: [R] Another problem with encoding
>
> Matthias Wendel wrote:
>   
>> Hi
>>     I've imported an spss-file using read.spss. One variable has
value
>> like '?rzte'. I thought this is UTF-8 encoded, but it is not
(as the results of iconv and utf8ToInt suggest). Is there any way to
>>     
> find out how these spss-values are encoded?
>   
>>   
>>     
> You are assuming a bit much of your readers.
>
> What exactly are you doing? Is it a value, a value label, or perhaps a
variable name. How do the results of read.spss look on the
R> side? How did you apply iconv and utf8ToInt? What is your locale?
>
> I mean, we could try and guess all those details, but you are the one with
the hard info, and the motivation...
>
>   

-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Peter Dalgaard

2008-Jan-02 15:55 UTC

head link

[R] WG: AW: Another problem with encoding

Matthias Wendel wrote:> Hello, Peter,
> 	I tried it out: iconv(names(attributes(spss[,'Y6'])[[1]][14]),
"UTF-8", "LATIN1", sub='byte') yielded
>
> [1] "<c4>rzte Chirurgie" 
>
> and c4 corresponds in most encodings to ?. What can I do next? I wonder
whether there is a more comfortable way then to change the
> occurences of <..> by the adequate character.
>   Not sure what you want here. Isn't it just the reverse conversion,
iconv(...., from="latin1", to="utf8") ???

Notice that c4 is not ? in UTF8:
> iconv("?", to="ascii", sub="byte")[1] "<c3><84>"

in fact c4 is not anything in UTF8, hence the "invalid string"
message.> Regards,
> Matthias
>
> -----Urspr?ngliche Nachricht-----
> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] 
> Gesendet: Dienstag, 1. Januar 2008 20:21
> An: Matthias Wendel
> Betreff: Re: AW: [R] Another problem with encoding
>
> Matthias Wendel wrote:
>   
>> Happy new year and my apologies, Peter. Here are the missing facts:
>> I'm reading in a spss-file, doing some calculations and putting the
>> results in a xml file. The xml-file is UTF-8 encoded and so should the
results and their labels (eg  ?rzte Chirurgie):
>> Here is part of the R session:
>>
>>   
>>     
> As a matter of principle: Requests for more information are not offers that
I will solve your problems personally. Stay on the list!
>
> The characters seem to travel OK in email, so latin1is a guess. Have you
tried the sub="byte" argument to iconv()?
>
>
>
>   
>>   
>>     
>>> Sys.getlocale()
>>>     
>>>       
>> [1]
>>
>>     
>
"LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.125
>   
>> 2"
>>   
>>     
>>> spss[,'Y6']
>>>     
>>>       
>>   [1]  6  3  8 11  8  9  6  8  3  5 10 15 NA  9  8  3  8 16  6  6 NA 10
5  2  7  7  6 16  7 15  7 10 12
>>  [34]  8  7 12 12 16  7  6  8  8 15  6 NA  8 99  7 12  8  9 16  7 16  8
7  7  1 15 12  8  7 10  7  8  7
>>  [67]  8  9  8  6  6  8  6 16 11  5 11 11  1 11  3  7  7 10 10 10  6 11
16 NA  1  3  2 10 99 10  3  3  9
>> [100]  7 16 99 16  1 10  2 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
13 13 NA 10 16 16 NA  6 10  5 11
>> [133] 11  1  1  1  1 16  1 16  1  1  1  1  6  6  6 16  8 16 16 16 16  5
6 10 99 11 11 10  6  6  1  1  6
>> [166]  1 11 11 16  9 11 16  6  8  8 16 16  8  6 16 16 12 12 12 12 12 12
12 16  9 16 15 12 12 15 10 16 15
>> [199]  4  1  2 14  4  4  2  5 NA  1  5  5  7  9  5 12 12 NA 16 12 12 12
12 12 12 12 12 12 99 NA 12 12 NA
>> [232]  1 16  1  7 11  5  6  7  1 13  6  8 16  2  1  5 16 16  9  8  8  8
7 16  8  8  2  8  5  4  6 14  5
>> [265] 14  8  8 14  4  4  8 14  8 14  6  2  3 14  3 16  5 15 15 15 15 15
15 15 15 15 15 15 13 13 13 13 13
>> [298] 13 13 13 13 13 13 13 13 15  6 NA 12  3  9  9 NA 10 16
>> attr(,"value.labels")
>>                           Verwaltung Servicegesellschaft Waldfriede
(SKW)
>>                                   16                                  
15
>>            Kurzzeitpflege Waldfriede                       
Sozialstation
>>                                   14                                  
13
>>                  Krankenpflegeschule              Med. Technischer
Dienst
>>                                   12                                  
11
>>                            Pflege OP                     
Funktionsdienst
>>                                   10                                   
9
>>                   Pflege Gyn?kologie                     Pflege
Chirurgie
>>                                    8                                   
7
>>                        Pflege Innere            ?rzte An?sthesie,
R?ntgen
>>                                    6                                   
5
>>                    ?rzte Gyn?kologie                      ?rzte
Chirurgie
>>                                    4                                   
3
>>                         ?rzte Innere        
Patientenberatung/-betreuung
>>                                    2                                   
1
>>   
>>     
>>> names(attributes(spss[,'Y6'])[[1]][14])
>>>     
>>>       
>> [1] "?rzte Chirurgie"
>>   
>>     
>>> iconv(names(attributes(spss[,'Y6'])[[1]][14]),
"UTF-8", "LATIN1")
>>>     
>>>       
>> [1] NA
>>   
>>     
>>> utf8ToInt(names(attributes(spss[,'Y6'])[[1]][14]))
>>>     
>>>       
>> Fehler in utf8ToInt(names(attributes(spss[, "Y6"])[[1]][14]))
:
>>   invalid UTF-8 string
>>   
>>
>> Cheers,
>> Matthias
>>
>>
>> -----Urspr?ngliche Nachricht-----
>> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] 
>> Gesendet: Montag, 31. Dezember 2007 10:45
>> An: Matthias Wendel
>> Cc: r-help at stat.math.ethz.ch
>> Betreff: Re: [R] Another problem with encoding
>>
>> Matthias Wendel wrote:
>>   
>>     
>>> Hi
>>>     I've imported an spss-file using read.spss. One variable
has value
>>> like '?rzte'. I thought this is UTF-8 encoded, but it is
not (as the results of iconv and utf8ToInt suggest). Is there any way to
>>>     
>>>       
>> find out how these spss-values are encoded?
>>   
>>     
>>>   
>>>     
>>>       
>> You are assuming a bit much of your readers.
>>
>> What exactly are you doing? Is it a value, a value label, or perhaps a
variable name. How do the results of read.spss look on the
>>     
> R
>   
>> side? How did you apply iconv and utf8ToInt? What is your locale?
>>
>> I mean, we could try and guess all those details, but you are the one
with the hard info, and the motivation...
>>
>>   
>>     
>
>
>   

-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Possibly Parallel Threads

Search for more maybe matching threads

R help - Jan 2008 - WG: AW: Another problem with encoding

[R] WG: AW: Another problem with encoding

[R] WG: AW: Another problem with encoding

Possibly Parallel Threads