I have a file which contains non-ASCII characters (umlauts, accented characters, etc.) both in its filename as well as its content. The only way I have been able to see these characters is inside vim, where they are displayed correctly no matter what I have LANG set to. My default LANG is en_US.utf8, but I have tried de_DE.utf8, de_DE.iso88591, and various others. In the output of a simple ls, the characters are shown as either a simple "?" or a "?" in a box depending on what I have LANG set to, and the same is true if I use cat to display the contents. I have tried this both in an xterm as well as a gnome-terminal window on a CentOS 5.3 system. I am taking stabs in the dark here as I've never had to deal with LANG and locales before. Am I missing something obvious or is there more to it than simply setting LANG. Since I can see the characters correctly in vim, I'm sure it's not a font issue. I just want to be able to display these characters correctly with simple Linux commands like ls and cat. Thanks, Alfred
Alfred von Campe a ?crit :> I have a file which contains non-ASCII characters (umlauts, accented > characters, etc.) both in its filename as well as its content. The > only way I have been able to see these characters is inside vim, > where they are displayed correctly no matter what I have LANG set > to. My default LANG is en_US.utf8, but I have tried de_DE.utf8, > de_DE.iso88591, and various others. In the output of a simple ls, > the characters are shown as either a simple "?" or a "?" in a box > depending on what I have LANG set to, and the same is true if I use > cat to display the contents. I have tried this both in an xterm as > well as a gnome-terminal window on a CentOS 5.3 system. > > I am taking stabs in the dark here as I've never had to deal with > LANG and locales before.The 'file' command displays encoding information. If you have to change the encoding, use 'recode'. Example : $ recode latin1..utf8 file ... from latin 1 (iso-8859-1 or the likes) to UTF-8. $ recode utf8..latin1 file ... the other way around. Check the syntax with the two dots between encodings. Cheers, Niki