Hi there, I have been using Sup for a couple of days, and I have found some problems with the encoding of messages written in ISO-8859-1(5). The encoding in my system is en_US.UTF-8, but most of the email that I receive is in Spanish, and usually encoded with ISO-8859-1 (and sometimes with ISO-8859-15). When I start Sup, it detects my UTF-8 and tries to decode the messages with that encoding, that results in hardly readable messages (most of the sentences where wide characters appear are truncated). If I start it with "LANG=en_US.ISO-8859-1 sup" it now detects the new encoding, and the messages are more readable, but I can not see the right characters because my terminal enconding is UTF-8 (but at least the sentences are not truncated). With Mutt (and other email clients), I still use UTF-8 as my system encoding, and I see ISO-8859-1 messages correctly. The encoding of the messages is of course included in the headers. As far as I know (considering what I have read in the documentation and in the archives of this mailing list), Sup determines the encoding using the environment variables and tries to decode all the messages using that encoding. For people working in different languages and environments (like me, I write in English and Spanish, some people send me messages in UTF-8, some other in ISO-88159-1), having an overall encoding for all the messages is not a good solution. Would it possible to decode each message according to its headers? Please correct me if I am wrong in any of my assumptions on how Sup encodes/decodes messages. By the way, I am using the Git version of Sup (as of today :-). Cheers, Israel
Shot (Piotr Szotkowski)
2008-Apr-22 11:39 UTC
[sup-talk] System encoding versus messages encoding
Israel Herraiz:> As far as I know (considering what I have read in the documentation > and in the archives of this mailing list), Sup determines the encoding > using the environment variables and tries to decode all the messages > using that encoding. For people working in different languages and > environments (like me, I write in English and Spanish, some people > send me messages in UTF-8, some other in ISO-88159-1), having an > overall encoding for all the messages is not a good solution.I agree wholeheartedly ? I tried to use Sup a couple of weeks ago, but this issue made it unusable for me (I planned to hack on this some day, but my work and uni obligations do not leave any free time lately). :( To make matters worse, the relevant RFC actually requires emails to be encoded in the ?tightest? encoding possible ? when I?m writing an email in Polish without any non-US-ASCII letters, it should be sent as US-ASCII; if I include Polish diacritical characters, it should be encoded as ISO-8859-2, but if I add some characters outside of it (say, ellipsis) it should be sent in UTF-8. As a result, I regularly receilve emails in US-ASCII, ISO-8859-2 and UTF-8 ? but also in ISO-8859-1, ISO-8859-15, as well as some cyryllic encodings. I remember Mutt going through some growing pains to accomodate this, but has it all sorted out now (there is an issue of recoding the email one replies to to the encoding expected by the editor, for example). I?ll be more than happy to test any work done in this regard and I offer any knowledge that could be useful; unfortunately, I can?t promise any hacking time (I just got accepted for this year?s Summer of Code to hack on CiviCRM internationalisation ? maybe Sup could apply to be a project in next year?s SoC edition?). -- Shot -- I grew up in Europe, where the history comes from. -- Eddie Izzard -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: not available Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080422/9e7cc1bf/attachment.bin
Reformatted excerpts from Israel Herraiz''s message of 2008-04-21:> As far as I know (considering what I have read in the documentation > and in the archives of this mailing list), Sup determines the encoding > using the environment variables and tries to decode all the messages > using that encoding.This is not correct. Sup DOES determin the message''s (or MIME component''s) charset, and transcodes it to your terminal charset using the iconv library before display. (After decoding quoted-printable, etc.) The problem is actually that the Ruby ncurses gem is not wide-character aware, so dumping UTF8 to the terminal doesn''t actually work. Well, it actually does seem to work for some characters, but not most of them. The solution has been known for quite some time, but it ain''t pretty: http://rubyforge.org/pipermail/sup-talk/2007-October/000297.html The good news is that I''ve just made it slightly simpler, at least if you''re running from git. I''ve published an "ncursesw" branch that contains a hacked ncurses-0.9.1 and a dirty script to install it into your ../lib/ directory. If you use that AND you run from git next, you''ll see wide characters. It works! So, just a "few" "simple" commands: $ git branch --track ncursesw origin/ncursesw $ git checkout ncursesw $ cd ncurses-0.9.1/ $ ./run-this-for-sup.sh $ cd .. $ git checkout next $ ruby -Ilib bin/sup ... and you should see wide characters, assuming your terminal is capable. If make dies, you probably need to install some kind of ncursesw development library. On my Debian system it''s a package called libncursesw5-dev. A gold star to anyone who makes a nice wiki page out of this. -- William <wmorgan-sup at masanjin.net>
Excerpts from William Morgan''s message of Wed Apr 23 01:18:00 +0200 2008:> The good news is that I''ve just made it slightly simpler, at least if > you''re running from git. I''ve published an "ncursesw" branch that > contains a hacked ncurses-0.9.1 and a dirty script to install it into > your ../lib/ directory. If you use that AND you run from git next, > you''ll see wide characters. It works!Great. I have obtained the new changes from the git repository, and applied the recipe that you give, and yes, it works! Thanks! I am noting only another odd thing: when I press enter to from the inbox-mode to the thread-mode, some characters of the inbox view "persists" in the screen. I will file the bug if you consider that it is necessary.> A gold star to anyone who makes a nice wiki page out of this.I guess that is my turn :-). I will add a page about UTF-8 with a summary of this message and the message that you link. Thanks again for your help. Cheers, Israel
Excerpts from William Morgan''s message of Wed Apr 23 01:18:00 +0200 2008:> A gold star to anyone who makes a nice wiki page out of this.I have added a summary of the content of the two messages (the parent message of this one, and the one you link with patches for the gem). It is available at: http://sup.rubyforge.org/wiki/wiki.pl?UTF8 I have added a link to this page in the main page of the wiki. Please note that I have tested only the method included in your previous message (using the git sources). I have not tested the other method (using the gem). I would appreaciate if someone can go to the wiki page, follow the recipe and check that is correct (or change whatever might be wrong). Cheers, Israel
Reformatted excerpts from Israel Herraiz''s message of 2008-04-22:> http://sup.rubyforge.org/wiki/wiki.pl?UTF8Thanks! Very thorough.> Please note that I have tested only the method included in your > previous message (using the git sources). I have not tested the other > method (using the gem). I would appreaciate if someone can go to the > wiki page, follow the recipe and check that is correct (or change > whatever might be wrong).I gave it a little tweak. -- William <wmorgan-sup at masanjin.net>
Reformatted excerpts from Israel Herraiz''s message of 2008-04-22:> I am noting only another odd thing: when I press enter to from the > inbox-mode to the thread-mode, some characters of the inbox view > "persists" in the screen.Yeah, I see this too. Looks like a string length issue, actually. I''ll look into it---probably an easy fix. -- William <wmorgan-sup at masanjin.net>
Excerpts from Israel Herraiz''s message of Tue Apr 22 21:08:00 -0400 2008:> Excerpts from William Morgan''s message of Wed Apr 23 01:18:00 +0200 2008: > > A gold star to anyone who makes a nice wiki page out of this. > > http://sup.rubyforge.org/wiki/wiki.pl?UTF8 > > Please note that I have tested only the method included in your > previous message (using the git sources). I have not tested the other > method (using the gem). I would appreaciate if someone can go to the > wiki page, follow the recipe and check that is correct (or change > whatever might be wrong).Have just followed the Git instructions on the wiki, and some of my wide character weirdness has gone away. Thanks, both of you. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://rubyforge.org/pipermail/sup-talk/attachments/20080424/fd4328b5/attachment.bin
OS X likes to do its own thing. --- lib/sup.rb | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/lib/sup.rb b/lib/sup.rb index c4d1dd5..afd030f 100644 --- a/lib/sup.rb +++ b/lib/sup.rb @@ -14,7 +14,7 @@ require ''curses'' require ''dl/import'' module LibC extend DL::Importable - dlload "libc.so.6" + dlload Config::CONFIG[''arch''] =~ /darwin/ ? "libc.dylib" : "libc.so.6" extern "void setlocale(int, const char *)" end LibC.setlocale(6, "") # LC_ALL == 6 -- 1.5.4.4