thr3ads.net - sup talk - [sup-talk] System encoding versus messages encoding [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Israel Herraiz

2008-Apr-21 20:37 UTC

[sup-talk] System encoding versus messages encoding

Hi there,

I have been using Sup for a couple of days, and I have found some
problems with the encoding of messages written in ISO-8859-1(5).

The encoding in my system is en_US.UTF-8, but most of the email that I
receive is in Spanish, and usually encoded with ISO-8859-1 (and
sometimes with ISO-8859-15).

When I start Sup, it detects my UTF-8 and tries to decode the messages
with that encoding, that results in hardly readable messages  (most of
the sentences where wide characters appear are truncated).

If I start it with "LANG=en_US.ISO-8859-1 sup" it now detects the new
encoding, and the messages are more readable, but I can not see the
right characters because my terminal enconding is UTF-8 (but at least
the sentences are not truncated).

With Mutt (and other email clients), I still use UTF-8 as my system
encoding, and I see ISO-8859-1 messages correctly. The encoding of the
messages is of course included in the headers.

As far as I know (considering what I have read in the documentation
and in the archives of this mailing list), Sup determines the encoding
using the environment variables and tries to decode all the messages
using that encoding. For people working in different languages and
environments (like me, I write in English and Spanish, some people
send me messages in UTF-8, some other in ISO-88159-1), having an
overall encoding for all the messages is not a good solution.

Would it possible to decode each message according to its headers?

Please correct me if I am wrong in any of my assumptions on how Sup
encodes/decodes messages.

By the way, I am using the Git version of Sup (as of today :-).

Cheers,
Israel

Shot (Piotr Szotkowski)

2008-Apr-22 11:39 UTC

head link

[sup-talk] System encoding versus messages encoding

Israel Herraiz:
> As far as I know (considering what I have read in the documentation
> and in the archives of this mailing list), Sup determines the encoding
> using the environment variables and tries to decode all the messages
> using that encoding. For people working in different languages and
> environments (like me, I write in English and Spanish, some people
> send me messages in UTF-8, some other in ISO-88159-1), having an
> overall encoding for all the messages is not a good solution.
I agree wholeheartedly ? I tried to use Sup a couple of weeks ago, but
this issue made it unusable for me (I planned to hack on this some day,
but my work and uni obligations do not leave any free time lately). :(

To make matters worse, the relevant RFC actually requires emails to
be encoded in the ?tightest? encoding possible ? when I?m writing an
email in Polish without any non-US-ASCII letters, it should be sent
as US-ASCII; if I include Polish diacritical characters, it should be
encoded as ISO-8859-2, but if I add some characters outside of it (say,
ellipsis) it should be sent in UTF-8.

As a result, I regularly receilve emails in US-ASCII, ISO-8859-2 and
UTF-8 ? but also in ISO-8859-1, ISO-8859-15, as well as some cyryllic
encodings. I remember Mutt going through some growing pains to
accomodate this, but has it all sorted out now (there is an issue
of recoding the email one replies to to the encoding expected by the
editor, for example).

I?ll be more than happy to test any work done in this regard and I offer
any knowledge that could be useful; unfortunately, I can?t promise any
hacking time (I just got accepted for this year?s Summer of Code to hack
on CiviCRM internationalisation ? maybe Sup could apply to be a project
in next year?s SoC edition?).

-- Shot
-- 
I grew up in Europe, where the history comes from.   -- Eddie Izzard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url :
http://rubyforge.org/pipermail/sup-talk/attachments/20080422/9e7cc1bf/attachment.bin

William Morgan

2008-Apr-22 23:18 UTC

head link

[sup-talk] System encoding versus messages encoding

Reformatted excerpts from Israel Herraiz''s message of
2008-04-21:> As far as I know (considering what I have read in the documentation
> and in the archives of this mailing list), Sup determines the encoding
> using the environment variables and tries to decode all the messages
> using that encoding.
This is not correct. Sup DOES determin the message''s (or MIME
component''s) charset, and transcodes it to your terminal charset using
the iconv library before display. (After decoding quoted-printable,
etc.)

The problem is actually that the Ruby ncurses gem is not wide-character
aware, so dumping UTF8 to the terminal doesn''t actually work. Well, it
actually does seem to work for some characters, but not most of them.

The solution has been known for quite some time, but it ain''t pretty:
http://rubyforge.org/pipermail/sup-talk/2007-October/000297.html

The good news is that I''ve just made it slightly simpler, at least if
you''re running from git. I''ve published an
"ncursesw" branch that
contains a hacked ncurses-0.9.1 and a dirty script to install it into
your ../lib/ directory. If you use that AND you run from git next,
you''ll see wide characters. It works!

So, just a "few" "simple" commands:
$ git branch --track ncursesw origin/ncursesw
$ git checkout ncursesw
$ cd ncurses-0.9.1/
$ ./run-this-for-sup.sh 
$ cd ..
$ git checkout next
$ ruby -Ilib bin/sup

... and you should see wide characters, assuming your terminal is
capable. If make dies, you probably need to install some kind of
ncursesw development library. On my Debian system it''s
a package called libncursesw5-dev.

A gold star to anyone who makes a nice wiki page out of this.

-- 
William <wmorgan-sup at masanjin.net>

Israel Herraiz

2008-Apr-23 00:09 UTC

head link

[sup-talk] System encoding versus messages encoding

Excerpts from William Morgan''s message of Wed Apr 23 01:18:00 +0200
2008:> The good news is that I''ve just made it slightly simpler, at least
if
> you''re running from git. I''ve published an
"ncursesw" branch that
> contains a hacked ncurses-0.9.1 and a dirty script to install it into
> your ../lib/ directory. If you use that AND you run from git next,
> you''ll see wide characters. It works!
Great. I have obtained the new changes from the git repository, and
applied the recipe that you give, and yes, it works! Thanks!

I am noting only another odd thing: when I press enter to from the
inbox-mode to the thread-mode, some characters of the inbox view
"persists" in the screen. I will file the bug if you consider that it
is necessary. 
> A gold star to anyone who makes a nice wiki page out of this.
I guess that is my turn :-). I will add a page about UTF-8 with a
summary of this message and the message that you link.

Thanks again for your help.

Cheers,
Israel

Israel Herraiz

2008-Apr-23 01:08 UTC

head link

[sup-talk] System encoding versus messages encoding

Excerpts from William Morgan''s message of Wed Apr 23 01:18:00 +0200
2008:> A gold star to anyone who makes a nice wiki page out of this.
I have added a summary of the content of the two messages (the parent
message of this one, and the one you link with patches for the
gem). It is available at:

http://sup.rubyforge.org/wiki/wiki.pl?UTF8

I have added a link to this page in the main page of the wiki.

Please note that I have tested only the method included in your
previous message (using the git sources). I have not tested the other
method (using the gem). I would appreaciate if someone can go to the
wiki page, follow the recipe and check that is correct (or change
whatever might be wrong).

Cheers,
Israel

William Morgan

2008-Apr-23 01:53 UTC

head link

[sup-talk] System encoding versus messages encoding

Reformatted excerpts from Israel Herraiz''s message of
2008-04-22:> http://sup.rubyforge.org/wiki/wiki.pl?UTF8
Thanks! Very thorough.
> Please note that I have tested only the method included in your
> previous message (using the git sources). I have not tested the other
> method (using the gem). I would appreaciate if someone can go to the
> wiki page, follow the recipe and check that is correct (or change
> whatever might be wrong).
I gave it a little tweak.

-- 
William <wmorgan-sup at masanjin.net>

William Morgan

2008-Apr-23 02:03 UTC

head link

[sup-talk] System encoding versus messages encoding

Reformatted excerpts from Israel Herraiz''s message of
2008-04-22:> I am noting only another odd thing: when I press enter to from the
> inbox-mode to the thread-mode, some characters of the inbox view
> "persists" in the screen.
Yeah, I see this too. Looks like a string length issue, actually.
I''ll look into it---probably an easy fix.

-- 
William <wmorgan-sup at masanjin.net>

Marc Hartstein

2008-Apr-24 19:36 UTC

head link

[sup-talk] System encoding versus messages encoding

Excerpts from Israel Herraiz''s message of Tue Apr 22 21:08:00 -0400
2008:> Excerpts from William Morgan''s message of Wed Apr 23 01:18:00
+0200 2008:
> > A gold star to anyone who makes a nice wiki page out of this.
> 
> http://sup.rubyforge.org/wiki/wiki.pl?UTF8
> 
> Please note that I have tested only the method included in your
> previous message (using the git sources). I have not tested the other
> method (using the gem). I would appreaciate if someone can go to the
> wiki page, follow the recipe and check that is correct (or change
> whatever might be wrong).
Have just followed the Git instructions on the wiki, and some of my wide
character weirdness has gone away.

Thanks, both of you.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url :
http://rubyforge.org/pipermail/sup-talk/attachments/20080424/fd4328b5/attachment.bin

Grant Hollingworth

2008-Apr-24 23:10 UTC

head link

[sup-talk] [PATCH] fixed dlopen of libc for os x

OS X likes to do its own thing.

---
 lib/sup.rb |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/sup.rb b/lib/sup.rb
index c4d1dd5..afd030f 100644
--- a/lib/sup.rb
+++ b/lib/sup.rb
@@ -14,7 +14,7 @@ require ''curses''
 require ''dl/import''
 module LibC
   extend DL::Importable
-  dlload "libc.so.6"
+  dlload Config::CONFIG[''arch''] =~ /darwin/ ?
"libc.dylib" : "libc.so.6"
   extern "void setlocale(int, const char *)"
 end
 LibC.setlocale(6, "")  # LC_ALL == 6
-- 
1.5.4.4

sup talk - Apr 2008 - System encoding versus messages encoding

[sup-talk] System encoding versus messages encoding

[sup-talk] System encoding versus messages encoding

[sup-talk] System encoding versus messages encoding

[sup-talk] System encoding versus messages encoding

[sup-talk] System encoding versus messages encoding

[sup-talk] System encoding versus messages encoding

[sup-talk] System encoding versus messages encoding

[sup-talk] System encoding versus messages encoding

[sup-talk] [PATCH] fixed dlopen of libc for os x