Hi, I tried Rsync 3.0.0pre8 on my mac running os X 10.5. I was very pleased about the --iconv feature, as i have to sync some LINUX-machines and I had really trouble with some filenames. But I found one strange thing in connection with the mac. First of all, the translation between the LINUX ISO-8859-15 and the mac ut-8 works (nearly) perfect. As I live in Germany, we have often filenames containing special characters (Umlaute like ??u???). And all the filenames look perfect on my mac. But whenever I run rsync again, all the files containing one of this special character in the name are deleted and copied again. And these are quite a lot. I found the reason for this behavoiur. Let me explain it with the example of the letter ? (ü) in HTML. On the LINUX machines running utf-8 the ? is coded as $C3A4 which is in utf-8 equal to the character E4. The ? occupies in that way 2 bytes. I was very astonished, when I copied a mac-filename, pasted into a texteditor and looked at the file: In the mac-filename the letter ? is coded as: $61CC88, which in utf-8 means the letter "a" followed by a $0308. (Combining diacritical marks) So the Mac combines the letter a with the two points above it instead using the E4 letter Now the things are clear: The filenames are different, in spite of looking equally. A question to the developers: do you see any solution to this problem? Perhaps a --icont=utf8mac, iso885915 ? Rudolf E. Reiber ---------------------------------------------------- Rudolf E. Reiber Kapuzinerberg 19/3 71263 Weil der Stadt Tel: 07033 44228 rudolfreiber@mac.com ---------------------------------------------------- To VISTA or not to VISTA, that is the question. The answer is to LEOPARD!
On Wed, 2008-01-23 at 16:01 +0100, Rudolf E. Reiber wrote:> I tried Rsync 3.0.0pre8 on my mac running os X 10.5. > > I was very pleased about the --iconv feature, as i have to sync some > LINUX-machines and I had really trouble with some filenames. > But I found one strange thing in connection with the mac. > > First of all, the translation between the LINUX ISO-8859-15 and the > mac ut-8 works (nearly) perfect. > > As I live in Germany, we have often filenames containing special > characters (Umlaute like ??u???). > And all the filenames look perfect on my mac. > > But whenever I run rsync again, all the files containing one of this > special character in the name are deleted and copied again. > And these are quite a lot. > > I found the reason for this behavoiur. > Let me explain it with the example of the letter ? (ü) in HTML. > On the LINUX machines running utf-8 the ? is coded as $C3A4 which is > in utf-8 equal to the character E4. The ? occupies in that way 2 bytes. > > I was very astonished, when I copied a mac-filename, pasted into a > texteditor and looked at the file: > > In the mac-filename the letter ? is coded as: $61CC88, which in utf-8 > means the letter "a" followed by a $0308. (Combining diacritical marks) > So the Mac combines the letter a with the two points above it instead > using the E4 letter > Now the things are clear: The filenames are different, in spite of > looking equally.Yup. The Mac HFS+ filesystem automatically decomposes Unicode characters in the stored versions of filenames, which confuses a number of programs, including rsync and git. A flamewar about whether to blame the problem on HFS+ or the application has been running on the git list for a week now.> A question to the developers: do you see any solution to this problem? > Perhaps a --icont=utf8mac, iso885915 ?Precisely. We need an iconv encoding name for "the form of UTF-8 that the Mac likes", and none of the existing encodings in the iconv on my computer fit the bill. Another option is store the umlaut-named files on a filesystem other than HFS+ on the Mac. Matt
I see now that *Apple's* iconv does have the necessary "utf8mac" encoding. See: http://code.google.com/p/macfuse/issues/detail?id=139 http://libiconv.darwinports.com/ So just pass --iconv=utf8mac,iso885915 when the Mac is sending and --iconv=iso885915,utf8mac when it is receiving, and the problem should go away. Matt On Thu, 2008-01-24 at 07:54 +0100, Rudolf E. Reiber wrote:> Hi Matt, > thanks for your answer. > Are the developers working at this problem? > So, can I wait for the solution und in the meantime have a little more > traffic on the line? > > Rudolf > > > Am 24.01.2008 um 05:28 schrieb Matt McCutchen: > > >> A question to the developers: do you see any solution to this > >> problem? > >> Perhaps a --icont=utf8mac, iso885915 ? > > > > Precisely. We need an iconv encoding name for "the form of UTF-8 that > > the Mac likes", and none of the existing encodings in the iconv on my > > computer fit the bill. Another option is store the umlaut-named files > > on a filesystem other than HFS+ on the Mac. > > > > Matt >
Please keep this on the rsync list. On Thu, 2008-01-24 at 18:32 +0100, Rudolf E. Reiber wrote:> I am sorry, but when I apply the --iconv=utf8mac,iso885915 option, the > rsync compleately fails. > to compile some patches or do some other things in order getting > utf8mac to work? Or is this feature built in Rsync 3.0.0pre8?Support for encodings such as utf8mac is determined by the Mac's libiconv, not by rsync. You need to install a libiconv that supports the utf8mac encoding (check that utf8mac is listed when you run "iconv --list") and then build rsync against that libiconv. It looks like your best bet is the MacPorts libiconv described here: http://trac.macports.org/projects/macports/browser/trunk/dports/textproc/libiconv/Portfile In fact, rsync 3.0.0pre8 itself appears to be available through MacPorts: http://trac.macports.org/projects/macports/browser/trunk/dports/net/rsync-devel/Portfile Matt> Am 24.01.2008 um 16:11 schrieb Matt McCutchen: > > > I see now that *Apple's* iconv does have the necessary "utf8mac" > > encoding. See: > > > > http://code.google.com/p/macfuse/issues/detail?id=139 > > http://libiconv.darwinports.com/ > > > > So just pass --iconv=utf8mac,iso885915 when the Mac is sending and > > --iconv=iso885915,utf8mac when it is receiving, and the problem should > > go away.