samba-bugs at samba.org
2020-Jun-04 16:22 UTC
[Bug 14401] New: unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401
Bug ID: 14401
Summary: unicode character conversion problem from MacOS to
Linux despite iconv
Product: rsync
Version: 3.1.3
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P5
Component: core
Assignee: wayne at opencoder.net
Reporter: quickhelp at gmail.com
QA Contact: rsync-qa at samba.org
Target Milestone: ---
// SOURCE (initiating rsync):
ProductName: Mac OS X
ProductVersion: 10.15.4
BuildVersion: 19E287
Homebrew rsync:
rsync version 3.1.3 protocol version 31
Copyright (C) 1996-2018 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
append, ACLs, xattrs, iconv, symtimes, no prealloc, file-flags
// TARGET:
Debian Linux 10.4
Linux 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2 (2020-04-29) x86_64 GNU/Linux
Debian rsync:
rsync version 3.1.3 protocol version 31
Copyright (C) 1996-2018 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
append, ACLs, xattrs, iconv, symtimes, prealloc
rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.
Problem:
2020/06/04 17:12:21 [12205] [sender] cannot convert filename:
Users/me/Library/Mail/V7/59923E9C-ACCC-45B0-B179-4CD4EA4D87D5/Sent
Messages.mbox/DEED0205-D544-48AF
-BDB2-40C0E6D5380C/Data/5/3/5/1/Attachments/1535951/3/<F0><9F><9B><84>
Danke!
Ihre Buchung ist besta<CC><88>tigt: Dolly Waikiki.eml (Illegal byte
sequence)
2020/06/04 17:12:21 [12205] IO error encountered -- skipping file deletion
ls on the filename shows:
--
You are receiving this mail because:
You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-04 17:38 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401
Wayne Davison <wayne at opencoder.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |WORKSFORME
--- Comment #1 from Wayne Davison <wayne at opencoder.net> ---
Macs use a weird utf-8-mac encoding that you need to make sure you're
specifying. If the iconv library complains about the encoding, then either the
encoding name is wrong or you have an invalid file that isn't named with the
specified encoding.
So, if you're running rsync on a mac and copying to/from a linux host, you
should be able to specify:
--iconv=utf-8-mac,utf-8
to specify the local & remote charset.
--
You are receiving this mail because:
You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-04 17:57 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401 --- Comment #2 from Tormen <quickhelp at gmail.com> --- Hi @wayne. I only now noticed that my comment got truncated at the "ls" part. I had provided afterwards also the rsync command used: '/usr/local/bin/rsync' -e 'ssh -p 53146 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null' -D --numeric-ids --links --hard-links --one-file-system --itemize-changes --times --recursive --perms --owner --group --stats --human-readable --partial --delete --iconv=utf-8-mac,utf-8 --compress --log-file '/Users/admin/.rsync-tmbackup/2020-06-04-195558.log' --exclude-from '/LINKS/etc/rsync-tmbackup.exclude-from.macado' -- '/System/Volumes/Data/' 'root at jolie:/jbackup/macado/System.Volumes.Data/2020-06-04-1955.50___FULL-BACKUP/' As you can see I already use(d) "--iconv=utf-8-mac,utf-8" As the file is an Email stored by MacOS Mail app the encoding should be MacOS standard and not untypical for a Mac. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-04 17:58 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401 --- Comment #3 from Tormen <quickhelp at gmail.com> --- Created attachment 16018 --> https://bugzilla.samba.org/attachment.cgi?id=16018&action=edit File breaking the rsync --iconv=utf-8-mac,utf-8 -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-04 17:59 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401
Tormen <quickhelp at gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|WORKSFORME |---
Status|RESOLVED |REOPENED
--
You are receiving this mail because:
You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-04 18:00 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401 --- Comment #4 from Tormen <quickhelp at gmail.com> --- Created attachment 16019 --> https://bugzilla.samba.org/attachment.cgi?id=16019&action=edit File breaking the rsync --iconv=utf-8-mac,utf-8 -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-04 18:04 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401 --- Comment #5 from Tormen <quickhelp at gmail.com> ---> If the iconv library complains about the encoding, then either the encoding name is wrong or you have an invalid file that isn't named with the specified encoding.How can I "zoom in" here to verify if the filename is encoded in utf-8-mac or not ? -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-04 18:11 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401 --- Comment #6 from Tormen <quickhelp at gmail.com> --- Created attachment 16020 --> https://bugzilla.samba.org/attachment.cgi?id=16020&action=edit File breaking the rsync --iconv=utf-8-mac,utf-8 -- in Finder -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-04 18:12 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401
Tormen <quickhelp at gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #16018|File breaking the rsync |File breaking the rsync
description|--iconv=utf-8-mac,utf-8 |--iconv=utf-8-mac,utf-8 --
| |"ls" in
"Terminal" App
--
You are receiving this mail because:
You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-05 06:14 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401 --- Comment #7 from SATOH Fumiyasu <fumiyas at osstech.co.jp> --- The ` -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-05 06:16 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401 --- Comment #8 from SATOH Fumiyasu <fumiyas at osstech.co.jp> --- The U+1F6C4 (BAGGAGE CLAIM emoji, <F0><9F><9B><84> in UTF-8) is a Unicode character and is located in surrogate pairs, but the UTF-8-MAC encoding by macOS's iconv(3) does not support surrogate pairs. Try to compile your rsync binary with my hacked GNU libiconv: https://github.com/fumiyas/libiconv-utf8mac ... Bugzilla does not support surrogate pairs... -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-05 06:26 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401 --- Comment #9 from Wayne Davison <wayne at opencoder.net> --- One other thing you could do when sending files to Linux is to not translate the names. This is because Linux can create a filename with oddball character sequences (unlike macos) so it can store and retrieve the raw filenames just fine for something like backup and restore. It would just cause various names to display with "?" chars when listed on Linux, or displayed with escape sequences when listed with `ls -b`: \360\237\233\204\ Danke!\ Ihre\ Buchung\ ist\ best?tigt:\ Dolly\ Waikiki.eml In any case, this sounds like an iconv issue, not an rsync issue. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-05 10:10 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401 --- Comment #10 from Tormen <quickhelp at gmail.com> --- Thank you very much !!!!! for your comments. I agree that the problem I pointed out would be then a limitation of iconv. The comment about not converting was really helpful. I was not sure about leaving the --conv off, but it makes sense. And I will go down this road and hope that I always have a MAC to restore too ;) This ticket can now be closed... but I left it open, as I was not sure what status to pick in this case. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2020-Jun-06 22:15 UTC
[Bug 14401] unicode character conversion problem from MacOS to Linux despite iconv
https://bugzilla.samba.org/show_bug.cgi?id=14401
Wayne Davison <wayne at opencoder.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |WORKSFORME
Status|REOPENED |RESOLVED
--
You are receiving this mail because:
You are the QA Contact for the bug.
Reasonably Related Threads
- [Bug 13827] New: despite --copy-unsafe-links, rsync does not copy the referent of symlinks that point one level outside the copied tree
- rsyncing from win1251 to UTF-8
- [Bug 11338] New: Rsync Crash - Segmentation fault
- [Bug 14371] New: Combined Exclude & Protect Filter Type
- [Bug 14338] New: ZSTD support