samba-bugs@samba.org
2007-Oct-30 16:09 UTC
DO NOT REPLY [Bug 2790] Add support for converting filenames into different encodings
https://bugzilla.samba.org/show_bug.cgi?id=2790 cabo@tzi.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cabo@tzi.org ------- Comment #9 from cabo@tzi.org 2007-10-30 11:09 CST ------- The current solution appears to be somewhat confused about what it is trying to solve. There are three filename encodings: the one in the client fs, the transfer encoding, the one in the server fs. Client needs to know client-fs and transfer, server needs to knoe server-fs and transfer. Trying to mush up any two of the three leads to pain. There are also three scenarios: -- sane: common transfer encoding (UTF-8 in NFC). Server and client need to know local conventions; as in current --iconv=., they probably can figure that out. -- compatible: The server may not know about iconv. So the client has to do all the conversions. This is almost support now, except that the client sends an iconv option to the server that this does not understand. -- fast: if both sides have the same encoding, the whole thing should be skipped. This is also compatible (it is the way it works right now). Because of compatibility, "sane" probably needs an option to switch it on. It may also need client-side and server-side overrides to help these two out if they can't guess or guess wrong. Compatible also needs an option to switch it on, and parameters to control the conversion. It is by definition client-side only; the client needs to be told what the server needs (and also may need help in guessing its own encoding). (For symmetry, it is also conceivable to add a server-side compatible option as part of the ssh-options.) Fast is the current (2.x) default and probably should stay the default for compatibility. So I propose (names are descriptive, but not optimal yet): --encoding-aware: Switches on sane. --client-encoding: supplies (overrides) value for client-side encoding for sane. --server-encoding: supplies (overrides) value for server-side encoding for sane. --transfer-encoding: overrides the transfer-encoding (default: UTF-8 NFC). --server-encoding-unaware: don't tell the server anything, but do everything on client-side. --client-encoding-unaware: inverse (if you want to do that). Maybe combining --encoding-aware and --server-encoding-unaware into one --client-encoding-aware is better. Maybe combining --encoding-aware and --client-encoding-unaware into one --server-encoding-aware is better. In both cases, this is somewhat confusing, because you want to keep the sane transfer coding unless you are in the compatible case. The only switch that needs a single-character form is --encoding-aware, which should get part of finger memory like -a for most rsync users. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2007-Oct-30 22:15 UTC
DO NOT REPLY [Bug 2790] Add support for converting filenames into different encodings
https://bugzilla.samba.org/show_bug.cgi?id=2790 ------- Comment #10 from matt@mattmccutchen.net 2007-10-30 17:15 CST ------- (In reply to comment #9)> The current solution appears to be somewhat confused about what it is trying to > solve.Rather, you appear to be overcomplicating the problem.> There are three filename encodings: the one in the client fs, the transfer > encoding, the one in the server fs. > Client needs to know client-fs and transfer, server needs to knoe server-fs and > transfer. > Trying to mush up any two of the three leads to pain.Rsync isn't like MySQL, which tags every string value with its encoding, and I don't see why we would want to make it that way. Instead, the rsync sender and receiver each treat filenames as plain sequences of bytes, in accordance with the POSIX filesystem API on which rsync relies so heavily. --iconv merely allows you to make the sender and receiver byte sequences differ by an encoding conversion because this is often useful.> -- compatible: The server may not know about iconv. So the client has to do > all the conversions. This is almost support now, except that the client sends > an iconv option to the server that this does not understand.This is the only thing you propose that rsync does not already support, and I think it is a natural addition to rsync. Currently, if iconv is enabled, each process converts strings from its local encoding to UTF-8 before sending them over the wire and converts strings from UTF-8 to its local encoding after reading them from the wire. Rsync should let the user specify another encoding in place of UTF-8. Specifically, I propose two options to specify the conversion, if any, to be applied on each end: --iconv-client=CLIENT,WIRE and --iconv-server=WIRE,SERVER . (There's no reason rsync shouldn't allow the two values of WIRE to be different, although this would rarely be useful.) --iconv=CLIENT,SERVER then stands for --iconv-client=CLIENT,UTF-8 --iconv-server=UTF-8,SERVER . A "compatible" copy with a UTF-8 client and an ISO-8859-1 server could be achieved by --iconv-client=UTF-8,ISO-8859-1 .> The only switch that needs a single-character form is --encoding-aware, which > should get part of finger memory like -a for most rsync users.I think --iconv=. or --encoding-aware is too special-purpose to "need" a single-character form in the main version of rsync. If you use it frequently, you can always define your own popt alias. This is what Wayne recommended for my favorite "sane" option, --chmod=ugo=rwX . -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2007-Oct-31 00:22 UTC
DO NOT REPLY [Bug 2790] Add support for converting filenames into different encodings
https://bugzilla.samba.org/show_bug.cgi?id=2790 ------- Comment #11 from cabo@tzi.org 2007-10-30 19:22 CST -------> Currently, if iconv is enabled, each > process converts strings from its local encoding to UTF-8 before sending them > over the wire and converts strings from UTF-8 to its local encoding after > reading them from the wire.I must admit I didn't get this at all from the documentation (I assume when you say "UTF-8" you mean "UTF-8 NFC", which may be its conventional meaning in the Samba world, I don't know).> Specifically, I propose two options to specify the conversion, if any, to be > applied on each end: --iconv-client=CLIENT,WIRE and --iconv-server=WIRE,SERVERSounds good to me. WIRE might default to "UTF-8" to make things even simpler for the most sane cases.> I think --iconv=. or --encoding-aware is too special-purpose to "need" a > single-character form in the main version of rsync. If you use it frequently, > you can always define your own popt alias. This is what Wayne recommended for > my favorite "sane" option, --chmod=ugo=rwX .As I live in heterogeneous environments, I'm not so sure about that, but then there is always RSYNC_ICONV (which would need to be able to set the new options, too, hmm). Thanks for the quick and sane reply! -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
Seemingly Similar Threads
- DO NOT REPLY [Bug 2790] Add support for converting filenames into different encodings
- DO NOT REPLY [Bug 2790] Add support for converting filenames into different encodings
- DO NOT REPLY [Bug 2790] Add support for converting filenames into different encodings
- [Bug 2790] Add support for converting filenames into different encodings
- DO NOT REPLY [Bug 3362] New: Add option to normalize Unicode filenames