Corentin via llvm-dev
2020-Jun-11 20:13 UTC
[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang
Hello.> 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encodedinput source files. This would be done at the file open time to allow the rest of Clang to operate as if the source was UTF-8 and so require no changes downstream. Feedback on this plan is welcome from the Clang community. Would it be correct to assume that this EBCDIC -> UTF-8 mapping would be as prescribed by UTF-EBCDIC / IBM CDRA, notably for the control characters that do not map exactly? Notably, if the execution encoding is EBCDIC, is '0x06' equivalent to '0086', etc? The question "Is Unicode sufficient to represent all characters present in the input source without using the Private Use Area?" is one that is relevant to both Clang and the C/C++ standard. ( I do hope that it is the case!) Thanks, Corentin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200611/a3ea1cfb/attachment.html>
Kai Peter Nacke via llvm-dev
2020-Jun-16 12:50 UTC
[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang
> > 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded> input source files. This would be done at the file open time to allowthe> rest of Clang to operate as if the source was UTF-8 and so require no > changes downstream. Feedback on this plan is welcome from the Clang > community. > Would it be correct to assume that this EBCDIC -> UTF-8 mapping > would be as prescribed by > UTF-EBCDIC / IBM CDRA, notably for the control characters that do > not map exactly? > Notably, if the execution encoding is EBCDIC, is '0x06' equivalent > to '0086', etc? > > The question "Is Unicode sufficient to represent all characters > present in the input source without using the Private Use Area?" is onethat> is relevant to both Clang and the C/C++ standard. ( I do hope that > it is the case!)The current goal is to make only minimal changes to the frontend to enable reading of EBCDIC encoded files. For this, we use the auto-conversion service of z/OS UNIX System Services ( https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxb200/xpascii.htm ), together with file tagging and setting the CCSID for the program and for opened files.. The auto-conversion service supports round-trip conversion between EBCDIC and Enhanced ASCII. With it, boot strapping with EBCDIC source files is possible. Of course, more complete UTF-8 support is a valid implementation alternative. Best regards, Kai Nacke IT Architect IBM Deutschland GmbH Vorsitzender des Aufsichtsrats: Sebastian Krause Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940