Kai Peter Nacke via llvm-dev
2020-Jun-11 16:07 UTC
[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang
Hubert Tong <hubert.reinterpretcast at gmail.com> wrote on 10.06.2020 23:51:54:> From: Hubert Tong <hubert.reinterpretcast at gmail.com> > To: Kai Peter Nacke <kai.nacke at de.ibm.com> > Cc: llvm-dev <llvm-dev at lists.llvm.org> > Date: 10.06.2020 23:52 > Subject: [EXTERNAL] Re: [llvm-dev] RFC: Adding support for the z/OS > platform to LLVM and clang > > On Wed, Jun 10, 2020 at 3:11 PM Kai Peter Nacke via llvm-dev <llvm- > dev at lists.llvm.org> wrote: > 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded > input source files. This would be done at the file open time to allowthe> rest of Clang to operate as if the source was UTF-8 and so require no > changes downstream. Feedback on this plan is welcome from the Clang > community. > Is there a statement that can be made with respect to accepting > UTF-8 encoded source files in a z/OS hosted environment or is it > implied that it works with no changes (and there are no changes that > will break this functionality)? > > Also, would these changes enable the consumption of non-UTF-8 > encoded source files on Clang as hosted on other platforms?The intention is to use the auto-conversion feature from the language environment. Currently, this platform feature does not handle conversions of multi-byte encodings, so at this time consumption of UTF-8 encoded source files is not possible. For the same reason, this does not enable the consumption of non-UTF-8 encoded source files on other platforms. Best regards, Kai Nacke IT Architect IBM Deutschland GmbH Vorsitzender des Aufsichtsrats: Sebastian Krause Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940
Hubert Tong via llvm-dev
2020-Jun-11 20:53 UTC
[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang
On Thu, Jun 11, 2020 at 12:07 PM Kai Peter Nacke <kai.nacke at de.ibm.com> wrote:> Hubert Tong <hubert.reinterpretcast at gmail.com> wrote on 10.06.2020 > 23:51:54: > > > From: Hubert Tong <hubert.reinterpretcast at gmail.com> > > To: Kai Peter Nacke <kai.nacke at de.ibm.com> > > Cc: llvm-dev <llvm-dev at lists.llvm.org> > > Date: 10.06.2020 23:52 > > Subject: [EXTERNAL] Re: [llvm-dev] RFC: Adding support for the z/OS > > platform to LLVM and clang > > > > On Wed, Jun 10, 2020 at 3:11 PM Kai Peter Nacke via llvm-dev <llvm- > > dev at lists.llvm.org> wrote: > > 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded > > input source files. This would be done at the file open time to allow > the > > rest of Clang to operate as if the source was UTF-8 and so require no > > changes downstream. Feedback on this plan is welcome from the Clang > > community. > > Is there a statement that can be made with respect to accepting > > UTF-8 encoded source files in a z/OS hosted environment or is it > > implied that it works with no changes (and there are no changes that > > will break this functionality)? > > > > Also, would these changes enable the consumption of non-UTF-8 > > encoded source files on Clang as hosted on other platforms? > > The intention is to use the auto-conversion feature from the > language environment. Currently, this platform feature does not > handle conversions of multi-byte encodings, so at this time > consumption of UTF-8 encoded source files is not possible. >If the internal representation is still UTF-8, consuming UTF-8 should involve not converting. It is sounding like the internal representation has been changed to ISO-8859-1 in order to support characters outside those in US-ASCII. If it is indeed internally fixed to ISO-8859-1, then the question of future support for non-Latin (e.g., Greek or Cyrillic) scripts arises. It may be a better tradeoff to leave the internal representation as UTF-8 and restrict the support to the US-ASCII subset for now.> For the same reason, this does not enable the consumption of > non-UTF-8 encoded source files on other platforms. >Thanks Kai for clarifying. I think this direction leads to some questions around testing. The auto-conversion feature makes use of some filesystem-specific features such as filetags that indicate the associated coded character set. In terms of the testing environment on a z/OS system under USS, will there be documentation or scripts available for establishing the necessary file properties on the local tree? It also sounds like there would be some tests that are specific to z/OS-hosted builds that test the conversion facilities. Also, if the platform feature does not handle conversions of multi-byte encodings, I am wondering if alternative mechanisms (such as iconv) have been investigated. I suppose there is an issue over how source positions are determined; however, I do not see how an extension of the autoconversion facility would avoid the said issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200611/398a16cf/attachment.html>
Kai Peter Nacke via llvm-dev
2020-Jun-16 13:16 UTC
[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang
Hubert Tong <hubert.reinterpretcast at gmail.com> wrote on 11.06.2020 22:53:14:> The intention is to use the auto-conversion feature from the > language environment. Currently, this platform feature does not > handle conversions of multi-byte encodings, so at this time > consumption of UTF-8 encoded source files is not possible. > If the internal representation is still UTF-8, consuming UTF-8 > should involve not converting. It is sounding like the internal > representation has been changed to ISO-8859-1 in order to support > characters outside those in US-ASCII. If it is indeed internally > fixed to ISO-8859-1, then the question of future support for non- > Latin (e.g., Greek or Cyrillic) scripts arises. It may be a better > tradeoff to leave the internal representation as UTF-8 and restrict > the support to the US-ASCII subset for now.The intention is to initially restrict the support to the US-ASCII subset. This enables compiling with EBCDIC-encoding files and does not exclude further development for true UTF-8 support.> For the same reason, this does not enable the consumption of > non-UTF-8 encoded source files on other platforms.Yes, because a platform-specific feature is used, it does not enable reading of non-UTF-8 encoded files on other platforms.> Thanks Kai for clarifying. I think this direction leads to some > questions around testing. > > The auto-conversion feature makes use of some filesystem-specific > features such as filetags that indicate the associated coded > character set. In terms of the testing environment on a z/OS system > under USS, will there be documentation or scripts available for > establishing the necessary file properties on the local tree? It > also sounds like there would be some tests that are specific to z/ > OS-hosted builds that test the conversion facilities.With a git clone under z/OS USS, the files get automatically tagged as Latin-1, requiring no further setup. We also have some tests which tests the text conversion. Of course, this only runs on z/OS USS due to the use of the conversion service.> Also, if the platform feature does not handle conversions of multi- > byte encodings, I am wondering if alternative mechanisms (such as > iconv) have been investigated. I suppose there is an issue over how > source positions are determined; however, I do not see how an > extension of the autoconversion facility would avoid the said issue.We have not yet investigated alternative mechanisms for converting file data. The first striking complexity is where to do the conversion. With the source locations identified, other conversion approaches are imaginable. Of course, converting on the fly poses some challenges, like the one you mentioned. Best regards, Kai Nacke IT Architect IBM Deutschland GmbH Vorsitzender des Aufsichtsrats: Sebastian Krause Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert Janzen, Markus Koerner, Christian Noll, Nicole Reimer Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940