thr3ads.net - llvm dev - [llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Kai Peter Nacke via llvm-dev

2020-Jun-11 16:07 UTC

[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang

Hubert Tong <hubert.reinterpretcast at gmail.com> wrote on 10.06.2020 
23:51:54:
> From: Hubert Tong <hubert.reinterpretcast at gmail.com>
> To: Kai Peter Nacke <kai.nacke at de.ibm.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Date: 10.06.2020 23:52
> Subject: [EXTERNAL] Re: [llvm-dev] RFC: Adding support for the z/OS 
> platform to LLVM and clang
> 
> On Wed, Jun 10, 2020 at 3:11 PM Kai Peter Nacke via llvm-dev <llvm-
> dev at lists.llvm.org> wrote:
> 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded 
> input source files. This would be done at the file open time to allow 
the > rest of Clang to operate as if the source was UTF-8 and so require no 
> changes downstream. Feedback on this plan is welcome from the Clang 
> community.
> Is there a statement that can be made with respect to accepting 
> UTF-8 encoded source files in a z/OS hosted environment or is it 
> implied that it works with no changes (and there are no changes that
> will break this functionality)?
> 
> Also, would these changes enable the consumption of non-UTF-8 
> encoded source files on Clang as hosted on other platforms?
The intention is to use the auto-conversion feature from the
language environment. Currently, this platform feature does not
handle conversions of multi-byte encodings, so at this time
consumption of UTF-8 encoded source files is not possible.
For the same reason, this does not enable the consumption of
non-UTF-8 encoded source files on other platforms.

Best regards,
Kai Nacke
IT Architect

IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert 
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 14562 / WEEE-Reg.-Nr. DE 99369940

Hubert Tong via llvm-dev

2020-Jun-11 20:53 UTC

head link

[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang

On Thu, Jun 11, 2020 at 12:07 PM Kai Peter Nacke <kai.nacke at de.ibm.com>
wrote:
> Hubert Tong <hubert.reinterpretcast at gmail.com> wrote on 10.06.2020
> 23:51:54:
>
> > From: Hubert Tong <hubert.reinterpretcast at gmail.com>
> > To: Kai Peter Nacke <kai.nacke at de.ibm.com>
> > Cc: llvm-dev <llvm-dev at lists.llvm.org>
> > Date: 10.06.2020 23:52
> > Subject: [EXTERNAL] Re: [llvm-dev] RFC: Adding support for the z/OS
> > platform to LLVM and clang
> >
> > On Wed, Jun 10, 2020 at 3:11 PM Kai Peter Nacke via llvm-dev <llvm-
> > dev at lists.llvm.org> wrote:
> > 2) Add patches to Clang to allow EBCDIC and ASCII (ISO-8859-1) encoded
> > input source files. This would be done at the file open time to allow
> the
> > rest of Clang to operate as if the source was UTF-8 and so require no
> > changes downstream. Feedback on this plan is welcome from the Clang
> > community.
> > Is there a statement that can be made with respect to accepting
> > UTF-8 encoded source files in a z/OS hosted environment or is it
> > implied that it works with no changes (and there are no changes that
> > will break this functionality)?
> >
> > Also, would these changes enable the consumption of non-UTF-8
> > encoded source files on Clang as hosted on other platforms?
>
> The intention is to use the auto-conversion feature from the
> language environment. Currently, this platform feature does not
> handle conversions of multi-byte encodings, so at this time
> consumption of UTF-8 encoded source files is not possible.
>If the internal representation is still UTF-8, consuming UTF-8 should
involve not converting. It is sounding like the internal representation has
been changed to ISO-8859-1 in order to support characters outside those in
US-ASCII. If it is indeed internally fixed to ISO-8859-1, then the question
of future support for non-Latin (e.g., Greek or Cyrillic) scripts arises.
It may be a better tradeoff to leave the internal representation as UTF-8
and restrict the support to the US-ASCII subset for now.

> For the same reason, this does not enable the consumption of
> non-UTF-8 encoded source files on other platforms.
>
Thanks Kai for clarifying. I think this direction leads to some questions
around testing.

The auto-conversion feature makes use of some filesystem-specific features
such as filetags that indicate the associated coded character set. In terms
of the testing environment on a z/OS system under USS, will there be
documentation or scripts available for establishing the necessary file
properties on the local tree? It also sounds like there would be some tests
that are specific to z/OS-hosted builds that test the conversion facilities.

Also, if the platform feature does not handle conversions of multi-byte
encodings, I am wondering if alternative mechanisms (such as iconv) have
been investigated. I suppose there is an issue over how source positions
are determined; however, I do not see how an extension of the
autoconversion facility would avoid the said issue.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200611/398a16cf/attachment.html>

Kai Peter Nacke via llvm-dev

2020-Jun-16 13:16 UTC

head link

[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang

Hubert Tong <hubert.reinterpretcast at gmail.com> wrote on 11.06.2020 
22:53:14:
> The intention is to use the auto-conversion feature from the
> language environment. Currently, this platform feature does not
> handle conversions of multi-byte encodings, so at this time
> consumption of UTF-8 encoded source files is not possible.
> If the internal representation is still UTF-8, consuming UTF-8 
> should involve not converting. It is sounding like the internal 
> representation has been changed to ISO-8859-1 in order to support 
> characters outside those in US-ASCII. If it is indeed internally 
> fixed to ISO-8859-1, then the question of future support for non-
> Latin (e.g., Greek or Cyrillic) scripts arises. It may be a better 
> tradeoff to leave the internal representation as UTF-8 and restrict 
> the support to the US-ASCII subset for now.
The intention is to initially restrict the support to the US-ASCII subset. 
This enables compiling with EBCDIC-encoding files and does not exclude 
further development for true UTF-8 support.
  > For the same reason, this does not enable the consumption of
> non-UTF-8 encoded source files on other platforms.
Yes, because a platform-specific feature is used, it does not enable 
reading of non-UTF-8 encoded files on other platforms.
> Thanks Kai for clarifying. I think this direction leads to some 
> questions around testing.
> 
> The auto-conversion feature makes use of some filesystem-specific 
> features such as filetags that indicate the associated coded 
> character set. In terms of the testing environment on a z/OS system 
> under USS, will there be documentation or scripts available for 
> establishing the necessary file properties on the local tree? It 
> also sounds like there would be some tests that are specific to z/
> OS-hosted builds that test the conversion facilities.
With a git clone under z/OS USS, the files get automatically tagged as 
Latin-1, requiring no further setup.
We also have some tests which tests the text conversion. Of course, this 
only runs on z/OS USS due to the use of the conversion service.
 > Also, if the platform feature does not handle conversions of multi-
> byte encodings, I am wondering if alternative mechanisms (such as 
> iconv) have been investigated. I suppose there is an issue over how 
> source positions are determined; however, I do not see how an 
> extension of the autoconversion facility would avoid the said issue.We have not yet investigated alternative mechanisms for converting file 
data. The first striking complexity is where to do the conversion. With 
the source locations identified, other conversion approaches are 
imaginable. Of course, converting on the fly poses some challenges, like 
the one you mentioned.

Best regards,
Kai Nacke
IT Architect

IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Agnes Heftberger, Norbert 
Janzen, Markus Koerner, Christian Noll, Nicole Reimer
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 14562 / WEEE-Reg.-Nr. DE 99369940

llvm dev - Jun 2020 - RFC: Adding support for the z/OS platform to LLVM and clang

[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang

[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang

[llvm-dev] RFC: Adding support for the z/OS platform to LLVM and clang