thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Michael Spencer via llvm-dev

2020-Jun-03 21:50 UTC

[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

On Tue, Jun 2, 2020 at 6:38 PM Richard Smith via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Tue, 2 Jun 2020 at 05:08, Andrzej Warzynski via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> *TL;DR*
>>
>> We propose some non-trivial refactoring in Clang and LLVM to enable
>> further work on Flang driver.
>>
>> *SUMMARY*
>> We would like to start extracting the driver/frontend code from Clang
>> (alongside the code that the driver/frontend depends on, e.g.
>> Diagnostics) and move the components that could be re-used by
>> non-C-based languages to LLVM. From our initial investigation we see
>> that these changes will impact many projects (upstream and downstream)
>> and will require big mechanical patches (our first attempt is
>> implemented in [8]). This is not ideal, but seems unavoidable in the
>> long-term. We would like to do this refactoring _before_ we start
>> implementing the Flang driver upstream (OPTION 1 below). This way we
>> avoid:
>>
>> * contaminating Clang with Fortran specific code (and vice versa)
>> * introducing dependency on Clang in Flang
>>
>> The downside is that the refactoring is likely to be disruptive for
many
>> projects that use Clang. We will try our best to minimise this.
>>
>> Does this approach make sense? Are there any preferred alternatives? At
>> this stage we'd like to discuss the overall direction. If folks are
in
>> favour, we'll send a separate RFC with a finer breakdown and more
>> technical details for the refactoring.
>>
>> Below you will find more context for our use-case (the Flang driver)
and
>> possible alternatives. We hope that this will help the discussion. We
>> would really appreciate your feedback!
>>
>
> Generally, I think this is a good idea, and a healthy direction for LLVM
> overall. We need to be careful to do this in a way that doesn't
introduce
> complexity or overheads in Clang, though, so we should proceed very
> cautiously.
>
> I also think you're skewing somewhat too far in favor of code reuse.
Some
> of the Clang code you're identifying below is very carefully tuned and
> tailored to Clang's use, and the amount of it that could reasonably be
> shared with another project (carefully tuned and tailored to that
project's
> use) is probably small, unless we heavily generalize it. Such
> generalization is likely not going to be worth its development and
> maintenance costs.
>
>
>> *BACKGROUND*
>> Flang (formerly known as F18) has recently been merged into LLVM [1].
>> Our ambition, as a community, is to make it as flexible, robust and
nice
>> to work with as Clang. One of the major items to address is the
>> implementation of a driver that would provide the flexibility and user
>> experience similar to that available in Clang. The F18/Flang driver was
>> already discussed on cfe-dev last year [2], but back then F18 (now llvm
>> project/flang) was a separate project. In the original proposal it was
>> assumed that initially Flang would depend (and extend where necessary)
>> Clang's driver/frontend code. Since F18/Flang was an independent
>> project, the refactoring of Clang/LLVM wasn't really considered.
That
>> design has been challenged since ([3], [10]), and also not much
progress
>> has been made. We would like to revisit that RFC from a slightly
>> different angle. Since Flang is now part of LLVM's monorepo, we
feel
>> that refactoring Clang/LLVM _before_ we upstream the driver makes a lot
>> of sense and is the natural first step.
>>
>> *ASSUMPTIONS & DESIGN GOALS*
>> 1. We will re-use as much of the Clang's driver/frontend code as
>> possible (this was previously proposed in [2]).
>>
>> 2. We want to avoid dependencies from Flang to Clang, both long-term
>> (strong requirement) and short-term (might be difficult to achieve).
>> This has recently come up in a discussion on one of our early patches
>> [3] (tl;dr Steve Scalpone, the code owner of Flang, would prefer us to
>> avoid this dependency), and was also suggested before by Eric
>> Christopher [10].
>>
>> 3. We will move the code that can be shared between Flang and Clang
(and
>> other projects) to LLVM. This idea has already come up on llvm-dev
>> before [7] (in a slightly different context, and to a slightly
different
>> extent). The methods that are not language specific would be shared in
>> an LLVM library.
>>
>> 4. The classes/types/methods that need specific changes for Fortran
will
>> be "copied" to Flang and adapted as needed. We should
minimize (or even
>> eliminate) any Fortran specific code from Clang and make sure that that
>> lives in llvm-project/flang.
>>
>> *FLANG'S DEPENDENCIES ON CLANG*
>> These are the dependencies on Clang that we have identified so far
while
>> prototyping the Flang driver.
>>
>> 1. All the machinery related to Diagnostics & SourceLocation.
>>
>> This is currently part of libclangBasic [4] and is used in _many_
places
>> in Clang. The official documentation [5] suggests that this could be
>> re-used for non-C-based languages. In particular, we feel that It would
>> make a lot of sense for Flang to use it. Also, separating Clang's
>> driver/frontend code and the diagnostics would require a lot of
>> refactoring for no real benefit (and we feel that Flang should re-use
>> Clang's driver/frontend code, see below). This dependency is used
in
>> many places, so moving it to LLVM will require a lot of (mostly)
>> mechanical changes. We can't see an obvious way to split it into
smaller
>> chunks (see also below where we discuss the impact).
>>
>
> I do not think it is necessarily going to be reasonable to move all
> machinery related to SourceLocation (in particular, all of clang's
> SourceManager) into LLVM. The ideas and data structure underpinning
> SourceLocation and SourceManager are quite general (a concatenated
> hierarchical slab of contiguous blocks, with linear indexing within those
> blocks), but the details are much more specific to Clang and the C-family
> languages it represents. Things like the support for object-like and
> function-like macros, macro arguments, #include, splitting >> tokens
for
> C++, and so on, all make sense for Clang, but probably make less sense for
> Fortran, where a different set of kinds of block would probably be desired
> instead. This isn't something that can be trivially generalized and
> extended, either; we carefully bit-pack various things into our block
> representations, and as a result, we're quite tightly fitted to the
needs
> of Clang, and would probably not want to move away from that position.
>
> However, I do think there is common infrastructure that can be extracted,
> with some significant work done to generalize the SourceManager
> infrastructure and make it tailorable to the needs of Clang and Flang (and
> any other consumers of it that might come along). I could imagine moving
> all of the complexity to do with what kinds of SLocEntry are supported into
> a traits type, and having a reusable template that can generate a data
> structure that the Clang and Flang SourceManagers can be implemented in
> terms of.
>
> Clang's SourceLocation is probably almost directly useable as-is -- it
has
> hardcoded assumptions about a particular bit being reserved to indicate a
> location within a C preprocessor macro, but we can move that to a static
> method on Clang SourceManager, and then I think SourceLocation can be
> directly shared between the two projects.
>
> (One big asterisk on the above: will Flang want an integrated C
> preprocessor? If so, then we're now talking about a much larger chunk
of
> Clang, including the lexer, preprocessor, identifier tables, the Token
> type, and it may be best to simply acknowledge that Flang has a dependency
> on Clang to supply all that, rather than moving it into LLVM.)
>
> The layers below SourceManager -- FileManager, the VFS, and so on -- all
> seem like they should be reasonable to share between projects.
>
> Some of the diagnostics engine seems reasonable to share: specifically,
> the tablegen-driven diagnostic table generation, most of the diagnostics
> engine (including support for diagnostics pragmas that change the set of
> warnings enabled at different source locations), and the formatting code
> for non-clang-specific types are all relatively reusable. If you want to
> reuse the TextDiagnosticPrinter, I think that will need some refactoring;
> it's currently tied into the specific needs of Clang's
SourceManager (for
> handling textual inclusion and macro expansion in the way that C-family
> languages deal with those things). I expect it would be possible to factor
> out an interface that Clang could implement to provide the necessary
> customizations.
>
> Before we factor out the diagnostics engine, we should fix the
> longstanding issue that it requires a global monolithic table covering all
> diagnostics, and is consequently unable to properly respect layering. I
> think this is very much fixable, but it requires someone to do the work to
> fix it :)
>
> Looking at your branch, I immediately see a few things there that are
> unacceptable changes: moving clang's TokenKinds.def, Specifiers.h, and
> OpenCLImageTypes.def into LLVM is not OK. But I assume you're aware of
that
> already. =)
>
>
>> 2. libclangFrontend & libclangDriver
>>
>> The Flang driver will use many methods from libClangDriver,
>> libClangFrontend and libClangFrontendTool. Driver.h and Compilation.h
>> from libClangDriver are responsible to call, pass the correct arguments
>> and execute the driver. TextDiagnosticPrinter.h takes care of printing
>> the driver diagnostics in case of errors.
>>
>> The Flang frontend will use CompilerInstance, CompilerInvocation,
>> FrontendOptions, FrontendActions and Utils from libClangFrontend and
>> libClangFrontendTool. These methods are responsible for translating the
>> command line arguments to frontend Options and later to Actions to be
>> executed by ExecuteCompilerInvocation. The translation from arguments
to
>> Actions happens with FrontendOption and FrontendActions. But it is the
>> CompilerInvocation that has the pointers for the sequence of Actions
>> that are required in a Compiler Instance. These methods are needed to
>> implement Flang driver/frontend and contain actions/method/functions
>> that seem to be language agnostic.
>>
>
> I think this is going too far in attempting to reuse Clang code.
> CompilerInvocation, for example, is almost exclusively dealing in parsing
> Clang's -cc1 flags, which I would expect to have very little overlap
with
> Flang's flags, and CompilerInstance exists (in part) to manage and own
all
> the Clang-specific global objects (the parser, sema, the module loader, the
> AST consumer). Flang should not be going anywhere near this stuff, and
> should be implementing its own frontend.
>
> There may be some clang-independent parts that can be factored out, but I
> would expect them to be small enough that we can address them on a
> case-by-case basis. The interesting thing to factor out is the parsing of
> command-line options, but that's already been done. I think your
approach
> here should be to assume as a baseline that you reuse none of clang's
> Frontend library, but if you find general pieces that can meaningfully be
> extracted, we can talk about those pieces in isolation.
>
I strongly agree here. It doesn't make sense for `CompilerInvocation` or
`CompilerInstance` to know anything about Fortran as they are entirely
about driving the Clang frontend (-cc1).

> For the driver, I think the picture is very different. It seems to me that
> we should only have one LLVM driver, that can build C-family languages,
> Fortran code, or both at the same time (or invoke lld etc). To that end, I
> think it would be reasonable to move clang's driver out to a separate
LLVM
> project (maybe that's llvm/, maybe it's somewhere new such as
driver/), and
> extend it to be able to invoke flang actions in addition to clang actions.
> Then the only difference between the clang and flang drivers would be which
> frontend is directly linked into the driver binary and which one is invoked
> by exec'ing a different binary. That would imply that all the parts of
> Clang that are depended on by the driver are also moved out (I think the
> main parts here are flags and diagnostics, and via the diagnostics layer,
> source locations).
>
> This will require some decoupling between the Clang driver and frontend
> (currently Clang's Options.td contains various driver options that are
> marked as also being options for Clang's -cc1 mode; duplicating those
in
> CC1Options.td is probably acceptable, if we're going to split the
driver
> and frontend into two different projects), and some shared support code
> (eg, clang's sanitizers list) will presumably end up in the driver,
because
> we don't want a driver -> *lang dependency.
>
Also strongly agree here. Large chunks of the driver will end up being the
same between Clang and Flang, but they should still be separate actions
having separate `FrontendOptions` and `FrontendActions`.

I feel this is a much larger refactoring than the current changeset and
description implies. I'm OK with the direction of Option 1, but am
concerned with the specific implementation details that have been described
so far. I'll feel much better when the concerns Richard expressed have been
addressed.

- Michael Spencer

>
>
>> *ALTERNATIVES*
>> This is a summary of the alternative ways of implementing the Flang
>> driver. We propose OPTION 1. If there are no major objections, we will
>> draft a separate RFC with more technical details (we will also break it
>> down into smaller pieces). Otherwise, what would be your preferred
>> alternative and why?
>>
>> OPTION 1
>> We avoid dependency on Clang from Day 1.
>>
>> This is the ideal scenario that would guarantee that Clang and Flang
are
>> completely separate and that the common bits stay in LLVM instead. It
>> would mean slower progress for us initially, but then other projects
>> could benefit from the refactoring sooner rather than later.
>>
>> OPTION 2
>> We avoid dependency on clangBasic from day 1, but initially allow
>> dependency on libClangFrontend & libClangDriver (or other libs
specific
>> to the driver/frontend).
>>
>> The dependency on libclang{Driver|Frontend} would gradually be
>> removed/refactored out as the driver for Flang gains momentum. As
>> mentioned earlier, there is plenty of code in libClangFrontend and
>> libClangDriver that we'd like to re-use, but the separation between
code
>> that's specific to C-based languages and generic driver/frontend
code is
>> not always obvious. We think that refactoring the common bits in
>> libClangFrontend and libClangDriver might simply be easier once:
>>
>>   * we have a Flang driver that leverages these libraries, and, as a
>> result,
>>   * we understand better what we could re-use and what's not that
>> relevant to non-C-based languages.
>>
>> OPTION 3
>> We initially keep the dependency on Clang and re-visit this RFC later.
>>
>> This would be the least disruptive approach (at least for the time
>> being) and would allow us to make us the most rapid progress (i.e. we
>> would be focusing on implementing the features rather than
refactoring).
>> It would also inform the future refactoring better. But it was already
>> pointed out that we should avoid dependencies on clang [3] and this
>> would be a step in the opposite direction. Also, the build requirements
>> for Flang would increase, and we feel that we should strive to reduce
>> them instead [6].
>>
>> If we missed any alternatives, please bring them up.
>>
>
> I don't think I can express an opinion without knowing whether you
intend
> for Flang to ever support an integrated C preprocessor. If not, then option
> 1 seems appropriate. But if so, then I think we have a choice between
> factoring out all of clang below the parser or just acknowledging that
> Flang depends on Clang for its lexical layer and deciding to keep a flang
> -> clang dependency forever.
>
>
>> *IMPACT ON OTHER PROJECTS*
>> The refactoring will have non-trivial impact on other projects:
>>
>> * OPTION 1 and OPTION 2 - huge impact initially.
>> * OPTION 3 - no impact initially, but most likely similar impact as
>> OPTION 1 and OPTION 2 in the long term.
>>
>>  From our initial investigation, extracting Diagnostics/SourceLocation
>> from clangBasic and moving it to LLVM will be the most impactful
change.
>> Within llvm-project it is used in clang, clang-tools-extra, lldb and
>> polly. Most of the changes will be mechanical, but will require
touching
>> many files. In order to get to a state where we could build libclang
>> using the newly defined LLVM library, we had to touch ~850 files and
>> make ~30k insertions/deletions. The result of this exercise is
available
>> in our development fork of llvm-project [8].
>>
>> Please note: our patches on GitHub [8] are just experiments to
>> illustrate the idea. It's work-in-progress that requires a lot of
>> polishing. When/if up-streaming this, we would need to do some
>> low-impact refactoring first. For example, currently ASTReader &
>> ASTWriter are `friends` with DiagnosticsEngine [9]. That won't be
>> possible when DiagnosticsEngine is moved to LLVM.
>>
>>
>> On behalf of the Arm Fortran Team,
>> Andrzej Warzynski
>>
>> REFERENCES
>>
>> [1]
>>
>>
https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
>> [2] http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
>> [3] https://reviews.llvm.org/D79092
>> [4]
>>
>>
https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
>> [5]
>>
https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
>> [6] http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
>> [7] http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
>> [8]
>>
>>
https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
>> [9]
>>
>>
https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
>> [10] https://reviews.llvm.org/D63607
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200603/d20edc80/attachment.html>

James Y Knight via llvm-dev

2020-Jun-03 22:24 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Extracting common code, especially the Driver code, from clang, so that it
can also be used for flang seems entirely reasonable as a high-level goal.

But I'd just like to enter another vote for it to live somewhere other than
under the "llvm" top-level-directory, e.g. "clang-common",
"frontend-support", or any other name people would like to paint that
bikeshed. :)

On Wed, Jun 3, 2020 at 5:50 PM Michael Spencer via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On Tue, Jun 2, 2020 at 6:38 PM Richard Smith via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Tue, 2 Jun 2020 at 05:08, Andrzej Warzynski via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> *TL;DR*
>>>
>>> We propose some non-trivial refactoring in Clang and LLVM to enable
>>> further work on Flang driver.
>>>
>>> *SUMMARY*
>>> We would like to start extracting the driver/frontend code from
Clang
>>> (alongside the code that the driver/frontend depends on, e.g.
>>> Diagnostics) and move the components that could be re-used by
>>> non-C-based languages to LLVM. From our initial investigation we
see
>>> that these changes will impact many projects (upstream and
downstream)
>>> and will require big mechanical patches (our first attempt is
>>> implemented in [8]). This is not ideal, but seems unavoidable in
the
>>> long-term. We would like to do this refactoring _before_ we start
>>> implementing the Flang driver upstream (OPTION 1 below). This way
we
>>> avoid:
>>>
>>> * contaminating Clang with Fortran specific code (and vice versa)
>>> * introducing dependency on Clang in Flang
>>>
>>> The downside is that the refactoring is likely to be disruptive for
many
>>> projects that use Clang. We will try our best to minimise this.
>>>
>>> Does this approach make sense? Are there any preferred
alternatives? At
>>> this stage we'd like to discuss the overall direction. If folks
are in
>>> favour, we'll send a separate RFC with a finer breakdown and
more
>>> technical details for the refactoring.
>>>
>>> Below you will find more context for our use-case (the Flang
driver) and
>>> possible alternatives. We hope that this will help the discussion.
We
>>> would really appreciate your feedback!
>>>
>>
>> Generally, I think this is a good idea, and a healthy direction for
LLVM
>> overall. We need to be careful to do this in a way that doesn't
introduce
>> complexity or overheads in Clang, though, so we should proceed very
>> cautiously.
>>
>> I also think you're skewing somewhat too far in favor of code
reuse. Some
>> of the Clang code you're identifying below is very carefully tuned
and
>> tailored to Clang's use, and the amount of it that could reasonably
be
>> shared with another project (carefully tuned and tailored to that
project's
>> use) is probably small, unless we heavily generalize it. Such
>> generalization is likely not going to be worth its development and
>> maintenance costs.
>>
>>
>>> *BACKGROUND*
>>> Flang (formerly known as F18) has recently been merged into LLVM
[1].
>>> Our ambition, as a community, is to make it as flexible, robust and
nice
>>> to work with as Clang. One of the major items to address is the
>>> implementation of a driver that would provide the flexibility and
user
>>> experience similar to that available in Clang. The F18/Flang driver
was
>>> already discussed on cfe-dev last year [2], but back then F18 (now
llvm
>>> project/flang) was a separate project. In the original proposal it
was
>>> assumed that initially Flang would depend (and extend where
necessary)
>>> Clang's driver/frontend code. Since F18/Flang was an
independent
>>> project, the refactoring of Clang/LLVM wasn't really
considered. That
>>> design has been challenged since ([3], [10]), and also not much
progress
>>> has been made. We would like to revisit that RFC from a slightly
>>> different angle. Since Flang is now part of LLVM's monorepo, we
feel
>>> that refactoring Clang/LLVM _before_ we upstream the driver makes a
lot
>>> of sense and is the natural first step.
>>>
>>> *ASSUMPTIONS & DESIGN GOALS*
>>> 1. We will re-use as much of the Clang's driver/frontend code
as
>>> possible (this was previously proposed in [2]).
>>>
>>> 2. We want to avoid dependencies from Flang to Clang, both
long-term
>>> (strong requirement) and short-term (might be difficult to
achieve).
>>> This has recently come up in a discussion on one of our early
patches
>>> [3] (tl;dr Steve Scalpone, the code owner of Flang, would prefer us
to
>>> avoid this dependency), and was also suggested before by Eric
>>> Christopher [10].
>>>
>>> 3. We will move the code that can be shared between Flang and Clang
(and
>>> other projects) to LLVM. This idea has already come up on llvm-dev
>>> before [7] (in a slightly different context, and to a slightly
different
>>> extent). The methods that are not language specific would be shared
in
>>> an LLVM library.
>>>
>>> 4. The classes/types/methods that need specific changes for Fortran
will
>>> be "copied" to Flang and adapted as needed. We should
minimize (or even
>>> eliminate) any Fortran specific code from Clang and make sure that
that
>>> lives in llvm-project/flang.
>>>
>>> *FLANG'S DEPENDENCIES ON CLANG*
>>> These are the dependencies on Clang that we have identified so far
while
>>> prototyping the Flang driver.
>>>
>>> 1. All the machinery related to Diagnostics & SourceLocation.
>>>
>>> This is currently part of libclangBasic [4] and is used in _many_
places
>>> in Clang. The official documentation [5] suggests that this could
be
>>> re-used for non-C-based languages. In particular, we feel that It
would
>>> make a lot of sense for Flang to use it. Also, separating
Clang's
>>> driver/frontend code and the diagnostics would require a lot of
>>> refactoring for no real benefit (and we feel that Flang should
re-use
>>> Clang's driver/frontend code, see below). This dependency is
used in
>>> many places, so moving it to LLVM will require a lot of (mostly)
>>> mechanical changes. We can't see an obvious way to split it
into smaller
>>> chunks (see also below where we discuss the impact).
>>>
>>
>> I do not think it is necessarily going to be reasonable to move all
>> machinery related to SourceLocation (in particular, all of clang's
>> SourceManager) into LLVM. The ideas and data structure underpinning
>> SourceLocation and SourceManager are quite general (a concatenated
>> hierarchical slab of contiguous blocks, with linear indexing within
those
>> blocks), but the details are much more specific to Clang and the
C-family
>> languages it represents. Things like the support for object-like and
>> function-like macros, macro arguments, #include, splitting >>
tokens for
>> C++, and so on, all make sense for Clang, but probably make less sense
for
>> Fortran, where a different set of kinds of block would probably be
desired
>> instead. This isn't something that can be trivially generalized and
>> extended, either; we carefully bit-pack various things into our block
>> representations, and as a result, we're quite tightly fitted to the
needs
>> of Clang, and would probably not want to move away from that position.
>>
>> However, I do think there is common infrastructure that can be
extracted,
>> with some significant work done to generalize the SourceManager
>> infrastructure and make it tailorable to the needs of Clang and Flang
(and
>> any other consumers of it that might come along). I could imagine
moving
>> all of the complexity to do with what kinds of SLocEntry are supported
into
>> a traits type, and having a reusable template that can generate a data
>> structure that the Clang and Flang SourceManagers can be implemented in
>> terms of.
>>
>> Clang's SourceLocation is probably almost directly useable as-is --
it
>> has hardcoded assumptions about a particular bit being reserved to
indicate
>> a location within a C preprocessor macro, but we can move that to a
static
>> method on Clang SourceManager, and then I think SourceLocation can be
>> directly shared between the two projects.
>>
>> (One big asterisk on the above: will Flang want an integrated C
>> preprocessor? If so, then we're now talking about a much larger
chunk of
>> Clang, including the lexer, preprocessor, identifier tables, the Token
>> type, and it may be best to simply acknowledge that Flang has a
dependency
>> on Clang to supply all that, rather than moving it into LLVM.)
>>
>> The layers below SourceManager -- FileManager, the VFS, and so on --
all
>> seem like they should be reasonable to share between projects.
>>
>> Some of the diagnostics engine seems reasonable to share: specifically,
>> the tablegen-driven diagnostic table generation, most of the
diagnostics
>> engine (including support for diagnostics pragmas that change the set
of
>> warnings enabled at different source locations), and the formatting
code
>> for non-clang-specific types are all relatively reusable. If you want
to
>> reuse the TextDiagnosticPrinter, I think that will need some
refactoring;
>> it's currently tied into the specific needs of Clang's
SourceManager (for
>> handling textual inclusion and macro expansion in the way that C-family
>> languages deal with those things). I expect it would be possible to
factor
>> out an interface that Clang could implement to provide the necessary
>> customizations.
>>
>> Before we factor out the diagnostics engine, we should fix the
>> longstanding issue that it requires a global monolithic table covering
all
>> diagnostics, and is consequently unable to properly respect layering. I
>> think this is very much fixable, but it requires someone to do the work
to
>> fix it :)
>>
>> Looking at your branch, I immediately see a few things there that are
>> unacceptable changes: moving clang's TokenKinds.def, Specifiers.h,
and
>> OpenCLImageTypes.def into LLVM is not OK. But I assume you're aware
of that
>> already. =)
>>
>>
>>> 2. libclangFrontend & libclangDriver
>>>
>>> The Flang driver will use many methods from libClangDriver,
>>> libClangFrontend and libClangFrontendTool. Driver.h and
Compilation.h
>>> from libClangDriver are responsible to call, pass the correct
arguments
>>> and execute the driver. TextDiagnosticPrinter.h takes care of
printing
>>> the driver diagnostics in case of errors.
>>>
>>> The Flang frontend will use CompilerInstance, CompilerInvocation,
>>> FrontendOptions, FrontendActions and Utils from libClangFrontend
and
>>> libClangFrontendTool. These methods are responsible for translating
the
>>> command line arguments to frontend Options and later to Actions to
be
>>> executed by ExecuteCompilerInvocation. The translation from
arguments to
>>> Actions happens with FrontendOption and FrontendActions. But it is
the
>>> CompilerInvocation that has the pointers for the sequence of
Actions
>>> that are required in a Compiler Instance. These methods are needed
to
>>> implement Flang driver/frontend and contain
actions/method/functions
>>> that seem to be language agnostic.
>>>
>>
>> I think this is going too far in attempting to reuse Clang code.
>> CompilerInvocation, for example, is almost exclusively dealing in
parsing
>> Clang's -cc1 flags, which I would expect to have very little
overlap with
>> Flang's flags, and CompilerInstance exists (in part) to manage and
own all
>> the Clang-specific global objects (the parser, sema, the module loader,
the
>> AST consumer). Flang should not be going anywhere near this stuff, and
>> should be implementing its own frontend.
>>
>> There may be some clang-independent parts that can be factored out, but
I
>> would expect them to be small enough that we can address them on a
>> case-by-case basis. The interesting thing to factor out is the parsing
of
>> command-line options, but that's already been done. I think your
approach
>> here should be to assume as a baseline that you reuse none of
clang's
>> Frontend library, but if you find general pieces that can meaningfully
be
>> extracted, we can talk about those pieces in isolation.
>>
>
> I strongly agree here. It doesn't make sense for `CompilerInvocation`
or
> `CompilerInstance` to know anything about Fortran as they are entirely
> about driving the Clang frontend (-cc1).
>
>
>> For the driver, I think the picture is very different. It seems to me
>> that we should only have one LLVM driver, that can build C-family
>> languages, Fortran code, or both at the same time (or invoke lld etc).
To
>> that end, I think it would be reasonable to move clang's driver out
to a
>> separate LLVM project (maybe that's llvm/, maybe it's somewhere
new such as
>> driver/), and extend it to be able to invoke flang actions in addition
to
>> clang actions. Then the only difference between the clang and flang
drivers
>> would be which frontend is directly linked into the driver binary and
which
>> one is invoked by exec'ing a different binary. That would imply
that all
>> the parts of Clang that are depended on by the driver are also moved
out (I
>> think the main parts here are flags and diagnostics, and via the
>> diagnostics layer, source locations).
>>
>> This will require some decoupling between the Clang driver and frontend
>> (currently Clang's Options.td contains various driver options that
are
>> marked as also being options for Clang's -cc1 mode; duplicating
those in
>> CC1Options.td is probably acceptable, if we're going to split the
driver
>> and frontend into two different projects), and some shared support code
>> (eg, clang's sanitizers list) will presumably end up in the driver,
because
>> we don't want a driver -> *lang dependency.
>>
>
> Also strongly agree here. Large chunks of the driver will end up being the
> same between Clang and Flang, but they should still be separate actions
> having separate `FrontendOptions` and `FrontendActions`.
>
> I feel this is a much larger refactoring than the current changeset and
> description implies. I'm OK with the direction of Option 1, but am
> concerned with the specific implementation details that have been described
> so far. I'll feel much better when the concerns Richard expressed have
been
> addressed.
>
> - Michael Spencer
>
>
>>
>>
>>> *ALTERNATIVES*
>>> This is a summary of the alternative ways of implementing the Flang
>>> driver. We propose OPTION 1. If there are no major objections, we
will
>>> draft a separate RFC with more technical details (we will also
break it
>>> down into smaller pieces). Otherwise, what would be your preferred
>>> alternative and why?
>>>
>>> OPTION 1
>>> We avoid dependency on Clang from Day 1.
>>>
>>> This is the ideal scenario that would guarantee that Clang and
Flang are
>>> completely separate and that the common bits stay in LLVM instead.
It
>>> would mean slower progress for us initially, but then other
projects
>>> could benefit from the refactoring sooner rather than later.
>>>
>>> OPTION 2
>>> We avoid dependency on clangBasic from day 1, but initially allow
>>> dependency on libClangFrontend & libClangDriver (or other libs
specific
>>> to the driver/frontend).
>>>
>>> The dependency on libclang{Driver|Frontend} would gradually be
>>> removed/refactored out as the driver for Flang gains momentum. As
>>> mentioned earlier, there is plenty of code in libClangFrontend and
>>> libClangDriver that we'd like to re-use, but the separation
between code
>>> that's specific to C-based languages and generic
driver/frontend code is
>>> not always obvious. We think that refactoring the common bits in
>>> libClangFrontend and libClangDriver might simply be easier once:
>>>
>>>   * we have a Flang driver that leverages these libraries, and, as
a
>>> result,
>>>   * we understand better what we could re-use and what's not
that
>>> relevant to non-C-based languages.
>>>
>>> OPTION 3
>>> We initially keep the dependency on Clang and re-visit this RFC
later.
>>>
>>> This would be the least disruptive approach (at least for the time
>>> being) and would allow us to make us the most rapid progress (i.e.
we
>>> would be focusing on implementing the features rather than
refactoring).
>>> It would also inform the future refactoring better. But it was
already
>>> pointed out that we should avoid dependencies on clang [3] and this
>>> would be a step in the opposite direction. Also, the build
requirements
>>> for Flang would increase, and we feel that we should strive to
reduce
>>> them instead [6].
>>>
>>> If we missed any alternatives, please bring them up.
>>>
>>
>> I don't think I can express an opinion without knowing whether you
intend
>> for Flang to ever support an integrated C preprocessor. If not, then
option
>> 1 seems appropriate. But if so, then I think we have a choice between
>> factoring out all of clang below the parser or just acknowledging that
>> Flang depends on Clang for its lexical layer and deciding to keep a
flang
>> -> clang dependency forever.
>>
>>
>>> *IMPACT ON OTHER PROJECTS*
>>> The refactoring will have non-trivial impact on other projects:
>>>
>>> * OPTION 1 and OPTION 2 - huge impact initially.
>>> * OPTION 3 - no impact initially, but most likely similar impact as
>>> OPTION 1 and OPTION 2 in the long term.
>>>
>>>  From our initial investigation, extracting
Diagnostics/SourceLocation
>>> from clangBasic and moving it to LLVM will be the most impactful
change.
>>> Within llvm-project it is used in clang, clang-tools-extra, lldb
and
>>> polly. Most of the changes will be mechanical, but will require
touching
>>> many files. In order to get to a state where we could build
libclang
>>> using the newly defined LLVM library, we had to touch ~850 files
and
>>> make ~30k insertions/deletions. The result of this exercise is
available
>>> in our development fork of llvm-project [8].
>>>
>>> Please note: our patches on GitHub [8] are just experiments to
>>> illustrate the idea. It's work-in-progress that requires a lot
of
>>> polishing. When/if up-streaming this, we would need to do some
>>> low-impact refactoring first. For example, currently ASTReader
&
>>> ASTWriter are `friends` with DiagnosticsEngine [9]. That won't
be
>>> possible when DiagnosticsEngine is moved to LLVM.
>>>
>>>
>>> On behalf of the Arm Fortran Team,
>>> Andrzej Warzynski
>>>
>>> REFERENCES
>>>
>>> [1]
>>>
>>>
https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
>>> [2] http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
>>> [3] https://reviews.llvm.org/D79092
>>> [4]
>>>
>>>
https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
>>> [5]
>>>
https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
>>> [6]
http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
>>> [7]
http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
>>> [8]
>>>
>>>
https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
>>> [9]
>>>
>>>
https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
>>> [10] https://reviews.llvm.org/D63607
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200603/1cc5615f/attachment-0001.html>

Andrzej Warzynski via llvm-dev

2020-Jun-09 15:12 UTC

head link

[llvm-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Thank you all for your feedback, this has been very helpful!

There have been a few important points raised by different people. Below 
I try to address them one by one. I hope that you don't mind me sending 
it in one email. Please let me know if I missed something!

*QUESTIONS RAISED IN PREVIOUS EMAILS*

On 02/06/2020 20:51, Eli Friedman wrote:
 > Separate from clang, LLVM itself actually has its own infrastructure 
for source locations and diagnostics, which is used by various tools 
that parse text.

We are only focusing on clang's infrastructure. LLVM's existing 
infrastructure remains unchanged.

On 02/06/2020 18:04, Chris Tetreault wrote:
 > I wonder if, instead of just putting it in llvm, it makes sense to 
have some sort of "llvm-frontend" subproject?

There seems to be preference to move this common-code into a separate 
project. Keep in mind that we already have llvm/lib/Frontend which was 
created recently for this kind of functionality (see [1] & [2], also 
CC'ed Johannes). The new library that we propose would be shared between 
subprojects (initially just clang and flang) and it feels that we tend 
to use LLVM for such things (e.g. FileCheck, LIT, bits of TableGen). Do 
we really want a separate subproject for the common Driver code?

Out of the proposed names I feel that 'frontend-support' would be the 
most accurate name (nothing clang, flang or llvm specific, just language 
agnostic code to share between various frontends, including some infra). 
That's the name I will be using below, though we still need to decide 
whether it's going to be a subdirectory within LLVM or a new subproject.

On 03/06/2020 01:37, Richard Smith wrote:
 >
 > Before we factor out the diagnostics engine, we should fix the
 > longstanding issue that it requires a global monolithic table covering
 > all diagnostics, and is consequently unable to properly respect layering.

I'm guessing that this could be as simple as moving various diagnostics 
to the corresponding namespaces (e.g. diagnostics from 
DiagnosticSemaKinds.td should live in the clang::sema namespace). Or is 
there more to it?

On 03/06/2020 01:37, Richard Smith wrote:
 > One big asterisk on the above: will Flang want an integrated C
 > preprocessor?

As S. Scalpone already mentioned, Flang has its own preprocessor. The 
idea of re-using Clang's C preprocessor is a bit out-of-scope for this 
RFC and we have not considered it. We definitely want to make sure that 
we don't create any unnecessary obstacles in case somebody decides to 
try this in the future.

On 03/06/2020 01:37, Richard Smith wrote:
 > Looking at your branch, I immediately see a few things there that are
 > unacceptable changes: moving clang's TokenKinds.def, Specifiers.h, and
 > OpenCLImageTypes.def into LLVM is not OK. But I assume you're aware of
 > that already. =)

Yes, one of a few shortcuts that I took. Just to clarify - the code 
that's specific to C-like-languages should stay in Clang and we are not 
suggesting otherwise. However, we will need to to work on decoupling 
between the driver (+ infra that it depends on) and the frontend code.

On 03/06/2020 22:50, Michael Spencer wrote:
 >
 > I feel this is a much larger refactoring than the current changeset and
 > description implies.

Yes! Apologies if I didn't emphasise this enough - there's a lot of 
refactoring that needs to happen before Clang and Flang can share the 
driver+infra code. The feedback so far has been helpful in identifying 
the bits that will be easier to implement rather than to refactor & 
re-use (i.e.  CompilerInvocation, CompilerInstance and the frontend 
driver overall). If there's more that we could or should re-implement 
then we are happy to take that approach. But we also want to minimise 
code duplication.

*NEXT STEPS*
Overall, there seems to be an agreement that it would be good to extract 
the driver code out of clang so that it can be shared with flang (and 
other future language frontends). Clang's frontend driver 
(CompilerInvocation, CompilerInstance and `clang -cc1` in general) 
should remain in Clang. Later we can decide that there are some 
commonalities that can be shared, but these are expected to be 
relatively small. In terms of the infra code, libclangDriver requires 
DiagnosticEngine, which contains SourceManager and also requires 
SourceLocation. This would suggest the following steps.

1a. Generalize SourceManager so that it's suitable for both Clang and Flang
1b. Move the updated SourceManager, together with SLocEntry, 
SourceLocation, FileManager, VFS to `frontend-support` (i.e. the new 
location for the common driver code)
2a. Split the diagnostics into separate namespaces so there's proper 
layering of various groups
2b. Move TableGen backend for diagnostics to `frontend-support`. The 
definitions specific to Clang remain in clang (e.g. 
DiagnosticSemaKinds.td), generic diagnostics are moved to 
`frontend-support` (e.g. DiagnosticDriverKinds.td)
3. Move DiagnosticsEngine to `frontend-support`. I suspect that we may 
want to move DiagnosticConsumer, DiagnosticBuilder, etc too.
4. Refactor libclangDriver (so that it can be used by both clang and 
flang), rename it as libLLVMFrontendDriver and move to `frontend-support`
5. Start upstreaming the Flang driver based on the refactored driver code

1{a|b} and 2{a|b} could happen in parallel, but need to precede 3, 4 and 
5. Would this overall direction make sense? If yes then I'd like to 
suggest moving more detailed discussions on 1{a|b}, 2{a|b} and 3 to 
separate RFCs. 4 & 5 should be relatively straightforward after 1, 2 &
3.

*OTHER FRONTENDS*
As mentioned earlier, I've reached out to Julia, Rust and Swift 
communities as well. They all have been very supportive, but overall 
this refactoring is unlikely to make the affected parts of Clang more 
appealing to them in terms of code re-use.

*ABI+MLIR*
Both ABI lowering and MLIR are very exciting, but at this stage a bit 
tangential to the Flang driver, which is our main motivation for this 
RFC. Another time, another RFC :)

Thank you,
-Andrzej

[1] http://lists.llvm.org/pipermail/llvm-dev/2019-November/137311.html
[2] 
https://github.com/llvm/llvm-project/commit/eb3e81f43f019cd90da87169aeff0eaddc4c9ecb


On 03/06/2020 23:24, James Y Knight via cfe-dev wrote:> Extracting common code, especially the Driver code, from clang, so that 
> it can also be used for flang seems entirely reasonable as a high-level 
> goal.
> 
> But I'd just like to enter another vote for it to live somewhere other 
> than under the "llvm" top-level-directory, e.g.
"clang-common",
> "frontend-support", or any other name people would like to paint
that
> bikeshed. :)
> 
> On Wed, Jun 3, 2020 at 5:50 PM Michael Spencer via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> 
> 
>     On Tue, Jun 2, 2020 at 6:38 PM Richard Smith via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> 
>         On Tue, 2 Jun 2020 at 05:08, Andrzej Warzynski via cfe-dev
>         <cfe-dev at lists.llvm.org <mailto:cfe-dev at
lists.llvm.org>> wrote:
> 
>             *TL;DR*
> 
>             We propose some non-trivial refactoring in Clang and LLVM to
>             enable
>             further work on Flang driver.
> 
>             *SUMMARY*
>             We would like to start extracting the driver/frontend code
>             from Clang
>             (alongside the code that the driver/frontend depends on, e.g.
>             Diagnostics) and move the components that could be re-used by
>             non-C-based languages to LLVM. From our initial
>             investigation we see
>             that these changes will impact many projects (upstream and
>             downstream)
>             and will require big mechanical patches (our first attempt is
>             implemented in [8]). This is not ideal, but seems
>             unavoidable in the
>             long-term. We would like to do this refactoring _before_ we
>             start
>             implementing the Flang driver upstream (OPTION 1 below).
>             This way we avoid:
> 
>             * contaminating Clang with Fortran specific code (and vice
>             versa)
>             * introducing dependency on Clang in Flang
> 
>             The downside is that the refactoring is likely to be
>             disruptive for many
>             projects that use Clang. We will try our best to minimise this.
> 
>             Does this approach make sense? Are there any preferred
>             alternatives? At
>             this stage we'd like to discuss the overall direction. If
>             folks are in
>             favour, we'll send a separate RFC with a finer breakdown
and
>             more
>             technical details for the refactoring.
> 
>             Below you will find more context for our use-case (the Flang
>             driver) and
>             possible alternatives. We hope that this will help the
>             discussion. We
>             would really appreciate your feedback!
> 
> 
>         Generally, I think this is a good idea, and a healthy direction
>         for LLVM overall. We need to be careful to do this in a way that
>         doesn't introduce complexity or overheads in Clang, though, so
>         we should proceed very cautiously.
> 
>         I also think you're skewing somewhat too far in favor of code
>         reuse. Some of the Clang code you're identifying below is very
>         carefully tuned and tailored to Clang's use, and the amount of
>         it that could reasonably be shared with another project
>         (carefully tuned and tailored to that project's use) is
probably
>         small, unless we heavily generalize it. Such generalization is
>         likely not going to be worth its development and maintenance costs.
> 
>             *BACKGROUND*
>             Flang (formerly known as F18) has recently been merged into
>             LLVM [1].
>             Our ambition, as a community, is to make it as flexible,
>             robust and nice
>             to work with as Clang. One of the major items to address is the
>             implementation of a driver that would provide the
>             flexibility and user
>             experience similar to that available in Clang. The F18/Flang
>             driver was
>             already discussed on cfe-dev last year [2], but back then
>             F18 (now llvm
>             project/flang) was a separate project. In the original
>             proposal it was
>             assumed that initially Flang would depend (and extend where
>             necessary)
>             Clang's driver/frontend code. Since F18/Flang was an
>             independent
>             project, the refactoring of Clang/LLVM wasn't really
>             considered. That
>             design has been challenged since ([3], [10]), and also not
>             much progress
>             has been made. We would like to revisit that RFC from a
>             slightly
>             different angle. Since Flang is now part of LLVM's
monorepo,
>             we feel
>             that refactoring Clang/LLVM _before_ we upstream the driver
>             makes a lot
>             of sense and is the natural first step.
> 
>             *ASSUMPTIONS & DESIGN GOALS*
>             1. We will re-use as much of the Clang's driver/frontend
>             code as
>             possible (this was previously proposed in [2]).
> 
>             2. We want to avoid dependencies from Flang to Clang, both
>             long-term
>             (strong requirement) and short-term (might be difficult to
>             achieve).
>             This has recently come up in a discussion on one of our
>             early patches
>             [3] (tl;dr Steve Scalpone, the code owner of Flang, would
>             prefer us to
>             avoid this dependency), and was also suggested before by Eric
>             Christopher [10].
> 
>             3. We will move the code that can be shared between Flang
>             and Clang (and
>             other projects) to LLVM. This idea has already come up on
>             llvm-dev
>             before [7] (in a slightly different context, and to a
>             slightly different
>             extent). The methods that are not language specific would be
>             shared in
>             an LLVM library.
> 
>             4. The classes/types/methods that need specific changes for
>             Fortran will
>             be "copied" to Flang and adapted as needed. We should
>             minimize (or even
>             eliminate) any Fortran specific code from Clang and make
>             sure that that
>             lives in llvm-project/flang.
> 
>             *FLANG'S DEPENDENCIES ON CLANG*
>             These are the dependencies on Clang that we have identified
>             so far while
>             prototyping the Flang driver.
> 
>             1. All the machinery related to Diagnostics &
SourceLocation.
> 
>             This is currently part of libclangBasic [4] and is used in
>             _many_ places
>             in Clang. The official documentation [5] suggests that this
>             could be
>             re-used for non-C-based languages. In particular, we feel
>             that It would
>             make a lot of sense for Flang to use it. Also, separating
>             Clang's
>             driver/frontend code and the diagnostics would require a lot of
>             refactoring for no real benefit (and we feel that Flang
>             should re-use
>             Clang's driver/frontend code, see below). This dependency
is
>             used in
>             many places, so moving it to LLVM will require a lot of
>             (mostly)
>             mechanical changes. We can't see an obvious way to split it
>             into smaller
>             chunks (see also below where we discuss the impact).
> 
> 
>         I do not think it is necessarily going to be reasonable to move
>         all machinery related to SourceLocation (in particular, all of
>         clang's SourceManager) into LLVM. The ideas and data structure
>         underpinning SourceLocation and SourceManager are quite general
>         (a concatenated hierarchical slab of contiguous blocks, with
>         linear indexing within those blocks), but the details are much
>         more specific to Clang and the C-family languages it represents.
>         Things like the support for object-like and function-like
>         macros, macro arguments, #include, splitting >> tokens for
C++,
>         and so on, all make sense for Clang, but probably make less
>         sense for Fortran, where a different set of kinds of block would
>         probably be desired instead. This isn't something that can be
>         trivially generalized and extended, either; we carefully
>         bit-pack various things into our block representations, and as a
>         result, we're quite tightly fitted to the needs of Clang, and
>         would probably not want to move away from that position.
> 
>         However, I do think there is common infrastructure that can be
>         extracted, with some significant work done to generalize the
>         SourceManager infrastructure and make it tailorable to the needs
>         of Clang and Flang (and any other consumers of it that might
>         come along). I could imagine moving all of the complexity to do
>         with what kinds of SLocEntry are supported into a traits type,
>         and having a reusable template that can generate a data
>         structure that the Clang and Flang SourceManagers can be
>         implemented in terms of.
> 
>         Clang's SourceLocation is probably almost directly useable
as-is
>         -- it has hardcoded assumptions about a particular bit being
>         reserved to indicate a location within a C preprocessor macro,
>         but we can move that to a static method on Clang SourceManager,
>         and then I think SourceLocation can be directly shared between
>         the two projects.
> 
>         (One big asterisk on the above: will Flang want an integrated C
>         preprocessor? If so, then we're now talking about a much larger
>         chunk of Clang, including the lexer, preprocessor, identifier
>         tables, the Token type, and it may be best to simply acknowledge
>         that Flang has a dependency on Clang to supply all that, rather
>         than moving it into LLVM.)
> 
>         The layers below SourceManager -- FileManager, the VFS, and so
>         on -- all seem like they should be reasonable to share between
>         projects.
> 
>         Some of the diagnostics engine seems reasonable to share:
>         specifically, the tablegen-driven diagnostic table generation,
>         most of the diagnostics engine (including support for
>         diagnostics pragmas that change the set of warnings enabled at
>         different source locations), and the formatting code for
>         non-clang-specific types are all relatively reusable. If you
>         want to reuse the TextDiagnosticPrinter, I think that will need
>         some refactoring; it's currently tied into the specific needs
of
>         Clang's SourceManager (for handling textual inclusion and macro
>         expansion in the way that C-family languages deal with those
>         things). I expect it would be possible to factor out an
>         interface that Clang could implement to provide the necessary
>         customizations.
> 
>         Before we factor out the diagnostics engine, we should fix the
>         longstanding issue that it requires a global monolithic table
>         covering all diagnostics, and is consequently unable to properly
>         respect layering. I think this is very much fixable, but it
>         requires someone to do the work to fix it :)
> 
>         Looking at your branch, I immediately see a few things there
>         that are unacceptable changes: moving clang's TokenKinds.def,
>         Specifiers.h, and OpenCLImageTypes.def into LLVM is not OK. But
>         I assume you're aware of that already. =)
> 
>             2. libclangFrontend & libclangDriver
> 
>             The Flang driver will use many methods from libClangDriver,
>             libClangFrontend and libClangFrontendTool. Driver.h and
>             Compilation.h
>             from libClangDriver are responsible to call, pass the
>             correct arguments
>             and execute the driver. TextDiagnosticPrinter.h takes care
>             of printing
>             the driver diagnostics in case of errors.
> 
>             The Flang frontend will use CompilerInstance,
>             CompilerInvocation,
>             FrontendOptions, FrontendActions and Utils from
>             libClangFrontend and
>             libClangFrontendTool. These methods are responsible for
>             translating the
>             command line arguments to frontend Options and later to
>             Actions to be
>             executed by ExecuteCompilerInvocation. The translation from
>             arguments to
>             Actions happens with FrontendOption and FrontendActions. But
>             it is the
>             CompilerInvocation that has the pointers for the sequence of
>             Actions
>             that are required in a Compiler Instance. These methods are
>             needed to
>             implement Flang driver/frontend and contain
>             actions/method/functions
>             that seem to be language agnostic.
> 
> 
>         I think this is going too far in attempting to reuse Clang code.
>         CompilerInvocation, for example, is almost exclusively dealing
>         in parsing Clang's -cc1 flags, which I would expect to have
very
>         little overlap with Flang's flags, and CompilerInstance exists
>         (in part) to manage and own all the Clang-specific global
>         objects (the parser, sema, the module loader, the AST consumer).
>         Flang should not be going anywhere near this stuff, and should
>         be implementing its own frontend.
> 
>         There may be some clang-independent parts that can be factored
>         out, but I would expect them to be small enough that we can
>         address them on a case-by-case basis. The interesting thing to
>         factor out is the parsing of command-line options, but that's
>         already been done. I think your approach here should be to
>         assume as a baseline that you reuse none of clang's Frontend
>         library, but if you find general pieces that can meaningfully be
>         extracted, we can talk about those pieces in isolation.
> 
> 
>     I strongly agree here. It doesn't make sense for
>     `CompilerInvocation` or `CompilerInstance` to know anything about
>     Fortran as they are entirely about driving the Clang frontend (-cc1).
> 
>         For the driver, I think the picture is very different. It seems
>         to me that we should only have one LLVM driver, that can build
>         C-family languages, Fortran code, or both at the same time (or
>         invoke lld etc). To that end, I think it would be reasonable to
>         move clang's driver out to a separate LLVM project (maybe
that's
>         llvm/, maybe it's somewhere new such as driver/), and extend it
>         to be able to invoke flang actions in addition to clang actions.
>         Then the only difference between the clang and flang drivers
>         would be which frontend is directly linked into the driver
>         binary and which one is invoked by exec'ing a different binary.
>         That would imply that all the parts of Clang that are depended
>         on by the driver are also moved out (I think the main parts here
>         are flags and diagnostics, and via the diagnostics layer, source
>         locations).
> 
>         This will require some decoupling between the Clang driver and
>         frontend (currently Clang's Options.td contains various driver
>         options that are marked as also being options for Clang's -cc1
>         mode; duplicating those in CC1Options.td is probably acceptable,
>         if we're going to split the driver and frontend into two
>         different projects), and some shared support code (eg, clang's
>         sanitizers list) will presumably end up in the driver, because
>         we don't want a driver -> *lang dependency.
> 
> 
>     Also strongly agree here. Large chunks of the driver will end up
>     being the same between Clang and Flang, but they should still be
>     separate actions having separate `FrontendOptions` and
>     `FrontendActions`.
> 
>     I feel this is a much larger refactoring than the current changeset
>     and description implies. I'm OK with the direction of Option 1, but
>     am concerned with the specific implementation details that have been
>     described so far. I'll feel much better when the concerns
>     Richard expressed have been addressed.
> 
>     - Michael Spencer
> 
>             *ALTERNATIVES*
>             This is a summary of the alternative ways of implementing
>             the Flang
>             driver. We propose OPTION 1. If there are no major
>             objections, we will
>             draft a separate RFC with more technical details (we will
>             also break it
>             down into smaller pieces). Otherwise, what would be your
>             preferred
>             alternative and why?
> 
>             OPTION 1
>             We avoid dependency on Clang from Day 1.
> 
>             This is the ideal scenario that would guarantee that Clang
>             and Flang are
>             completely separate and that the common bits stay in LLVM
>             instead. It
>             would mean slower progress for us initially, but then other
>             projects
>             could benefit from the refactoring sooner rather than later.
> 
>             OPTION 2
>             We avoid dependency on clangBasic from day 1, but initially
>             allow
>             dependency on libClangFrontend & libClangDriver (or other
>             libs specific
>             to the driver/frontend).
> 
>             The dependency on libclang{Driver|Frontend} would gradually be
>             removed/refactored out as the driver for Flang gains
>             momentum. As
>             mentioned earlier, there is plenty of code in
>             libClangFrontend and
>             libClangDriver that we'd like to re-use, but the separation
>             between code
>             that's specific to C-based languages and generic
>             driver/frontend code is
>             not always obvious. We think that refactoring the common
>             bits in
>             libClangFrontend and libClangDriver might simply be easier
once:
> 
>                * we have a Flang driver that leverages these libraries,
>             and, as a result,
>                * we understand better what we could re-use and what's
>             not that
>             relevant to non-C-based languages.
> 
>             OPTION 3
>             We initially keep the dependency on Clang and re-visit this
>             RFC later.
> 
>             This would be the least disruptive approach (at least for
>             the time
>             being) and would allow us to make us the most rapid progress
>             (i.e. we
>             would be focusing on implementing the features rather than
>             refactoring).
>             It would also inform the future refactoring better. But it
>             was already
>             pointed out that we should avoid dependencies on clang [3]
>             and this
>             would be a step in the opposite direction. Also, the build
>             requirements
>             for Flang would increase, and we feel that we should strive
>             to reduce
>             them instead [6].
> 
>             If we missed any alternatives, please bring them up.
> 
> 
>         I don't think I can express an opinion without knowing whether
>         you intend for Flang to ever support an integrated C
>         preprocessor. If not, then option 1 seems appropriate. But if
>         so, then I think we have a choice between factoring out all of
>         clang below the parser or just acknowledging that Flang depends
>         on Clang for its lexical layer and deciding to keep a flang ->
>         clang dependency forever.
> 
>             *IMPACT ON OTHER PROJECTS*
>             The refactoring will have non-trivial impact on other projects:
> 
>             * OPTION 1 and OPTION 2 - huge impact initially.
>             * OPTION 3 - no impact initially, but most likely similar
>             impact as
>             OPTION 1 and OPTION 2 in the long term.
> 
>               From our initial investigation, extracting
>             Diagnostics/SourceLocation
>             from clangBasic and moving it to LLVM will be the most
>             impactful change.
>             Within llvm-project it is used in clang, clang-tools-extra,
>             lldb and
>             polly. Most of the changes will be mechanical, but will
>             require touching
>             many files. In order to get to a state where we could build
>             libclang
>             using the newly defined LLVM library, we had to touch ~850
>             files and
>             make ~30k insertions/deletions. The result of this exercise
>             is available
>             in our development fork of llvm-project [8].
> 
>             Please note: our patches on GitHub [8] are just experiments to
>             illustrate the idea. It's work-in-progress that requires a
>             lot of
>             polishing. When/if up-streaming this, we would need to do some
>             low-impact refactoring first. For example, currently
>             ASTReader &
>             ASTWriter are `friends` with DiagnosticsEngine [9]. That
>             won't be
>             possible when DiagnosticsEngine is moved to LLVM.
> 
> 
>             On behalf of the Arm Fortran Team,
>             Andrzej Warzynski
> 
>             REFERENCES
> 
>             [1]
>            
https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
>             [2]
>             http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
>             [3] https://reviews.llvm.org/D79092
>             [4]
>            
https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
>             [5]
>            
https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
>             [6]
>            
http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
>             [7]
>            
http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
>             [8]
>            
https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
>             [9]
>            
https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
>             [10] https://reviews.llvm.org/D63607
>             _______________________________________________
>             cfe-dev mailing list
>             cfe-dev at lists.llvm.org <mailto:cfe-dev at
lists.llvm.org>
>             https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 
>         _______________________________________________
>         LLVM Developers mailing list
>         llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

llvm dev - Jun 2020 - [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

[llvm-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM