thr3ads.net - llvm dev - [llvm-dev] New pass manager for optimization pipeline status and questions [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Alina Sbirlea via llvm-dev

2020-Jul-28 17:23 UTC

[llvm-dev] New pass manager for optimization pipeline status and questions

On Fri, Jul 24, 2020 at 12:54 PM Sjoerd Meijer <Sjoerd.Meijer at arm.com>
wrote:
> Hi Alina,
>
> I think this is an excellent direction, this is the direction we should
> take here.  Just a somewhat irrelevant disagreement on this though:
>
> > Philip's point is spot on that we are deficient now in the testing
of
> the LegacyPassManager,
>
> I disagree because the LPM is still the default and I appreciated Hans'
> reply: "Defaults tend to be popular". But this is the direction I
like:
>
> > This means prioritizing those blockers over other LLVM work. The
current
> umbrella bug is PR46649
<https://bugs.llvm.org/show_bug.cgi?id=46649>.
>
> Just checking: do you accept both performance and code-size regressions as
> blockers here?
>
Yes, I think big performance and code-size regressions need to be
investigated. It will be hard to quantify what "big" is though, but we
should definitely track such regressions.
Due to the time-boxed approach we may need to discuss moving forward with
the switch with some regressions deemed acceptable, but those considered
blockers should be prioritized of course.

>
> > My point is that we want and should work with users to make the
> transition smooth, but we do very much need user (meaning companies using
> LLVM) involvement here in order to not delay the switch further.
>
> That's clear, and agreed.
> I would like to remark here that currently, when a commit regresses one
> benchmark that is important for someone, that is enough justification most
> of the time for a revert of that commit. That's why I surprised that it
> looked like we were not setting code-quality goals and requirements before
> switching. And what I would like to ask here is to provide reasonable
> enough time for people to look into switching to the NPM, to evaluate this,
> and then file bugs under PR46649. Just collecting data, evaluating
> problems, filings bugs can already time-consuming, and then I guess they
> need fixing too. This also needs to fit in people's plans right now.
>
> But it sounds reasonable to me that this is time-boxed. Given that
> switching is quite some work I think, switching before the clang-12 release
> would be unreasonable, and if clang-13 is in half a year from now, that
> already sounds perhaps somewhat reasonable, but might be tight.
>
I think it's reasonable to add a firm switch before clang-13 and before the
end of this year, with intermediary milestones (e.g. filing blockers for
user regressions in the next 1-2 months).
I'm inclined to favor a tighter deadline, the motivation here being to
ensure that working on potential blockers is prioritized with plenty of
time to spare, so the switch remains time-boxed.

Best,
Alina

> Thanks,
> Sjoerd.
>
> ------------------------------
> *From:* Alina Sbirlea <asbirlea at google.com>
> *Sent:* 24 July 2020 19:51
> *To:* Sjoerd Meijer <Sjoerd.Meijer at arm.com>
> *Cc:* Philip Reames <listmail at philipreames.com>; Chandler Carruth
<
> chandlerc at gmail.com>; Eric Christopher <echristo at gmail.com>;
llvm-dev <
> llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] New pass manager for optimization pipeline
> status and questions
>
> Hi all,
>
> The current plan is to prioritize enabling the NPM as soon as possible,
> and that includes addressing any blockers that are known or arise. This
> means prioritizing those blockers over other LLVM work. The current
> umbrella bug is PR46649
<https://bugs.llvm.org/show_bug.cgi?id=46649>.
>
> Philip's point is spot on that we are deficient now in the testing of
the
> LegacyPassManager, because so many have already made the switch (FWIW
> Google switched more than 2 years ago).
>
> It's not constructive for the LLVM community to just flip the switch
and
> break current LPM users. The purpose of these communications to llvm-dev
> and the bug tracking is to be informative as to the planned direction and
> make as quick of a progress as possible.
> Please keep in mind that the work on the NPM has been going on for many
> years and many customers have switched years ago, and delaying this for
> even an additional year is not acceptable for the code health and stability
> of LLVM.
>
> My point is that we want and should work with users to make the transition
> smooth, but we do very much need user (meaning companies using LLVM)
> involvement here in order to not delay the switch further.
>
> Best,
> Alina
>
>
> On Thu, Jul 23, 2020 at 2:59 AM Sjoerd Meijer <Sjoerd.Meijer at
arm.com>
> wrote:
>
> I am not in favour of just flipping the switch and then deal with all the
> fall-out, because we see major regressions that would be unacceptable for
> our users. Thus, not only would this be very disruptive, also our releases
> are based on a certain trunk versions, so we would need to revert back to
> the legacy PM downstream and thus diverge from upstream which wouldn't
be
> ideal for us. About the regressions, see the message/thread that I kicked
> off earlier (
> http://lists.llvm.org/pipermail/llvm-dev/2020-July/143646.html) which was
> quickly followed up by this thread.
>
> I would like to see here if we are interesting in defining a few criteria
> that must be met before we switch:
>
>    1. Correctness, which obviously always must come first: looks like
>    this is covered by bots that are running with the NPM, and by downstream
>    users. From the latest messages, I am getting we are there, or nearly
there.
>    2. Performance (i.e. optimising for speed),
>    3. Code-size.
>
> With 1) correctness box covered and ticked, is now the time to look at
> codegen quality: 2) performance and 3) code-size? Would it be reasonable
> that we create a plan or timeline to address this, and thus allow time that
> these issues can be addressed?
>
> We are now ready to start tuning the NPM for code-size. Perhaps we are
> late to the NPM party (but that was a priority and bandwidth issue), but
> perhaps with correctness fixed this is actually the right time. I only ran
> numbers for code-size, and haven't even looked at performance numbers
yet,
> which we would also need to do and takes time.
>
> Cheers,
> Sjoerd.
>
>
> ------------------------------
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of
Eric
> Christopher via llvm-dev <llvm-dev at lists.llvm.org>
> *Sent:* 23 July 2020 02:05
> *To:* Philip Reames <listmail at philipreames.com>; Alina Sbirlea
<
> asbirlea at google.com>; Chandler Carruth <chandlerc at gmail.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] New pass manager for optimization pipeline
> status and questions
>
> FWIW I'm in favor of this direction while making sure that we keep
focus
> on removing the vestiges of the old pass manager for the code health impact
> to the project.
>
> -eric
>
> On Wed, Jul 22, 2020 at 3:15 PM Philip Reames via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> (I'm probably going to derail your thread, sorry about that.)
>
> I think at this point, we should just bite the bullet and make the switch
> to NPM by default for Clang's optimization pipeline.  Today.
>
> Why?  Because many of our downstream consumers have already switched.
> Google has.  We (Azul) have.  I think I've heard the same for a couple
> other major contributors.  Why does this matter?  Testing.  At the current
> moment, the vast majority of testing the project gets is exercising NPM,
> not LPM.
>
> NPM is functionally complete for Clang optimization.  There might be a few
> missing cases around the sanitizers, but last I heard those were on the
> edge of being fixed.
>
> I think we should make the switch, and deal with any fall out as
> regressions.  If we made the change immediately after a release branch,
> we'd have several months to address any major issues before the next
> release.
>
> Philip
> On 7/22/20 2:39 PM, Arthur Eubanks via llvm-dev wrote:
>
> Hi all,
>
> I wanted to give a quick update on the status of NPM for the IR
> optimization pipeline and ask some questions.
>
> In the past I believe there were thoughts that NPM was basically ready
> because all of check-llvm and check-clang passed when
> -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=ON was specified. But that CMake
> flag did not apply to opt and any tests running something like `opt
> -foo-pass -bar-pass` (which is the vast majority of check-llvm tests) were
> still using the legacy PM. The intended way to use NPM was to use the
> -passes flag, e.g. `opt -passes='foo,bar'`.
>
> I've added a -enable-new-pm flag to opt to force running NPM passes
even
> when `opt -foo-pass` is used. This is because I didn't want to go
through
> every single test and figure out which ones should be using both -foo-pass
> and -passes=foo. Switching on -enable-new-pm currently leads to ~1800
> check-llvm failures. I've documented the failed tests count per
directory
> in https://bugs.llvm.org/show_bug.cgi?id=46651 (some have been fixed
> since that was posted).
>
> This has led to real bugs in NPM being discovered and fixed (e.g. some
> optnone issues).
>
> But a large portion of the remaining failures are because codegen-only
> passes haven't been ported to NPM yet. That's fine for the
optimization
> pipeline NPM transition since it doesn't affect the optimization
pipeline,
> but it does present an issue with the approach of the -enable-new-pm flag
> (which would by default become true alongside the NPM transition). Lots of
> tests are testing codegen-specific passes via opt (e.g. `opt
> -amdgpu-lower-intrinsics`) and they can't use NPM (yet).
>
> I was thinking either we have a way of identifying codegen-only passes and
> revert back to the legacy PM in opt whenever we see one, or we go back to
> considering the originally intended approach of adding an equivalent
> `-passes=` RUN to all tests that should be also running under NPM.
>
> I'm not sure of a nice and clean solution to identify codegen-only
passes.
> We could go and update every instance of INITIALIZE_PASS to take another
> parameter indicating if it's codegen-only. Or we could just have a
central
> list somewhere where we check if the pass is in some hardcoded list or has
> some prefix (e.g. "x86-").
>
> The approach of adding equivalent `-passes=` RUN lines to all relevant
> tests seems daunting, but not exactly sure how daunting. Maybe it's
> possible to script something and see what fails? We'd still need some
way
> to identify codegen-only passes to make sure we don't miss anything,
and
> we'd need to distinguish between analyses and normal passes. Also, it
would
> slow down test execution since we'd run a lot more tests twice, but
maybe
> that's not such a big deal? Maybe it's good to have most tests
running
> against the legacy PM even when NPM is on by default?
>
> Thoughts?
>
> This is split off from
> http://lists.llvm.org/pipermail/llvm-dev/2020-July/143395.html.
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200728/7808a2c4/attachment-0001.html>

Sjoerd Meijer via llvm-dev

2020-Jul-29 10:37 UTC

head link

[llvm-dev] New pass manager for optimization pipeline status and questions

Hi,
> Yes, I think big performance and code-size regressions need to be
investigated. It will be hard to quantify what "big" is though, but we
should definitely track such regressions.
> Due to the time-boxed approach we may need to discuss moving forward with
the switch with some regressions deemed acceptable, but those considered
blockers should be prioritized of course.
Agreed, SGTM.
> I think it's reasonable to add a firm switch before clang-13 and before
the end of this year, with intermediary milestones (e.g. filing  blockers for
user regressions in the next 1-2 months).
> I'm inclined to favor a tighter deadline, the motivation here being to
ensure that working on potential blockers is prioritized with plenty of time to
spare, so the switch remains time-boxed.
I am not entirely sure yet about the tighter deadline, but I am optimistic
things might start falling into place soon. Let me explain.

I think you've noticed the code-size issue that I raised. Switching before
that being fixed would be a real non-starter for us, but if there's
consensus the community want to switch then we'll deal with that of course.
Because (some) of these regression are so big, I am hoping we are missing
something obvious and that with a few things fixed we solve most of these cases,
which is why I am optimistic, and also because of next point:

After code-size, I'm now looking at performance, and things look a lot
better there for us. I do see some up and down behaviour: some wins in some
benchmarks overall, but also some regressions. I will need to do some more
homework here, as I need to check if small downstream divergence play a role
here, and need to eliminate that. In some benchmarks where we win overall, there
are a few cases that we really don't want to regress, so I need to look into
this. I think these are problems that we need to fix, but I will start raising
issues for them just for visibility. Overall, there are less to none
non-starters in this area I think.

But because these are time-consuming exercises, and it's the summer months,
I am getting slightly nervous about the tight(er) deadline. :-)

Cheers,
Sjoerd.

________________________________
From: Alina Sbirlea <asbirlea at google.com>
Sent: 28 July 2020 18:23
To: Sjoerd Meijer <Sjoerd.Meijer at arm.com>
Cc: Philip Reames <listmail at philipreames.com>; Chandler Carruth
<chandlerc at gmail.com>; Eric Christopher <echristo at gmail.com>;
llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] New pass manager for optimization pipeline status and
questions

On Fri, Jul 24, 2020 at 12:54 PM Sjoerd Meijer <Sjoerd.Meijer at
arm.com<mailto:Sjoerd.Meijer at arm.com>> wrote:
Hi Alina,

I think this is an excellent direction, this is the direction we should take
here.  Just a somewhat irrelevant disagreement on this though:
> Philip's point is spot on that we are deficient now in the testing of
the LegacyPassManager,
I disagree because the LPM is still the default and I appreciated Hans'
reply: "Defaults tend to be popular". But this is the direction I
like:
> This means prioritizing those blockers over other LLVM work. The current
umbrella bug is PR46649<https://bugs.llvm.org/show_bug.cgi?id=46649>.
Just checking: do you accept both performance and code-size regressions as
blockers here?

Yes, I think big performance and code-size regressions need to be investigated.
It will be hard to quantify what "big" is though, but we should
definitely track such regressions.
Due to the time-boxed approach we may need to discuss moving forward with the
switch with some regressions deemed acceptable, but those considered blockers
should be prioritized of course.

> My point is that we want and should work with users to make the transition
smooth, but we do very much need user (meaning companies using LLVM) involvement
here in order to not delay the switch further.
That's clear, and agreed.
I would like to remark here that currently, when a commit regresses one
benchmark that is important for someone, that is enough justification most of
the time for a revert of that commit. That's why I surprised that it looked
like we were not setting code-quality goals and requirements before switching.
And what I would like to ask here is to provide reasonable enough time for
people to look into switching to the NPM, to evaluate this, and then file bugs
under PR46649. Just collecting data, evaluating problems, filings bugs can
already time-consuming, and then I guess they need fixing too. This also needs
to fit in people's plans right now.

But it sounds reasonable to me that this is time-boxed. Given that switching is
quite some work I think, switching before the clang-12 release would be
unreasonable, and if clang-13 is in half a year from now, that already sounds
perhaps somewhat reasonable, but might be tight.

I think it's reasonable to add a firm switch before clang-13 and before the
end of this year, with intermediary milestones (e.g. filing blockers for user
regressions in the next 1-2 months).
I'm inclined to favor a tighter deadline, the motivation here being to
ensure that working on potential blockers is prioritized with plenty of time to
spare, so the switch remains time-boxed.

Best,
Alina

Thanks,
Sjoerd.

________________________________
From: Alina Sbirlea <asbirlea at google.com<mailto:asbirlea at
google.com>>
Sent: 24 July 2020 19:51
To: Sjoerd Meijer <Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at
arm.com>>
Cc: Philip Reames <listmail at philipreames.com<mailto:listmail at
philipreames.com>>; Chandler Carruth <chandlerc at
gmail.com<mailto:chandlerc at gmail.com>>; Eric Christopher
<echristo at gmail.com<mailto:echristo at gmail.com>>; llvm-dev
<llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] New pass manager for optimization pipeline status and
questions

Hi all,

The current plan is to prioritize enabling the NPM as soon as possible, and that
includes addressing any blockers that are known or arise. This means
prioritizing those blockers over other LLVM work. The current umbrella bug is
PR46649<https://bugs.llvm.org/show_bug.cgi?id=46649>.

Philip's point is spot on that we are deficient now in the testing of the
LegacyPassManager, because so many have already made the switch (FWIW Google
switched more than 2 years ago).

It's not constructive for the LLVM community to just flip the switch and
break current LPM users. The purpose of these communications to llvm-dev and the
bug tracking is to be informative as to the planned direction and make as quick
of a progress as possible.
Please keep in mind that the work on the NPM has been going on for many years
and many customers have switched years ago, and delaying this for even an
additional year is not acceptable for the code health and stability of LLVM.

My point is that we want and should work with users to make the transition
smooth, but we do very much need user (meaning companies using LLVM) involvement
here in order to not delay the switch further.

Best,
Alina

On Thu, Jul 23, 2020 at 2:59 AM Sjoerd Meijer <Sjoerd.Meijer at
arm.com<mailto:Sjoerd.Meijer at arm.com>> wrote:
I am not in favour of just flipping the switch and then deal with all the
fall-out, because we see major regressions that would be unacceptable for our
users. Thus, not only would this be very disruptive, also our releases are based
on a certain trunk versions, so we would need to revert back to the legacy PM
downstream and thus diverge from upstream which wouldn't be ideal for us.
About the regressions, see the message/thread that I kicked off earlier
(http://lists.llvm.org/pipermail/llvm-dev/2020-July/143646.html) which was
quickly followed up by this thread.

I would like to see here if we are interesting in defining a few criteria that
must be met before we switch:

  1.  Correctness, which obviously always must come first: looks like this is
covered by bots that are running with the NPM, and by downstream users. >From
the latest messages, I am getting we are there, or nearly there.
  2.  Performance (i.e. optimising for speed),
  3.  Code-size.

With 1) correctness box covered and ticked, is now the time to look at codegen
quality: 2) performance and 3) code-size? Would it be reasonable that we create
a plan or timeline to address this, and thus allow time that these issues can be
addressed?

We are now ready to start tuning the NPM for code-size. Perhaps we are late to
the NPM party (but that was a priority and bandwidth issue), but perhaps with
correctness fixed this is actually the right time. I only ran numbers for
code-size, and haven't even looked at performance numbers yet, which we
would also need to do and takes time.

Cheers,
Sjoerd.

________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> on behalf of Eric Christopher via llvm-dev
<llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Sent: 23 July 2020 02:05
To: Philip Reames <listmail at philipreames.com<mailto:listmail at
philipreames.com>>; Alina Sbirlea <asbirlea at
google.com<mailto:asbirlea at google.com>>; Chandler Carruth
<chandlerc at gmail.com<mailto:chandlerc at gmail.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: Re: [llvm-dev] New pass manager for optimization pipeline status and
questions

FWIW I'm in favor of this direction while making sure that we keep focus on
removing the vestiges of the old pass manager for the code health impact to the
project.

-eric

On Wed, Jul 22, 2020 at 3:15 PM Philip Reames via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

(I'm probably going to derail your thread, sorry about that.)

I think at this point, we should just bite the bullet and make the switch to NPM
by default for Clang's optimization pipeline.  Today.

Why?  Because many of our downstream consumers have already switched.  Google
has.  We (Azul) have.  I think I've heard the same for a couple other major
contributors.  Why does this matter?  Testing.  At the current moment, the vast
majority of testing the project gets is exercising NPM, not LPM.

NPM is functionally complete for Clang optimization.  There might be a few
missing cases around the sanitizers, but last I heard those were on the edge of
being fixed.

I think we should make the switch, and deal with any fall out as regressions. 
If we made the change immediately after a release branch, we'd have several
months to address any major issues before the next release.

Philip

On 7/22/20 2:39 PM, Arthur Eubanks via llvm-dev wrote:
Hi all,

I wanted to give a quick update on the status of NPM for the IR optimization
pipeline and ask some questions.

In the past I believe there were thoughts that NPM was basically ready because
all of check-llvm and check-clang passed when
-DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=ON was specified. But that CMake flag did
not apply to opt and any tests running something like `opt -foo-pass -bar-pass`
(which is the vast majority of check-llvm tests) were still using the legacy PM.
The intended way to use NPM was to use the -passes flag, e.g. `opt
-passes='foo,bar'`.

I've added a -enable-new-pm flag to opt to force running NPM passes even
when `opt -foo-pass` is used. This is because I didn't want to go through
every single test and figure out which ones should be using both -foo-pass and
-passes=foo. Switching on -enable-new-pm currently leads to ~1800 check-llvm
failures. I've documented the failed tests count per directory in
https://bugs.llvm.org/show_bug.cgi?id=46651 (some have been fixed since that was
posted).

This has led to real bugs in NPM being discovered and fixed (e.g. some optnone
issues).

But a large portion of the remaining failures are because codegen-only passes
haven't been ported to NPM yet. That's fine for the optimization
pipeline NPM transition since it doesn't affect the optimization pipeline,
but it does present an issue with the approach of the -enable-new-pm flag (which
would by default become true alongside the NPM transition). Lots of tests are
testing codegen-specific passes via opt (e.g. `opt -amdgpu-lower-intrinsics`)
and they can't use NPM (yet).

I was thinking either we have a way of identifying codegen-only passes and
revert back to the legacy PM in opt whenever we see one, or we go back to
considering the originally intended approach of adding an equivalent `-passes=`
RUN to all tests that should be also running under NPM.

I'm not sure of a nice and clean solution to identify codegen-only passes.
We could go and update every instance of INITIALIZE_PASS to take another
parameter indicating if it's codegen-only. Or we could just have a central
list somewhere where we check if the pass is in some hardcoded list or has some
prefix (e.g. "x86-").

The approach of adding equivalent `-passes=` RUN lines to all relevant tests
seems daunting, but not exactly sure how daunting. Maybe it's possible to
script something and see what fails? We'd still need some way to identify
codegen-only passes to make sure we don't miss anything, and we'd need
to distinguish between analyses and normal passes. Also, it would slow down test
execution since we'd run a lot more tests twice, but maybe that's not
such a big deal? Maybe it's good to have most tests running against the
legacy PM even when NPM is on by default?

Thoughts?

This is split off from
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143395.html.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200729/4d3255a6/attachment.html>

Alina Sbirlea via llvm-dev

2020-Jul-30 05:13 UTC

head link

[llvm-dev] New pass manager for optimization pipeline status and questions

On Wed, Jul 29, 2020 at 3:37 AM Sjoerd Meijer <Sjoerd.Meijer at arm.com>
wrote:
> Hi,
>
> > Yes, I think big performance and code-size regressions need to be
> investigated. It will be hard to quantify what "big" is though,
but we
> should definitely track such regressions.
> > Due to the time-boxed approach we may need to discuss moving forward
> with the switch with some regressions deemed acceptable, but those
> considered blockers should be prioritized of course.
>
> Agreed, SGTM.
>
> > I think it's reasonable to add a firm switch before clang-13 and
before
> the end of this year, with intermediary milestones (e.g. filing  blockers
> for user regressions in the next 1-2 months).
> > I'm inclined to favor a tighter deadline, the motivation here
being to
> ensure that working on potential blockers is prioritized with plenty of
> time to spare, so the switch remains time-boxed.
>
> I am not entirely sure yet about the tighter deadline, but I am optimistic
> things might start falling into place soon. Let me explain.
>
> I think you've noticed the code-size issue that I raised. Switching
before
> that being fixed would be a real non-starter for us, but if there's
> consensus the community want to switch then we'll deal with that of
course.
>Because (some) of these regression are so big, I am hoping we are
missing> something obvious and that with a few things fixed we solve most of these
> cases, which is why I am optimistic, and also because of next point:
>
> After code-size, I'm now looking at performance, and things look a lot
> better there for us. I do see some up and down behaviour: some wins in some
> benchmarks overall, but also some regressions. I will need to do some more
> homework here, as I need to check if small downstream divergence play a
> role here, and need to eliminate that. In some benchmarks where we win
> overall, there are a few cases that we really don't want to regress, so
I
> need to look into this. I think these are problems that we need to fix, but
> I will start raising issues for them just for visibility. Overall, there
> are less to none non-starters in this area I think.
>
> But because these are time-consuming exercises, and it's the summer
> months, I am getting slightly nervous about the tight(er) deadline. :-)
>
I think it's great to make it clear which items are critical for you, and
having those as blocker makes perfect sense. The code-size issue you added
to the umbrella bug makes sense to me, thank you for filing that. I think
it's also fair to ask for reasonable delays or help addressing such issues.

I completely agree that discovering and addressing these is a
time-consuming endeavour, so I think there's some room for flexibility
here. Again, the main reason I'm advocating for a preliminary tighter
deadline is so we can ensure prioritization of discovering and addressing
such blockers early. As long as the extended deadline (end of this year and
clang-13) remains firm I think we'd be in a good position.
A tighter deadline is useful as a checkpoint with the community of: "who is
regressed if we switched today", "are the regressions at this point
acceptable to make the switch", "what is the status of the
investigations
into these regressions", "what resources are needed to make progress
in the
remaining blockers". I propose having the tighter deadline half-way through
(middle of October) and work towards evaluating the status of the switch
then and decide on extensions as needed. Does this sound like a reasonable
plan?

The end goal is for the flag flip to be an overall benefit, not to cause
disruptions :-).

Best,
Alina

> Cheers,
> Sjoerd.
>


I think the progress so far looks great

>
>
> ------------------------------
> *From:* Alina Sbirlea <asbirlea at google.com>
> *Sent:* 28 July 2020 18:23
> *To:* Sjoerd Meijer <Sjoerd.Meijer at arm.com>
> *Cc:* Philip Reames <listmail at philipreames.com>; Chandler Carruth
<
> chandlerc at gmail.com>; Eric Christopher <echristo at gmail.com>;
llvm-dev <
> llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] New pass manager for optimization pipeline
> status and questions
>
>
>
> On Fri, Jul 24, 2020 at 12:54 PM Sjoerd Meijer <Sjoerd.Meijer at
arm.com>
> wrote:
>
> Hi Alina,
>
> I think this is an excellent direction, this is the direction we should
> take here.  Just a somewhat irrelevant disagreement on this though:
>
> > Philip's point is spot on that we are deficient now in the testing
of
> the LegacyPassManager,
>
> I disagree because the LPM is still the default and I appreciated Hans'
> reply: "Defaults tend to be popular". But this is the direction I
like:
>
> > This means prioritizing those blockers over other LLVM work. The
current
> umbrella bug is PR46649
<https://bugs.llvm.org/show_bug.cgi?id=46649>.
>
> Just checking: do you accept both performance and code-size regressions as
> blockers here?
>
>
> Yes, I think big performance and code-size regressions need to be
> investigated. It will be hard to quantify what "big" is though,
but we
> should definitely track such regressions.
> Due to the time-boxed approach we may need to discuss moving forward with
> the switch with some regressions deemed acceptable, but those considered
> blockers should be prioritized of course.
>
>
>
> > My point is that we want and should work with users to make the
> transition smooth, but we do very much need user (meaning companies using
> LLVM) involvement here in order to not delay the switch further.
>
> That's clear, and agreed.
> I would like to remark here that currently, when a commit regresses one
> benchmark that is important for someone, that is enough justification most
> of the time for a revert of that commit. That's why I surprised that it
> looked like we were not setting code-quality goals and requirements before
> switching. And what I would like to ask here is to provide reasonable
> enough time for people to look into switching to the NPM, to evaluate this,
> and then file bugs under PR46649. Just collecting data, evaluating
> problems, filings bugs can already time-consuming, and then I guess they
> need fixing too. This also needs to fit in people's plans right now.
>
> But it sounds reasonable to me that this is time-boxed. Given that
> switching is quite some work I think, switching before the clang-12 release
> would be unreasonable, and if clang-13 is in half a year from now, that
> already sounds perhaps somewhat reasonable, but might be tight.
>
>
> I think it's reasonable to add a firm switch before clang-13 and before
> the end of this year, with intermediary milestones (e.g. filing blockers
> for user regressions in the next 1-2 months).
> I'm inclined to favor a tighter deadline, the motivation here being to
> ensure that working on potential blockers is prioritized with plenty of
> time to spare, so the switch remains time-boxed.
>
> Best,
> Alina
>
>
> Thanks,
> Sjoerd.
>
> ------------------------------
> *From:* Alina Sbirlea <asbirlea at google.com>
> *Sent:* 24 July 2020 19:51
> *To:* Sjoerd Meijer <Sjoerd.Meijer at arm.com>
> *Cc:* Philip Reames <listmail at philipreames.com>; Chandler Carruth
<
> chandlerc at gmail.com>; Eric Christopher <echristo at gmail.com>;
llvm-dev <
> llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] New pass manager for optimization pipeline
> status and questions
>
> Hi all,
>
> The current plan is to prioritize enabling the NPM as soon as possible,
> and that includes addressing any blockers that are known or arise. This
> means prioritizing those blockers over other LLVM work. The current
> umbrella bug is PR46649
<https://bugs.llvm.org/show_bug.cgi?id=46649>.
>
> Philip's point is spot on that we are deficient now in the testing of
the
> LegacyPassManager, because so many have already made the switch (FWIW
> Google switched more than 2 years ago).
>
> It's not constructive for the LLVM community to just flip the switch
and
> break current LPM users. The purpose of these communications to llvm-dev
> and the bug tracking is to be informative as to the planned direction and
> make as quick of a progress as possible.
> Please keep in mind that the work on the NPM has been going on for many
> years and many customers have switched years ago, and delaying this for
> even an additional year is not acceptable for the code health and stability
> of LLVM.
>
> My point is that we want and should work with users to make the transition
> smooth, but we do very much need user (meaning companies using LLVM)
> involvement here in order to not delay the switch further.
>
> Best,
> Alina
>
>
> On Thu, Jul 23, 2020 at 2:59 AM Sjoerd Meijer <Sjoerd.Meijer at
arm.com>
> wrote:
>
> I am not in favour of just flipping the switch and then deal with all the
> fall-out, because we see major regressions that would be unacceptable for
> our users. Thus, not only would this be very disruptive, also our releases
> are based on a certain trunk versions, so we would need to revert back to
> the legacy PM downstream and thus diverge from upstream which wouldn't
be
> ideal for us. About the regressions, see the message/thread that I kicked
> off earlier (
> http://lists.llvm.org/pipermail/llvm-dev/2020-July/143646.html) which was
> quickly followed up by this thread.
>
> I would like to see here if we are interesting in defining a few criteria
> that must be met before we switch:
>
>    1. Correctness, which obviously always must come first: looks like
>    this is covered by bots that are running with the NPM, and by downstream
>    users. From the latest messages, I am getting we are there, or nearly
there.
>    2. Performance (i.e. optimising for speed),
>    3. Code-size.
>
> With 1) correctness box covered and ticked, is now the time to look at
> codegen quality: 2) performance and 3) code-size? Would it be reasonable
> that we create a plan or timeline to address this, and thus allow time that
> these issues can be addressed?
>
> We are now ready to start tuning the NPM for code-size. Perhaps we are
> late to the NPM party (but that was a priority and bandwidth issue), but
> perhaps with correctness fixed this is actually the right time. I only ran
> numbers for code-size, and haven't even looked at performance numbers
yet,
> which we would also need to do and takes time.
>
> Cheers,
> Sjoerd.
>
>
> ------------------------------
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of
Eric
> Christopher via llvm-dev <llvm-dev at lists.llvm.org>
> *Sent:* 23 July 2020 02:05
> *To:* Philip Reames <listmail at philipreames.com>; Alina Sbirlea
<
> asbirlea at google.com>; Chandler Carruth <chandlerc at gmail.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] New pass manager for optimization pipeline
> status and questions
>
> FWIW I'm in favor of this direction while making sure that we keep
focus
> on removing the vestiges of the old pass manager for the code health impact
> to the project.
>
> -eric
>
> On Wed, Jul 22, 2020 at 3:15 PM Philip Reames via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> (I'm probably going to derail your thread, sorry about that.)
>
> I think at this point, we should just bite the bullet and make the switch
> to NPM by default for Clang's optimization pipeline.  Today.
>
> Why?  Because many of our downstream consumers have already switched.
> Google has.  We (Azul) have.  I think I've heard the same for a couple
> other major contributors.  Why does this matter?  Testing.  At the current
> moment, the vast majority of testing the project gets is exercising NPM,
> not LPM.
>
> NPM is functionally complete for Clang optimization.  There might be a few
> missing cases around the sanitizers, but last I heard those were on the
> edge of being fixed.
>
> I think we should make the switch, and deal with any fall out as
> regressions.  If we made the change immediately after a release branch,
> we'd have several months to address any major issues before the next
> release.
>
> Philip
> On 7/22/20 2:39 PM, Arthur Eubanks via llvm-dev wrote:
>
> Hi all,
>
> I wanted to give a quick update on the status of NPM for the IR
> optimization pipeline and ask some questions.
>
> In the past I believe there were thoughts that NPM was basically ready
> because all of check-llvm and check-clang passed when
> -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=ON was specified. But that CMake
> flag did not apply to opt and any tests running something like `opt
> -foo-pass -bar-pass` (which is the vast majority of check-llvm tests) were
> still using the legacy PM. The intended way to use NPM was to use the
> -passes flag, e.g. `opt -passes='foo,bar'`.
>
> I've added a -enable-new-pm flag to opt to force running NPM passes
even
> when `opt -foo-pass` is used. This is because I didn't want to go
through
> every single test and figure out which ones should be using both -foo-pass
> and -passes=foo. Switching on -enable-new-pm currently leads to ~1800
> check-llvm failures. I've documented the failed tests count per
directory
> in https://bugs.llvm.org/show_bug.cgi?id=46651 (some have been fixed
> since that was posted).
>
> This has led to real bugs in NPM being discovered and fixed (e.g. some
> optnone issues).
>
> But a large portion of the remaining failures are because codegen-only
> passes haven't been ported to NPM yet. That's fine for the
optimization
> pipeline NPM transition since it doesn't affect the optimization
pipeline,
> but it does present an issue with the approach of the -enable-new-pm flag
> (which would by default become true alongside the NPM transition). Lots of
> tests are testing codegen-specific passes via opt (e.g. `opt
> -amdgpu-lower-intrinsics`) and they can't use NPM (yet).
>
> I was thinking either we have a way of identifying codegen-only passes and
> revert back to the legacy PM in opt whenever we see one, or we go back to
> considering the originally intended approach of adding an equivalent
> `-passes=` RUN to all tests that should be also running under NPM.
>
> I'm not sure of a nice and clean solution to identify codegen-only
passes.
> We could go and update every instance of INITIALIZE_PASS to take another
> parameter indicating if it's codegen-only. Or we could just have a
central
> list somewhere where we check if the pass is in some hardcoded list or has
> some prefix (e.g. "x86-").
>
> The approach of adding equivalent `-passes=` RUN lines to all relevant
> tests seems daunting, but not exactly sure how daunting. Maybe it's
> possible to script something and see what fails? We'd still need some
way
> to identify codegen-only passes to make sure we don't miss anything,
and
> we'd need to distinguish between analyses and normal passes. Also, it
would
> slow down test execution since we'd run a lot more tests twice, but
maybe
> that's not such a big deal? Maybe it's good to have most tests
running
> against the legacy PM even when NPM is on by default?
>
> Thoughts?
>
> This is split off from
> http://lists.llvm.org/pipermail/llvm-dev/2020-July/143395.html.
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200729/ad17f303/attachment.html>

llvm dev - Jul 2020 - New pass manager for optimization pipeline status and questions

[llvm-dev] New pass manager for optimization pipeline status and questions

[llvm-dev] New pass manager for optimization pipeline status and questions

[llvm-dev] New pass manager for optimization pipeline status and questions