thr3ads.net - llvm dev - [llvm-dev] RFC: Enabling Module passes post-ISel [Jul 2016]

If this information is useful, please help other people find it:
Share via:

James Molloy via llvm-dev

2016-Jul-17 16:34 UTC

[llvm-dev] RFC: Enabling Module passes post-ISel

Hi,

[Apologies to those receiving this mail twice - used the old list address
by accident]

In LLVM it is currently not possible to write a Module-level pass (a pass
that modifies or analyzes multiple MachineFunctions) after DAG formation.
This inhibits some optimizations[1] and is something I'd like to see
changed.

The problem is that in the backend, we emit a function at a time, from DAG
formation to object emission. So no two MachineFunctions ever exist at any
one time. Changing this necessarily means increasing memory usage.

I've prototyped this change and have measured peak memory usage in the
worst case scenario - LTO'ing llc and clang. Without further ado:

  llvm-lto llc:   before: 1.44GB maximum resident set size
                  after:  1.68GB (+17%)

  llvm-lto clang: before: 2.48GB maximum resident set size
                  after:  3.42GB (+33%)

The increases are very large. This is worst-case (non-LTO builds would see
the peak usage of the backend masked by the peak of the midend) but still -
pretty big. Thoughts? Is this completely no-go? is this something that we
*just need* to do? Is crippling the backend architecture to keep memory
down justified? Is this something we could enable under an option?

Any input appreciated. I can also provide a proof-of-concept patch for
people to test if wanted.

Cheers,

James

[1] A concrete (and the motivating) example is the sharing of constants in
constantpools across multiple functions. Two small functions that use the
same constant must currently create their own copy of that constant. For
many reasons that I'd not like to go into in this thread, we can't
implement this *at all* in the current infrastructure.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160717/c6dfdd7a/attachment.html>

Matthias Braun via llvm-dev

2016-Jul-17 17:57 UTC

head link

[llvm-dev] RFC: Enabling Module passes post-ISel

> On Jul 17, 2016, at 9:34 AM, James Molloy via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> [Apologies to those receiving this mail twice - used the old list address
by accident]
> 
> In LLVM it is currently not possible to write a Module-level pass (a pass
that modifies or analyzes multiple MachineFunctions) after DAG formation. This
inhibits some optimizations[1] and is something I'd like to see changed.
> 
> The problem is that in the backend, we emit a function at a time, from DAG
formation to object emission. So no two MachineFunctions ever exist at any one
time. Changing this necessarily means increasing memory usage.
> 
> I've prototyped this change and have measured peak memory usage in the
worst case scenario - LTO'ing llc and clang. Without further ado:
> 
>   llvm-lto llc:   before: 1.44GB maximum resident set size
>                   after:  1.68GB (+17%)
> 
>   llvm-lto clang: before: 2.48GB maximum resident set size
>                   after:  3.42GB (+33%)
> 
> The increases are very large. This is worst-case (non-LTO builds would see
the peak usage of the backend masked by the peak of the midend) but still -
pretty big. Thoughts? Is this completely no-go? is this something that we *just
need* to do? Is crippling the backend architecture to keep memory down
justified? Is this something we could enable under an option?I also recently did a prototype that enables MachineModulePasses
(https://github.com/MatzeB/llvm/tree/MachineModulePass
<https://github.com/MatzeB/llvm/tree/MachineModulePass>). I assume your
patch looks similar (I did not do any measurements with mine).

I believe this could be done in a way so the memory stays at current levels if
no machine module pass issued (simply remove the memory used by the
MachineFunction after we AsmPrinted it). So that would at least make the
infrastructure available for people to experiment with. So maybe that is a good
first step?

So far all the uses I have seen did not convince me that the increased memory
consumption is worth it (by default). But IMO if we can provide by
infrastructure for MachineModulePasses with nearly zero cost in case no
machinemodulepass is used, then I’d say we should go for it.

- Matthias

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160717/116b43bf/attachment.html>

Pete Cooper via llvm-dev

2016-Jul-17 19:52 UTC

head link

[llvm-dev] RFC: Enabling Module passes post-ISel

Hi James, Matthias

I recently proposed the idea of deleting the IR for a function after its reached
the AsmPrinter.  This was before the idea of MachineModulePasses.  I was seeing
peak memory savings of 20% when LTOing clang itself.

So it might be possible to trade the memory so that IR and MachineIR aren’t live
at the same time for all functions.  But that depends on whether a
MachineModulePass would want access to the IR.  AA is the typical example of
MachineIR referencing back to IR.  I knew it was safe to delete IR when we hit
the AsmPrinter because nothing at (or beyond) that point needs AA.

Cheers,
Pete> On Jul 17, 2016, at 10:57 AM, Matthias Braun via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On Jul 17, 2016, at 9:34 AM, James Molloy via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> Hi,
>> 
>> [Apologies to those receiving this mail twice - used the old list
address by accident]
>> 
>> In LLVM it is currently not possible to write a Module-level pass (a
pass that modifies or analyzes multiple MachineFunctions) after DAG formation.
This inhibits some optimizations[1] and is something I'd like to see
changed.
>> 
>> The problem is that in the backend, we emit a function at a time, from
DAG formation to object emission. So no two MachineFunctions ever exist at any
one time. Changing this necessarily means increasing memory usage.
>> 
>> I've prototyped this change and have measured peak memory usage in
the worst case scenario - LTO'ing llc and clang. Without further ado:
>> 
>>   llvm-lto llc:   before: 1.44GB maximum resident set size
>>                   after:  1.68GB (+17%)
>> 
>>   llvm-lto clang: before: 2.48GB maximum resident set size
>>                   after:  3.42GB (+33%)
>> 
>> The increases are very large. This is worst-case (non-LTO builds would
see the peak usage of the backend masked by the peak of the midend) but still -
pretty big. Thoughts? Is this completely no-go? is this something that we *just
need* to do? Is crippling the backend architecture to keep memory down
justified? Is this something we could enable under an option?
> I also recently did a prototype that enables MachineModulePasses
(https://github.com/MatzeB/llvm/tree/MachineModulePass
<https://github.com/MatzeB/llvm/tree/MachineModulePass>). I assume your
patch looks similar (I did not do any measurements with mine).
> 
> I believe this could be done in a way so the memory stays at current levels
if no machine module pass issued (simply remove the memory used by the
MachineFunction after we AsmPrinted it). So that would at least make the
infrastructure available for people to experiment with. So maybe that is a good
first step?
> 
> So far all the uses I have seen did not convince me that the increased
memory consumption is worth it (by default). But IMO if we can provide by
infrastructure for MachineModulePasses with nearly zero cost in case no
machinemodulepass is used, then I’d say we should go for it.
> 
> - Matthias
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160717/7b67c0f9/attachment.html>

Mehdi Amini via llvm-dev

2016-Jul-18 00:15 UTC

head link

[llvm-dev] RFC: Enabling Module passes post-ISel

> On Jul 17, 2016, at 9:34 AM, James Molloy via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> [Apologies to those receiving this mail twice - used the old list address
by accident]
> 
> In LLVM it is currently not possible to write a Module-level pass (a pass
that modifies or analyzes multiple MachineFunctions) after DAG formation. This
inhibits some optimizations[1] and is something I'd like to see changed.
> 
> The problem is that in the backend, we emit a function at a time, from DAG
formation to object emission. So no two MachineFunctions ever exist at any one
time. Changing this necessarily means increasing memory usage.
> 
> I've prototyped this change and have measured peak memory usage in the
worst case scenario - LTO'ing llc and clang. Without further ado:
> 
>   llvm-lto llc:   before: 1.44GB maximum resident set size
>                   after:  1.68GB (+17%)
> 
>   llvm-lto clang: before: 2.48GB maximum resident set size
>                   after:  3.42GB (+33%)
These are non-debug build, mind you trying with debug info?

Thanks,

— 
Mehdi

> 
> The increases are very large. This is worst-case (non-LTO builds would see
the peak usage of the backend masked by the peak of the midend) but still -
pretty big. Thoughts? Is this completely no-go? is this something that we *just
need* to do? Is crippling the backend architecture to keep memory down
justified? Is this something we could enable under an option?
> 
> Any input appreciated. I can also provide a proof-of-concept patch for
people to test if wanted.
> 
> Cheers,
> 
> James
> 
> [1] A concrete (and the motivating) example is the sharing of constants in
constantpools across multiple functions. Two small functions that use the same
constant must currently create their own copy of that constant. For many reasons
that I'd not like to go into in this thread, we can't implement this *at
all* in the current infrastructure.
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

James Molloy via llvm-dev

2016-Jul-18 09:23 UTC

head link

[llvm-dev] RFC: Enabling Module passes post-ISel

Hi Mehdi,

I've just done an LTO build of llc with debug info:
  before: 7.66GB maximum resident set size
  after:   8.05GB (+5.1%)

Cheers,

James

On Mon, 18 Jul 2016 at 01:15 Mehdi Amini <mehdi.amini at apple.com> wrote:
>
> > On Jul 17, 2016, at 9:34 AM, James Molloy via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > [Apologies to those receiving this mail twice - used the old list
> address by accident]
> >
> > In LLVM it is currently not possible to write a Module-level pass (a
> pass that modifies or analyzes multiple MachineFunctions) after DAG
> formation. This inhibits some optimizations[1] and is something I'd
like to
> see changed.
> >
> > The problem is that in the backend, we emit a function at a time, from
> DAG formation to object emission. So no two MachineFunctions ever exist at
> any one time. Changing this necessarily means increasing memory usage.
> >
> > I've prototyped this change and have measured peak memory usage in
the
> worst case scenario - LTO'ing llc and clang. Without further ado:
> >
> >   llvm-lto llc:   before: 1.44GB maximum resident set size
> >                   after:  1.68GB (+17%)
> >
> >   llvm-lto clang: before: 2.48GB maximum resident set size
> >                   after:  3.42GB (+33%)
>
> These are non-debug build, mind you trying with debug info?
>
> Thanks,
>
> —
> Mehdi
>
>
> >
> > The increases are very large. This is worst-case (non-LTO builds would
> see the peak usage of the backend masked by the peak of the midend) but
> still - pretty big. Thoughts? Is this completely no-go? is this something
> that we *just need* to do? Is crippling the backend architecture to keep
> memory down justified? Is this something we could enable under an option?
> >
> > Any input appreciated. I can also provide a proof-of-concept patch for
> people to test if wanted.
> >
> > Cheers,
> >
> > James
> >
> > [1] A concrete (and the motivating) example is the sharing of
constants
> in constantpools across multiple functions. Two small functions that use
> the same constant must currently create their own copy of that constant.
> For many reasons that I'd not like to go into in this thread, we
can't
> implement this *at all* in the current infrastructure.
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160718/fcb96417/attachment.html>

Justin Bogner via llvm-dev

2016-Jul-19 07:21 UTC

head link

[llvm-dev] RFC: Enabling Module passes post-ISel

James Molloy via llvm-dev <llvm-dev at lists.llvm.org>
writes:> In LLVM it is currently not possible to write a Module-level pass (a pass
that
> modifies or analyzes multiple MachineFunctions) after DAG formation. This
> inhibits some optimizations[1] and is something I'd like to see
changed.
>
> The problem is that in the backend, we emit a function at a time, from DAG
> formation to object emission. So no two MachineFunctions ever exist at any
one
> time. Changing this necessarily means increasing memory usage.
>
> I've prototyped this change and have measured peak memory usage in the
worst
> case scenario - LTO'ing llc and clang. Without further ado:
>
>   llvm-lto llc:   before: 1.44GB maximum resident set size
>                   after:  1.68GB (+17%)
>
>   llvm-lto clang: before: 2.48GB maximum resident set size
>                   after:  3.42GB (+33%)
>
> The increases are very large. This is worst-case (non-LTO builds would see
the
> peak usage of the backend masked by the peak of the midend) but still -
pretty
> big. Thoughts? Is this completely no-go? is this something that we *just
need*
> to do? Is crippling the backend architecture to keep memory down justified?
Is
> this something we could enable under an option?
Personally, I think this price is too high. I think that if we want to
enable machine module passes (which we probably do) we need to turn
MachineFunction into more of a first class object that isn't just a
wrapper around IR.

This can and should be designed to work something like Pete's solution,
where we get rid of the IR and just have machine level stuff in memory.
This way, we may still increase the memory usage here, but it should be
far less dramatic.

You'll note that doing this also has tangential benefits - it should be
helpful for simplifying MIR and generally improving testability of the
backends.

James Molloy via llvm-dev

2016-Jul-19 14:16 UTC

head link

[llvm-dev] RFC: Enabling Module passes post-ISel

Hi all,

I like all the ideas so far. Here are my thoughts:

I think that fundamentally users of LLVM should be able to opt-in to more
aggressive or intensive computation at compile time if they wish. Users'
needs differ, and while a 33% increase in clang LTO is absolutely out of
the question for some people, for those developing microcontrollers or HPC
applications that may well be irrelevant. Either the volume of code
expected is significantly smaller or they're happy to trade off compile
time for expensive server time. That does not mean that we shouldn't strive
for a solution that can be acceptable by all users. On the other hand
making something opt-in makes it non-default, and that increases the
testing surface.

Tangentially I think that LLVM currently doesn't have the right tuning
knobs to allow the user to select their desired tradeoff. We have one
optimization flag -O{s,z,0,1,2,3} which encodes both optimization *goal* (a
point on the pareto curve between size and speed) and amount of effort to
expend at compile time achieving that goal. Anyway, that's besides the
point.

I like Justin's idea of removing IR from the backend to free up memory. I
think it's a very long term project though, one that requires significant
(re)design; alias analysis access in the backend would be completely broken
and BasicAA among others depends on seeing the IR at query time. We'd need
to work out a way of providing alias analysis with no IR present. I don't
think that is feasible for the near future.

So my suggestion is that we go with Matthias' idea - do the small amount of
refactoring needed to allow MachineModulePasses on an opt-in basis. The
knobs to enable that opt-in might need some more bikeshedding.

Cheers,

James

On Tue, 19 Jul 2016 at 08:21 Justin Bogner <mail at justinbogner.com>
wrote:
> James Molloy via llvm-dev <llvm-dev at lists.llvm.org> writes:
> > In LLVM it is currently not possible to write a Module-level pass (a
> pass that
> > modifies or analyzes multiple MachineFunctions) after DAG formation.
This
> > inhibits some optimizations[1] and is something I'd like to see
changed.
> >
> > The problem is that in the backend, we emit a function at a time, from
> DAG
> > formation to object emission. So no two MachineFunctions ever exist at
> any one
> > time. Changing this necessarily means increasing memory usage.
> >
> > I've prototyped this change and have measured peak memory usage in
the
> worst
> > case scenario - LTO'ing llc and clang. Without further ado:
> >
> >   llvm-lto llc:   before: 1.44GB maximum resident set size
> >                   after:  1.68GB (+17%)
> >
> >   llvm-lto clang: before: 2.48GB maximum resident set size
> >                   after:  3.42GB (+33%)
> >
> > The increases are very large. This is worst-case (non-LTO builds would
> see the
> > peak usage of the backend masked by the peak of the midend) but still
-
> pretty
> > big. Thoughts? Is this completely no-go? is this something that we
*just
> need*
> > to do? Is crippling the backend architecture to keep memory down
> justified? Is
> > this something we could enable under an option?
>
> Personally, I think this price is too high. I think that if we want to
> enable machine module passes (which we probably do) we need to turn
> MachineFunction into more of a first class object that isn't just a
> wrapper around IR.
>
> This can and should be designed to work something like Pete's solution,
> where we get rid of the IR and just have machine level stuff in memory.
> This way, we may still increase the memory usage here, but it should be
> far less dramatic.
>
> You'll note that doing this also has tangential benefits - it should be
> helpful for simplifying MIR and generally improving testability of the
> backends.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160719/e214e7d4/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jul 2016 - RFC: Enabling Module passes post-ISel

[llvm-dev] RFC: Enabling Module passes post-ISel

[llvm-dev] RFC: Enabling Module passes post-ISel

[llvm-dev] RFC: Enabling Module passes post-ISel

[llvm-dev] RFC: Enabling Module passes post-ISel

[llvm-dev] RFC: Enabling Module passes post-ISel

[llvm-dev] RFC: Enabling Module passes post-ISel

[llvm-dev] RFC: Enabling Module passes post-ISel

Maybe Matching Threads