thr3ads.net - llvm dev - [LLVMdev] [cfe-dev] costing optimisations [Nov 2012]

If this information is useful, please help other people find it:
Share via:

john skaller

2012-Nov-23 10:19 UTC

[LLVMdev] [cfe-dev] costing optimisations

On 23/11/2012, at 8:28 PM, David Tweed wrote:
> Firstly I don't think TeX does pre-estimation of work, it's just
the
> standard stepwise "refinement stuff".
I didn't mean that example to be taken so literally.
When TeX formats a paragraph, it has to decide
where to put line breaks. Breaking a line on a space
has less badness that hyphenating a word. If I recall
hyphenations can have different badnesses. The thing is
that stretching whitespace also has a badness dependent on
how much you stretch it.

So TeX actually considers several possible break points,
and calculates a badness for each one.  but there's more.
The choice of a break point in a line affects the next line.
And the one after that, etc. TeX tries to solve for optimal layout
of the whole paragraph. That's why it produces such awesome
results.
> However, it's also potentially introducing highly
> non-predictable/principle-of-least-surprise behaviour into the compiler,
> particularly since llvm can be used as a JIT. So I suspect it wouldn't
be
> something most people would want for mainstream clang/llvm, but if might be
> interesting if someone were to implement some passes like this (probably
> out-of-tree) and see if it proves beneficial.
My problem is that I get quantum non-determinism :)
Either the compiler finishes and does a "reasonable job"
or it takes so long the whole thing gets killed.

Of course, I could re-run the compiler with a lower optimisation
switch in that case. I think I might try that.
> 
> Of course Sean's point that superlinear behaviour ought to be reported
as
> it's probably a bug is very true,

Isn't data flow analysis O(N^3)?
You cannot do proper alias analysis without it.


--
john skaller
skaller at users.sourceforge.net
http://felix-lang.org

David Tweed

2012-Nov-23 10:36 UTC

head link

[LLVMdev] [cfe-dev] costing optimisations

On 23/11/2012, at 8:28 PM, David Tweed wrote:

| So TeX actually considers several possible break points,
| and calculates a badness for each one.  but there's more.
| The choice of a break point in a line affects the next line.
| And the one after that, etc. TeX tries to solve for optimal layout
| of the whole paragraph. That's why it produces such awesome
| results.

Yep; the project part of my MSc was on dynamic programming problems similar to
the TeX one. However, it doesn't actually first attempt to estimate how much
work a paragraph is likely to take to break, it just uses a clever dynamic
programming algorithm to build up a globally optimal set of breaks regardless of
work cost. Only if the badness is still too high does it try to do extra breaks
with hyphenations (and it was a long time ago I looked at this, but I think once
it's decided to hyphenate a given word it stops caring about the whitespace
badness of the break in favour of a "non-misleading" hyphenation --
such as avoiding weeknights becoming wee-knights). Anyway, this is becoming a
digression.
> However, it's also potentially introducing highly
> non-predictable/principle-of-least-surprise behaviour into the compiler,
> particularly since llvm can be used as a JIT. So I suspect it wouldn't
be
> something most people would want for mainstream clang/llvm, but if might be
> interesting if someone were to implement some passes like this (probably
> out-of-tree) and see if it proves beneficial.
My problem is that I get quantum non-determinism :)
Either the compiler finishes and does a "reasonable job"
or it takes so long the whole thing gets killed.

That really sounds like a bug (either software or hardware): whether it's
fast or slow, the compilation time (for the same options) should be essentially
the same for the same piece of code.

Regards,
Dave

David Chisnall

2012-Nov-23 11:39 UTC

head link

[LLVMdev] [cfe-dev] costing optimisations

{Veering a bit off-topic}

On 23 Nov 2012, at 10:36, David Tweed wrote:
> Yep; the project part of my MSc was on dynamic programming problems similar
to the TeX one. However, it doesn't actually first attempt to estimate how
much work a paragraph is likely to take to break, it just uses a clever dynamic
programming algorithm to build up a globally optimal set of breaks regardless of
work cost. Only if the badness is still too high does it try to do extra breaks
with hyphenations (and it was a long time ago I looked at this, but I think once
it's decided to hyphenate a given word it stops caring about the whitespace
badness of the break in favour of a "non-misleading" hyphenation --
such as avoiding weeknights becoming wee-knights). Anyway, this is becoming a
digression.
Franklin Mark Liang's PhD thesis is quite readable and very informative for
anyone actually interested in this.  The most interesting feature is that TeX
uses a very good dynamic programming approach for laying out text in paragraphs
but a uses an ugly greedy algorithm for laying out paragraphs on a page.  This
decision was made because doing the same dynamic processing approach for every
paragraph in the document would have required several megabytes of memory for
large documents, which was completely unfeasible on the computers of the time. 
It provides a good lesson that optimisations that made sense for one system are
worth revisiting periodically.

David

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Nov 2012 - [LLVMdev] [cfe-dev] costing optimisations

[LLVMdev] [cfe-dev] costing optimisations

[LLVMdev] [cfe-dev] costing optimisations

[LLVMdev] [cfe-dev] costing optimisations

Maybe Matching Threads