On 23/11/2012, at 8:28 PM, David Tweed wrote:> Firstly I don't think TeX does pre-estimation of work, it's just the > standard stepwise "refinement stuff".I didn't mean that example to be taken so literally. When TeX formats a paragraph, it has to decide where to put line breaks. Breaking a line on a space has less badness that hyphenating a word. If I recall hyphenations can have different badnesses. The thing is that stretching whitespace also has a badness dependent on how much you stretch it. So TeX actually considers several possible break points, and calculates a badness for each one. but there's more. The choice of a break point in a line affects the next line. And the one after that, etc. TeX tries to solve for optimal layout of the whole paragraph. That's why it produces such awesome results.> However, it's also potentially introducing highly > non-predictable/principle-of-least-surprise behaviour into the compiler, > particularly since llvm can be used as a JIT. So I suspect it wouldn't be > something most people would want for mainstream clang/llvm, but if might be > interesting if someone were to implement some passes like this (probably > out-of-tree) and see if it proves beneficial.My problem is that I get quantum non-determinism :) Either the compiler finishes and does a "reasonable job" or it takes so long the whole thing gets killed. Of course, I could re-run the compiler with a lower optimisation switch in that case. I think I might try that.> > Of course Sean's point that superlinear behaviour ought to be reported as > it's probably a bug is very true,Isn't data flow analysis O(N^3)? You cannot do proper alias analysis without it. -- john skaller skaller at users.sourceforge.net http://felix-lang.org
On 23/11/2012, at 8:28 PM, David Tweed wrote: | So TeX actually considers several possible break points, | and calculates a badness for each one. but there's more. | The choice of a break point in a line affects the next line. | And the one after that, etc. TeX tries to solve for optimal layout | of the whole paragraph. That's why it produces such awesome | results. Yep; the project part of my MSc was on dynamic programming problems similar to the TeX one. However, it doesn't actually first attempt to estimate how much work a paragraph is likely to take to break, it just uses a clever dynamic programming algorithm to build up a globally optimal set of breaks regardless of work cost. Only if the badness is still too high does it try to do extra breaks with hyphenations (and it was a long time ago I looked at this, but I think once it's decided to hyphenate a given word it stops caring about the whitespace badness of the break in favour of a "non-misleading" hyphenation -- such as avoiding weeknights becoming wee-knights). Anyway, this is becoming a digression.> However, it's also potentially introducing highly > non-predictable/principle-of-least-surprise behaviour into the compiler, > particularly since llvm can be used as a JIT. So I suspect it wouldn't be > something most people would want for mainstream clang/llvm, but if might be > interesting if someone were to implement some passes like this (probably > out-of-tree) and see if it proves beneficial.My problem is that I get quantum non-determinism :) Either the compiler finishes and does a "reasonable job" or it takes so long the whole thing gets killed. That really sounds like a bug (either software or hardware): whether it's fast or slow, the compilation time (for the same options) should be essentially the same for the same piece of code. Regards, Dave
{Veering a bit off-topic} On 23 Nov 2012, at 10:36, David Tweed wrote:> Yep; the project part of my MSc was on dynamic programming problems similar to the TeX one. However, it doesn't actually first attempt to estimate how much work a paragraph is likely to take to break, it just uses a clever dynamic programming algorithm to build up a globally optimal set of breaks regardless of work cost. Only if the badness is still too high does it try to do extra breaks with hyphenations (and it was a long time ago I looked at this, but I think once it's decided to hyphenate a given word it stops caring about the whitespace badness of the break in favour of a "non-misleading" hyphenation -- such as avoiding weeknights becoming wee-knights). Anyway, this is becoming a digression.Franklin Mark Liang's PhD thesis is quite readable and very informative for anyone actually interested in this. The most interesting feature is that TeX uses a very good dynamic programming approach for laying out text in paragraphs but a uses an ugly greedy algorithm for laying out paragraphs on a page. This decision was made because doing the same dynamic processing approach for every paragraph in the document would have required several megabytes of memory for large documents, which was completely unfeasible on the computers of the time. It provides a good lesson that optimisations that made sense for one system are worth revisiting periodically. David