* Scott Michel:> On my _local_ x86_64 Ubuntu 7.10 machine, the shift_ops.ll is an > unexpected success (i.e., "grep -w shlh %t1.s | count 9" succeeds.) > > I get the same unexpected success on my x86_64 Mac 10.4.11. > > On the x86_64 buildbot, the same test fails. The culprit is grep, > evidently. It's just that simple.There have been issues the GNU libc regular expression code. Try running with "unset LANG" (or "LC_ALL=C") and see if it improves things. The problem is that the regexp code used to be unacceptably slow in multi-byte locales such as UTF-8, and the patch Debian applied to improve its speed wasn't 100% correct.
On Wed, Dec 31, 2008 at 3:35 AM, Florian Weimer <fw at deneb.enyo.de> wrote:> * Scott Michel: > >> On my _local_ x86_64 Ubuntu 7.10 machine, the shift_ops.ll is an >> unexpected success (i.e., "grep -w shlh %t1.s | count 9" succeeds.) >> >> I get the same unexpected success on my x86_64 Mac 10.4.11. >> >> On the x86_64 buildbot, the same test fails. The culprit is grep, >> evidently. It's just that simple. > > There have been issues the GNU libc regular expression code. Try > running with "unset LANG" (or "LC_ALL=C") and see if it improves > things. > > The problem is that the regexp code used to be unacceptably slow in > multi-byte locales such as UTF-8, and the patch Debian applied to > improve its speed wasn't 100% correct.Considering most regexps can be done in linear time, it seems fairly dumb to break them to get speed, instead of simply changing algorithms. (The fact that most implementations suck badly, well ....)
* Daniel Berlin:>> There have been issues the GNU libc regular expression code. Try >> running with "unset LANG" (or "LC_ALL=C") and see if it improves >> things. >> >> The problem is that the regexp code used to be unacceptably slow in >> multi-byte locales such as UTF-8, and the patch Debian applied to >> improve its speed wasn't 100% correct. > > Considering most regexps can be done in linear time, it seems fairly > dumb to break them to get speed, instead of simply changing > algorithms.IIRC, it's not an issue of complexity classes. With multi-byte character set conversion, the constant factor is just too large.