On Sun, Aug 23, 2009 at 3:29 PM, Kenneth Uildriks<kennethuil at gmail.com> wrote:> On Sun, Aug 23, 2009 at 4:56 PM, Daniel Dunbar<daniel at zuster.org> wrote: >> We would like to have access to some kind of regular expression >> library inside LLVM. For example, we need this to extend the FileCheck >> test case checking tool to support regular expressions. >> >> There are three obvious options: >> 1. Roll our own library. Multiple unnamed individuals may even >> already have implementations lying around! :) >> 2. Use POSIX regcomp facilities. This implies importing some >> implementation of this interface, e.g., Windows. On Linux, BSD, etc. >> we would try to use the platform version if available (and non-buggy). >> >> 3. Import a more heavy weight library such as PCRE, and use it universally. > > > Personally, I'm a big fan of the Boost libraries. They've got a regex > library, and a full-blown parser library (which I am using in my > front-end). It's definitely heavier than POSIX, but it's portable, > well-tested, and loaded with features.This is too heavy, and we don't need the extra features, and regexec is well tested and much more standard. Unless there is an overwhelming agreement to add another option, I'd like to keep the discussion to the obvious choices. That is, I need to be convinced *not* to use #2, before I get derailed into discussing which form of #3 to take. - Daniel
Blast, LLVM list not filling in the response headers still... On Sun, Aug 23, 2009 at 6:32 PM, Daniel Dunbar<daniel at zuster.org> wrote:> On Sun, Aug 23, 2009 at 3:29 PM, Kenneth Uildriks<kennethuil at gmail.com> wrote: >> On Sun, Aug 23, 2009 at 4:56 PM, Daniel Dunbar<daniel at zuster.org> wrote: >>> We would like to have access to some kind of regular expression >>> library inside LLVM. For example, we need this to extend the FileCheck >>> test case checking tool to support regular expressions. >>> >>> There are three obvious options: >>> 1. Roll our own library. Multiple unnamed individuals may even >>> already have implementations lying around! :) >>> 2. Use POSIX regcomp facilities. This implies importing some >>> implementation of this interface, e.g., Windows. On Linux, BSD, etc. >>> we would try to use the platform version if available (and non-buggy). >>> >>> 3. Import a more heavy weight library such as PCRE, and use it universally. >> >> >> Personally, I'm a big fan of the Boost libraries. They've got a regex >> library, and a full-blown parser library (which I am using in my >> front-end). It's definitely heavier than POSIX, but it's portable, >> well-tested, and loaded with features. > > This is too heavy, and we don't need the extra features, and regexec > is well tested and much more standard. Unless there is an overwhelming > agreement to add another option, I'd like to keep the discussion to > the obvious choices. That is, I need to be convinced *not* to use #2, > before I get derailed into discussing which form of #3 to take.POSIX has a tendency to be rather useless on some major platforms, notably Windows (which has no built-in regex library), which is why I still recommend Spirit, it can be as lightweight as you want, you only pay for what you use, and it works fast and everywhere. Boost.xpressive is also quite good if you need the dynamic functionality (you really should not, how often is the grammar/regex going to be generated at runtime anyway?).
On Sun, Aug 23, 2009 at 6:32 PM, Daniel Dunbar<daniel at zuster.org> wrote:> This is too heavy, and we don't need the extra features, and regexec > is well tested and much more standard. Unless there is an overwhelming'regexec' I had never heard of, figured it was a library, turns out it is a function call on *nix systems, yea, that is very much not usable in any way shape or form, and is certainly not a standard if it does not work on one of the major LLVM platforms (and it is still not a standard in any pure form since it is not part of the C/C++ standard headers). If that is option #2, then option #2 is very unusable. And yes, if you must know, I program on Windows, which is why I am pushing to use something that actually works everywhere instead of just someone's favorite OS (I prefer BSD honestly, but Windows is what the desktop world is sadly stuck on, so that is what I have to program for).
Daniel Dunbar wrote:> This is too heavy, and we don't need the extra features, and regexec > is well tested and much more standard. Unless there is an overwhelmingactually, Boost is much more standard. IIRC the Boost library corresponds to tr1::regex (or std::regex in C++0x), which has the incredible advantage of being available on all standard conforming C++ compilers. Admittedly the C++ compiler has to be fairly new to support this, but you can use Boost as a drop-in replacement until the compiler supports it on its own. Thomas
On Aug 23, 2009, at 5:50 PM, OvermindDL1 wrote:> On Sun, Aug 23, 2009 at 6:32 PM, Daniel Dunbar<daniel at zuster.org> > wrote: >> This is too heavy, and we don't need the extra features, and regexec >> is well tested and much more standard. Unless there is an >> overwhelming > > 'regexec' I had never heard of, figured it was a library, turns out it > is a function call on *nix systems, yea, that is very much not usable > in any way shape or form, and is certainly not a standard if it does > not work on one of the major LLVM platforms (and it is still not a > standard in any pure form since it is not part of the C/C++ standard > headers). If that is option #2, then option #2 is very unusable. > > And yes, if you must know, I program on Windows, which is why I am > pushing to use something that actually works everywhere instead of > just someone's favorite OS (I prefer BSD honestly, but Windows is what > the desktop world is sadly stuck on, so that is what I have to program > for).I think you're seriously confused about the proposal. To put it bluntly, there is no way we'll use boosts regex support, sorry. The proposal is to use the unix standard regexec library interface. The LLVM tree would include an imported BSD-licenced implementation from one of many sources. We'd then have configury logic detect when the host OS already supports the regexec interfaces, and if so, don't build our imported copy. We'd have a simple layer on top of it to make the interface to the regex library less horrible than what regexec provides. Again, forget boost regex. :) -Chris