Richard W.M. Jones
2017-Aug-02 11:52 UTC
Re: [Libguestfs] [PATCH 0/2] Add lightweight bindings for PCRE.
On Wed, Aug 02, 2017 at 12:33:14PM +0200, Pino Toscano wrote:> Hi, > > (replying here since v2 of the series does not have this explanation.) > > On Tuesday, 1 August 2017 16:00:15 CEST Richard W.M. Jones wrote: > > We'd like to use PCRE instead of the awful Str module. However I > > don't necessarily want to pull in the extra dependency of ocaml-pcre, > > and in any case ocaml-pcre is rather difficult to use. > > > > This introduces very simplified and lightweight bindings for PCRE. > > > > They work rather like Str in that there is some global state (actually > > thread-local in this implementation) between the matching and the > > getting the substring, so you can write code like this: > > > > let re = PCRE.compile "(a+)b" > > ... > > > > if PCRE.matches re "ccaaaabb" then ( > > let whole = PCRE.sub 0 in (* returns "aaaab" *) > > let first = PCRE.sub 1 in (* returns "aaaa" *) > > ... > > Since we are providing a better module, with a different API (which > needs changes), what about removing the usage of a global state, in > favour of a match object holding the captures? Something like > (starting from your example above): > > let re = PCRE.compile "(a+)b" in > try > let m = PCRE.match re "ccaaaabb" in > let whole = PCRE.sub m 0 in (* returns "aaaab" *) > let first = PCRE.sub m 1 in (* returns "aaaa" *) > with Not_matched _ -> > ...That's what I was trying to avoid. I think the if statement with global state is much easier to use.> This makes it possible to stop thinking about what was the last saved > state, and even keep the multiple results of matches at the same time.I've converted all of the daemon code to this form, and this is not an issue that came up.> Also the results are properly GC'ed once they get out of scope, and not > linger until the thread finish (or the program shutdown). > The drawback I see is that many of the Str usages are in chains of > "if ... else if ...", which could make the code slightly more complex. > > Of course PCRE.matches ought to be left, but it would just return > whether the re matched, without changing any global state, and without > any result available.I think you're suggesting this: let m = PCRE.exec re "ccaaaabb" in if PCRE.matches m then ( let whole = PCRE.sub m 0 in ... (‘match’ is a reserved word, and I've got rid of the ‘Not_matched’ exception which is a pain to deal with), which is sort of OK but still more code than what I'm writing which is: else if PCRE.matches re_xdev spec then ( debug_matching "xdev"; let typ = PCRE.sub 1 and disk = PCRE.sub 2 and part = int_of_string (PCRE.sub 3) in resolve_xdev typ disk part default ) else if PCRE.matches re_cciss spec then ( debug_matching "cciss"; let disk = PCRE.sub 1 and part = try Some (int_of_string (PCRE.sub 2)) with Not_found -> None in resolve_cciss disk part default ) else if PCRE.matches re_mdN spec then ( debug_matching "md<N>"; try Mountable.of_device (StringMap.find spec md_map) with | Not_found -> default ) Rewriting if/else chains without the global state is a pain too. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Pino Toscano
2017-Aug-02 13:01 UTC
Re: [Libguestfs] [PATCH 0/2] Add lightweight bindings for PCRE.
On Wednesday, 2 August 2017 13:52:06 CEST Richard W.M. Jones wrote:> On Wed, Aug 02, 2017 at 12:33:14PM +0200, Pino Toscano wrote: > > Hi, > > > > (replying here since v2 of the series does not have this explanation.) > > > > On Tuesday, 1 August 2017 16:00:15 CEST Richard W.M. Jones wrote: > > > We'd like to use PCRE instead of the awful Str module. However I > > > don't necessarily want to pull in the extra dependency of ocaml-pcre, > > > and in any case ocaml-pcre is rather difficult to use. > > > > > > This introduces very simplified and lightweight bindings for PCRE. > > > > > > They work rather like Str in that there is some global state (actually > > > thread-local in this implementation) between the matching and the > > > getting the substring, so you can write code like this: > > > > > > let re = PCRE.compile "(a+)b" > > > ... > > > > > > if PCRE.matches re "ccaaaabb" then ( > > > let whole = PCRE.sub 0 in (* returns "aaaab" *) > > > let first = PCRE.sub 1 in (* returns "aaaa" *) > > > ... > > > > Since we are providing a better module, with a different API (which > > needs changes), what about removing the usage of a global state, in > > favour of a match object holding the captures? Something like > > (starting from your example above): > > > > let re = PCRE.compile "(a+)b" in > > try > > let m = PCRE.match re "ccaaaabb" in > > let whole = PCRE.sub m 0 in (* returns "aaaab" *) > > let first = PCRE.sub m 1 in (* returns "aaaa" *) > > with Not_matched _ -> > > ... > > That's what I was trying to avoid. I think the if statement with > global state is much easier to use. > > > This makes it possible to stop thinking about what was the last saved > > state, and even keep the multiple results of matches at the same time. > > I've converted all of the daemon code to this form, and this is > not an issue that came up.Right, because we have already these constraints because of Str.> > > Also the results are properly GC'ed once they get out of scope, and not > > linger until the thread finish (or the program shutdown). > > The drawback I see is that many of the Str usages are in chains of > > "if ... else if ...", which could make the code slightly more complex. > > > > Of course PCRE.matches ought to be left, but it would just return > > whether the re matched, without changing any global state, and without > > any result available. > > I think you're suggesting this: > > let m = PCRE.exec re "ccaaaabb" in > if PCRE.matches m then ( > let whole = PCRE.sub m 0 inNot really, my suggestion was to have a separate object representing the result of a regex match -- much like other language have in their regex APIs. OTOH, this solution LGTM as well: the result of the regex is not saved in a thread-local variable, but directly in the same regex object, so can be kept/used around, and it is GC'ed when not needed anymore. If you could apply that change, that'd be a LGTM. Thanks, -- Pino Toscano
Richard W.M. Jones
2017-Aug-02 18:18 UTC
Re: [Libguestfs] [PATCH 0/2] Add lightweight bindings for PCRE.
On Wed, Aug 02, 2017 at 03:01:53PM +0200, Pino Toscano wrote:> On Wednesday, 2 August 2017 13:52:06 CEST Richard W.M. Jones wrote: > > I think you're suggesting this: > > > > let m = PCRE.exec re "ccaaaabb" in > > if PCRE.matches m then ( > > let whole = PCRE.sub m 0 in > > Not really, my suggestion was to have a separate object representing > the result of a regex match -- much like other language have in their > regex APIs. > > OTOH, this solution LGTM as well: the result of the regex is not saved > in a thread-local variable, but directly in the same regex object, so > can be kept/used around, and it is GC'ed when not needed anymore. > If you could apply that change, that'd be a LGTM.But the problem still is this makes the if/else chains more awkward. What's really wrong with the thread-local variable? It only stores a single string and a vector of integers per thread (at most). The final string/vector is reliably freed on exit - I know that because I've checked it with valgrind. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v