Peter Collingbourne
2013-Jul-13 00:38 UTC
[LLVMdev] Special case list files; a bug and a slowness issue
Hi, I need to be able to use a special case list file containing thousands of entries (namely, a list of libc symbols, to be used when using DFSan with an uninstrumented libc). Initially I built the symbol list like this: fun:sym1=uninstrumented fun:sym2=uninstrumented fun:sym3=uninstrumented ... fun:sym6000=uninstrumented What I found was that, despite various bits of documentation [1,2], the symbol names are matched as substrings, the root cause being that the regular expressions built by the SpecialCaseList class do not contain anchors. The attached unit test demonstrates the problem. If I modify my symbol list to contain anchors: fun:^sym1$=uninstrumented fun:^sym2$=uninstrumented fun:^sym3$=uninstrumented ... fun:^sym6000$=uninstrumented the behaviour is as expected, but compiler run time is slow (on the order of seconds), presumably because our regex library doesn't cope with anchors very efficiently. I intend to resolve the substring bug and the slow run time issue by using a StringSet for symbol patterns which do not contain regex metacharacters. There would still be a regex for any other patterns, which would have anchors added automatically. Thoughts? Thanks, -- Peter [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizer [2] https://code.google.com/p/thread-sanitizer/wiki/Flags -------------- next part -------------- A non-text attachment was scrubbed... Name: scl.patch Type: text/x-diff Size: 842 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130712/aebdb84b/attachment.patch>
Alexey Samsonov
2013-Jul-16 09:23 UTC
[LLVMdev] Special case list files; a bug and a slowness issue
Hi Peter! On Sat, Jul 13, 2013 at 4:38 AM, Peter Collingbourne <peter at pcc.me.uk>wrote:> Hi, > > I need to be able to use a special case list file containing thousands > of entries (namely, a list of libc symbols, to be used when using > DFSan with an uninstrumented libc). Initially I built the symbol > list like this: > > fun:sym1=uninstrumented > fun:sym2=uninstrumented > fun:sym3=uninstrumented > ... > fun:sym6000=uninstrumented > > What I found was that, despite various bits of documentation [1,2], > the symbol names are matched as substrings, the root cause being that > the regular expressions built by the SpecialCaseList class do not > contain anchors. The attached unit test demonstrates the problem. > If I modify my symbol list to contain anchors: > > fun:^sym1$=uninstrumented > fun:^sym2$=uninstrumented > fun:^sym3$=uninstrumented > ... > fun:^sym6000$=uninstrumented > > the behaviour is as expected, but compiler run time is slow (on the > order of seconds), presumably because our regex library doesn't cope > with anchors very efficiently. > > I intend to resolve the substring bug and the slow run time issue > by using a StringSet for symbol patterns which do not contain regex > metacharacters. There would still be a regex for any other patterns, > which would have anchors added automatically. >I think that it's fine to add anchors automatically to implement the behavior described in the docs (I've LGTMed that patch). Do you want to avoid adding anchors for dfsan SpecialCaseList?> > Thoughts? > > Thanks, > -- > Peter > > [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizer > [2] https://code.google.com/p/thread-sanitizer/wiki/Flags >-- Alexey Samsonov, MSK -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/0fb312a1/attachment.html>
Peter Collingbourne
2013-Jul-16 18:10 UTC
[LLVMdev] Special case list files; a bug and a slowness issue
On Tue, Jul 16, 2013 at 01:23:30PM +0400, Alexey Samsonov wrote:> Do you want to avoid adding > anchors > for dfsan SpecialCaseList?No, I need the (documented) whole string semantics in dfsan. Thanks, -- Peter