Zachary Turner via llvm-dev
2016-Nov-28 19:38 UTC
[llvm-dev] RFC: Constructing StringRefs at compile time
The fact that the templatized constructor falls down because of the possibility of initializing StringRef with a stack-allocated char array kills that idea in my mind. I feel like the only two reasonable solutions are 1) allow UDL for this case, document that this is an exception and that UDLs are still not permitted anywhere else, and require (by policy, since I don't know of a way to have the compiler force it) that this UDL be used only in global constructors. One idea to help "enforce" this policy would be to give the UDL a ridiculously convoluted name, like `string_ref_literal`, so that one would have to write "foo"_string_ref_literal, and then provide a macro like `#define LITERAL(x) x_string_ref_literal`, so that the user writes `StringRef s[] = { LITERAL("a"), LITERAL("b") }; I'm not sure if that's better or worse than `StringRef s[] = { "a"_sr, "b"_sr };`, but at least it's greppable this way. 2) Don't allow global tables of StringRefs. On Mon, Nov 28, 2016 at 11:30 AM Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Nov 28, 2016, at 11:27 AM, David Blaikie <dblaikie at gmail.com> wrote: > > > > On Mon, Nov 28, 2016 at 11:01 AM Mehdi Amini <mehdi.amini at apple.com> > wrote: > > On Nov 28, 2016, at 9:47 AM, David Blaikie via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > OK - good to know. (not sure we're talking about pessimizing it - just not > adding a new/possible optimization, to be clear) > > > This does not seem that clear to me. The motivation seems to be able to > create global table of StringRef, which we don’t do because the lack fo > constexpr of static initializers right now. > Moving forward it would mean making clang a lot slower when built with > MSVC if we were going this route. > > > Ah, fair - perhaps I misunderstood/misrepresented, apologies. Figured this > was just an attempt to reduce global initializers in arrays we already > have. Any pointers on where the motivation is described/discussed? > > > This thread started with: "There is a desire to be able to create > constexpr StringRefs to avoid static initializers for global tables > of/containing StringRefs.” > > I don’t have more information, but maybe Malcolm can elaborate? > > — > Mehdi > > > > > > Just out of curiosity - are there particular reasons you prefer or need to > ship an MSVC built version, rather than a bootstrapped Clang? > > On Mon, Nov 28, 2016 at 9:24 AM Robinson, Paul <paul.robinson at sony.com> > wrote: > > So I wouldn't personally worry too much about performance degredation when > built with MSVC - if, when building a stage 2 on Windows (building Clang > with MSVC build Clang) you do end up with a compiler with the desired > performance characteristics - then that's probably sufficient. > > > > Hold on there—we deliver an MSVC-built Clang to our licensees, and I would > really rather not pessimize it. > > --paulr > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *David > Blaikie via llvm-dev > *Sent:* Friday, November 25, 2016 8:52 AM > *To:* Mueller-Roemer, Johannes Sebastian; Malcolm Parsons; Hal Finkel; > llvm-dev at lists.llvm.org > > > *Subject:* Re: [llvm-dev] RFC: Constructing StringRefs at compile time > > > > > > On Fri, Nov 25, 2016 at 6:10 AM Mueller-Roemer, Johannes Sebastian via > llvm-dev <llvm-dev at lists.llvm.org> wrote: > > What about going for > > template<unsigned N> > constexpr StringRef(const char (&Str)[N]) > > and avoiding strlen entirely for string literals? > > > > You'd at least want an assert in there (that N - 1 == strlen(Str)) in case > a StringRef is ever constructed from a non-const char buffer that's only > partially filled. > > But if we can write this in such a way that it performs well on good > implementations - that seems sufficient. If getting good performance out of > the compiler means bootstrapping - that's pretty much the status quo > already, as I understand it. > > So I wouldn't personally worry too much about performance degredation when > built with MSVC - if, when building a stage 2 on Windows (building Clang > with MSVC build Clang) you do end up with a compiler with the desired > performance characteristics - then that's probably sufficient. > > > > > -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > Malcolm Parsons via llvm-dev > Sent: Friday, November 25, 2016 13:34 > To: Hal Finkel <hfinkel at anl.gov> > Cc: llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] RFC: Constructing StringRefs at compile time > > On 24 November 2016 at 15:04, Hal Finkel <hfinkel at anl.gov> wrote: > >> Creating constexpr StringRefs isn't trivial as strlen isn't portably > >> constexpr and std::char_traits<char>::length is only constexpr in > >> C++17. > > > > Why don't we just create our own traits class that has a constexpr > length, and then we can switch over to the standard one when we switch to > C++17? > > GCC and Clang treat __builtin_strlen as constexpr. > MSVC 2015 doesn't support C++14 extended constexpr. I don't know how well > it optimises a recursive strlen. > > This works as an optimisation for GCC and Clang, and doesn't make things > worse for MSVC: > > /// Construct a string ref from a cstring. > LLVM_ATTRIBUTE_ALWAYS_INLINE > +#if __has_builtin(__builtin_strlen) > + /*implicit*/ constexpr StringRef(const char *Str) > + : Data(Str), Length(Str ? __builtin_strlen(Str) : 0) {} #else > /*implicit*/ StringRef(const char *Str) > : Data(Str), Length(Str ? ::strlen(Str) : 0) {} > +#endif > > -- > Malcolm Parsons > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161128/8cd75508/attachment-0001.html>
Zachary Turner via llvm-dev
2016-Nov-28 20:51 UTC
[llvm-dev] RFC: Constructing StringRefs at compile time
What about this? diff --git a/include/llvm/ADT/StringRef.h b/include/llvm/ADT/StringRef.h index d8e0732..5b8503a 100644 --- a/include/llvm/ADT/StringRef.h +++ b/include/llvm/ADT/StringRef.h @@ -84,10 +84,10 @@ namespace llvm { /// Construct a string ref from a pointer and length. LLVM_ATTRIBUTE_ALWAYS_INLINE - /*implicit*/ StringRef(const char *data, size_t length) + /*implicit*/ constexpr StringRef(const char *data, size_t length) : Data(data), Length(length) { - assert((data || length == 0) && - "StringRef cannot be built from a NULL argument with non-null length"); + //assert((data || length == 0) && + //"StringRef cannot be built from a NULL argument with non-null length"); } /// Construct a string ref from an std::string. @@ -839,6 +839,24 @@ namespace llvm { /// @} }; + class StringLiteral { + public: + template<size_t N> + constexpr StringLiteral(const char(&Str)[N]) + : Str(Str), Length(N) { + + } + + constexpr operator StringRef() const { + return StringRef(Str, Length); + } + + private: + const char *Str; + unsigned Length; + }; + + /// @name StringRef Comparison Operators /// @{ diff --git a/unittests/ADT/StringRefTest.cpp b/unittests/ADT/StringRefTest.cpp index c1cc558..0876f2e 100644 --- a/unittests/ADT/StringRefTest.cpp +++ b/unittests/ADT/StringRefTest.cpp @@ -17,6 +17,11 @@ #include "gtest/gtest.h" using namespace llvm; + +constexpr StringLiteral Strings[] = { + "Test" +}; + namespace llvm { The basic idea here is that you introduce a StringLiteral class and anywhere you want to use a global constructor, you make sure to declare a constexpr array instead of a normal array, and you make it of type StringLiteral. This avoids the issue of having to make the StringRef constructor explicit since it doesn't touch StringRef, and it also avoids the ambiguity between constructing a StringRef with a literal versus a char array, because even though it is still possible to construct a StringLiteral with a char array, it is not possible to do so in a constexpr context. If your StringRef is nested in part of another class, and you want an array of that class, it still works. For example: struct Foo { StringLiteral X; StringLiteral Y; int Z; }; constexpr Foo[] Foos = { {"A", "B", 1}, {"C", "D", 2} }; On Mon, Nov 28, 2016 at 11:38 AM Zachary Turner <zturner at google.com> wrote:> The fact that the templatized constructor falls down because of the > possibility of initializing StringRef with a stack-allocated char array > kills that idea in my mind. > > I feel like the only two reasonable solutions are > > 1) allow UDL for this case, document that this is an exception and that > UDLs are still not permitted anywhere else, and require (by policy, since I > don't know of a way to have the compiler force it) that this UDL be used > only in global constructors. One idea to help "enforce" this policy would > be to give the UDL a ridiculously convoluted name, like > `string_ref_literal`, so that one would have to write > "foo"_string_ref_literal, and then provide a macro like `#define LITERAL(x) > x_string_ref_literal`, so that the user writes `StringRef s[] = { > LITERAL("a"), LITERAL("b") }; I'm not sure if that's better or worse than > `StringRef s[] = { "a"_sr, "b"_sr };`, but at least it's greppable this way. > > 2) Don't allow global tables of StringRefs. > > > > On Mon, Nov 28, 2016 at 11:30 AM Mehdi Amini via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > On Nov 28, 2016, at 11:27 AM, David Blaikie <dblaikie at gmail.com> wrote: > > > > On Mon, Nov 28, 2016 at 11:01 AM Mehdi Amini <mehdi.amini at apple.com> > wrote: > > On Nov 28, 2016, at 9:47 AM, David Blaikie via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > OK - good to know. (not sure we're talking about pessimizing it - just not > adding a new/possible optimization, to be clear) > > > This does not seem that clear to me. The motivation seems to be able to > create global table of StringRef, which we don’t do because the lack fo > constexpr of static initializers right now. > Moving forward it would mean making clang a lot slower when built with > MSVC if we were going this route. > > > Ah, fair - perhaps I misunderstood/misrepresented, apologies. Figured this > was just an attempt to reduce global initializers in arrays we already > have. Any pointers on where the motivation is described/discussed? > > > This thread started with: "There is a desire to be able to create > constexpr StringRefs to avoid static initializers for global tables > of/containing StringRefs.” > > I don’t have more information, but maybe Malcolm can elaborate? > > — > Mehdi > > > > > > Just out of curiosity - are there particular reasons you prefer or need to > ship an MSVC built version, rather than a bootstrapped Clang? > > On Mon, Nov 28, 2016 at 9:24 AM Robinson, Paul <paul.robinson at sony.com> > wrote: > > So I wouldn't personally worry too much about performance degredation when > built with MSVC - if, when building a stage 2 on Windows (building Clang > with MSVC build Clang) you do end up with a compiler with the desired > performance characteristics - then that's probably sufficient. > > > > Hold on there—we deliver an MSVC-built Clang to our licensees, and I would > really rather not pessimize it. > > --paulr > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *David > Blaikie via llvm-dev > *Sent:* Friday, November 25, 2016 8:52 AM > *To:* Mueller-Roemer, Johannes Sebastian; Malcolm Parsons; Hal Finkel; > llvm-dev at lists.llvm.org > > > *Subject:* Re: [llvm-dev] RFC: Constructing StringRefs at compile time > > > > > > On Fri, Nov 25, 2016 at 6:10 AM Mueller-Roemer, Johannes Sebastian via > llvm-dev <llvm-dev at lists.llvm.org> wrote: > > What about going for > > template<unsigned N> > constexpr StringRef(const char (&Str)[N]) > > and avoiding strlen entirely for string literals? > > > > You'd at least want an assert in there (that N - 1 == strlen(Str)) in case > a StringRef is ever constructed from a non-const char buffer that's only > partially filled. > > But if we can write this in such a way that it performs well on good > implementations - that seems sufficient. If getting good performance out of > the compiler means bootstrapping - that's pretty much the status quo > already, as I understand it. > > So I wouldn't personally worry too much about performance degredation when > built with MSVC - if, when building a stage 2 on Windows (building Clang > with MSVC build Clang) you do end up with a compiler with the desired > performance characteristics - then that's probably sufficient. > > > > > -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > Malcolm Parsons via llvm-dev > Sent: Friday, November 25, 2016 13:34 > To: Hal Finkel <hfinkel at anl.gov> > Cc: llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] RFC: Constructing StringRefs at compile time > > On 24 November 2016 at 15:04, Hal Finkel <hfinkel at anl.gov> wrote: > >> Creating constexpr StringRefs isn't trivial as strlen isn't portably > >> constexpr and std::char_traits<char>::length is only constexpr in > >> C++17. > > > > Why don't we just create our own traits class that has a constexpr > length, and then we can switch over to the standard one when we switch to > C++17? > > GCC and Clang treat __builtin_strlen as constexpr. > MSVC 2015 doesn't support C++14 extended constexpr. I don't know how well > it optimises a recursive strlen. > > This works as an optimisation for GCC and Clang, and doesn't make things > worse for MSVC: > > /// Construct a string ref from a cstring. > LLVM_ATTRIBUTE_ALWAYS_INLINE > +#if __has_builtin(__builtin_strlen) > + /*implicit*/ constexpr StringRef(const char *Str) > + : Data(Str), Length(Str ? __builtin_strlen(Str) : 0) {} #else > /*implicit*/ StringRef(const char *Str) > : Data(Str), Length(Str ? ::strlen(Str) : 0) {} > +#endif > > -- > Malcolm Parsons > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161128/a2302356/attachment.html>
Malcolm Parsons via llvm-dev
2016-Nov-29 10:20 UTC
[llvm-dev] RFC: Constructing StringRefs at compile time
On 28 November 2016 at 20:51, Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> The basic idea here is that you introduce a StringLiteral class and anywhere > you want to use a global constructor, you make sure to declare a constexpr > array instead of a normal array, and you make it of type StringLiteral.I prefer constexpr llvm_strlen() over StringLiteral because it doesn't require code changes outside StringRef - all StringRefs constructed from a literal can benefit. But there are concerns about MSVC. I prefer StringLiteral over UDL because the type requires code changes, but the values don't. I prefer StringLiteral over explicit StringRef constructor because it's safer. -- Malcolm Parsons
Malcolm Parsons via llvm-dev
2017-Jan-09 18:13 UTC
[llvm-dev] RFC: Constructing StringRefs at compile time
On 28 November 2016 at 20:51, Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> If your StringRef is nested in part of another class, and you want an array > of that class, it still works. For example: > > struct Foo { > StringLiteral X; > StringLiteral Y; > int Z; > }; > > constexpr Foo Foos[] = { {"A", "B", 1}, {"C", "D", 2} };gcc <= 5.4 report an error: error: could not convert '(const char*)"A"' from 'const char*' to 'StringLiteral' https://godbolt.org/g/ihxrwr -- Malcolm Parsons