Zachary Turner via llvm-dev
2016-Sep-25 16:10 UTC
[llvm-dev] Is it time to allow StringRef to be constructed from nullptr?
While porting LLDB over to StringRef, I am continuously running into difficulties caused by the fact that StringRef cannot be constructed from nullptr. So I wanted to see peoples' thoughts on removing this restriction from StringRef. To be clear, I'm only using LLDB as a motivating example, but I'm not requesting that it be done because LLDB is some kind of special case. If it is to be done it should be on its own merits. That said, here is some context: LLDB has a lot of functions that look like this: void foo(const char *, Bar, const char *). I'm trying to port these to functions that look like this: void foo(StringRef, Bar, StringRef). Often times the parameters are string literals or char arrays, but equally often they are another const char* that got passed into the calling function, or a return value from a CRT function like strstr(), or many other possible sources. This latter category presents a problem for porting code to StringRef, because if I simply change the function signature and fix up compile errors, I will probably have introduced a bug because hundreds of callers will now be implicitly converting from const char* to StringRef, leaving open the possibility that one of those was null. To work around this, I've started doing the following every time I port a function: void foo(const char *, Bar, const char*) = delete; This is pretty hackish, but it gets the job done. At least the compiler warns me and forces me to go inspect every callsite where there's an implicit conversion. Unfortunately it also makes for extremely verbose code. Now instead of: foo("bar", baz, "buzz") I have to write foo(StringRef("bar"), baz, StringRef("buzz")) even for string literals and char arrays, which will obviously never be null! If StringRef would handle a null argument gracefully, it would make my life much easier. With that out of the way, here are some reasons I can see to allow StringRef accept null to its constructor which are independent of LLDB and stand on their own merit. 1) std::string_view<> can be constructed with null. I don't know when we will be able to use std::string_view<>, but there's a chance that at some point in the future we may wish to remove StringRef in favor of string_view. That day isn't soon, but in any case, it will be easier if our assumptions are the same. 2) [nullptr, nullptr+0) is a valid range. Why shouldn't we be able to construct a StringRef from an otherwise perfectly valid range? 3) StringRef() can *already* be constructed from nullptr (!) Surprised? That's what happens when you invoke the default constructor. It happily initializes the internal Data with null. So why not allow the same behavior when invoking the const char * constructor? Thoughts? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160925/2ac6479f/attachment.html>
Chris Lattner via llvm-dev
2016-Sep-25 19:53 UTC
[llvm-dev] Is it time to allow StringRef to be constructed from nullptr?
> On Sep 25, 2016, at 6:10 AM, Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > With that out of the way, here are some reasons I can see to allow StringRef accept null to its constructor which are independent of LLDB and stand on their own merit. > > 1) std::string_view<> can be constructed with null. I don't know when we will be able to use std::string_view<>, but there's a chance that at some point in the future we may wish to remove StringRef in favor of string_view. That day isn't soon, but in any case, it will be easier if our assumptions are the same. > > 2) [nullptr, nullptr+0) is a valid range. Why shouldn't we be able to construct a StringRef from an otherwise perfectly valid range? > > 3) StringRef() can already be constructed from nullptr (!) Surprised? That's what happens when you invoke the default constructor. It happily initializes the internal Data with null. So why not allow the same behavior when invoking the const char * constructor? > > > Thoughts?Seems fine to me, I think the default constructor behavior makes it clear that this is fine. I don’t recall why the assert was originally added, but I don’t think it has added much (if any) value over the years. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160925/fa3f7c7a/attachment.html>
Mehdi Amini via llvm-dev
2016-Sep-25 20:49 UTC
[llvm-dev] Is it time to allow StringRef to be constructed from nullptr?
> On Sep 25, 2016, at 9:10 AM, Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > While porting LLDB over to StringRef, I am continuously running into difficulties caused by the fact that StringRef cannot be constructed from nullptr. So I wanted to see peoples' thoughts on removing this restriction from StringRef. To be clear, I'm only using LLDB as a motivating example, but I'm not requesting that it be done because LLDB is some kind of special case. If it is to be done it should be on its own merits. That said, here is some context: > > LLDB has a lot of functions that look like this: > > void foo(const char *, Bar, const char *). > > I'm trying to port these to functions that look like this: > > void foo(StringRef, Bar, StringRef). > > Often times the parameters are string literals or char arrays, but equally often they are another const char* that got passed into the calling function, or a return value from a CRT function like strstr(), or many other possible sources. This latter category presents a problem for porting code to StringRef, because if I simply change the function signature and fix up compile errors, I will probably have introduced a bug because hundreds of callers will now be implicitly converting from const char* to StringRef, leaving open the possibility that one of those was null. > > To work around this, I've started doing the following every time I port a function: > > void foo(const char *, Bar, const char*) = delete; > > This is pretty hackish, but it gets the job done. At least the compiler warns me and forces me to go inspect every callsite where there's an implicit conversion. Unfortunately it also makes for extremely verbose code. Now instead of: > > foo("bar", baz, "buzz") > > I have to write > > foo(StringRef("bar"), baz, StringRef("buzz")) > > even for string literals and char arrays, which will obviously never be null! If StringRef would handle a null argument gracefully, it would make my life much easier. > > > With that out of the way, here are some reasons I can see to allow StringRef accept null to its constructor which are independent of LLDB and stand on their own merit. > > 1) std::string_view<> can be constructed with null. I don't know when we will be able to use std::string_view<>, but there's a chance that at some point in the future we may wish to remove StringRef in favor of string_view. That day isn't soon, but in any case, it will be easier if our assumptions are the same. > > 2) [nullptr, nullptr+0) is a valid range. Why shouldn't we be able to construct a StringRef from an otherwise perfectly valid range? > > 3) StringRef() can already be constructed from nullptr (!) Surprised? That's what happens when you invoke the default constructor. It happily initializes the internal Data with null. So why not allow the same behavior when invoking the const char * constructor? > > > Thoughts?As a tangent: I don’t like the fact that StringRef is implicitly built out of “const char *”, this is calling strlen() and because it is implicit folks don’t realize when they go from string -> char * -> StringRef. I rather have this constructor explicit, and provide an implicit one for string literal. To come back to your point, I’m not sure if we should leave the internal pointer null or always set it to “”? This would provide the guarantee that dereferencing a StringRef is always valid without checking. — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160925/8ca02bd8/attachment.html>
Zachary Turner via llvm-dev
2016-Sep-25 20:57 UTC
[llvm-dev] Is it time to allow StringRef to be constructed from nullptr?
For now, leaving it returning null seems the least risk, since in theory someone could be relying on that (that would be horrible, of course, but I've seen worse). On Sun, Sep 25, 2016 at 1:49 PM Mehdi Amini <mehdi.amini at apple.com> wrote:> > On Sep 25, 2016, at 9:10 AM, Zachary Turner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > While porting LLDB over to StringRef, I am continuously running into > difficulties caused by the fact that StringRef cannot be constructed from > nullptr. So I wanted to see peoples' thoughts on removing this restriction > from StringRef. To be clear, I'm only using LLDB as a motivating example, > but I'm not requesting that it be done because LLDB is some kind of special > case. If it is to be done it should be on its own merits. That said, here > is some context: > > LLDB has a lot of functions that look like this: > > void foo(const char *, Bar, const char *). > > I'm trying to port these to functions that look like this: > > void foo(StringRef, Bar, StringRef). > > Often times the parameters are string literals or char arrays, but equally > often they are another const char* that got passed into the calling > function, or a return value from a CRT function like strstr(), or many > other possible sources. This latter category presents a problem for > porting code to StringRef, because if I simply change the function > signature and fix up compile errors, I will probably have introduced a bug > because hundreds of callers will now be implicitly converting from const > char* to StringRef, leaving open the possibility that one of those was null. > > To work around this, I've started doing the following every time I port a > function: > > void foo(const char *, Bar, const char*) = delete; > > This is pretty hackish, but it gets the job done. At least the compiler > warns me and forces me to go inspect every callsite where there's an > implicit conversion. Unfortunately it also makes for extremely verbose > code. Now instead of: > > foo("bar", baz, "buzz") > > I have to write > > foo(StringRef("bar"), baz, StringRef("buzz")) > > even for string literals and char arrays, which will obviously never be > null! If StringRef would handle a null argument gracefully, it would make > my life much easier. > > > With that out of the way, here are some reasons I can see to allow > StringRef accept null to its constructor which are independent of LLDB and > stand on their own merit. > > 1) std::string_view<> can be constructed with null. I don't know when we > will be able to use std::string_view<>, but there's a chance that at some > point in the future we may wish to remove StringRef in favor of > string_view. That day isn't soon, but in any case, it will be easier if > our assumptions are the same. > > 2) [nullptr, nullptr+0) is a valid range. Why shouldn't we be able to > construct a StringRef from an otherwise perfectly valid range? > > 3) StringRef() can *already* be constructed from nullptr (!) Surprised? > That's what happens when you invoke the default constructor. It happily > initializes the internal Data with null. So why not allow the same > behavior when invoking the const char * constructor? > > > Thoughts? > > > > As a tangent: I don’t like the fact that StringRef is implicitly built out > of “const char *”, this is calling strlen() and because it is implicit > folks don’t realize when they go from string -> char * -> StringRef. > I rather have this constructor explicit, and provide an implicit one for > string literal. > > To come back to your point, I’m not sure if we should leave the internal > pointer null or always set it to “”? This would provide the guarantee that > dereferencing a StringRef is always valid without checking. > > — > Mehdi > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160925/2db251d5/attachment.html>
Zachary Turner via llvm-dev
2016-Sep-25 21:55 UTC
[llvm-dev] Is it time to allow StringRef to be constructed from nullptr?
On Sun, Sep 25, 2016 at 12:53 PM Chris Lattner <clattner at apple.com> wrote:> > On Sep 25, 2016, at 6:10 AM, Zachary Turner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > With that out of the way, here are some reasons I can see to allow > StringRef accept null to its constructor which are independent of LLDB and > stand on their own merit. > > 1) std::string_view<> can be constructed with null. I don't know when we > will be able to use std::string_view<>, but there's a chance that at some > point in the future we may wish to remove StringRef in favor of > string_view. That day isn't soon, but in any case, it will be easier if > our assumptions are the same. > > 2) [nullptr, nullptr+0) is a valid range. Why shouldn't we be able to > construct a StringRef from an otherwise perfectly valid range? > > 3) StringRef() can *already* be constructed from nullptr (!) Surprised? > That's what happens when you invoke the default constructor. It happily > initializes the internal Data with null. So why not allow the same > behavior when invoking the const char * constructor? > > > Thoughts? > > > Seems fine to me, I think the default constructor behavior makes it clear > that this is fine. I don’t recall why the assert was originally added, but > I don’t think it has added much (if any) value over the years. > > -Chris >I just noticed it's not just the default constructor. Even the const char*, size_t constructor allows null as long as you specify a 0 length. LLVM_ATTRIBUTE_ALWAYS_INLINE /*implicit*/ StringRef(const char *data, size_t length) : Data(data), Length(length) { assert((data || length == 0) && "StringRef cannot be built from a NULL argument with non-null length"); } It seems that the only benefit of the asserting constructor is that it saves a branch when invoked with a character pointer (I would expect the branch to be optimized away when constructed with a string literal or char array). -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160925/dfe7db62/attachment.html>
Pete Cooper via llvm-dev
2016-Sep-25 22:58 UTC
[llvm-dev] Is it time to allow StringRef to be constructed from nullptr?
> On Sep 25, 2016, at 1:49 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote: > >> >> On Sep 25, 2016, at 9:10 AM, Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> While porting LLDB over to StringRef, I am continuously running into difficulties caused by the fact that StringRef cannot be constructed from nullptr. So I wanted to see peoples' thoughts on removing this restriction from StringRef. To be clear, I'm only using LLDB as a motivating example, but I'm not requesting that it be done because LLDB is some kind of special case. If it is to be done it should be on its own merits. That said, here is some context: >> >> LLDB has a lot of functions that look like this: >> >> void foo(const char *, Bar, const char *). >> >> I'm trying to port these to functions that look like this: >> >> void foo(StringRef, Bar, StringRef). >> >> Often times the parameters are string literals or char arrays, but equally often they are another const char* that got passed into the calling function, or a return value from a CRT function like strstr(), or many other possible sources. This latter category presents a problem for porting code to StringRef, because if I simply change the function signature and fix up compile errors, I will probably have introduced a bug because hundreds of callers will now be implicitly converting from const char* to StringRef, leaving open the possibility that one of those was null. >> >> To work around this, I've started doing the following every time I port a function: >> >> void foo(const char *, Bar, const char*) = delete; >> >> This is pretty hackish, but it gets the job done. At least the compiler warns me and forces me to go inspect every callsite where there's an implicit conversion. Unfortunately it also makes for extremely verbose code. Now instead of: >> >> foo("bar", baz, "buzz") >> >> I have to write >> >> foo(StringRef("bar"), baz, StringRef("buzz")) >> >> even for string literals and char arrays, which will obviously never be null! If StringRef would handle a null argument gracefully, it would make my life much easier. >> >> >> With that out of the way, here are some reasons I can see to allow StringRef accept null to its constructor which are independent of LLDB and stand on their own merit. >> >> 1) std::string_view<> can be constructed with null. I don't know when we will be able to use std::string_view<>, but there's a chance that at some point in the future we may wish to remove StringRef in favor of string_view. That day isn't soon, but in any case, it will be easier if our assumptions are the same. >> >> 2) [nullptr, nullptr+0) is a valid range. Why shouldn't we be able to construct a StringRef from an otherwise perfectly valid range? >> >> 3) StringRef() can already be constructed from nullptr (!) Surprised? That's what happens when you invoke the default constructor. It happily initializes the internal Data with null. So why not allow the same behavior when invoking the const char * constructor? >> >> >> Thoughts? > > > As a tangent: I don’t like the fact that StringRef is implicitly built out of “const char *”, this is calling strlen() and because it is implicit folks don’t realize when they go from string -> char * -> StringRef. > I rather have this constructor explicit, and provide an implicit one for string literal.I wonder if we could change that call site to be deleted (or at least explicit), and add support for literal strings with a StringRef version of this: /// Construct an ArrayRef from a C array. template <size_t N> /*implicit*/ LLVM_CONSTEXPR ArrayRef(const T (&Arr)[N]) : Data(Arr), Length(N) {} This way we’ll avoid the strlen on quoted strings which is the common case anyway, and then can see how many other cases we have from const char* remaining. Pete> > To come back to your point, I’m not sure if we should leave the internal pointer null or always set it to “”? This would provide the guarantee that dereferencing a StringRef is always valid without checking. > > — > Mehdi > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160925/9a64e131/attachment.html>
Chris Lattner via llvm-dev
2016-Sep-26 00:40 UTC
[llvm-dev] Is it time to allow StringRef to be constructed from nullptr?
The pointer could only be null if the length is zero. If the length is zero, you shouldn't be loading it. Defaulting to null instead of "" is also a microoptimization. -Chris> On Sep 25, 2016, at 10:49 AM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > >> On Sep 25, 2016, at 9:10 AM, Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> While porting LLDB over to StringRef, I am continuously running into difficulties caused by the fact that StringRef cannot be constructed from nullptr. So I wanted to see peoples' thoughts on removing this restriction from StringRef. To be clear, I'm only using LLDB as a motivating example, but I'm not requesting that it be done because LLDB is some kind of special case. If it is to be done it should be on its own merits. That said, here is some context: >> >> LLDB has a lot of functions that look like this: >> >> void foo(const char *, Bar, const char *). >> >> I'm trying to port these to functions that look like this: >> >> void foo(StringRef, Bar, StringRef). >> >> Often times the parameters are string literals or char arrays, but equally often they are another const char* that got passed into the calling function, or a return value from a CRT function like strstr(), or many other possible sources. This latter category presents a problem for porting code to StringRef, because if I simply change the function signature and fix up compile errors, I will probably have introduced a bug because hundreds of callers will now be implicitly converting from const char* to StringRef, leaving open the possibility that one of those was null. >> >> To work around this, I've started doing the following every time I port a function: >> >> void foo(const char *, Bar, const char*) = delete; >> >> This is pretty hackish, but it gets the job done. At least the compiler warns me and forces me to go inspect every callsite where there's an implicit conversion. Unfortunately it also makes for extremely verbose code. Now instead of: >> >> foo("bar", baz, "buzz") >> >> I have to write >> >> foo(StringRef("bar"), baz, StringRef("buzz")) >> >> even for string literals and char arrays, which will obviously never be null! If StringRef would handle a null argument gracefully, it would make my life much easier. >> >> >> With that out of the way, here are some reasons I can see to allow StringRef accept null to its constructor which are independent of LLDB and stand on their own merit. >> >> 1) std::string_view<> can be constructed with null. I don't know when we will be able to use std::string_view<>, but there's a chance that at some point in the future we may wish to remove StringRef in favor of string_view. That day isn't soon, but in any case, it will be easier if our assumptions are the same. >> >> 2) [nullptr, nullptr+0) is a valid range. Why shouldn't we be able to construct a StringRef from an otherwise perfectly valid range? >> >> 3) StringRef() can already be constructed from nullptr (!) Surprised? That's what happens when you invoke the default constructor. It happily initializes the internal Data with null. So why not allow the same behavior when invoking the const char * constructor? >> >> >> Thoughts? > > > As a tangent: I don’t like the fact that StringRef is implicitly built out of “const char *”, this is calling strlen() and because it is implicit folks don’t realize when they go from string -> char * -> StringRef. > I rather have this constructor explicit, and provide an implicit one for string literal. > > To come back to your point, I’m not sure if we should leave the internal pointer null or always set it to “”? This would provide the guarantee that dereferencing a StringRef is always valid without checking. > > — > Mehdi > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160925/57747cc1/attachment.html>
Joerg Sonnenberger via llvm-dev
2016-Sep-26 09:54 UTC
[llvm-dev] Is it time to allow StringRef to be constructed from nullptr?
On Sun, Sep 25, 2016 at 04:10:24PM +0000, Zachary Turner via llvm-dev wrote:> 2) [nullptr, nullptr+0) is a valid range. Why shouldn't we be able to > construct a StringRef from an otherwise perfectly valid range?The problem is that for a lot of functions, esp. the mem* family, NULL as argument is triggering UB, even if size is 0. Since the GNU folks started to attribute the functions like that, certain compilers will pretty aggressively fold code based on that. Joerg
Zachary Turner via llvm-dev
2016-Sep-26 15:54 UTC
[llvm-dev] Is it time to allow StringRef to be constructed from nullptr?
I put up a CL to remove the assert https://reviews.llvm.org/D24904 I'm not seeing much objection to this proposal but I posted on a weekend, so maybe someone will have something to say now. If there's no objections I'll wait a few hours after getting an lgtm to commit On Mon, Sep 26, 2016 at 2:54 AM Joerg Sonnenberger via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Sun, Sep 25, 2016 at 04:10:24PM +0000, Zachary Turner via llvm-dev > wrote: > > 2) [nullptr, nullptr+0) is a valid range. Why shouldn't we be able to > > construct a StringRef from an otherwise perfectly valid range? > > The problem is that for a lot of functions, esp. the mem* family, NULL > as argument is triggering UB, even if size is 0. Since the GNU folks > started to attribute the functions like that, certain compilers will > pretty aggressively fold code based on that. > > Joerg > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160926/cb81770e/attachment.html>