On Jul 22, 2011, at 2:59 PM, David Blaikie wrote:>> The dangerous part of this is that characters are integers, so "foo" + 'x' is very likely to cause serious problems. > > std::string already provides such overloads though, doesn't it? So the > code isn't any safer from accidental "foo" + 'x' expressions that > don't include Twine/StringRef/std::string than it was before. But if > the argument is that std::string's interface was poorly > designed/unsafe & we can do better/safer, I'm OK with making the ctor > explicit as you've suggested.Yes, exactly. I'm just saying that I think the additional clarity of: "foo" + Twine('x') is worth the inconvenience.>> You should also probably add a ctor for signed/unsigned char as well (which reuse the existing CharKind enum). > > Hmm - would it be safe to cast those signed/unsigned chars to straight > char? (is it guaranteed that the signed & unsigned values with the > same representation map to the same glyph?)Yes. I consider 'signed vs unsigned char vs char' to be a blight on the C type system. Just casting to char internally would be fine.> As a side note on Twine's design: Is there a particular reason it uses > void*s rather than unions?I'm not sure what you're proposing specifically.> and chars rather than enums?char vs enum is because of visual studio compatibility and because enums often are stored as 32-bit values instead of 8-bit values.> (sorry if I'm asking lots of "why is this like this" questions all > over the code base - I just don't want to assume that it's intentional > and replicate a pattern elsewhere that I don't understand only to find > it's unintentional "not fixed yet" sort of stuff. I suppose at the > very least it'll be a chance to add in some explanatory comments if I > do find things that are by design but weren't clear to me)No problem at all, happy to help answer the questions. Forcing a reexamination of past decisions is not a bad thing at all in this case :)>> Right. Something like this could work: >> >> foo(const Twine &T) { >> ... >> TwineTmpBuffer Tmp; >> StringSwitch(T.toString(Tmp))..... >> >> Which doesn't seem too horrible, just needs a typedef of smallvector to TwineTmpBuffer. > > In a few choice places, maybe, but as the default way to pass string > parameters I think that'd be a hard sell as a general practice.Yes, from a general API design perspective, I hate having to force a choice between "convenience to implementor of an API to just take a StringRef" vs "convenience to client of API for it to take a Twine". It really stinks. I was chatting with Howard Hinnant about this and he suggested replacing Twine with a template metaprogramming expression-template system. I haven't thought through all the details, but perhaps this would allow us to get the best of both worlds?> Hmm, this is one bit I'm not sure about. As I tried to explain above, > it seems problematic to have to choose your argument type on the basis > of how you think callers might use your API. From the perspective of a > caller, a Twine argument is at least as expressive as a StringRef > (since all StringRefs can be Twined implicitly), but it takes that > extra step to write the implementation. > > Perhaps I'm aiming for some kind of purist/perfectionist argument that > isn't necessary or practical, but I hope I've been clear in explaining > my uncertainty/issue here.Yes, I'm deeply unhappy about this aspect of our string api's. -Chris
> Yes, exactly. I'm just saying that I think the additional clarity of: > "foo" + Twine('x') > > is worth the inconvenience.Ok, attached a modified version of my patch with an Twine(char), Twine(unsigned char), and Twine(signed char). All three are explicit & have included test cases. Speaking of which - is there any LLVM dev policy or preference around C-style casts versus C++-style casts? Twine is littered with the former but I generally use the latter for safety/clarity of intend, etc. (I'm not going to try to fix everything in Twine just now if we're considering alternative implementations like template metaprogramming anyway)>> As a side note on Twine's design: Is there a particular reason it uses >> void*s rather than unions? > > I'm not sure what you're proposing specifically.I've attached an version of my fix that includes the union use I was referring to (sorry for being vague in the previous email). Though if we're considering a more explicitly structured solution with templates/subtypes/etc we could skip the whole mess of enums & unions/casts.>> and chars rather than enums? > > char vs enum is because of visual studio compatibility and because enums often are stored as 32-bit values instead of 8-bit values.Curious - seems VS >2k3 ( http://msdn.microsoft.com/en-us/library/2dzy4k6e(v=VS.80).aspx ) has supported specifying enum backing type (so you can specify it as char: "enum foo : char { ... }") if that was the feature you were wanting to use but avoiding.> Yes, from a general API design perspective, I hate having to force a choice between "convenience to implementor of an API to just take a StringRef" vs "convenience to client of API for it to take a Twine". It really stinks.Ok - glad we're on the same page there, wasn't quite sure whether we were talking past one another in previous emails. I have a couple of ideas about how to lower the developer cost of using Twine ubiquitously. I'll try to describe them in some detail.> I was chatting with Howard Hinnant about this and he suggested replacing Twine with a template metaprogramming expression-template system. I haven't thought through all the details, but perhaps this would allow us to get the best of both worlds?I think I'd prefer a template metaprogramming solution to Twine just from a purist perspective compared to all the casting/reinterpreting/enums (even with the union) - that sort of stuff makes me a bit uncomfortable, but I realize it's well contained enough that it's not too risky (& I know some of the other tricks LLVM uses like low bit twiddling in pointers, etc, so this seems relatively sane in the grand scheme of things). But I'm not sure it'll fundamentally change how we might be able to solve this caller convenience issue. So here are my ideas: 1) the easy solution: create a StringRef subclass (or new type with a similarly expressive API) that wraps a buffer that it may or may not use. Give it a ctor that takes a Twine. The use case then goes from this: SmallVectorImpl<char> v; StringRef s = theTwine.GetStringRef(v); to this: TwineString s(theTwine); (& the ctor looks like: TwineString(const Twine& t) { *this t.GetStringRef(this->buffer); }) So any Twine sink now has to pay one line of code to get a usable string out of a Twine. Cheap to implement, easy to fix existing use cases (privatize GetStringRef(SmallVectorImpl<char>) and make TwineString a friend - fix everything that fails to compile). This keeps things simple & seems to be "good enough" to me, but we could perhaps do better (at the very least, again, if we did do better, we could go back & remove TwineString & again fix all the places that fail to compile with whatever new hotness we invent) Hmm - new thought: what would happen if my argument type was simply TwineString? If I knew my function needed access to the values (like the Host.cpp example) but didn't have any buffer of its own that it needed to use, that seems OK. This new thought perhaps renders suggestion (2) uninteresting. Functions that know they always need the string contents take a TwineString. Functions that might be able to skip realizing the string entirely - or have their own buffer (eg: a member std::string) to populate - take a Twine & explicitly construct a TwineString only in the path that needs the value. That seems optimal to me. 1.1) We probably want a better way to get a Twine into a std::string - while the change to Triple's ctor works because returning a std::string from a function into a ctor can do fancy in-place construction - in any other case like "SetFoo(const Twine& t) { this->s = t; }" we'd want something better. Perhaps simply t.AssignTo(s); 2) This one isn't quite as fleshed out, but see if the general outline makes sense: Introduce an extra type that's Twine-esque, but is only the top level (so it's implicitly constructible from a const Twine&), rather than all the branches of a Twine. If a function wants to accept a Twine that it needs to manipulate, it can mark its argument as TwineString2 (for want of a better name, just using 2 to distinguish from the (1) example). TwineString2 has a buffer in it, but it starts uninitialized. Any attempt to retrieve/inspect values causes the string to be initialized. [downside: all string operations on a TwineString2 incur an "if not initialized" check] Twines could be constructed from TwineString2s if the value needed to be passed on to another function. (& still take advantage of any initialized string buffer it might have, rather than having to remanifest from its child nodes repeatedly) One fancy enhancement to this is that TwineString2 could partially manifest - eg: if you want ts[2] it could walk its children (dropping its references to children & updating the buffer with the contents at each step) until it found the 2nd character - then stop, with a partially expanded tree/buffer. Though there's probably nothing stopping us implementing such smarts in the (1) solution eventually either. Thanks, - David -------------- next part -------------- A non-text attachment was scrubbed... Name: twine_explicit_char.patch Type: application/octet-stream Size: 4365 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110724/852ca59f/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: twine_explicit_char_union.patch Type: application/octet-stream Size: 15243 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110724/852ca59f/attachment-0001.obj>
On Jul 24, 2011, at 12:09 AM, David Blaikie wrote:>> Yes, exactly. I'm just saying that I think the additional clarity of: >> "foo" + Twine('x') >> >> is worth the inconvenience. > > Ok, attached a modified version of my patch with an Twine(char), > Twine(unsigned char), and Twine(signed char). All three are explicit & > have included test cases.Ok, great.> Speaking of which - is there any LLVM dev policy or preference around > C-style casts versus C++-style casts? Twine is littered with theNope, use the one that is the most clear in a given case. Except in extreme cases, the C++ style casts seem overly verbose and don't add a lot of value.>>> As a side note on Twine's design: Is there a particular reason it uses >>> void*s rather than unions? >> >> I'm not sure what you're proposing specifically. > > I've attached an version of my fix that includes the union use I was > referring to (sorry for being vague in the previous email). Though if > we're considering a more explicitly structured solution with > templates/subtypes/etc we could skip the whole mess of enums & > unions/casts.aha, that *is* much nicer! Applied with a few tweaks (remove some extraneous 'explicit' keywords and wrapped to 80 cols). Thanks!>>> and chars rather than enums? >> >> char vs enum is because of visual studio compatibility and because enums often are stored as 32-bit values instead of 8-bit values. > > Curious - seems VS >2k3 ( > http://msdn.microsoft.com/en-us/library/2dzy4k6e(v=VS.80).aspx ) has > supported specifying enum backing type (so you can specify it as char: > "enum foo : char { ... }") if that was the feature you were wanting to > use but avoiding.Right, but that requires VS #ifdefs. You can also use enum bitfields, but VS has different promotion rules from them than other compilers.>> Yes, from a general API design perspective, I hate having to force a choice between "convenience to implementor of an API to just take a StringRef" vs "convenience to client of API for it to take a Twine". It really stinks. > > Ok - glad we're on the same page there, wasn't quite sure whether we > were talking past one another in previous emails. > > I have a couple of ideas about how to lower the developer cost of > using Twine ubiquitously. I'll try to describe them in some detail.Ok!>> I was chatting with Howard Hinnant about this and he suggested replacing Twine with a template metaprogramming expression-template system. I haven't thought through all the details, but perhaps this would allow us to get the best of both worlds? > > I think I'd prefer a template metaprogramming solution to Twine just > from a purist perspective compared to all the > casting/reinterpreting/enums (even with the union) - that sort of > stuff makes me a bit uncomfortable, but I realize it's well contained > enough that it's not too risky (& I know some of the other tricks LLVM > uses like low bit twiddling in pointers, etc, so this seems relatively > sane in the grand scheme of things).Right. As long as the damage is well contained and isolated in the class, I don't care much about stuff like this. One of the great features of C++ is that it lets you do whatever insane thing you want and draw a nice interface around it :)> 1) the easy solution: create a StringRef subclass (or new type with a > similarly expressive API) that wraps a buffer that it may or may not > use. Give it a ctor that takes a Twine. The use case then goes from > this: > > SmallVectorImpl<char> v; > StringRef s = theTwine.GetStringRef(v); > > to this: > > TwineString s(theTwine); > > (& the ctor looks like: TwineString(const Twine& t) { *this > t.GetStringRef(this->buffer); }) > So any Twine sink now has to pay one line of code to get a usable > string out of a Twine. Cheap to implement, easy to fix existing use > cases (privatize GetStringRef(SmallVectorImpl<char>) and make > TwineString a friend - fix everything that fails to compile). > > This keeps things simple & seems to be "good enough" to me, but we > could perhaps do better (at the very least, again, if we did do > better, we could go back & remove TwineString & again fix all the > places that fail to compile with whatever new hotness we invent)This definitely seems like a step forward. While it is kinda gross, a subclass of StringRef is probably the lowest friction path to do this.> Hmm - new thought: what would happen if my argument type was simply > TwineString? If I knew my function needed access to the values (like > the Host.cpp example) but didn't have any buffer of its own that it > needed to use, that seems OK. This new thought perhaps renders > suggestion (2) uninteresting. Functions that know they always need the > string contents take a TwineString. Functions that might be able to > skip realizing the string entirely - or have their own buffer (eg: a > member std::string) to populate - take a Twine & explicitly construct > a TwineString only in the path that needs the value. That seems > optimal to me.I'm dubious of this. This is exposing an implementation detail to clients that they shouldn't have to know, and if passed by value, this could be expensive.> 1.1) We probably want a better way to get a Twine into a std::string - > while the change to Triple's ctor works because returning a > std::string from a function into a ctor can do fancy in-place > construction - in any other case like "SetFoo(const Twine& t) { > this->s = t; }" we'd want something better. Perhaps simply > t.AssignTo(s);Adding a Twine::assignTo method would make sense to me!> 2) This one isn't quite as fleshed out, but see if the general outline > makes sense: Introduce an extra type that's Twine-esque, but is only > the top level (so it's implicitly constructible from a const Twine&), > rather than all the branches of a Twine. If a function wants to accept > a Twine that it needs to manipulate, it can mark its argument as > TwineString2 (for want of a better name, just using 2 to distinguish > from the (1) example). TwineString2 has a buffer in it, but it starts > uninitialized. Any attempt to retrieve/inspect values causes the > string to be initialized. [downside: all string operations on a > TwineString2 incur an "if not initialized" check] > > Twines could be constructed from TwineString2s if the value needed to > be passed on to another function. (& still take advantage of any > initialized string buffer it might have, rather than having to > remanifest from its child nodes repeatedly) > > One fancy enhancement to this is that TwineString2 could partially > manifest - eg: if you want ts[2] it could walk its children (dropping > its references to children & updating the buffer with the contents at > each step) until it found the 2nd character - then stop, with a > partially expanded tree/buffer. Though there's probably nothing > stopping us implementing such smarts in the (1) solution eventually > either.I think you're flying way over my head here :) -Chris