thr3ads.net - llvm dev - [LLVMdev] Correct use of StringRef and Twine [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Chris Lattner

2011-Jul-22 18:02 UTC

[LLVMdev] Correct use of StringRef and Twine

On Jul 21, 2011, at 12:30 AM, David Blaikie wrote:
>> [diff attached]
> 
> Updated diff with test fix. (since this broke a test (printing chars
> as numerical values, rather than characters) it's possible this change
> is a bad idea & it could break the product code itself. Though
> strangely I wasn't able to do character concatenation without my
> change, so I have a sneaking suspicion that while the test passed, it
> didn't actually expose this case to the common Twine use cases.
> Perhaps only explicitly invoking the Twine ctor would've got the
> char-as-number behavior previously)
> <twine_triple.diff>
The dangerous part of this is that characters are integers, so "foo" +
'x' is very likely to cause serious problems.  This is the reason that
the integer versions of the twine ctor are marked 'explicit'.  I'm
ok with the Twine class changes in this patch if the ctor is marked
'explicit'.  You should also probably add a ctor for signed/unsigned
char as well (which reuse the existing CharKind enum).

I'll respond to Triple specific issues in response to your previous email. 
Thanks for pushing this forward David!

-Chris

David Blaikie

2011-Jul-22 21:59 UTC

head link

[LLVMdev] Correct use of StringRef and Twine

> The dangerous part of this is that characters are integers, so
"foo" + 'x' is very likely to cause serious problems.
std::string already provides such overloads though, doesn't it? So the
code isn't any safer from accidental "foo" + 'x'
expressions that
don't include Twine/StringRef/std::string than it was before. But if
the argument is that std::string's interface was poorly
designed/unsafe & we can do better/safer, I'm OK with making the ctor
explicit as you've suggested.
> You should also probably add a ctor for signed/unsigned char as well (which
reuse the existing CharKind enum).
Hmm - would it be safe to cast those signed/unsigned chars to straight
char? (is it guaranteed that the signed & unsigned values with the
same representation map to the same glyph?)

As a side note on Twine's design: Is there a particular reason it uses
void*s rather than unions? and chars rather than enums?

(sorry if I'm asking lots of "why is this like this" questions all
over the code base - I just don't want to assume that it's intentional
and replicate a pattern elsewhere that I don't understand only to find
it's unintentional "not fixed yet" sort of stuff. I suppose at the
very least it'll be a chance to add in some explanatory comments if I
do find things that are by design but weren't clear to me)

Thanks,
- David

Chris Lattner

2011-Jul-23 00:29 UTC

head link

[LLVMdev] Correct use of StringRef and Twine

On Jul 22, 2011, at 2:59 PM, David Blaikie wrote:
>> The dangerous part of this is that characters are integers, so
"foo" + 'x' is very likely to cause serious problems.
> 
> std::string already provides such overloads though, doesn't it? So the
> code isn't any safer from accidental "foo" + 'x'
expressions that
> don't include Twine/StringRef/std::string than it was before. But if
> the argument is that std::string's interface was poorly
> designed/unsafe & we can do better/safer, I'm OK with making the
ctor
> explicit as you've suggested.
Yes, exactly.  I'm just saying that I think the additional clarity of:
  "foo" + Twine('x') 

is worth the inconvenience.
>>  You should also probably add a ctor for signed/unsigned char as well
(which reuse the existing CharKind enum).
> 
> Hmm - would it be safe to cast those signed/unsigned chars to straight
> char? (is it guaranteed that the signed & unsigned values with the
> same representation map to the same glyph?)
Yes.  I consider 'signed vs unsigned char vs char' to be a blight on the
C type system.  Just casting to char internally would be fine.
> As a side note on Twine's design: Is there a particular reason it uses
> void*s rather than unions? 
I'm not sure what you're proposing specifically.
> and chars rather than enums?
char vs enum is because of visual studio compatibility and because enums often
are stored as 32-bit values instead of 8-bit values.
> (sorry if I'm asking lots of "why is this like this"
questions all
> over the code base - I just don't want to assume that it's
intentional
> and replicate a pattern elsewhere that I don't understand only to find
> it's unintentional "not fixed yet" sort of stuff. I suppose
at the
> very least it'll be a chance to add in some explanatory comments if I
> do find things that are by design but weren't clear to me)
No problem at all, happy to help answer the questions.  Forcing a reexamination
of past decisions is not a bad thing at all in this case :)
>> Right.  Something like this could work:
>> 
>> foo(const Twine &T) {
>> ...
>>  TwineTmpBuffer Tmp;
>>  StringSwitch(T.toString(Tmp)).....
>> 
>> Which doesn't seem too horrible, just needs a typedef of
smallvector to TwineTmpBuffer.
> 
> In a few choice places, maybe, but as the default way to pass string
> parameters I think that'd be a hard sell as a general practice.

Yes, from a general API design perspective, I hate having to force a choice
between "convenience to implementor of an API to just take a
StringRef" vs "convenience to client of API for it to take a
Twine".  It really stinks.

I was chatting with Howard Hinnant about this and he suggested replacing Twine
with a template metaprogramming expression-template system.  I haven't
thought through all the details, but perhaps this would allow us to get the best
of both worlds?
> Hmm, this is one bit I'm not sure about. As I tried to explain above,
> it seems problematic to have to choose your argument type on the basis
> of how you think callers might use your API. From the perspective of a
> caller, a Twine argument is at least as expressive as a StringRef
> (since all StringRefs can be Twined implicitly), but it takes that
> extra step to write the implementation.
> 
> Perhaps I'm aiming for some kind of purist/perfectionist argument that
> isn't necessary or practical, but I hope I've been clear in
explaining
> my uncertainty/issue here.

Yes, I'm deeply unhappy about this aspect of our string api's.

-Chris

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Jul 2011 - [LLVMdev] Correct use of StringRef and Twine

[LLVMdev] Correct use of StringRef and Twine

[LLVMdev] Correct use of StringRef and Twine

[LLVMdev] Correct use of StringRef and Twine

Maybe Matching Threads