thr3ads.net - llvm dev - [LLVMdev] Correct use of StringRef and Twine [Jul 2011]

If this information is useful, please help other people find it:
Share via:

David Blaikie

2011-Jul-18 19:38 UTC

[LLVMdev] Correct use of StringRef and Twine

I'm attempting to do some amount of mass migration from const
std::string&
(& const char*) to StringRef. In doing so I stumbled across the fact that
while StringRef has no op+ to speak of (well, it has += and I added a manual
StringRef+std::string and std::string+StringRef for now - though I'm
thinking perhaps they can be skipped in favor of Twine operations), Twine
does provide a variety of handy op+ implementations that get picked up
mostly for free.

A few questions from this:

* When should a StringRef parameter type be preferred over a const Twine&
parameter type?
* Would it be OK to implement an implicit conversion from Twine to
std::string rather than using the explicit str() function? (when switching
to StringRef many expressions became Twine concatenation when they were
std::string concatenation - this isn't a drop-in replacement due to the
absence of std::string conversion from Twine (which should be a perf win,
I'd think - delaying concatenation into a single operation instead of (((a +
b) + c) + d)), so I've had to wrap various concatenation expressions in
(...).str())
* What would happen if Twine gave back a pointer/reference to some internal
storage? In the common/correct use case (taking a Twine as a const ref, or
using a Twine entirely in a temporary concatenation expression) wouldn't the
Twine's internal storage live long enough to allow the caller to use that
buffer within the life of the statement (as in, say, o << (aStringRef +
"foo"))?

This last one is probably the bigger question & relates to the other two. It
seems like Twine is nice for cases where deferred concatenation is required
& might be able to be avoided altogether - if the majority case is that the
concatenation ends up being thrown away, you win. But as you approach the
100% usage (not throwing away ever) are you losing perf in the simple
non-concatenative case (all those callers who just pass in a straight string
without any concatenation - the Twine then returns that string by value when
str() is called, right?)?

At the moment, if the lifetime works out & Twine can return a reference to a
member (this would mean unrealized Twines would have a bunch of empty/spare
std::strings lying around, but I expect those could be optimized away fairly
reliably), that Twine would be strictly superior to StringRef & could be
preferred in all cases, potentially even replacing/simplifying StringRef
substantially.

Thoughts?

- David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110718/be1f6564/attachment.html>

Chris Lattner

2011-Jul-19 17:07 UTC

head link

[LLVMdev] Correct use of StringRef and Twine

On Jul 18, 2011, at 12:38 PM, David Blaikie wrote:
> I'm attempting to do some amount of mass migration from const
std::string& (& const char*) to StringRef.
Great!
> In doing so I stumbled across the fact that while StringRef has no op+ to
speak of (well, it has += and I added a manual StringRef+std::string and
std::string+StringRef for now - though I'm thinking perhaps they can be
skipped in favor of Twine operations), Twine does provide a variety of handy op+
implementations that get picked up mostly for free.
Yeah, I don't think we want to support + on StringRef without going through
Twine.
> A few questions from this:
> 
> * When should a StringRef parameter type be preferred over a const
Twine& parameter type?
Generally, here are the guidelines for return values:
1. Use std::string when the value is computed, or there are other lifetime
issues.
2. Use StringRef otherwise.

And for arguments, generally always use Twine as the default, it allows
construction of complex things, and is still efficient when passed the equiv of
a StringRef (with the toStringRef method).  The only annoying thing about it is
that the API to do this requires a temporary SmallVector to scribble in, which
makes it more difficult to use.

It seems that there should be a good way to fix this API problem.
> * Would it be OK to implement an implicit conversion from Twine to
std::string rather than using the explicit str() function? (when switching to
StringRef many expressions became Twine concatenation when they were std::string
concatenation - this isn't a drop-in replacement due to the absence of
std::string conversion from Twine (which should be a perf win, I'd think -
delaying concatenation into a single operation instead of (((a + b) + c) + d)),
so I've had to wrap various concatenation expressions in (...).str())
I'd prefer not.  I'd rather convert the things that want an std::string
to take a Twine.
> * What would happen if Twine gave back a pointer/reference to some internal
storage? In the common/correct use case (taking a Twine as a const ref, or using
a Twine entirely in a temporary concatenation expression) wouldn't the
Twine's internal storage live long enough to allow the caller to use that
buffer within the life of the statement (as in, say, o << (aStringRef +
"foo"))?
This is really dangerous.  I'd much rather extend raw_ostream to take
twines.
> This last one is probably the bigger question & relates to the other
two. It seems like Twine is nice for cases where deferred concatenation is
required & might be able to be avoided altogether - if the majority case is
that the concatenation ends up being thrown away, you win. But as you approach
the 100% usage (not throwing away ever) are you losing perf in the simple
non-concatenative case (all those callers who just pass in a straight string
without any concatenation - the Twine then returns that string by value when
str() is called, right?)?
The major win is actually when you have clients that don't *want* an
std::string.  std::string is really slow in most operations (it does atomic
operations and COW with common implementations, not to mention heap allocations,
though libc++ is much much better about both of these).  Twine is optimized to
avoid creating the temporary std::string at all.
> At the moment, if the lifetime works out & Twine can return a reference
to a member (this would mean unrealized Twines would have a bunch of empty/spare
std::strings lying around, but I expect those could be optimized away fairly
reliably), that Twine would be strictly superior to StringRef & could be
preferred in all cases, potentially even replacing/simplifying StringRef
substantially.
The toStringRef(x) method is what you want.  It is really fast and does no copy
if the twine *just* contains a C string, std::string or StringRef, and in the
concat cases it does no memory allocation in the common case where the
SmallVector is big enough for the result.

-Chris

David Blaikie

2011-Jul-19 17:24 UTC

head link

[LLVMdev] Correct use of StringRef and Twine

> And for arguments, generally always use Twine as the default, it allows
construction of complex things, and is still efficient when passed the equiv of
a StringRef (with the toStringRef method).  The only annoying thing about it is
that the API to do this requires a temporary SmallVector to scribble in, which
makes it more difficult to use.
Yes, I noticed this - which was one of my concerns about migrating
lots of stuff to Twine: that efficient Twine-sinks would be tricky
and/or verbose, so I was wondering how to make Twine-sinks easier to
write (hence the ref return I mentioned below)
> > * Would it be OK to implement an implicit conversion from Twine to
std::string rather than using the explicit str() function? (when switching to
StringRef many expressions became Twine concatenation when they were std::string
concatenation - this isn't a drop-in replacement due to the absence of
std::string conversion from Twine (which should be a perf win, I'd think -
delaying concatenation into a single operation instead of (((a + b) + c) + d)),
so I've had to wrap various concatenation expressions in (...).str())
>
> I'd prefer not.  I'd rather convert the things that want an
std::string to take a Twine.
Then it's a question of what sort of final destinations/sinks there
are & how many, given the slight complexity of writing an efficient
Twine sink.
> > * What would happen if Twine gave back a pointer/reference to some
internal storage? In the common/correct use case (taking a Twine as a const ref,
or using a Twine entirely in a temporary concatenation expression) wouldn't
the Twine's internal storage live long enough to allow the caller to use
that buffer within the life of the statement (as in, say, o << (aStringRef
+ "foo"))?
>
> This is really dangerous.
Is it much/any more dangerous than Twine's existing functionality,
though? Twine's already referring to temporaries & will break in fun
ways if used outside that cliche.
> I'd much rather extend raw_ostream to take twines.
Certainly that covers this case, but I'm not sure how many different
sinks there are that would need the nuanced efficient Twine handling
code. Perhaps it's not many - though some easy way to do "stdStr
twine" would be great since quite often we want to take a string & put
it in a member, etc. Would it be worth having "assign(std::string&,
const Twine&)" or similar - or is going through Twine::str() &
expecting the compiler to optimize away the temporary std::string
acceptable? (I assume not, or Twine wouldn't have all those fancy
functions for getting a StringRef & providing a buffer, etc)
> The major win is actually when you have clients that don't *want* an
std::string.
Ah, right, that too.
> std::string is really slow in most operations (it does atomic operations
and COW with common implementations,
Hmm, which implementations still use COW, I wonder? I think there's
something in C++0x that makes it impossible to implement std::string
as COW.
> The toStringRef(x) method is what you want.  It is really fast and does no
copy if the twine *just* contains a C string, std::string or StringRef, and in
the concat cases it does no memory allocation in the common case where the
SmallVector is big enough for the result.
curiosity question: how much more efficient (vague question, I know)
is the StringRef + SmallVector than a good (eg: libc++) std::string
implementation? I know, for example, that Visual C++ 2010's
std::string does perform the small string optimization which I guess
is what SmallVector is doing.

- David

David Blaikie

2011-Jul-21 07:00 UTC

head link

[LLVMdev] Correct use of StringRef and Twine

> And for arguments, generally always use Twine as the default, it allows
construction of complex things, and is still efficient when passed the equiv of
a StringRef (with the toStringRef method).  The only annoying thing about it is
that the API to do this requires a temporary SmallVector to scribble in, which
makes it more difficult to use.
>
> It seems that there should be a good way to fix this API problem.
This is the problem I'm still trying to figure out. It seems that
while Twine is efficient to pass, it's not easy to use efficiently & I
don't think it'd be appropriate (correct me if I'm wrong) to go
adding
in temporary SmallVectors all over the place to try to consume Twine
parameters in any code that needs to handle string contents after
migrating the argument types to Twines.

One place I experimented with fixing tonight (after trying broader
goals - like changing all StringRef args in clang only, say) was to
add a Twine(char) ctor to enable llvm::Triple's ctor to take Twines
rather than StringRefs, and then do Twine op+ to build the Data
member.

The problem I see with this is that the current implementation of
Triple's ctor is still more efficient than the simple Twine version:

: Data(x)
{
Data += '-';
Data += x;
Data += '-';
Data += 'z';
}

(essentially), as opposed to:

: Data((x + '-' + y + '-' + z).str())

Which requires an extra string copy of the final value in all cases...
actually, now that I think about it, since it's returning into a ctor,
this might be nicely optimized - so perhaps this is the "right way" to
write this code. [diff attached]

So then here's another example (can't find the exact piece of code I
had been working on, but taking
tools/clang/lib/Basic/Targets.cpp::getCPUDefineSuffix as an example
anyway - just something using a StringSwitch, really). If this
function were to take a const Twine& and pass it along to
StringSwitch, ultimately StringSwitch (being the sink for the Twine -
the code needing to read the individual character elements) would be
responsible for allocating a temporary SmallVectorImpl, passing that
in, getting a StringRef & then using that for its work.

Another example of a string sink is, say, tools/llvmc/src/Hooks.cpp. I
found this while looking for uses of "+= '-'" to use the char
support
for Twine I'd just added. But I can't upgrade the Arg argument from
const std::string& to const Twine& easily since it needs to actually
manipulate the string - find, access elements, and substr it.

Should this work just be done case by case? If so, I don't think I'll
end up with much Twine usage, probably just a lot of StringRef usage &
lots of str() calls.
If Twine were able to be a more drop-in replacement (by providing a
StringRef in a more convenient manner - or, ideally, subsuming
StringRef's functionality entirely I think (& simply allocating a
backing buffer when necessary - the suggesting you mentioned was
dangerous, though I'm still tossing it around as something to try))
it'd be more practical to use it as the go-to argument for anything
that needs a string.

Looking forward to hearing anyone's thoughts on this.

- David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: twine_triple.diff
Type: application/octet-stream
Size: 3201 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110721/eab246db/attachment.obj>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Jul 2011 - [LLVMdev] Correct use of StringRef and Twine

[LLVMdev] Correct use of StringRef and Twine

[LLVMdev] Correct use of StringRef and Twine

[LLVMdev] Correct use of StringRef and Twine

[LLVMdev] Correct use of StringRef and Twine

Seemingly Similar Threads