I'm attempting to do some amount of mass migration from const std::string& (& const char*) to StringRef. In doing so I stumbled across the fact that while StringRef has no op+ to speak of (well, it has += and I added a manual StringRef+std::string and std::string+StringRef for now - though I'm thinking perhaps they can be skipped in favor of Twine operations), Twine does provide a variety of handy op+ implementations that get picked up mostly for free. A few questions from this: * When should a StringRef parameter type be preferred over a const Twine& parameter type? * Would it be OK to implement an implicit conversion from Twine to std::string rather than using the explicit str() function? (when switching to StringRef many expressions became Twine concatenation when they were std::string concatenation - this isn't a drop-in replacement due to the absence of std::string conversion from Twine (which should be a perf win, I'd think - delaying concatenation into a single operation instead of (((a + b) + c) + d)), so I've had to wrap various concatenation expressions in (...).str()) * What would happen if Twine gave back a pointer/reference to some internal storage? In the common/correct use case (taking a Twine as a const ref, or using a Twine entirely in a temporary concatenation expression) wouldn't the Twine's internal storage live long enough to allow the caller to use that buffer within the life of the statement (as in, say, o << (aStringRef + "foo"))? This last one is probably the bigger question & relates to the other two. It seems like Twine is nice for cases where deferred concatenation is required & might be able to be avoided altogether - if the majority case is that the concatenation ends up being thrown away, you win. But as you approach the 100% usage (not throwing away ever) are you losing perf in the simple non-concatenative case (all those callers who just pass in a straight string without any concatenation - the Twine then returns that string by value when str() is called, right?)? At the moment, if the lifetime works out & Twine can return a reference to a member (this would mean unrealized Twines would have a bunch of empty/spare std::strings lying around, but I expect those could be optimized away fairly reliably), that Twine would be strictly superior to StringRef & could be preferred in all cases, potentially even replacing/simplifying StringRef substantially. Thoughts? - David -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110718/be1f6564/attachment.html>
On Jul 18, 2011, at 12:38 PM, David Blaikie wrote:> I'm attempting to do some amount of mass migration from const std::string& (& const char*) to StringRef.Great!> In doing so I stumbled across the fact that while StringRef has no op+ to speak of (well, it has += and I added a manual StringRef+std::string and std::string+StringRef for now - though I'm thinking perhaps they can be skipped in favor of Twine operations), Twine does provide a variety of handy op+ implementations that get picked up mostly for free.Yeah, I don't think we want to support + on StringRef without going through Twine.> A few questions from this: > > * When should a StringRef parameter type be preferred over a const Twine& parameter type?Generally, here are the guidelines for return values: 1. Use std::string when the value is computed, or there are other lifetime issues. 2. Use StringRef otherwise. And for arguments, generally always use Twine as the default, it allows construction of complex things, and is still efficient when passed the equiv of a StringRef (with the toStringRef method). The only annoying thing about it is that the API to do this requires a temporary SmallVector to scribble in, which makes it more difficult to use. It seems that there should be a good way to fix this API problem.> * Would it be OK to implement an implicit conversion from Twine to std::string rather than using the explicit str() function? (when switching to StringRef many expressions became Twine concatenation when they were std::string concatenation - this isn't a drop-in replacement due to the absence of std::string conversion from Twine (which should be a perf win, I'd think - delaying concatenation into a single operation instead of (((a + b) + c) + d)), so I've had to wrap various concatenation expressions in (...).str())I'd prefer not. I'd rather convert the things that want an std::string to take a Twine.> * What would happen if Twine gave back a pointer/reference to some internal storage? In the common/correct use case (taking a Twine as a const ref, or using a Twine entirely in a temporary concatenation expression) wouldn't the Twine's internal storage live long enough to allow the caller to use that buffer within the life of the statement (as in, say, o << (aStringRef + "foo"))?This is really dangerous. I'd much rather extend raw_ostream to take twines.> This last one is probably the bigger question & relates to the other two. It seems like Twine is nice for cases where deferred concatenation is required & might be able to be avoided altogether - if the majority case is that the concatenation ends up being thrown away, you win. But as you approach the 100% usage (not throwing away ever) are you losing perf in the simple non-concatenative case (all those callers who just pass in a straight string without any concatenation - the Twine then returns that string by value when str() is called, right?)?The major win is actually when you have clients that don't *want* an std::string. std::string is really slow in most operations (it does atomic operations and COW with common implementations, not to mention heap allocations, though libc++ is much much better about both of these). Twine is optimized to avoid creating the temporary std::string at all.> At the moment, if the lifetime works out & Twine can return a reference to a member (this would mean unrealized Twines would have a bunch of empty/spare std::strings lying around, but I expect those could be optimized away fairly reliably), that Twine would be strictly superior to StringRef & could be preferred in all cases, potentially even replacing/simplifying StringRef substantially.The toStringRef(x) method is what you want. It is really fast and does no copy if the twine *just* contains a C string, std::string or StringRef, and in the concat cases it does no memory allocation in the common case where the SmallVector is big enough for the result. -Chris
> And for arguments, generally always use Twine as the default, it allows construction of complex things, and is still efficient when passed the equiv of a StringRef (with the toStringRef method). The only annoying thing about it is that the API to do this requires a temporary SmallVector to scribble in, which makes it more difficult to use.Yes, I noticed this - which was one of my concerns about migrating lots of stuff to Twine: that efficient Twine-sinks would be tricky and/or verbose, so I was wondering how to make Twine-sinks easier to write (hence the ref return I mentioned below)> > * Would it be OK to implement an implicit conversion from Twine to std::string rather than using the explicit str() function? (when switching to StringRef many expressions became Twine concatenation when they were std::string concatenation - this isn't a drop-in replacement due to the absence of std::string conversion from Twine (which should be a perf win, I'd think - delaying concatenation into a single operation instead of (((a + b) + c) + d)), so I've had to wrap various concatenation expressions in (...).str()) > > I'd prefer not. I'd rather convert the things that want an std::string to take a Twine.Then it's a question of what sort of final destinations/sinks there are & how many, given the slight complexity of writing an efficient Twine sink.> > * What would happen if Twine gave back a pointer/reference to some internal storage? In the common/correct use case (taking a Twine as a const ref, or using a Twine entirely in a temporary concatenation expression) wouldn't the Twine's internal storage live long enough to allow the caller to use that buffer within the life of the statement (as in, say, o << (aStringRef + "foo"))? > > This is really dangerous.Is it much/any more dangerous than Twine's existing functionality, though? Twine's already referring to temporaries & will break in fun ways if used outside that cliche.> I'd much rather extend raw_ostream to take twines.Certainly that covers this case, but I'm not sure how many different sinks there are that would need the nuanced efficient Twine handling code. Perhaps it's not many - though some easy way to do "stdStr twine" would be great since quite often we want to take a string & put it in a member, etc. Would it be worth having "assign(std::string&, const Twine&)" or similar - or is going through Twine::str() & expecting the compiler to optimize away the temporary std::string acceptable? (I assume not, or Twine wouldn't have all those fancy functions for getting a StringRef & providing a buffer, etc)> The major win is actually when you have clients that don't *want* an std::string.Ah, right, that too.> std::string is really slow in most operations (it does atomic operations and COW with common implementations,Hmm, which implementations still use COW, I wonder? I think there's something in C++0x that makes it impossible to implement std::string as COW.> The toStringRef(x) method is what you want. It is really fast and does no copy if the twine *just* contains a C string, std::string or StringRef, and in the concat cases it does no memory allocation in the common case where the SmallVector is big enough for the result.curiosity question: how much more efficient (vague question, I know) is the StringRef + SmallVector than a good (eg: libc++) std::string implementation? I know, for example, that Visual C++ 2010's std::string does perform the small string optimization which I guess is what SmallVector is doing. - David
> And for arguments, generally always use Twine as the default, it allows construction of complex things, and is still efficient when passed the equiv of a StringRef (with the toStringRef method). The only annoying thing about it is that the API to do this requires a temporary SmallVector to scribble in, which makes it more difficult to use. > > It seems that there should be a good way to fix this API problem.This is the problem I'm still trying to figure out. It seems that while Twine is efficient to pass, it's not easy to use efficiently & I don't think it'd be appropriate (correct me if I'm wrong) to go adding in temporary SmallVectors all over the place to try to consume Twine parameters in any code that needs to handle string contents after migrating the argument types to Twines. One place I experimented with fixing tonight (after trying broader goals - like changing all StringRef args in clang only, say) was to add a Twine(char) ctor to enable llvm::Triple's ctor to take Twines rather than StringRefs, and then do Twine op+ to build the Data member. The problem I see with this is that the current implementation of Triple's ctor is still more efficient than the simple Twine version: : Data(x) { Data += '-'; Data += x; Data += '-'; Data += 'z'; } (essentially), as opposed to: : Data((x + '-' + y + '-' + z).str()) Which requires an extra string copy of the final value in all cases... actually, now that I think about it, since it's returning into a ctor, this might be nicely optimized - so perhaps this is the "right way" to write this code. [diff attached] So then here's another example (can't find the exact piece of code I had been working on, but taking tools/clang/lib/Basic/Targets.cpp::getCPUDefineSuffix as an example anyway - just something using a StringSwitch, really). If this function were to take a const Twine& and pass it along to StringSwitch, ultimately StringSwitch (being the sink for the Twine - the code needing to read the individual character elements) would be responsible for allocating a temporary SmallVectorImpl, passing that in, getting a StringRef & then using that for its work. Another example of a string sink is, say, tools/llvmc/src/Hooks.cpp. I found this while looking for uses of "+= '-'" to use the char support for Twine I'd just added. But I can't upgrade the Arg argument from const std::string& to const Twine& easily since it needs to actually manipulate the string - find, access elements, and substr it. Should this work just be done case by case? If so, I don't think I'll end up with much Twine usage, probably just a lot of StringRef usage & lots of str() calls. If Twine were able to be a more drop-in replacement (by providing a StringRef in a more convenient manner - or, ideally, subsuming StringRef's functionality entirely I think (& simply allocating a backing buffer when necessary - the suggesting you mentioned was dangerous, though I'm still tossing it around as something to try)) it'd be more practical to use it as the go-to argument for anything that needs a string. Looking forward to hearing anyone's thoughts on this. - David -------------- next part -------------- A non-text attachment was scrubbed... Name: twine_triple.diff Type: application/octet-stream Size: 3201 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110721/eab246db/attachment.obj>