> And for arguments, generally always use Twine as the default, it allows construction of complex things, and is still efficient when passed the equiv of a StringRef (with the toStringRef method). The only annoying thing about it is that the API to do this requires a temporary SmallVector to scribble in, which makes it more difficult to use. > > It seems that there should be a good way to fix this API problem.This is the problem I'm still trying to figure out. It seems that while Twine is efficient to pass, it's not easy to use efficiently & I don't think it'd be appropriate (correct me if I'm wrong) to go adding in temporary SmallVectors all over the place to try to consume Twine parameters in any code that needs to handle string contents after migrating the argument types to Twines. One place I experimented with fixing tonight (after trying broader goals - like changing all StringRef args in clang only, say) was to add a Twine(char) ctor to enable llvm::Triple's ctor to take Twines rather than StringRefs, and then do Twine op+ to build the Data member. The problem I see with this is that the current implementation of Triple's ctor is still more efficient than the simple Twine version: : Data(x) { Data += '-'; Data += x; Data += '-'; Data += 'z'; } (essentially), as opposed to: : Data((x + '-' + y + '-' + z).str()) Which requires an extra string copy of the final value in all cases... actually, now that I think about it, since it's returning into a ctor, this might be nicely optimized - so perhaps this is the "right way" to write this code. [diff attached] So then here's another example (can't find the exact piece of code I had been working on, but taking tools/clang/lib/Basic/Targets.cpp::getCPUDefineSuffix as an example anyway - just something using a StringSwitch, really). If this function were to take a const Twine& and pass it along to StringSwitch, ultimately StringSwitch (being the sink for the Twine - the code needing to read the individual character elements) would be responsible for allocating a temporary SmallVectorImpl, passing that in, getting a StringRef & then using that for its work. Another example of a string sink is, say, tools/llvmc/src/Hooks.cpp. I found this while looking for uses of "+= '-'" to use the char support for Twine I'd just added. But I can't upgrade the Arg argument from const std::string& to const Twine& easily since it needs to actually manipulate the string - find, access elements, and substr it. Should this work just be done case by case? If so, I don't think I'll end up with much Twine usage, probably just a lot of StringRef usage & lots of str() calls. If Twine were able to be a more drop-in replacement (by providing a StringRef in a more convenient manner - or, ideally, subsuming StringRef's functionality entirely I think (& simply allocating a backing buffer when necessary - the suggesting you mentioned was dangerous, though I'm still tossing it around as something to try)) it'd be more practical to use it as the go-to argument for anything that needs a string. Looking forward to hearing anyone's thoughts on this. - David -------------- next part -------------- A non-text attachment was scrubbed... Name: twine_triple.diff Type: application/octet-stream Size: 3201 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110721/eab246db/attachment.obj>
> [diff attached]Updated diff with test fix. (since this broke a test (printing chars as numerical values, rather than characters) it's possible this change is a bad idea & it could break the product code itself. Though strangely I wasn't able to do character concatenation without my change, so I have a sneaking suspicion that while the test passed, it didn't actually expose this case to the common Twine use cases. Perhaps only explicitly invoking the Twine ctor would've got the char-as-number behavior previously) -------------- next part -------------- A non-text attachment was scrubbed... Name: twine_triple.diff Type: application/octet-stream Size: 3732 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110721/8cd5b77a/attachment.obj>
On Jul 21, 2011, at 12:30 AM, David Blaikie wrote:>> [diff attached] > > Updated diff with test fix. (since this broke a test (printing chars > as numerical values, rather than characters) it's possible this change > is a bad idea & it could break the product code itself. Though > strangely I wasn't able to do character concatenation without my > change, so I have a sneaking suspicion that while the test passed, it > didn't actually expose this case to the common Twine use cases. > Perhaps only explicitly invoking the Twine ctor would've got the > char-as-number behavior previously) > <twine_triple.diff>The dangerous part of this is that characters are integers, so "foo" + 'x' is very likely to cause serious problems. This is the reason that the integer versions of the twine ctor are marked 'explicit'. I'm ok with the Twine class changes in this patch if the ctor is marked 'explicit'. You should also probably add a ctor for signed/unsigned char as well (which reuse the existing CharKind enum). I'll respond to Triple specific issues in response to your previous email. Thanks for pushing this forward David! -Chris
On Jul 21, 2011, at 12:00 AM, David Blaikie wrote:>> And for arguments, generally always use Twine as the default, it allows construction of complex things, and is still efficient when passed the equiv of a StringRef (with the toStringRef method). The only annoying thing about it is that the API to do this requires a temporary SmallVector to scribble in, which makes it more difficult to use. >> >> It seems that there should be a good way to fix this API problem. > > This is the problem I'm still trying to figure out. It seems that > while Twine is efficient to pass, it's not easy to use efficiently & I > don't think it'd be appropriate (correct me if I'm wrong) to go adding > in temporary SmallVectors all over the place to try to consume Twine > parameters in any code that needs to handle string contents after > migrating the argument types to Twines.Are you concerned about the stack size of the smallvector's, or the amount of code that needs to be written?> One place I experimented with fixing tonight (after trying broader > goals - like changing all StringRef args in clang only, say) was to > add a Twine(char) ctor to enable llvm::Triple's ctor to take Twines > rather than StringRefs, and then do Twine op+ to build the Data > member.Nice.> The problem I see with this is that the current implementation of > Triple's ctor is still more efficient than the simple Twine version: > > : Data(x) > { > Data += '-'; > Data += x; > Data += '-'; > Data += 'z'; > } > > (essentially), as opposed to: > > : Data((x + '-' + y + '-' + z).str()) > > Which requires an extra string copy of the final value in all cases... > actually, now that I think about it, since it's returning into a ctor, > this might be nicely optimized - so perhaps this is the "right way" to > write this code. [diff attached]The former is relying on how append works on std::string. Beyond the number of bytes copied, the former could cause (depending on the implementation of std::string) a reallocation for every append. Well written, the Twine version could just do a resize/reserve once, then fill in the bytes. I don't know if Twine does that yet though.> So then here's another example (can't find the exact piece of code I > had been working on, but taking > tools/clang/lib/Basic/Targets.cpp::getCPUDefineSuffix as an example > anyway - just something using a StringSwitch, really). If this > function were to take a const Twine& and pass it along to > StringSwitch, ultimately StringSwitch (being the sink for the Twine - > the code needing to read the individual character elements) would be > responsible for allocating a temporary SmallVectorImpl, passing that > in, getting a StringRef & then using that for its work.Right. Something like this could work: foo(const Twine &T) { ... TwineTmpBuffer Tmp; StringSwitch(T.toString(Tmp))..... Which doesn't seem too horrible, just needs a typedef of smallvector to TwineTmpBuffer.> Another example of a string sink is, say, tools/llvmc/src/Hooks.cpp. I > found this while looking for uses of "+= '-'" to use the char support > for Twine I'd just added. But I can't upgrade the Arg argument from > const std::string& to const Twine& easily since it needs to actually > manipulate the string - find, access elements, and substr it.We should make ".str()" an efficient way to get a mutable std::string out.> Should this work just be done case by case? If so, I don't think I'll > end up with much Twine usage, probably just a lot of StringRef usage & > lots of str() calls.Yes, I think it should be done carefully, one API at a time. I think that would make sense. Sticking with StringRef where it is possible is simple and good. -Chris
> Are you concerned about the stack size of the smallvector's, or the amount of code that needs to be written?Mostly the code that needs to be written (my suggestion of having Twines be able to create mutable buffers internally would increase stack size by even more than just one SmallVector per Twine sink, so I haven't really been thinking stack space efficiency so much as authoring efficiency). It would be nice if authors didn't have to guess at how their callers might construct strings & could always provide Twine argument types without making a tradeoff between implementation simplicity/rapidity & runtime efficiency when consuming such strings.> The former is relying on how append works on std::string. Beyond the number of bytes copied, the former could cause (depending on the implementation of std::string) a reallocation for everyappend. Ah, indeed - I can see how an implementation of std::string might not want to do the same kind of growth mechanisms as is more common in, say, std::vector, whereas Twine knows it's got to put a bunch of things into the buffer so it can be designed with that in mind.> Well written, the Twine version could just do a resize/reserve once, then fill in the bytes. I don't know if Twine does that yet though.Yep - something I'll keep in mind to take a look at.> Right. Something like this could work: > > foo(const Twine &T) { > ... > TwineTmpBuffer Tmp; > StringSwitch(T.toString(Tmp))..... > > Which doesn't seem too horrible, just needs a typedef of smallvector to TwineTmpBuffer.In a few choice places, maybe, but as the default way to pass string parameters I think that'd be a hard sell as a general practice.>> Another example of a string sink is, say, tools/llvmc/src/Hooks.cpp. I >> found this while looking for uses of "+= '-'" to use the char support >> for Twine I'd just added. But I can't upgrade the Arg argument from >> const std::string& to const Twine& easily since it needs to actually >> manipulate the string - find, access elements, and substr it. > > We should make ".str()" an efficient way to get a mutable std::string out.In the case of Hooks.cpp I could do a similar transformation as the one you suggested above, using the SmallVectorImpl - but my point is that doing that to every string sink seems like it'd be a little annoying. If you think it's suitable/the right thing I can do that & if we find a terse/tidier/more streamlined way to handle string sinks later we can always update this usage.>> Should this work just be done case by case? If so, I don't think I'll >> end up with much Twine usage, probably just a lot of StringRef usage & >> lots of str() calls. > > Yes, I think it should be done carefully, one API at a time. I think that would make sense. Sticking with StringRef where it is possible is simple and good.Hmm, this is one bit I'm not sure about. As I tried to explain above, it seems problematic to have to choose your argument type on the basis of how you think callers might use your API. From the perspective of a caller, a Twine argument is at least as expressive as a StringRef (since all StringRefs can be Twined implicitly), but it takes that extra step to write the implementation. Perhaps I'm aiming for some kind of purist/perfectionist argument that isn't necessary or practical, but I hope I've been clear in explaining my uncertainty/issue here. Thanks, - David