Yaron Keren
2015-Apr-20 21:10 UTC
[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
Sean, thanks for reminding this, Alp did commit a class derived from raw_svector_ostream conatining an internal SmallString he called small_string_ostream. The commit was reverted after a day due to a disagreement about the commit approval and apparently abandoned. http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html The class Alp comitted did solve the possible mismatch between the SmallString and the stream by making the SmallString private to the class. This however did not treat the root problem, the duplication of the information about the buffer between SmallString and the stream. I can make performance measurements but even if the performance difference is neglible (and looking at all the code paths and conditionals of non-inlined functions at raw_ostream.cpp that's hard to believe), the current SmallString-raw_svector_ostream is simply wrong. Personally, after the previous "Alp" discussion I found and fixed several such issues in my out-of-tree code only to make new similar new error earlier this year, which I caught only months after, when Rafael committed the pwrite which reminded me the issue. Due ot the information duplication Rafael commit also had a bug and Mehdi reports similar experience. Back then Alp reported similar problems he found in LLVM code which were hopefully fixed otherwise. In addition to the information duplication, the design is quite confusing to use - Should one use the stream.str() function or the SmallString itself? - flush or str? - How do you properly clear the combination for reuse (that was my mistake, I forgot to resync after OS.clear())? It's safe to say this design pattern is very easy to get wrong one way or another, we got burned by it multiple times and it should be replaced. Alp suggested a derived class containing its own SmallString. That's a simple and effective approach to avoid the bugs mentioned but does not fix the memory or runtime issues. Small as they may be, why have them at a fundemental data structure? I was thinking about going further avoiding all duplication with a templated class derived with its own internal buffer storage. Rafael suggested a more modular approach, a derived adpater class adapter to a *simple* buffer (or nullptr for fully-dynamic operation) which also won't duplicate any information the buffer is dumb and has no state. That solution sounds simple, efficient and safe to use. The implementation would be actually simpler then raw_svector_ostream since all the coordination logic is not required anymore. 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com>:> > > On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <yaron.keren at gmail.com> > wrote: > >> A very common code pattern in LLVM is >> >> SmallString<128> S; >> raw_svector_ostream OS(S); >> OS<< ... >> Use OS.str() >> >> While raw_svector_ostream is smart to share the text buffer itself, it's >> inefficient keeping two sets of pointers to the same buffer: >> >> In SmallString: void *BeginX, *EndX, *CapacityX >> In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur >> >> Moreover, at runtime the two sets of pointers need to be coordinated >> between the SmallString and raw_svector_ostream using >> raw_svector_ostream::init, raw_svector_ostream::pwrite, raw_svector_ostream::resync >> and raw_svector_ostream::write_impl. >> All these functions have non-inlined implementations in raw_ostream.cpp. >> >> Finally, this may cause subtle bugs if S is modified without calling >> OS::resync(). This is too easy to do by mistake. >> >> In this frequent case usage the client does not really care about S being >> a SmallString with its many useful string helper function. It's just >> boilerplate code for raw_svector_ostream. But it does cost three extra >> pointers, some runtime performance and possible bugs. >> > > I agree the bugs are real (Alp proposed something a while back regarding > this?), but you will need to provide measurements to justify the cost in > runtime performance. One technique I have used in the past to measure these > sorts of things I call "stuffing": take the operation that you want to > measure, then essentially change the logic so that you pay the cost 2 > times, 3 times, etc. You can then look at the trend in performance as N > varies and extrapolate back to the case where N = 0 (i.e. you don't pay the > cost). > > For example, in one situation where I used this method it was to measure > the cost of stat'ing files (sys::fs::status) across a holistic build, using > only "time" on the command line (it was on Windows and I didn't have any > tools like DTrace available that can directly measure this). In order to do > this, I changed sys::fs::status to call stat N times instead of 1, and > measured with N=1 N=2 N=3 etc. The result was that the difference between > the N and N+1 versions was about 1-2% across N=1..10 (or whatever I > measured). In order to negate caching and other confounding effects, it is > important to try different distributions of stats; e.g. the extra stats are > on the same file as the "real" stat vs. the extra stats are on nonexistent > files in the same directory as the "real" file vs. parent directories of > the "real" file; if these match up fairly well (they did), then you have > some indication that the "stuffing" is measuring what you want to measure. > > So e.g. if you think the cost of 3 extra pointers is significant, then > "stuff" the struct with 3, 6, 9, ... extra pointers and measure the > difference in performance (e.g. measure the time building a real project). > > -- Sean Silva > > >> >> To solve all three issues, would it make sense to have >> raw_ostream-derived container with a its own SmallString like >> templated-size built-in buffer? >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150421/7b4a745d/attachment.html>
Yaron Keren
2015-Apr-24 10:46 UTC
[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
Hi, Is this what you're thinking about? The code is not tested yet, I'd like to know if the overall direction and design are correct. Yaron 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:> Sean, thanks for reminding this, Alp did commit a class derived from > raw_svector_ostream conatining an internal SmallString he called small_string_ostream. > The commit was reverted after a day due to a disagreement about the commit > approval and apparently abandoned. > > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html > > The class Alp comitted did solve the possible mismatch between the > SmallString and the stream by making the SmallString private to the class. > This however did not treat the root problem, the duplication of the > information about the buffer between SmallString and the stream. > > I can make performance measurements but even if the performance difference > is neglible (and looking at all the code paths and conditionals of > non-inlined functions at raw_ostream.cpp that's hard to believe), the > current SmallString-raw_svector_ostream is simply wrong. > > Personally, after the previous "Alp" discussion I found and fixed several > such issues in my out-of-tree code only to make new similar new error > earlier this year, which I caught only months after, when Rafael committed > the pwrite which reminded me the issue. Due ot the information > duplication Rafael commit also had a bug and Mehdi reports similar > experience. Back then Alp reported similar problems he found in LLVM code > which were hopefully fixed otherwise. > > In addition to the information duplication, the design is quite confusing > to use > > - Should one use the stream.str() function or the SmallString itself? > - flush or str? > - How do you properly clear the combination for reuse (that was my > mistake, I forgot to resync after OS.clear())? > > It's safe to say this design pattern is very easy to get wrong one way or > another, we got burned by it multiple times and it should be replaced. > > Alp suggested a derived class containing its own SmallString. That's a > simple and effective approach to avoid the bugs mentioned but does not fix > the memory or runtime issues. Small as they may be, why have them at a > fundemental data structure? > > I was thinking about going further avoiding all duplication with a > templated class derived with its own internal buffer storage. > > Rafael suggested a more modular approach, a derived adpater class adapter > to a *simple* buffer (or nullptr for fully-dynamic operation) which also > won't duplicate any information the buffer is dumb and has no state. > > That solution sounds simple, efficient and safe to use. The implementation > would be actually simpler then raw_svector_ostream since all the > coordination logic is not required anymore. > > > 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com>: > >> >> >> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <yaron.keren at gmail.com> >> wrote: >> >>> A very common code pattern in LLVM is >>> >>> SmallString<128> S; >>> raw_svector_ostream OS(S); >>> OS<< ... >>> Use OS.str() >>> >>> While raw_svector_ostream is smart to share the text buffer itself, it's >>> inefficient keeping two sets of pointers to the same buffer: >>> >>> In SmallString: void *BeginX, *EndX, *CapacityX >>> In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur >>> >>> Moreover, at runtime the two sets of pointers need to be coordinated >>> between the SmallString and raw_svector_ostream using >>> raw_svector_ostream::init, raw_svector_ostream::pwrite, raw_svector_ostream::resync >>> and raw_svector_ostream::write_impl. >>> All these functions have non-inlined implementations in raw_ostream.cpp. >>> >>> Finally, this may cause subtle bugs if S is modified without calling >>> OS::resync(). This is too easy to do by mistake. >>> >>> In this frequent case usage the client does not really care about S >>> being a SmallString with its many useful string helper function. It's just >>> boilerplate code for raw_svector_ostream. But it does cost three extra >>> pointers, some runtime performance and possible bugs. >>> >> >> I agree the bugs are real (Alp proposed something a while back regarding >> this?), but you will need to provide measurements to justify the cost in >> runtime performance. One technique I have used in the past to measure these >> sorts of things I call "stuffing": take the operation that you want to >> measure, then essentially change the logic so that you pay the cost 2 >> times, 3 times, etc. You can then look at the trend in performance as N >> varies and extrapolate back to the case where N = 0 (i.e. you don't pay the >> cost). >> >> For example, in one situation where I used this method it was to measure >> the cost of stat'ing files (sys::fs::status) across a holistic build, using >> only "time" on the command line (it was on Windows and I didn't have any >> tools like DTrace available that can directly measure this). In order to do >> this, I changed sys::fs::status to call stat N times instead of 1, and >> measured with N=1 N=2 N=3 etc. The result was that the difference between >> the N and N+1 versions was about 1-2% across N=1..10 (or whatever I >> measured). In order to negate caching and other confounding effects, it is >> important to try different distributions of stats; e.g. the extra stats are >> on the same file as the "real" stat vs. the extra stats are on nonexistent >> files in the same directory as the "real" file vs. parent directories of >> the "real" file; if these match up fairly well (they did), then you have >> some indication that the "stuffing" is measuring what you want to measure. >> >> So e.g. if you think the cost of 3 extra pointers is significant, then >> "stuff" the struct with 3, 6, 9, ... extra pointers and measure the >> difference in performance (e.g. measure the time building a real project). >> >> -- Sean Silva >> >> >>> >>> To solve all three issues, would it make sense to have >>> raw_ostream-derived container with a its own SmallString like >>> templated-size built-in buffer? >>> >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150424/606355f1/attachment.html> -------------- next part -------------- Index: include/llvm/Support/raw_ostream.h ==================================================================--- include/llvm/Support/raw_ostream.h (revision 235696) +++ include/llvm/Support/raw_ostream.h (working copy) @@ -58,6 +58,7 @@ /// If a subclass installs an external buffer using SetBuffer then it can wait /// for a \see write_impl() call to handle the data which has been put into /// this buffer. +protected: char *OutBufStart, *OutBufEnd, *OutBufCur; enum BufferKind { @@ -131,7 +132,7 @@ // Data Output Interface //===--------------------------------------------------------------------===// - void flush() { + virtual void flush() { if (OutBufCur != OutBufStart) flush_nonempty(); } @@ -208,8 +209,8 @@ /// satisfy std::isprint into an escape sequence. raw_ostream &write_escaped(StringRef Str, bool UseHexEscapes = false); - raw_ostream &write(unsigned char C); - raw_ostream &write(const char *Ptr, size_t Size); + virtual raw_ostream &write(unsigned char C); + virtual raw_ostream &write(const char *Ptr, size_t Size); // Formatted output, see the format() function in Support/Format.h. raw_ostream &operator<<(const format_object_base &Fmt); @@ -475,6 +476,51 @@ } }; +/// A raw_ostream that writes to a char array. This is a simple adaptor class. +/// This class does not encounter output errors. +class raw_char_ostream : public raw_pwrite_stream { + void write_impl(const char *Ptr, size_t Size) override{}; + void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override { + memcpy(OutBufStart + Offset, Ptr, Size); + } + void grow(size_t Size) { + size_t CurBytesAvailable = OutBufEnd - OutBufCur; + if (Size > CurBytesAvailable) { + size_t CurPos = OutBufCur - OutBufStart; + size_t NewSize = size_t(NextPowerOf2(OutBufCur - OutBufStart + Size + 2)); + SetBufferSize(NewSize); + OutBufCur = OutBufStart + CurPos; + } + } + +public: + explicit raw_char_ostream(char *Buffer, unsigned BufferSize) { + if (Buffer) + SetBuffer(Buffer, BufferSize); + else if (BufferSize) + SetBufferSize(BufferSize); + else + SetBufferSize(preferred_buffer_size()); + } + ~raw_char_ostream() override { + if (BufferMode == InternalBuffer) + delete[] OutBufStart; + } + raw_ostream &write(unsigned char C) override { + grow(1); + *OutBufCur++ = C; + } + raw_ostream &write(const char *Ptr, size_t Size) override { + grow(Size); + memcpy(OutBufCur, Ptr, Size); + OutBufCur += Size; + } + /// raw_char_ostream is always unflushed. + void flush() override {} + /// Returns the buffer. + StringRef str() { return StringRef(OutBufStart, GetBufferSize()); } +}; + /// A raw_ostream that writes to an SmallVector or SmallString. This is a /// simple adaptor class. This class does not encounter output errors. class raw_svector_ostream : public raw_pwrite_stream {
Rafael EspĂndola
2015-Apr-30 19:48 UTC
[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
I don't think we should make flush virtual. Why do you need to do it? Can't you set up the base class to write to use the tail of the memory region as the buffer? On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com> wrote:> Hi, > > Is this what you're thinking about? > The code is not tested yet, I'd like to know if the overall direction and > design are correct. > > Yaron > > > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: >> >> Sean, thanks for reminding this, Alp did commit a class derived from >> raw_svector_ostream conatining an internal SmallString he called >> small_string_ostream. The commit was reverted after a day due to a >> disagreement about the commit approval and apparently abandoned. >> >> >> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html >> >> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html >> >> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html >> >> The class Alp comitted did solve the possible mismatch between the >> SmallString and the stream by making the SmallString private to the class. >> This however did not treat the root problem, the duplication of the >> information about the buffer between SmallString and the stream. >> >> I can make performance measurements but even if the performance difference >> is neglible (and looking at all the code paths and conditionals of >> non-inlined functions at raw_ostream.cpp that's hard to believe), the >> current SmallString-raw_svector_ostream is simply wrong. >> >> Personally, after the previous "Alp" discussion I found and fixed several >> such issues in my out-of-tree code only to make new similar new error >> earlier this year, which I caught only months after, when Rafael committed >> the pwrite which reminded me the issue. Due ot the information duplication >> Rafael commit also had a bug and Mehdi reports similar experience. Back then >> Alp reported similar problems he found in LLVM code which were hopefully >> fixed otherwise. >> >> In addition to the information duplication, the design is quite confusing >> to use >> >> - Should one use the stream.str() function or the SmallString itself? >> - flush or str? >> - How do you properly clear the combination for reuse (that was my >> mistake, I forgot to resync after OS.clear())? >> >> It's safe to say this design pattern is very easy to get wrong one way or >> another, we got burned by it multiple times and it should be replaced. >> >> Alp suggested a derived class containing its own SmallString. That's a >> simple and effective approach to avoid the bugs mentioned but does not fix >> the memory or runtime issues. Small as they may be, why have them at a >> fundemental data structure? >> >> I was thinking about going further avoiding all duplication with a >> templated class derived with its own internal buffer storage. >> >> Rafael suggested a more modular approach, a derived adpater class adapter >> to a *simple* buffer (or nullptr for fully-dynamic operation) which also >> won't duplicate any information the buffer is dumb and has no state. >> >> That solution sounds simple, efficient and safe to use. The implementation >> would be actually simpler then raw_svector_ostream since all the >> coordination logic is not required anymore. >> >> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com>: >>> >>> >>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <yaron.keren at gmail.com> >>> wrote: >>>> >>>> A very common code pattern in LLVM is >>>> >>>> SmallString<128> S; >>>> raw_svector_ostream OS(S); >>>> OS<< ... >>>> Use OS.str() >>>> >>>> While raw_svector_ostream is smart to share the text buffer itself, it's >>>> inefficient keeping two sets of pointers to the same buffer: >>>> >>>> In SmallString: void *BeginX, *EndX, *CapacityX >>>> In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur >>>> >>>> Moreover, at runtime the two sets of pointers need to be coordinated >>>> between the SmallString and raw_svector_ostream using >>>> raw_svector_ostream::init, raw_svector_ostream::pwrite, >>>> raw_svector_ostream::resync and raw_svector_ostream::write_impl. >>>> All these functions have non-inlined implementations in raw_ostream.cpp. >>>> >>>> Finally, this may cause subtle bugs if S is modified without calling >>>> OS::resync(). This is too easy to do by mistake. >>>> >>>> In this frequent case usage the client does not really care about S >>>> being a SmallString with its many useful string helper function. It's just >>>> boilerplate code for raw_svector_ostream. But it does cost three extra >>>> pointers, some runtime performance and possible bugs. >>> >>> >>> I agree the bugs are real (Alp proposed something a while back regarding >>> this?), but you will need to provide measurements to justify the cost in >>> runtime performance. One technique I have used in the past to measure these >>> sorts of things I call "stuffing": take the operation that you want to >>> measure, then essentially change the logic so that you pay the cost 2 times, >>> 3 times, etc. You can then look at the trend in performance as N varies and >>> extrapolate back to the case where N = 0 (i.e. you don't pay the cost). >>> >>> For example, in one situation where I used this method it was to measure >>> the cost of stat'ing files (sys::fs::status) across a holistic build, using >>> only "time" on the command line (it was on Windows and I didn't have any >>> tools like DTrace available that can directly measure this). In order to do >>> this, I changed sys::fs::status to call stat N times instead of 1, and >>> measured with N=1 N=2 N=3 etc. The result was that the difference between >>> the N and N+1 versions was about 1-2% across N=1..10 (or whatever I >>> measured). In order to negate caching and other confounding effects, it is >>> important to try different distributions of stats; e.g. the extra stats are >>> on the same file as the "real" stat vs. the extra stats are on nonexistent >>> files in the same directory as the "real" file vs. parent directories of the >>> "real" file; if these match up fairly well (they did), then you have some >>> indication that the "stuffing" is measuring what you want to measure. >>> >>> So e.g. if you think the cost of 3 extra pointers is significant, then >>> "stuff" the struct with 3, 6, 9, ... extra pointers and measure the >>> difference in performance (e.g. measure the time building a real project). >>> >>> -- Sean Silva >>> >>>> >>>> >>>> To solve all three issues, would it make sense to have >>>> raw_ostream-derived container with a its own SmallString like templated-size >>>> built-in buffer? >>>> >>> >