Yaron Keren
2015-May-02 04:41 UTC
[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
+update diff 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:> I outlined (is that the right > word?) raw_char_ostream::grow, raw_char_ostream::write (both) into > raw_ostream.cpp with less than 10% difference in performance. > > Profiling reveals that the real culprit is the code line > > OS.reserve(OS.size() + 64); > > called from raw_svector_ostream::write_impl called from > raw_svector_ostream::~raw_svector_ostream. > The issue is already documented: > > raw_svector_ostream::~raw_svector_ostream() { > // FIXME: Prevent resizing during this flush(). > flush(); > } > > And a solution is provided in raw_svector_ostream::init(): > > // Set up the initial external buffer. We make sure that the buffer has > at > // least 128 bytes free; raw_ostream itself only requires 64, but we > want to > // make sure that we don't grow the buffer unnecessarily on destruction > (when > // the data is flushed). See the FIXME below. > OS.reserve(OS.size() + 128); > > This solution may be worse than the problem. In total: > > * If the SmallString was less than 128 bytes init() will always(!) heap > allocate. > * If it's more or equal to 128 bytes but upon destruction less than 64 > bytes are left unused it will heap allocate for no reason. > > A careful programmer who did size the SmallString to match its use + small > reserve will find either of these heap allocations surprising, negating the > SmallString advantage and then paying memcpy cost. > > I filed http://llvm.org/pr23395 about this. To temporarly avoid this > issue, in addition to the outline methods change I commented the OS.reserve > line in flush(). (This change of course requires further review.) > > With reserve being out of the way, the gap now is smaller but still > significant: raw_char_ostream is about 20% faster than > raw_svector_ostream in Release=optimized configuration. > > > 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at gmail.com>: > >> >> Could you dig into why: >> + raw_ostream &write(unsigned char C) override { >> + grow(1); >> + *OutBufCur++ = C; >> + return *this; >> + } >> >> Is 3 times as fast as raw_svector_ostream? I don't see a good reason why >> that should be any faster than: >> >> raw_ostream &operator<<(char C) { >> if (OutBufCur >= OutBufEnd) >> return write(C); >> *OutBufCur++ = C; >> return *this; >> } >> >> >> You might just be seeing the difference between having write inlined vs. >> non-inlined, in which case your patch might be a complicated way to pull >> the likely cases of some raw_ostream methods into the header. >> >> >> -- Sean Silva >> > > On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at gmail.com> > wrote: > >> Yes, we should do without virtual flush. The problem was raw_char_ostream >> should not be flushed - it's always current - so I overloaded flush to a >> no-op. A cleaner solution is attached, adding a DirectBuffer mode to the >> raw_ostream. >> >> Also added a simple performance comparison project >> between raw_char_ostream and raw_svector_ostream. On Window 7 x64 >> machine, raw_char_ostream was three times faster than raw_svector_ostream >> when the provided buffer size is large enough and two times as fast with >> one dynamic allocation resize. >> >> >> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola <rafael.espindola at gmail.com>: >> >>> I don't think we should make flush virtual. Why do you need to do it? >>> Can't you set up the base class to write to use the tail of the memory >>> region as the buffer? >>> >>> >>> >>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com> wrote: >>> > Hi, >>> > >>> > Is this what you're thinking about? >>> > The code is not tested yet, I'd like to know if the overall direction >>> and >>> > design are correct. >>> > >>> > Yaron >>> > >>> > >>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: >>> >> >>> >> Sean, thanks for reminding this, Alp did commit a class derived from >>> >> raw_svector_ostream conatining an internal SmallString he called >>> >> small_string_ostream. The commit was reverted after a day due to a >>> >> disagreement about the commit approval and apparently abandoned. >>> >> >>> >> >>> >> >>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html >>> >> >>> >> >>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html >>> >> >>> >> >>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html >>> >> >>> >> The class Alp comitted did solve the possible mismatch between the >>> >> SmallString and the stream by making the SmallString private to the >>> class. >>> >> This however did not treat the root problem, the duplication of the >>> >> information about the buffer between SmallString and the stream. >>> >> >>> >> I can make performance measurements but even if the performance >>> difference >>> >> is neglible (and looking at all the code paths and conditionals of >>> >> non-inlined functions at raw_ostream.cpp that's hard to believe), the >>> >> current SmallString-raw_svector_ostream is simply wrong. >>> >> >>> >> Personally, after the previous "Alp" discussion I found and fixed >>> several >>> >> such issues in my out-of-tree code only to make new similar new error >>> >> earlier this year, which I caught only months after, when Rafael >>> committed >>> >> the pwrite which reminded me the issue. Due ot the information >>> duplication >>> >> Rafael commit also had a bug and Mehdi reports similar experience. >>> Back then >>> >> Alp reported similar problems he found in LLVM code which were >>> hopefully >>> >> fixed otherwise. >>> >> >>> >> In addition to the information duplication, the design is quite >>> confusing >>> >> to use >>> >> >>> >> - Should one use the stream.str() function or the SmallString itself? >>> >> - flush or str? >>> >> - How do you properly clear the combination for reuse (that was my >>> >> mistake, I forgot to resync after OS.clear())? >>> >> >>> >> It's safe to say this design pattern is very easy to get wrong one >>> way or >>> >> another, we got burned by it multiple times and it should be replaced. >>> >> >>> >> Alp suggested a derived class containing its own SmallString. That's a >>> >> simple and effective approach to avoid the bugs mentioned but does >>> not fix >>> >> the memory or runtime issues. Small as they may be, why have them at a >>> >> fundemental data structure? >>> >> >>> >> I was thinking about going further avoiding all duplication with a >>> >> templated class derived with its own internal buffer storage. >>> >> >>> >> Rafael suggested a more modular approach, a derived adpater class >>> adapter >>> >> to a *simple* buffer (or nullptr for fully-dynamic operation) which >>> also >>> >> won't duplicate any information the buffer is dumb and has no state. >>> >> >>> >> That solution sounds simple, efficient and safe to use. The >>> implementation >>> >> would be actually simpler then raw_svector_ostream since all the >>> >> coordination logic is not required anymore. >>> >> >>> >> >>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com>: >>> >>> >>> >>> >>> >>> >>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <yaron.keren at gmail.com> >>> >>> wrote: >>> >>>> >>> >>>> A very common code pattern in LLVM is >>> >>>> >>> >>>> SmallString<128> S; >>> >>>> raw_svector_ostream OS(S); >>> >>>> OS<< ... >>> >>>> Use OS.str() >>> >>>> >>> >>>> While raw_svector_ostream is smart to share the text buffer itself, >>> it's >>> >>>> inefficient keeping two sets of pointers to the same buffer: >>> >>>> >>> >>>> In SmallString: void *BeginX, *EndX, *CapacityX >>> >>>> In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur >>> >>>> >>> >>>> Moreover, at runtime the two sets of pointers need to be coordinated >>> >>>> between the SmallString and raw_svector_ostream using >>> >>>> raw_svector_ostream::init, raw_svector_ostream::pwrite, >>> >>>> raw_svector_ostream::resync and raw_svector_ostream::write_impl. >>> >>>> All these functions have non-inlined implementations in >>> raw_ostream.cpp. >>> >>>> >>> >>>> Finally, this may cause subtle bugs if S is modified without calling >>> >>>> OS::resync(). This is too easy to do by mistake. >>> >>>> >>> >>>> In this frequent case usage the client does not really care about S >>> >>>> being a SmallString with its many useful string helper function. >>> It's just >>> >>>> boilerplate code for raw_svector_ostream. But it does cost three >>> extra >>> >>>> pointers, some runtime performance and possible bugs. >>> >>> >>> >>> >>> >>> I agree the bugs are real (Alp proposed something a while back >>> regarding >>> >>> this?), but you will need to provide measurements to justify the >>> cost in >>> >>> runtime performance. One technique I have used in the past to >>> measure these >>> >>> sorts of things I call "stuffing": take the operation that you want >>> to >>> >>> measure, then essentially change the logic so that you pay the cost >>> 2 times, >>> >>> 3 times, etc. You can then look at the trend in performance as N >>> varies and >>> >>> extrapolate back to the case where N = 0 (i.e. you don't pay the >>> cost). >>> >>> >>> >>> For example, in one situation where I used this method it was to >>> measure >>> >>> the cost of stat'ing files (sys::fs::status) across a holistic >>> build, using >>> >>> only "time" on the command line (it was on Windows and I didn't have >>> any >>> >>> tools like DTrace available that can directly measure this). In >>> order to do >>> >>> this, I changed sys::fs::status to call stat N times instead of 1, >>> and >>> >>> measured with N=1 N=2 N=3 etc. The result was that the difference >>> between >>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or whatever I >>> >>> measured). In order to negate caching and other confounding effects, >>> it is >>> >>> important to try different distributions of stats; e.g. the extra >>> stats are >>> >>> on the same file as the "real" stat vs. the extra stats are on >>> nonexistent >>> >>> files in the same directory as the "real" file vs. parent >>> directories of the >>> >>> "real" file; if these match up fairly well (they did), then you have >>> some >>> >>> indication that the "stuffing" is measuring what you want to measure. >>> >>> >>> >>> So e.g. if you think the cost of 3 extra pointers is significant, >>> then >>> >>> "stuff" the struct with 3, 6, 9, ... extra pointers and measure the >>> >>> difference in performance (e.g. measure the time building a real >>> project). >>> >>> >>> >>> -- Sean Silva >>> >>> >>> >>>> >>> >>>> >>> >>>> To solve all three issues, would it make sense to have >>> >>>> raw_ostream-derived container with a its own SmallString like >>> templated-size >>> >>>> built-in buffer? >>> >>>> >>> >>> >>> > >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150502/cf76e5ee/attachment.html> -------------- next part -------------- Index: include/llvm/Support/raw_ostream.h ==================================================================--- include/llvm/Support/raw_ostream.h (revision 236304) +++ include/llvm/Support/raw_ostream.h (working copy) @@ -52,17 +52,19 @@ /// OutBufEnd - OutBufStart >= 1). /// /// If buffered, then the raw_ostream owns the buffer if (BufferMode =- /// InternalBuffer); otherwise the buffer has been set via SetBuffer and is - /// managed by the subclass. + /// InternalBuffer or BufferMode == DirectBuffer); otherwise the buffer has + /// been set via SetBuffer and is managed by the subclass. /// /// If a subclass installs an external buffer using SetBuffer then it can wait /// for a \see write_impl() call to handle the data which has been put into /// this buffer. +protected: char *OutBufStart, *OutBufEnd, *OutBufCur; enum BufferKind { Unbuffered = 0, InternalBuffer, + DirectBuffer, ExternalBuffer } BufferMode; @@ -132,7 +134,7 @@ //===--------------------------------------------------------------------===// void flush() { - if (OutBufCur != OutBufStart) + if (BufferMode != DirectBuffer && OutBufCur != OutBufStart) flush_nonempty(); } @@ -208,8 +210,8 @@ /// satisfy std::isprint into an escape sequence. raw_ostream &write_escaped(StringRef Str, bool UseHexEscapes = false); - raw_ostream &write(unsigned char C); - raw_ostream &write(const char *Ptr, size_t Size); + virtual raw_ostream &write(unsigned char C); + virtual raw_ostream &write(const char *Ptr, size_t Size); // Formatted output, see the format() function in Support/Format.h. raw_ostream &operator<<(const format_object_base &Fmt); @@ -297,13 +299,13 @@ /// unbuffered. const char *getBufferStart() const { return OutBufStart; } + /// Install the given buffer and mode. + void SetBufferAndMode(char *BufferStart, size_t Size, BufferKind Mode); + //===--------------------------------------------------------------------===// // Private Interface //===--------------------------------------------------------------------===// private: - /// Install the given buffer and mode. - void SetBufferAndMode(char *BufferStart, size_t Size, BufferKind Mode); - /// Flush the current buffer, which is known to be non-empty. This outputs the /// currently buffered data and resets the buffer to empty. void flush_nonempty(); @@ -481,6 +483,33 @@ } }; +/// A raw_ostream that writes to a char array. This is a simple adaptor class. +/// This class does not encounter output errors. +class raw_char_ostream : public raw_pwrite_stream { + void write_impl(const char *Ptr, size_t Size) override{}; + void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override { + memcpy(OutBufStart + Offset, Ptr, Size); + } + uint64_t current_pos() const override { return 0; } + void grow(size_t Size); + +public: + explicit raw_char_ostream(char *Buffer, unsigned BufferSize) { + if (Buffer) { + SetBufferAndMode(Buffer, BufferSize, ExternalBuffer); + } else { + if (!BufferSize) + BufferSize = preferred_buffer_size(); + SetBufferAndMode(new char[BufferSize], BufferSize, DirectBuffer); + } + } + ~raw_char_ostream() override { OutBufCur = OutBufStart; } + raw_ostream &write(unsigned char C) override; + raw_ostream &write(const char *Ptr, size_t Size) override; + /// Returns the buffer. + StringRef str() { return StringRef(OutBufStart, GetNumBytesInBuffer()); } +}; + /// A raw_ostream that writes to an SmallVector or SmallString. This is a /// simple adaptor class. This class does not encounter output errors. class raw_svector_ostream : public raw_pwrite_stream { Index: lib/Support/raw_ostream.cpp ==================================================================--- lib/Support/raw_ostream.cpp (revision 236304) +++ lib/Support/raw_ostream.cpp (working copy) @@ -65,7 +65,7 @@ assert(OutBufCur == OutBufStart && "raw_ostream destructor called with non-empty buffer!"); - if (BufferMode == InternalBuffer) + if (BufferMode == InternalBuffer || BufferMode == DirectBuffer) delete [] OutBufStart; } @@ -95,7 +95,7 @@ // child buffer management logic will be in write_impl). assert(GetNumBytesInBuffer() == 0 && "Current buffer is non-empty!"); - if (BufferMode == InternalBuffer) + if (BufferMode == InternalBuffer || BufferMode == DirectBuffer) delete [] OutBufStart; OutBufStart = BufferStart; OutBufEnd = OutBufStart+Size; @@ -751,6 +751,29 @@ OS.append(Ptr, Size); } +void raw_char_ostream::grow(size_t Size) { + size_t CurBytesAvailable = OutBufEnd - OutBufCur; + if (Size > CurBytesAvailable) { + size_t CurPos = OutBufCur - OutBufStart; + OutBufCur = OutBufStart; + size_t NewSize = size_t(NextPowerOf2(CurPos + Size + 2)); + SetBufferAndMode(new char[NewSize], NewSize, DirectBuffer); + OutBufCur = OutBufStart + CurPos; + } +} + +raw_ostream &raw_char_ostream::write(unsigned char C) { + grow(1); + *OutBufCur++ = C; + return *this; +} +raw_ostream &raw_char_ostream::write(const char *Ptr, size_t Size) { + grow(Size); + memcpy(OutBufCur, Ptr, Size); + OutBufCur += Size; + return *this; +} + //===----------------------------------------------------------------------===// // raw_svector_ostream //===----------------------------------------------------------------------===// @@ -810,6 +833,7 @@ OS.append(Ptr, Ptr + Size); } + // TESTME: comment out reserve call. OS.reserve(OS.size() + 64); SetBuffer(OS.end(), OS.capacity() - OS.size()); } Index: projects/raw_char_ostream_speed/CMakeLists.txt ==================================================================--- projects/raw_char_ostream_speed/CMakeLists.txt (revision 0) +++ projects/raw_char_ostream_speed/CMakeLists.txt (working copy) @@ -0,0 +1,7 @@ +set(LLVM_LINK_COMPONENTS + Support +) + +add_llvm_tool(raw_char_ostream + raw_char_ostream_speed.cpp +) Index: projects/raw_char_ostream_speed/raw_char_ostream_speed.cpp ==================================================================--- projects/raw_char_ostream_speed/raw_char_ostream_speed.cpp (revision 0) +++ projects/raw_char_ostream_speed/raw_char_ostream_speed.cpp (working copy) @@ -0,0 +1,45 @@ +#include "llvm/ADT/SmallString.h" +#include "llvm/Support/raw_ostream.h" + +#include <ctime> + +using namespace llvm; + +namespace { +const StringRef FuncName + "thadddddddddsrttttttttttttttttttttttttjouhojkirjdjdsjoiuosiasddjisjsrof"; + +void writeStream(raw_ostream &OS) { + OS << "_Z" << FuncName.size() << FuncName << 'v'; +} + +const unsigned BufferSize = 128; +void test_raw_char_ostream() { + char Buffer[BufferSize]; + raw_char_ostream OS(Buffer, sizeof(Buffer)); + writeStream(OS); +} + +void test_raw_svector_ostream() { + SmallString<BufferSize> Buffer; + raw_svector_ostream OS(Buffer); + writeStream(OS); + // Make sure Buffer was not reallocated. + assert(Buffer.capacity() == BufferSize); +} + +void testIt(void (*F)()) { + std::clock_t StartTime = std::clock(); + for (unsigned i = 1e7; i; --i) + F(); + std::clock_t EndTime = std::clock(); + unsigned ms = unsigned(double(EndTime - StartTime) * 1000.0 / CLOCKS_PER_SEC); + outs() << ms << "ms\n"; +} +} + +int main() { + testIt(test_raw_char_ostream); + testIt(test_raw_svector_ostream); + return 0; +}
Yaron Keren
2015-May-02 18:31 UTC
[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
Following a hint from Duncan in http://llvm.org/pr23395, here is a revised patch. Rather then introduce the raw_char_ostream adapter, this version improves raw_svector_ostream. raw_svector_ostream is now a very lightweight adapter, running without a buffer and fowarding almost anything to the SmallString. This solves both the performance issue and the information duplication issue. The SmallString is always up-to-date and can be used at any time. flush() does nothing and is not required. resync() was kept in this patch as a no-op and will be removed with its users in a follow-up patch. I'll also try to remove the superfluous flush calls() along the way. 2015-05-02 7:41 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:> +update diff > > > 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: > >> I outlined (is that the right >> word?) raw_char_ostream::grow, raw_char_ostream::write (both) into >> raw_ostream.cpp with less than 10% difference in performance. >> >> Profiling reveals that the real culprit is the code line >> >> OS.reserve(OS.size() + 64); >> >> called from raw_svector_ostream::write_impl called from >> raw_svector_ostream::~raw_svector_ostream. >> The issue is already documented: >> >> raw_svector_ostream::~raw_svector_ostream() { >> // FIXME: Prevent resizing during this flush(). >> flush(); >> } >> >> And a solution is provided in raw_svector_ostream::init(): >> >> // Set up the initial external buffer. We make sure that the buffer has >> at >> // least 128 bytes free; raw_ostream itself only requires 64, but we >> want to >> // make sure that we don't grow the buffer unnecessarily on destruction >> (when >> // the data is flushed). See the FIXME below. >> OS.reserve(OS.size() + 128); >> >> This solution may be worse than the problem. In total: >> >> * If the SmallString was less than 128 bytes init() will always(!) heap >> allocate. >> * If it's more or equal to 128 bytes but upon destruction less than 64 >> bytes are left unused it will heap allocate for no reason. >> >> A careful programmer who did size the SmallString to match its use + >> small reserve will find either of these heap allocations surprising, >> negating the SmallString advantage and then paying memcpy cost. >> >> I filed http://llvm.org/pr23395 about this. To temporarly avoid this >> issue, in addition to the outline methods change I commented the OS.reserve >> line in flush(). (This change of course requires further review.) >> >> With reserve being out of the way, the gap now is smaller but still >> significant: raw_char_ostream is about 20% faster than >> raw_svector_ostream in Release=optimized configuration. >> >> >> 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at gmail.com>: >> >>> >>> Could you dig into why: >>> + raw_ostream &write(unsigned char C) override { >>> + grow(1); >>> + *OutBufCur++ = C; >>> + return *this; >>> + } >>> >>> Is 3 times as fast as raw_svector_ostream? I don't see a good reason why >>> that should be any faster than: >>> >>> raw_ostream &operator<<(char C) { >>> if (OutBufCur >= OutBufEnd) >>> return write(C); >>> *OutBufCur++ = C; >>> return *this; >>> } >>> >>> >>> You might just be seeing the difference between having write inlined vs. >>> non-inlined, in which case your patch might be a complicated way to pull >>> the likely cases of some raw_ostream methods into the header. >>> >>> >>> -- Sean Silva >>> >> >> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at gmail.com> >> wrote: >> >>> Yes, we should do without virtual flush. The problem >>> was raw_char_ostream should not be flushed - it's always current - so I >>> overloaded flush to a no-op. A cleaner solution is attached, adding >>> a DirectBuffer mode to the raw_ostream. >>> >>> Also added a simple performance comparison project >>> between raw_char_ostream and raw_svector_ostream. On Window 7 x64 >>> machine, raw_char_ostream was three times faster than raw_svector_ostream >>> when the provided buffer size is large enough and two times as fast with >>> one dynamic allocation resize. >>> >>> >>> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola <rafael.espindola at gmail.com> >>> : >>> >>>> I don't think we should make flush virtual. Why do you need to do it? >>>> Can't you set up the base class to write to use the tail of the memory >>>> region as the buffer? >>>> >>>> >>>> >>>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com> wrote: >>>> > Hi, >>>> > >>>> > Is this what you're thinking about? >>>> > The code is not tested yet, I'd like to know if the overall direction >>>> and >>>> > design are correct. >>>> > >>>> > Yaron >>>> > >>>> > >>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: >>>> >> >>>> >> Sean, thanks for reminding this, Alp did commit a class derived from >>>> >> raw_svector_ostream conatining an internal SmallString he called >>>> >> small_string_ostream. The commit was reverted after a day due to a >>>> >> disagreement about the commit approval and apparently abandoned. >>>> >> >>>> >> >>>> >> >>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html >>>> >> >>>> >> >>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html >>>> >> >>>> >> >>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html >>>> >> >>>> >> The class Alp comitted did solve the possible mismatch between the >>>> >> SmallString and the stream by making the SmallString private to the >>>> class. >>>> >> This however did not treat the root problem, the duplication of the >>>> >> information about the buffer between SmallString and the stream. >>>> >> >>>> >> I can make performance measurements but even if the performance >>>> difference >>>> >> is neglible (and looking at all the code paths and conditionals of >>>> >> non-inlined functions at raw_ostream.cpp that's hard to believe), the >>>> >> current SmallString-raw_svector_ostream is simply wrong. >>>> >> >>>> >> Personally, after the previous "Alp" discussion I found and fixed >>>> several >>>> >> such issues in my out-of-tree code only to make new similar new error >>>> >> earlier this year, which I caught only months after, when Rafael >>>> committed >>>> >> the pwrite which reminded me the issue. Due ot the information >>>> duplication >>>> >> Rafael commit also had a bug and Mehdi reports similar experience. >>>> Back then >>>> >> Alp reported similar problems he found in LLVM code which were >>>> hopefully >>>> >> fixed otherwise. >>>> >> >>>> >> In addition to the information duplication, the design is quite >>>> confusing >>>> >> to use >>>> >> >>>> >> - Should one use the stream.str() function or the SmallString itself? >>>> >> - flush or str? >>>> >> - How do you properly clear the combination for reuse (that was my >>>> >> mistake, I forgot to resync after OS.clear())? >>>> >> >>>> >> It's safe to say this design pattern is very easy to get wrong one >>>> way or >>>> >> another, we got burned by it multiple times and it should be >>>> replaced. >>>> >> >>>> >> Alp suggested a derived class containing its own SmallString. That's >>>> a >>>> >> simple and effective approach to avoid the bugs mentioned but does >>>> not fix >>>> >> the memory or runtime issues. Small as they may be, why have them at >>>> a >>>> >> fundemental data structure? >>>> >> >>>> >> I was thinking about going further avoiding all duplication with a >>>> >> templated class derived with its own internal buffer storage. >>>> >> >>>> >> Rafael suggested a more modular approach, a derived adpater class >>>> adapter >>>> >> to a *simple* buffer (or nullptr for fully-dynamic operation) which >>>> also >>>> >> won't duplicate any information the buffer is dumb and has no state. >>>> >> >>>> >> That solution sounds simple, efficient and safe to use. The >>>> implementation >>>> >> would be actually simpler then raw_svector_ostream since all the >>>> >> coordination logic is not required anymore. >>>> >> >>>> >> >>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com>: >>>> >>> >>>> >>> >>>> >>> >>>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <yaron.keren at gmail.com >>>> > >>>> >>> wrote: >>>> >>>> >>>> >>>> A very common code pattern in LLVM is >>>> >>>> >>>> >>>> SmallString<128> S; >>>> >>>> raw_svector_ostream OS(S); >>>> >>>> OS<< ... >>>> >>>> Use OS.str() >>>> >>>> >>>> >>>> While raw_svector_ostream is smart to share the text buffer >>>> itself, it's >>>> >>>> inefficient keeping two sets of pointers to the same buffer: >>>> >>>> >>>> >>>> In SmallString: void *BeginX, *EndX, *CapacityX >>>> >>>> In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur >>>> >>>> >>>> >>>> Moreover, at runtime the two sets of pointers need to be >>>> coordinated >>>> >>>> between the SmallString and raw_svector_ostream using >>>> >>>> raw_svector_ostream::init, raw_svector_ostream::pwrite, >>>> >>>> raw_svector_ostream::resync and raw_svector_ostream::write_impl. >>>> >>>> All these functions have non-inlined implementations in >>>> raw_ostream.cpp. >>>> >>>> >>>> >>>> Finally, this may cause subtle bugs if S is modified without >>>> calling >>>> >>>> OS::resync(). This is too easy to do by mistake. >>>> >>>> >>>> >>>> In this frequent case usage the client does not really care about S >>>> >>>> being a SmallString with its many useful string helper function. >>>> It's just >>>> >>>> boilerplate code for raw_svector_ostream. But it does cost three >>>> extra >>>> >>>> pointers, some runtime performance and possible bugs. >>>> >>> >>>> >>> >>>> >>> I agree the bugs are real (Alp proposed something a while back >>>> regarding >>>> >>> this?), but you will need to provide measurements to justify the >>>> cost in >>>> >>> runtime performance. One technique I have used in the past to >>>> measure these >>>> >>> sorts of things I call "stuffing": take the operation that you want >>>> to >>>> >>> measure, then essentially change the logic so that you pay the cost >>>> 2 times, >>>> >>> 3 times, etc. You can then look at the trend in performance as N >>>> varies and >>>> >>> extrapolate back to the case where N = 0 (i.e. you don't pay the >>>> cost). >>>> >>> >>>> >>> For example, in one situation where I used this method it was to >>>> measure >>>> >>> the cost of stat'ing files (sys::fs::status) across a holistic >>>> build, using >>>> >>> only "time" on the command line (it was on Windows and I didn't >>>> have any >>>> >>> tools like DTrace available that can directly measure this). In >>>> order to do >>>> >>> this, I changed sys::fs::status to call stat N times instead of 1, >>>> and >>>> >>> measured with N=1 N=2 N=3 etc. The result was that the difference >>>> between >>>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or whatever I >>>> >>> measured). In order to negate caching and other confounding >>>> effects, it is >>>> >>> important to try different distributions of stats; e.g. the extra >>>> stats are >>>> >>> on the same file as the "real" stat vs. the extra stats are on >>>> nonexistent >>>> >>> files in the same directory as the "real" file vs. parent >>>> directories of the >>>> >>> "real" file; if these match up fairly well (they did), then you >>>> have some >>>> >>> indication that the "stuffing" is measuring what you want to >>>> measure. >>>> >>> >>>> >>> So e.g. if you think the cost of 3 extra pointers is significant, >>>> then >>>> >>> "stuff" the struct with 3, 6, 9, ... extra pointers and measure the >>>> >>> difference in performance (e.g. measure the time building a real >>>> project). >>>> >>> >>>> >>> -- Sean Silva >>>> >>> >>>> >>>> >>>> >>>> >>>> >>>> To solve all three issues, would it make sense to have >>>> >>>> raw_ostream-derived container with a its own SmallString like >>>> templated-size >>>> >>>> built-in buffer? >>>> >>>> >>>> >>> >>>> > >>>> >>> >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150502/f543d9cb/attachment.html> -------------- next part -------------- Index: include/llvm/Support/raw_ostream.h ==================================================================--- include/llvm/Support/raw_ostream.h (revision 236389) +++ include/llvm/Support/raw_ostream.h (working copy) @@ -208,8 +208,8 @@ /// satisfy std::isprint into an escape sequence. raw_ostream &write_escaped(StringRef Str, bool UseHexEscapes = false); - raw_ostream &write(unsigned char C); - raw_ostream &write(const char *Ptr, size_t Size); + virtual raw_ostream &write(unsigned char C); + virtual raw_ostream &write(const char *Ptr, size_t Size); // Formatted output, see the format() function in Support/Format.h. raw_ostream &operator<<(const format_object_base &Fmt); @@ -483,40 +483,42 @@ /// A raw_ostream that writes to an SmallVector or SmallString. This is a /// simple adaptor class. This class does not encounter output errors. +/// raw_svector_ostream operates without a buffer, delegating all memory +/// management to the SmallString. Thus the SmallString is always up-to-date, +/// may be used directly and there is no need to call flush(). class raw_svector_ostream : public raw_pwrite_stream { SmallVectorImpl<char> &OS; /// See raw_ostream::write_impl. - void write_impl(const char *Ptr, size_t Size) override; + void write_impl(const char *Ptr, size_t Size) override{}; + void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override { + memcpy(OS.data() + Offset, Ptr, Size); + } - void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override; - /// Return the current position within the stream, not counting the bytes /// currently in the buffer. - uint64_t current_pos() const override; + uint64_t current_pos() const override { return OS.size(); } -protected: - // Like the regular constructor, but doesn't call init. - explicit raw_svector_ostream(SmallVectorImpl<char> &O, unsigned); - void init(); - public: /// Construct a new raw_svector_ostream. /// - /// \param O The vector to write to; this should generally have at least 128 - /// bytes free to avoid any extraneous memory overhead. - explicit raw_svector_ostream(SmallVectorImpl<char> &O); - ~raw_svector_ostream() override; + explicit raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) {} + ~raw_svector_ostream() override {} + raw_ostream &write(unsigned char C) override { + OS.push_back(C); + return *this; + } + raw_ostream &write(const char *Ptr, size_t Size) override { + OS.append(Ptr, Ptr + Size); + return *this; + } - /// This is called when the SmallVector we're appending to is changed outside - /// of the raw_svector_ostream's control. It is only safe to do this if the - /// raw_svector_ostream has previously been flushed. - void resync(); + // FIXME: remove resync() and users. + void resync() {} - /// Flushes the stream contents to the target vector and return a StringRef - /// for the vector contents. - StringRef str(); + /// Return a StringRef for the vector contents. + StringRef str() { return StringRef(OS.data(), OS.size()); } }; /// A raw_ostream that discards all output. @@ -539,9 +541,7 @@ SmallVector<char, 0> Buffer; public: - buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer, 0), OS(OS) { - init(); - } + buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer), OS(OS) {} ~buffer_ostream() { OS << str(); } }; Index: lib/Support/raw_ostream.cpp ==================================================================--- lib/Support/raw_ostream.cpp (revision 236389) +++ lib/Support/raw_ostream.cpp (working copy) @@ -752,78 +752,6 @@ } //===----------------------------------------------------------------------===// -// raw_svector_ostream -//===----------------------------------------------------------------------===// - -// The raw_svector_ostream implementation uses the SmallVector itself as the -// buffer for the raw_ostream. We guarantee that the raw_ostream buffer is -// always pointing past the end of the vector, but within the vector -// capacity. This allows raw_ostream to write directly into the correct place, -// and we only need to set the vector size when the data is flushed. - -raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O, unsigned) - : OS(O) {} - -raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) { - init(); -} - -void raw_svector_ostream::init() { - // Set up the initial external buffer. We make sure that the buffer has at - // least 128 bytes free; raw_ostream itself only requires 64, but we want to - // make sure that we don't grow the buffer unnecessarily on destruction (when - // the data is flushed). See the FIXME below. - OS.reserve(OS.size() + 128); - SetBuffer(OS.end(), OS.capacity() - OS.size()); -} - -raw_svector_ostream::~raw_svector_ostream() { - // FIXME: Prevent resizing during this flush(). - flush(); -} - -void raw_svector_ostream::pwrite_impl(const char *Ptr, size_t Size, - uint64_t Offset) { - flush(); - memcpy(OS.begin() + Offset, Ptr, Size); -} - -/// resync - This is called when the SmallVector we're appending to is changed -/// outside of the raw_svector_ostream's control. It is only safe to do this -/// if the raw_svector_ostream has previously been flushed. -void raw_svector_ostream::resync() { - assert(GetNumBytesInBuffer() == 0 && "Didn't flush before mutating vector"); - - if (OS.capacity() - OS.size() < 64) - OS.reserve(OS.capacity() * 2); - SetBuffer(OS.end(), OS.capacity() - OS.size()); -} - -void raw_svector_ostream::write_impl(const char *Ptr, size_t Size) { - if (Ptr == OS.end()) { - // Grow the buffer to include the scratch area without copying. - size_t NewSize = OS.size() + Size; - assert(NewSize <= OS.capacity() && "Invalid write_impl() call!"); - OS.set_size(NewSize); - } else { - assert(!GetNumBytesInBuffer()); - OS.append(Ptr, Ptr + Size); - } - - OS.reserve(OS.size() + 64); - SetBuffer(OS.end(), OS.capacity() - OS.size()); -} - -uint64_t raw_svector_ostream::current_pos() const { - return OS.size(); -} - -StringRef raw_svector_ostream::str() { - flush(); - return StringRef(OS.begin(), OS.size()); -} - -//===----------------------------------------------------------------------===// // raw_null_ostream //===----------------------------------------------------------------------===//
Yaron Keren
2015-May-22 14:17 UTC
[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
Here's a performance testcase for the raw_svector_ostream patch. On my WIndows x64 machine it runs in 1010ms with the current code and in 440ms with the patch applies. Is this OK to commit? 2015-05-02 21:31 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:> Following a hint from Duncan in http://llvm.org/pr23395, here is a > revised patch. Rather then introduce the raw_char_ostream adapter, this > version improves raw_svector_ostream. > > raw_svector_ostream is now a very lightweight adapter, running without a > buffer and fowarding almost anything to the SmallString. This solves both > the performance issue and the information duplication issue. The > SmallString is always up-to-date and can be used at any time. flush() does > nothing and is not required. resync() was kept in this patch as a no-op and > will be removed with its users in a follow-up patch. I'll also try to > remove the superfluous flush calls() along the way. > > > > 2015-05-02 7:41 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: > >> +update diff >> >> >> 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: >> >>> I outlined (is that the right >>> word?) raw_char_ostream::grow, raw_char_ostream::write (both) into >>> raw_ostream.cpp with less than 10% difference in performance. >>> >>> Profiling reveals that the real culprit is the code line >>> >>> OS.reserve(OS.size() + 64); >>> >>> called from raw_svector_ostream::write_impl called from >>> raw_svector_ostream::~raw_svector_ostream. >>> The issue is already documented: >>> >>> raw_svector_ostream::~raw_svector_ostream() { >>> // FIXME: Prevent resizing during this flush(). >>> flush(); >>> } >>> >>> And a solution is provided in raw_svector_ostream::init(): >>> >>> // Set up the initial external buffer. We make sure that the buffer >>> has at >>> // least 128 bytes free; raw_ostream itself only requires 64, but we >>> want to >>> // make sure that we don't grow the buffer unnecessarily on >>> destruction (when >>> // the data is flushed). See the FIXME below. >>> OS.reserve(OS.size() + 128); >>> >>> This solution may be worse than the problem. In total: >>> >>> * If the SmallString was less than 128 bytes init() will always(!) heap >>> allocate. >>> * If it's more or equal to 128 bytes but upon destruction less than 64 >>> bytes are left unused it will heap allocate for no reason. >>> >>> A careful programmer who did size the SmallString to match its use + >>> small reserve will find either of these heap allocations surprising, >>> negating the SmallString advantage and then paying memcpy cost. >>> >>> I filed http://llvm.org/pr23395 about this. To temporarly avoid this >>> issue, in addition to the outline methods change I commented the OS.reserve >>> line in flush(). (This change of course requires further review.) >>> >>> With reserve being out of the way, the gap now is smaller but still >>> significant: raw_char_ostream is about 20% faster than >>> raw_svector_ostream in Release=optimized configuration. >>> >>> >>> 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at gmail.com>: >>> >>>> >>>> Could you dig into why: >>>> + raw_ostream &write(unsigned char C) override { >>>> + grow(1); >>>> + *OutBufCur++ = C; >>>> + return *this; >>>> + } >>>> >>>> Is 3 times as fast as raw_svector_ostream? I don't see a good reason >>>> why that should be any faster than: >>>> >>>> raw_ostream &operator<<(char C) { >>>> if (OutBufCur >= OutBufEnd) >>>> return write(C); >>>> *OutBufCur++ = C; >>>> return *this; >>>> } >>>> >>>> >>>> You might just be seeing the difference between having write inlined >>>> vs. non-inlined, in which case your patch might be a complicated way to >>>> pull the likely cases of some raw_ostream methods into the header. >>>> >>>> >>>> -- Sean Silva >>>> >>> >>> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at gmail.com> >>> wrote: >>> >>>> Yes, we should do without virtual flush. The problem >>>> was raw_char_ostream should not be flushed - it's always current - so I >>>> overloaded flush to a no-op. A cleaner solution is attached, adding >>>> a DirectBuffer mode to the raw_ostream. >>>> >>>> Also added a simple performance comparison project >>>> between raw_char_ostream and raw_svector_ostream. On Window 7 x64 >>>> machine, raw_char_ostream was three times faster than raw_svector_ostream >>>> when the provided buffer size is large enough and two times as fast with >>>> one dynamic allocation resize. >>>> >>>> >>>> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola <rafael.espindola at gmail.com >>>> >: >>>> >>>>> I don't think we should make flush virtual. Why do you need to do it? >>>>> Can't you set up the base class to write to use the tail of the memory >>>>> region as the buffer? >>>>> >>>>> >>>>> >>>>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com> wrote: >>>>> > Hi, >>>>> > >>>>> > Is this what you're thinking about? >>>>> > The code is not tested yet, I'd like to know if the overall >>>>> direction and >>>>> > design are correct. >>>>> > >>>>> > Yaron >>>>> > >>>>> > >>>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: >>>>> >> >>>>> >> Sean, thanks for reminding this, Alp did commit a class derived from >>>>> >> raw_svector_ostream conatining an internal SmallString he called >>>>> >> small_string_ostream. The commit was reverted after a day due to a >>>>> >> disagreement about the commit approval and apparently abandoned. >>>>> >> >>>>> >> >>>>> >> >>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html >>>>> >> >>>>> >> >>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html >>>>> >> >>>>> >> >>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html >>>>> >> >>>>> >> The class Alp comitted did solve the possible mismatch between the >>>>> >> SmallString and the stream by making the SmallString private to the >>>>> class. >>>>> >> This however did not treat the root problem, the duplication of the >>>>> >> information about the buffer between SmallString and the stream. >>>>> >> >>>>> >> I can make performance measurements but even if the performance >>>>> difference >>>>> >> is neglible (and looking at all the code paths and conditionals of >>>>> >> non-inlined functions at raw_ostream.cpp that's hard to believe), >>>>> the >>>>> >> current SmallString-raw_svector_ostream is simply wrong. >>>>> >> >>>>> >> Personally, after the previous "Alp" discussion I found and fixed >>>>> several >>>>> >> such issues in my out-of-tree code only to make new similar new >>>>> error >>>>> >> earlier this year, which I caught only months after, when Rafael >>>>> committed >>>>> >> the pwrite which reminded me the issue. Due ot the information >>>>> duplication >>>>> >> Rafael commit also had a bug and Mehdi reports similar experience. >>>>> Back then >>>>> >> Alp reported similar problems he found in LLVM code which were >>>>> hopefully >>>>> >> fixed otherwise. >>>>> >> >>>>> >> In addition to the information duplication, the design is quite >>>>> confusing >>>>> >> to use >>>>> >> >>>>> >> - Should one use the stream.str() function or the SmallString >>>>> itself? >>>>> >> - flush or str? >>>>> >> - How do you properly clear the combination for reuse (that was my >>>>> >> mistake, I forgot to resync after OS.clear())? >>>>> >> >>>>> >> It's safe to say this design pattern is very easy to get wrong one >>>>> way or >>>>> >> another, we got burned by it multiple times and it should be >>>>> replaced. >>>>> >> >>>>> >> Alp suggested a derived class containing its own SmallString. >>>>> That's a >>>>> >> simple and effective approach to avoid the bugs mentioned but does >>>>> not fix >>>>> >> the memory or runtime issues. Small as they may be, why have them >>>>> at a >>>>> >> fundemental data structure? >>>>> >> >>>>> >> I was thinking about going further avoiding all duplication with a >>>>> >> templated class derived with its own internal buffer storage. >>>>> >> >>>>> >> Rafael suggested a more modular approach, a derived adpater class >>>>> adapter >>>>> >> to a *simple* buffer (or nullptr for fully-dynamic operation) which >>>>> also >>>>> >> won't duplicate any information the buffer is dumb and has no state. >>>>> >> >>>>> >> That solution sounds simple, efficient and safe to use. The >>>>> implementation >>>>> >> would be actually simpler then raw_svector_ostream since all the >>>>> >> coordination logic is not required anymore. >>>>> >> >>>>> >> >>>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com>: >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren < >>>>> yaron.keren at gmail.com> >>>>> >>> wrote: >>>>> >>>> >>>>> >>>> A very common code pattern in LLVM is >>>>> >>>> >>>>> >>>> SmallString<128> S; >>>>> >>>> raw_svector_ostream OS(S); >>>>> >>>> OS<< ... >>>>> >>>> Use OS.str() >>>>> >>>> >>>>> >>>> While raw_svector_ostream is smart to share the text buffer >>>>> itself, it's >>>>> >>>> inefficient keeping two sets of pointers to the same buffer: >>>>> >>>> >>>>> >>>> In SmallString: void *BeginX, *EndX, *CapacityX >>>>> >>>> In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur >>>>> >>>> >>>>> >>>> Moreover, at runtime the two sets of pointers need to be >>>>> coordinated >>>>> >>>> between the SmallString and raw_svector_ostream using >>>>> >>>> raw_svector_ostream::init, raw_svector_ostream::pwrite, >>>>> >>>> raw_svector_ostream::resync and raw_svector_ostream::write_impl. >>>>> >>>> All these functions have non-inlined implementations in >>>>> raw_ostream.cpp. >>>>> >>>> >>>>> >>>> Finally, this may cause subtle bugs if S is modified without >>>>> calling >>>>> >>>> OS::resync(). This is too easy to do by mistake. >>>>> >>>> >>>>> >>>> In this frequent case usage the client does not really care about >>>>> S >>>>> >>>> being a SmallString with its many useful string helper function. >>>>> It's just >>>>> >>>> boilerplate code for raw_svector_ostream. But it does cost three >>>>> extra >>>>> >>>> pointers, some runtime performance and possible bugs. >>>>> >>> >>>>> >>> >>>>> >>> I agree the bugs are real (Alp proposed something a while back >>>>> regarding >>>>> >>> this?), but you will need to provide measurements to justify the >>>>> cost in >>>>> >>> runtime performance. One technique I have used in the past to >>>>> measure these >>>>> >>> sorts of things I call "stuffing": take the operation that you >>>>> want to >>>>> >>> measure, then essentially change the logic so that you pay the >>>>> cost 2 times, >>>>> >>> 3 times, etc. You can then look at the trend in performance as N >>>>> varies and >>>>> >>> extrapolate back to the case where N = 0 (i.e. you don't pay the >>>>> cost). >>>>> >>> >>>>> >>> For example, in one situation where I used this method it was to >>>>> measure >>>>> >>> the cost of stat'ing files (sys::fs::status) across a holistic >>>>> build, using >>>>> >>> only "time" on the command line (it was on Windows and I didn't >>>>> have any >>>>> >>> tools like DTrace available that can directly measure this). In >>>>> order to do >>>>> >>> this, I changed sys::fs::status to call stat N times instead of 1, >>>>> and >>>>> >>> measured with N=1 N=2 N=3 etc. The result was that the difference >>>>> between >>>>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or whatever I >>>>> >>> measured). In order to negate caching and other confounding >>>>> effects, it is >>>>> >>> important to try different distributions of stats; e.g. the extra >>>>> stats are >>>>> >>> on the same file as the "real" stat vs. the extra stats are on >>>>> nonexistent >>>>> >>> files in the same directory as the "real" file vs. parent >>>>> directories of the >>>>> >>> "real" file; if these match up fairly well (they did), then you >>>>> have some >>>>> >>> indication that the "stuffing" is measuring what you want to >>>>> measure. >>>>> >>> >>>>> >>> So e.g. if you think the cost of 3 extra pointers is significant, >>>>> then >>>>> >>> "stuff" the struct with 3, 6, 9, ... extra pointers and measure the >>>>> >>> difference in performance (e.g. measure the time building a real >>>>> project). >>>>> >>> >>>>> >>> -- Sean Silva >>>>> >>> >>>>> >>>> >>>>> >>>> >>>>> >>>> To solve all three issues, would it make sense to have >>>>> >>>> raw_ostream-derived container with a its own SmallString like >>>>> templated-size >>>>> >>>> built-in buffer? >>>>> >>>> >>>>> >>> >>>>> > >>>>> >>>> >>>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150522/ee08391a/attachment.html> -------------- next part -------------- set(LLVM_LINK_COMPONENTS Support ) add_llvm_tool(raw_svector_ostream_speed raw_svector_ostream_speed.cpp ) -------------- next part -------------- Index: include/llvm/Support/raw_ostream.h ==================================================================--- include/llvm/Support/raw_ostream.h (revision 237261) +++ include/llvm/Support/raw_ostream.h (working copy) @@ -208,8 +208,8 @@ /// satisfy std::isprint into an escape sequence. raw_ostream &write_escaped(StringRef Str, bool UseHexEscapes = false); - raw_ostream &write(unsigned char C); - raw_ostream &write(const char *Ptr, size_t Size); + virtual raw_ostream &write(unsigned char C); + virtual raw_ostream &write(const char *Ptr, size_t Size); // Formatted output, see the format() function in Support/Format.h. raw_ostream &operator<<(const format_object_base &Fmt); @@ -483,40 +483,42 @@ /// A raw_ostream that writes to an SmallVector or SmallString. This is a /// simple adaptor class. This class does not encounter output errors. +/// raw_svector_ostream operates without a buffer, delegating all memory +/// management to the SmallString. Thus the SmallString is always up-to-date, +/// may be used directly and there is no need to call flush(). class raw_svector_ostream : public raw_pwrite_stream { SmallVectorImpl<char> &OS; /// See raw_ostream::write_impl. - void write_impl(const char *Ptr, size_t Size) override; + void write_impl(const char *Ptr, size_t Size) override{}; + void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override { + memcpy(OS.data() + Offset, Ptr, Size); + } - void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override; - /// Return the current position within the stream, not counting the bytes /// currently in the buffer. - uint64_t current_pos() const override; + uint64_t current_pos() const override { return OS.size(); } -protected: - // Like the regular constructor, but doesn't call init. - explicit raw_svector_ostream(SmallVectorImpl<char> &O, unsigned); - void init(); - public: /// Construct a new raw_svector_ostream. /// - /// \param O The vector to write to; this should generally have at least 128 - /// bytes free to avoid any extraneous memory overhead. - explicit raw_svector_ostream(SmallVectorImpl<char> &O); - ~raw_svector_ostream() override; + explicit raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) {} + ~raw_svector_ostream() override {} + raw_ostream &write(unsigned char C) override { + OS.push_back(C); + return *this; + } + raw_ostream &write(const char *Ptr, size_t Size) override { + OS.append(Ptr, Ptr + Size); + return *this; + } - /// This is called when the SmallVector we're appending to is changed outside - /// of the raw_svector_ostream's control. It is only safe to do this if the - /// raw_svector_ostream has previously been flushed. - void resync(); + // FIXME: remove resync() and users. + void resync() {} - /// Flushes the stream contents to the target vector and return a StringRef - /// for the vector contents. - StringRef str(); + /// Return a StringRef for the vector contents. + StringRef str() { return StringRef(OS.data(), OS.size()); } }; /// A raw_ostream that discards all output. @@ -539,9 +541,7 @@ SmallVector<char, 0> Buffer; public: - buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer, 0), OS(OS) { - init(); - } + buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer), OS(OS) {} ~buffer_ostream() { OS << str(); } }; Index: lib/Support/raw_ostream.cpp ==================================================================--- lib/Support/raw_ostream.cpp (revision 237261) +++ lib/Support/raw_ostream.cpp (working copy) @@ -752,78 +752,6 @@ } //===----------------------------------------------------------------------===// -// raw_svector_ostream -//===----------------------------------------------------------------------===// - -// The raw_svector_ostream implementation uses the SmallVector itself as the -// buffer for the raw_ostream. We guarantee that the raw_ostream buffer is -// always pointing past the end of the vector, but within the vector -// capacity. This allows raw_ostream to write directly into the correct place, -// and we only need to set the vector size when the data is flushed. - -raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O, unsigned) - : OS(O) {} - -raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) { - init(); -} - -void raw_svector_ostream::init() { - // Set up the initial external buffer. We make sure that the buffer has at - // least 128 bytes free; raw_ostream itself only requires 64, but we want to - // make sure that we don't grow the buffer unnecessarily on destruction (when - // the data is flushed). See the FIXME below. - OS.reserve(OS.size() + 128); - SetBuffer(OS.end(), OS.capacity() - OS.size()); -} - -raw_svector_ostream::~raw_svector_ostream() { - // FIXME: Prevent resizing during this flush(). - flush(); -} - -void raw_svector_ostream::pwrite_impl(const char *Ptr, size_t Size, - uint64_t Offset) { - flush(); - memcpy(OS.begin() + Offset, Ptr, Size); -} - -/// resync - This is called when the SmallVector we're appending to is changed -/// outside of the raw_svector_ostream's control. It is only safe to do this -/// if the raw_svector_ostream has previously been flushed. -void raw_svector_ostream::resync() { - assert(GetNumBytesInBuffer() == 0 && "Didn't flush before mutating vector"); - - if (OS.capacity() - OS.size() < 64) - OS.reserve(OS.capacity() * 2); - SetBuffer(OS.end(), OS.capacity() - OS.size()); -} - -void raw_svector_ostream::write_impl(const char *Ptr, size_t Size) { - if (Ptr == OS.end()) { - // Grow the buffer to include the scratch area without copying. - size_t NewSize = OS.size() + Size; - assert(NewSize <= OS.capacity() && "Invalid write_impl() call!"); - OS.set_size(NewSize); - } else { - assert(!GetNumBytesInBuffer()); - OS.append(Ptr, Ptr + Size); - } - - OS.reserve(OS.size() + 64); - SetBuffer(OS.end(), OS.capacity() - OS.size()); -} - -uint64_t raw_svector_ostream::current_pos() const { - return OS.size(); -} - -StringRef raw_svector_ostream::str() { - flush(); - return StringRef(OS.begin(), OS.size()); -} - -//===----------------------------------------------------------------------===// // raw_null_ostream //===----------------------------------------------------------------------===// -------------- next part -------------- A non-text attachment was scrubbed... Name: raw_svector_ostream_speed.cpp Type: text/x-c++src Size: 768 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150522/ee08391a/attachment.cpp>
Reasonably Related Threads
- SmallString + raw_svector_ostream combination should be more efficient
- [LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
- [LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
- SmallString + raw_svector_ostream combination should be more efficient
- [LLVMdev] SmallString + raw_svector_ostream combination should be more efficient