thr3ads.net - llvm dev - [llvm-dev] SmallString + raw_svector_ostream combination should be more efficient [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Yaron Keren

2015-May-22 14:17 UTC

[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

Here's a performance testcase for the raw_svector_ostream patch. On my
WIndows x64 machine it runs in 1010ms with the current code and in 440ms
with the patch applies. Is this OK to commit?


2015-05-02 21:31 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
> Following a hint from Duncan in http://llvm.org/pr23395, here is a
> revised patch. Rather then introduce the raw_char_ostream adapter, this
> version improves raw_svector_ostream.
>
> raw_svector_ostream is now a very lightweight adapter, running without a
> buffer and fowarding almost anything to the SmallString. This solves both
> the performance issue and the information duplication issue. The
> SmallString is always up-to-date and can be used at any time. flush() does
> nothing and is not required. resync() was kept in this patch as a no-op and
> will be removed with its users in a follow-up patch. I'll also try to
> remove the superfluous flush calls() along the way.
>
>
>
> 2015-05-02 7:41 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>
>> +update diff
>>
>>
>> 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>>
>>> I outlined (is that the right
>>> word?) raw_char_ostream::grow, raw_char_ostream::write (both) into
>>> raw_ostream.cpp with less than 10% difference in performance.
>>>
>>> Profiling reveals that the real culprit is the code line
>>>
>>>  OS.reserve(OS.size() + 64);
>>>
>>> called from raw_svector_ostream::write_impl called from
>>> raw_svector_ostream::~raw_svector_ostream.
>>> The issue is already documented:
>>>
>>> raw_svector_ostream::~raw_svector_ostream() {
>>>   // FIXME: Prevent resizing during this flush().
>>>   flush();
>>> }
>>>
>>> And a solution is provided in raw_svector_ostream::init():
>>>
>>>   // Set up the initial external buffer. We make sure that the
buffer
>>> has at
>>>   // least 128 bytes free; raw_ostream itself only requires 64, but
we
>>> want to
>>>   // make sure that we don't grow the buffer unnecessarily on
>>> destruction (when
>>>   // the data is flushed). See the FIXME below.
>>>   OS.reserve(OS.size() + 128);
>>>
>>> This solution may be worse than the problem. In total:
>>>
>>> * If the SmallString was less than 128 bytes init() will always(!)
heap
>>> allocate.
>>> * If it's more or equal to 128 bytes but upon destruction less
than 64
>>> bytes are left unused it will heap allocate for no reason.
>>>
>>> A careful programmer who did size the SmallString to match its use
+
>>> small reserve will find either of these heap allocations
surprising,
>>> negating the SmallString advantage and then paying memcpy cost.
>>>
>>> I filed http://llvm.org/pr23395 about this. To temporarly avoid
this
>>> issue, in addition to the outline methods change I commented the
OS.reserve
>>> line in flush(). (This change of course requires further review.)
>>>
>>> With reserve being out of the way, the gap now is smaller but still
>>> significant: raw_char_ostream is about 20% faster than
>>> raw_svector_ostream in Release=optimized configuration.
>>>
>>>
>>> 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at
gmail.com>:
>>>
>>>>
>>>> Could you dig into why:
>>>> +  raw_ostream &write(unsigned char C) override {
>>>> +    grow(1);
>>>> +    *OutBufCur++ = C;
>>>> +    return *this;
>>>> +  }
>>>>
>>>> Is 3 times as fast as raw_svector_ostream? I don't see a
good reason
>>>> why that should be any faster than:
>>>>
>>>>   raw_ostream &operator<<(char C) {
>>>>     if (OutBufCur >= OutBufEnd)
>>>>       return write(C);
>>>>     *OutBufCur++ = C;
>>>>     return *this;
>>>>   }
>>>>
>>>>
>>>> You might just be seeing the difference between having write
inlined
>>>> vs. non-inlined, in which case your patch might be a
complicated way to
>>>> pull the likely cases of some raw_ostream methods into the
header.
>>>>
>>>>
>>>> -- Sean Silva
>>>>
>>>
>>> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at
gmail.com>
>>> wrote:
>>>
>>>> Yes, we should do without virtual flush. The problem
>>>> was raw_char_ostream should not be flushed - it's always
current - so I
>>>> overloaded flush to a no-op. A cleaner solution is attached,
adding
>>>> a DirectBuffer mode to the raw_ostream.
>>>>
>>>> Also added a simple performance comparison project
>>>> between raw_char_ostream and raw_svector_ostream. On Window 7
x64
>>>> machine, raw_char_ostream was three times faster than
raw_svector_ostream
>>>> when the provided buffer size is large enough and two times as
fast with
>>>> one dynamic allocation resize.
>>>>
>>>>
>>>> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola
<rafael.espindola at gmail.com
>>>> >:
>>>>
>>>>> I don't think we should make flush virtual. Why do you
need to do it?
>>>>> Can't you set up the base class to write to use the
tail of the memory
>>>>> region as the buffer?
>>>>>
>>>>>
>>>>>
>>>>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at
gmail.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > Is this what you're thinking about?
>>>>> > The code is not tested yet, I'd like to know if
the overall
>>>>> direction and
>>>>> > design are correct.
>>>>> >
>>>>> > Yaron
>>>>> >
>>>>> >
>>>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren
at gmail.com>:
>>>>> >>
>>>>> >> Sean, thanks for reminding this, Alp did commit a
class derived from
>>>>> >> raw_svector_ostream conatining an internal
SmallString he called
>>>>> >> small_string_ostream. The commit was reverted
after a day due to a
>>>>> >> disagreement about the commit approval and
apparently abandoned.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
>>>>> >>
>>>>> >>
>>>>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
>>>>> >>
>>>>> >>
>>>>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
>>>>> >>
>>>>> >> The class Alp comitted did solve the possible
mismatch between the
>>>>> >> SmallString and the stream by making the
SmallString private to the
>>>>> class.
>>>>> >> This however did not treat the root problem, the
duplication of the
>>>>> >> information about the buffer between SmallString
and the stream.
>>>>> >>
>>>>> >> I can make performance measurements but even if
the performance
>>>>> difference
>>>>> >> is neglible (and looking at all the code paths and
conditionals of
>>>>> >> non-inlined functions at raw_ostream.cpp
that's hard to believe),
>>>>> the
>>>>> >> current SmallString-raw_svector_ostream is simply
wrong.
>>>>> >>
>>>>> >> Personally, after the previous "Alp"
discussion I found and fixed
>>>>> several
>>>>> >> such issues in my out-of-tree code only to make
new similar new
>>>>> error
>>>>> >> earlier this year, which I caught only months
after, when Rafael
>>>>> committed
>>>>> >> the pwrite which reminded me the issue. Due ot the
information
>>>>> duplication
>>>>> >> Rafael commit also had a bug and Mehdi reports
similar experience.
>>>>> Back then
>>>>> >> Alp reported similar problems he found in LLVM
code which were
>>>>> hopefully
>>>>> >> fixed otherwise.
>>>>> >>
>>>>> >> In addition to the information duplication, the
design is quite
>>>>> confusing
>>>>> >> to use
>>>>> >>
>>>>> >> - Should one use the stream.str() function or the
SmallString
>>>>> itself?
>>>>> >> - flush or str?
>>>>> >> - How do you properly clear the combination for
reuse (that was my
>>>>> >> mistake, I forgot to resync after OS.clear())?
>>>>> >>
>>>>> >> It's safe to say this design pattern is very
easy to get wrong one
>>>>> way or
>>>>> >> another, we got burned by it multiple times and it
should be
>>>>> replaced.
>>>>> >>
>>>>> >> Alp suggested a derived class containing its own
SmallString.
>>>>> That's a
>>>>> >> simple and effective approach to avoid the bugs
mentioned but does
>>>>> not fix
>>>>> >> the memory or runtime issues. Small as they may
be, why have them
>>>>> at a
>>>>> >> fundemental data structure?
>>>>> >>
>>>>> >> I was thinking about going further avoiding all
duplication with a
>>>>> >> templated class derived with its own internal
buffer storage.
>>>>> >>
>>>>> >> Rafael suggested a more modular approach, a
derived adpater class
>>>>> adapter
>>>>> >> to a *simple* buffer (or nullptr for fully-dynamic
operation) which
>>>>> also
>>>>> >> won't duplicate any information the buffer is
dumb and has no state.
>>>>> >>
>>>>> >> That solution sounds simple, efficient and safe to
use. The
>>>>> implementation
>>>>> >> would be actually simpler then raw_svector_ostream
since all the
>>>>> >> coordination logic is not required anymore.
>>>>> >>
>>>>> >>
>>>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva
<chisophugis at gmail.com>:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren
<
>>>>> yaron.keren at gmail.com>
>>>>> >>> wrote:
>>>>> >>>>
>>>>> >>>> A very common code pattern in LLVM is
>>>>> >>>>
>>>>> >>>>  SmallString<128> S;
>>>>> >>>>  raw_svector_ostream OS(S);
>>>>> >>>>  OS<< ...
>>>>> >>>>  Use OS.str()
>>>>> >>>>
>>>>> >>>> While raw_svector_ostream is smart to
share the text buffer
>>>>> itself, it's
>>>>> >>>> inefficient keeping two sets of pointers
to the same buffer:
>>>>> >>>>
>>>>> >>>>  In SmallString: void *BeginX, *EndX,
*CapacityX
>>>>> >>>>  In raw_ostream: char *OutBufStart,
*OutBufEnd, *OutBufCur
>>>>> >>>>
>>>>> >>>> Moreover, at runtime the two sets of
pointers need to be
>>>>> coordinated
>>>>> >>>> between the SmallString and
raw_svector_ostream using
>>>>> >>>> raw_svector_ostream::init,
raw_svector_ostream::pwrite,
>>>>> >>>> raw_svector_ostream::resync and
raw_svector_ostream::write_impl.
>>>>> >>>> All these functions have non-inlined
implementations in
>>>>> raw_ostream.cpp.
>>>>> >>>>
>>>>> >>>> Finally, this may cause subtle bugs if S
is modified without
>>>>> calling
>>>>> >>>> OS::resync(). This is too easy to do by
mistake.
>>>>> >>>>
>>>>> >>>> In this frequent case usage the client
does not really care about
>>>>> S
>>>>> >>>> being a SmallString with its many useful
string helper function.
>>>>> It's just
>>>>> >>>> boilerplate code for raw_svector_ostream.
But it does cost three
>>>>> extra
>>>>> >>>> pointers, some runtime performance and
possible bugs.
>>>>> >>>
>>>>> >>>
>>>>> >>> I agree the bugs are real (Alp proposed
something a while back
>>>>> regarding
>>>>> >>> this?), but you will need to provide
measurements to justify the
>>>>> cost in
>>>>> >>> runtime performance. One technique I have used
in the past to
>>>>> measure these
>>>>> >>> sorts of things I call "stuffing":
take the operation that you
>>>>> want to
>>>>> >>> measure, then essentially change the logic so
that you pay the
>>>>> cost 2 times,
>>>>> >>> 3 times, etc. You can then look at the trend
in performance as N
>>>>> varies and
>>>>> >>> extrapolate back to the case where N = 0 (i.e.
you don't pay the
>>>>> cost).
>>>>> >>>
>>>>> >>> For example, in one situation where I used
this method it was to
>>>>> measure
>>>>> >>> the cost of stat'ing files
(sys::fs::status) across a holistic
>>>>> build, using
>>>>> >>> only "time" on the command line (it
was on Windows and I didn't
>>>>> have any
>>>>> >>> tools like DTrace available that can directly
measure this). In
>>>>> order to do
>>>>> >>> this, I changed sys::fs::status to call stat N
times instead of 1,
>>>>> and
>>>>> >>> measured with N=1 N=2 N=3 etc. The result was
that the difference
>>>>> between
>>>>> >>> the N and N+1 versions was about 1-2% across
N=1..10 (or whatever I
>>>>> >>> measured). In order to negate caching and
other confounding
>>>>> effects, it is
>>>>> >>> important to try different distributions of
stats; e.g. the extra
>>>>> stats are
>>>>> >>> on the same file as the "real" stat
vs. the extra stats are on
>>>>> nonexistent
>>>>> >>> files in the same directory as the
"real" file vs. parent
>>>>> directories of the
>>>>> >>> "real" file; if these match up
fairly well (they did), then you
>>>>> have some
>>>>> >>> indication that the "stuffing" is
measuring what you want to
>>>>> measure.
>>>>> >>>
>>>>> >>> So e.g. if you think the cost of 3 extra
pointers is significant,
>>>>> then
>>>>> >>> "stuff" the struct with 3, 6, 9, ...
extra pointers and measure the
>>>>> >>> difference in performance (e.g. measure the
time building a real
>>>>> project).
>>>>> >>>
>>>>> >>> -- Sean Silva
>>>>> >>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> To solve all three issues, would it make
sense to have
>>>>> >>>> raw_ostream-derived container with a its
own SmallString like
>>>>> templated-size
>>>>> >>>> built-in buffer?
>>>>> >>>>
>>>>> >>>
>>>>> >
>>>>>
>>>>
>>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150522/ee08391a/attachment.html>
-------------- next part --------------
set(LLVM_LINK_COMPONENTS
  Support
)

add_llvm_tool(raw_svector_ostream_speed
  raw_svector_ostream_speed.cpp
)
-------------- next part --------------
Index: include/llvm/Support/raw_ostream.h
==================================================================---
include/llvm/Support/raw_ostream.h	(revision 237261)
+++ include/llvm/Support/raw_ostream.h	(working copy)
@@ -208,8 +208,8 @@
   /// satisfy std::isprint into an escape sequence.
   raw_ostream &write_escaped(StringRef Str, bool UseHexEscapes = false);
 
-  raw_ostream &write(unsigned char C);
-  raw_ostream &write(const char *Ptr, size_t Size);
+  virtual raw_ostream &write(unsigned char C);
+  virtual raw_ostream &write(const char *Ptr, size_t Size);
 
   // Formatted output, see the format() function in Support/Format.h.
   raw_ostream &operator<<(const format_object_base &Fmt);
@@ -483,40 +483,42 @@
 
 /// A raw_ostream that writes to an SmallVector or SmallString.  This is a
 /// simple adaptor class. This class does not encounter output errors.
+/// raw_svector_ostream operates without a buffer, delegating all memory
+/// management to the SmallString. Thus the SmallString is always up-to-date,
+/// may be used directly and there is no need to call flush().
 class raw_svector_ostream : public raw_pwrite_stream {
   SmallVectorImpl<char> &OS;
 
   /// See raw_ostream::write_impl.
-  void write_impl(const char *Ptr, size_t Size) override;
+  void write_impl(const char *Ptr, size_t Size) override{};
+  void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override {
+    memcpy(OS.data() + Offset, Ptr, Size);
+  }
 
-  void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override;
-
   /// Return the current position within the stream, not counting the bytes
   /// currently in the buffer.
-  uint64_t current_pos() const override;
+  uint64_t current_pos() const override { return OS.size(); }
 
-protected:
-  // Like the regular constructor, but doesn't call init.
-  explicit raw_svector_ostream(SmallVectorImpl<char> &O, unsigned);
-  void init();
-
 public:
   /// Construct a new raw_svector_ostream.
   ///
-  /// \param O The vector to write to; this should generally have at least 128
-  /// bytes free to avoid any extraneous memory overhead.
-  explicit raw_svector_ostream(SmallVectorImpl<char> &O);
-  ~raw_svector_ostream() override;
+  explicit raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) {}
+  ~raw_svector_ostream() override {}
 
+  raw_ostream &write(unsigned char C) override {
+    OS.push_back(C);
+    return *this;
+  }
+  raw_ostream &write(const char *Ptr, size_t Size) override {
+    OS.append(Ptr, Ptr + Size);
+    return *this;
+  }
 
-  /// This is called when the SmallVector we're appending to is changed
outside
-  /// of the raw_svector_ostream's control.  It is only safe to do this if
the
-  /// raw_svector_ostream has previously been flushed.
-  void resync();
+  // FIXME: remove resync() and users.
+  void resync() {}
 
-  /// Flushes the stream contents to the target vector and return a StringRef
-  /// for the vector contents.
-  StringRef str();
+  /// Return a StringRef for the vector contents.
+  StringRef str() { return StringRef(OS.data(), OS.size()); }
 };
 
 /// A raw_ostream that discards all output.
@@ -539,9 +541,7 @@
   SmallVector<char, 0> Buffer;
 
 public:
-  buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer, 0), OS(OS)
{
-    init();
-  }
+  buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer), OS(OS) {}
   ~buffer_ostream() { OS << str(); }
 };
 
Index: lib/Support/raw_ostream.cpp
==================================================================---
lib/Support/raw_ostream.cpp	(revision 237261)
+++ lib/Support/raw_ostream.cpp	(working copy)
@@ -752,78 +752,6 @@
 }
 
 //===----------------------------------------------------------------------===//
-//  raw_svector_ostream
-//===----------------------------------------------------------------------===//
-
-// The raw_svector_ostream implementation uses the SmallVector itself as the
-// buffer for the raw_ostream. We guarantee that the raw_ostream buffer is
-// always pointing past the end of the vector, but within the vector
-// capacity. This allows raw_ostream to write directly into the correct place,
-// and we only need to set the vector size when the data is flushed.
-
-raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O,
unsigned)
-    : OS(O) {}
-
-raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O) :
OS(O) {
-  init();
-}
-
-void raw_svector_ostream::init() {
-  // Set up the initial external buffer. We make sure that the buffer has at
-  // least 128 bytes free; raw_ostream itself only requires 64, but we want to
-  // make sure that we don't grow the buffer unnecessarily on destruction
(when
-  // the data is flushed). See the FIXME below.
-  OS.reserve(OS.size() + 128);
-  SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-raw_svector_ostream::~raw_svector_ostream() {
-  // FIXME: Prevent resizing during this flush().
-  flush();
-}
-
-void raw_svector_ostream::pwrite_impl(const char *Ptr, size_t Size,
-                                      uint64_t Offset) {
-  flush();
-  memcpy(OS.begin() + Offset, Ptr, Size);
-}
-
-/// resync - This is called when the SmallVector we're appending to is
changed
-/// outside of the raw_svector_ostream's control.  It is only safe to do
this
-/// if the raw_svector_ostream has previously been flushed.
-void raw_svector_ostream::resync() {
-  assert(GetNumBytesInBuffer() == 0 && "Didn't flush before
mutating vector");
-
-  if (OS.capacity() - OS.size() < 64)
-    OS.reserve(OS.capacity() * 2);
-  SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-void raw_svector_ostream::write_impl(const char *Ptr, size_t Size) {
-  if (Ptr == OS.end()) {
-    // Grow the buffer to include the scratch area without copying.
-    size_t NewSize = OS.size() + Size;
-    assert(NewSize <= OS.capacity() && "Invalid write_impl()
call!");
-    OS.set_size(NewSize);
-  } else {
-    assert(!GetNumBytesInBuffer());
-    OS.append(Ptr, Ptr + Size);
-  }
-
-  OS.reserve(OS.size() + 64);
-  SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-uint64_t raw_svector_ostream::current_pos() const {
-   return OS.size();
-}
-
-StringRef raw_svector_ostream::str() {
-  flush();
-  return StringRef(OS.begin(), OS.size());
-}
-
-//===----------------------------------------------------------------------===//
 //  raw_null_ostream
 //===----------------------------------------------------------------------===//
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: raw_svector_ostream_speed.cpp
Type: text/x-c++src
Size: 768 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150522/ee08391a/attachment.cpp>

Rafael Espíndola

2015-Jul-11 17:48 UTC

head link

[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

This makes write virtual. What is the impact of that on the other streamers?



On 22 May 2015 at 10:17, Yaron Keren <yaron.keren at gmail.com>
wrote:> Here's a performance testcase for the raw_svector_ostream patch. On my
> WIndows x64 machine it runs in 1010ms with the current code and in 440ms
> with the patch applies. Is this OK to commit?
>
>
> 2015-05-02 21:31 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>>
>> Following a hint from Duncan in http://llvm.org/pr23395, here is a
revised
>> patch. Rather then introduce the raw_char_ostream adapter, this version
>> improves raw_svector_ostream.
>>
>> raw_svector_ostream is now a very lightweight adapter, running without
a
>> buffer and fowarding almost anything to the SmallString. This solves
both
>> the performance issue and the information duplication issue. The
SmallString
>> is always up-to-date and can be used at any time. flush() does nothing
and
>> is not required. resync() was kept in this patch as a no-op and will be
>> removed with its users in a follow-up patch. I'll also try to
remove the
>> superfluous flush calls() along the way.
>>
>>
>>
>> 2015-05-02 7:41 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>>>
>>> +update diff
>>>
>>>
>>> 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at
gmail.com>:
>>>>
>>>> I outlined (is that the right word?) raw_char_ostream::grow,
>>>> raw_char_ostream::write (both) into raw_ostream.cpp with less
than 10%
>>>> difference in performance.
>>>>
>>>> Profiling reveals that the real culprit is the code line
>>>>
>>>>  OS.reserve(OS.size() + 64);
>>>>
>>>> called from raw_svector_ostream::write_impl called from
>>>> raw_svector_ostream::~raw_svector_ostream.
>>>> The issue is already documented:
>>>>
>>>> raw_svector_ostream::~raw_svector_ostream() {
>>>>   // FIXME: Prevent resizing during this flush().
>>>>   flush();
>>>> }
>>>>
>>>> And a solution is provided in raw_svector_ostream::init():
>>>>
>>>>   // Set up the initial external buffer. We make sure that the
buffer
>>>> has at
>>>>   // least 128 bytes free; raw_ostream itself only requires 64,
but we
>>>> want to
>>>>   // make sure that we don't grow the buffer unnecessarily
on
>>>> destruction (when
>>>>   // the data is flushed). See the FIXME below.
>>>>   OS.reserve(OS.size() + 128);
>>>>
>>>> This solution may be worse than the problem. In total:
>>>>
>>>> * If the SmallString was less than 128 bytes init() will
always(!) heap
>>>> allocate.
>>>> * If it's more or equal to 128 bytes but upon destruction
less than 64
>>>> bytes are left unused it will heap allocate for no reason.
>>>>
>>>> A careful programmer who did size the SmallString to match its
use +
>>>> small reserve will find either of these heap allocations
surprising,
>>>> negating the SmallString advantage and then paying memcpy cost.
>>>>
>>>> I filed http://llvm.org/pr23395 about this. To temporarly avoid
this
>>>> issue, in addition to the outline methods change I commented
the OS.reserve
>>>> line in flush(). (This change of course requires further
review.)
>>>>
>>>> With reserve being out of the way, the gap now is smaller but
still
>>>> significant: raw_char_ostream is about 20% faster than
raw_svector_ostream
>>>> in Release=optimized configuration.
>>>>
>>>>
>>>> 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at
gmail.com>:
>>>>>
>>>>>
>>>>> Could you dig into why:
>>>>> +  raw_ostream &write(unsigned char C) override {
>>>>> +    grow(1);
>>>>> +    *OutBufCur++ = C;
>>>>> +    return *this;
>>>>> +  }
>>>>>
>>>>> Is 3 times as fast as raw_svector_ostream? I don't see
a good reason
>>>>> why that should be any faster than:
>>>>>
>>>>>   raw_ostream &operator<<(char C) {
>>>>>     if (OutBufCur >= OutBufEnd)
>>>>>       return write(C);
>>>>>     *OutBufCur++ = C;
>>>>>     return *this;
>>>>>   }
>>>>>
>>>>>
>>>>> You might just be seeing the difference between having
write inlined
>>>>> vs. non-inlined, in which case your patch might be a
complicated way to pull
>>>>> the likely cases of some raw_ostream methods into the
header.
>>>>>
>>>>>
>>>>> -- Sean Silva
>>>>
>>>>
>>>> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at
gmail.com>
>>>> wrote:
>>>>>
>>>>> Yes, we should do without virtual flush. The problem was
>>>>> raw_char_ostream should not be flushed - it's always
current - so I
>>>>> overloaded flush to a no-op. A cleaner solution is
attached, adding a
>>>>> DirectBuffer mode to the raw_ostream.
>>>>>
>>>>> Also added a simple performance comparison project between
>>>>> raw_char_ostream and raw_svector_ostream. On Window 7 x64
machine,
>>>>> raw_char_ostream was three times faster than
raw_svector_ostream when the
>>>>> provided buffer size is large enough and two times as fast
with one dynamic
>>>>> allocation resize.
>>>>>
>>>>>
>>>>> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola
>>>>> <rafael.espindola at gmail.com>:
>>>>>>
>>>>>> I don't think we should make flush virtual. Why do
you need to do it?
>>>>>> Can't you set up the base class to write to use the
tail of the memory
>>>>>> region as the buffer?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren
at gmail.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > Is this what you're thinking about?
>>>>>> > The code is not tested yet, I'd like to know
if the overall
>>>>>> > direction and
>>>>>> > design are correct.
>>>>>> >
>>>>>> > Yaron
>>>>>> >
>>>>>> >
>>>>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren
<yaron.keren at gmail.com>:
>>>>>> >>
>>>>>> >> Sean, thanks for reminding this, Alp did
commit a class derived
>>>>>> >> from
>>>>>> >> raw_svector_ostream conatining an internal
SmallString he called
>>>>>> >> small_string_ostream. The commit was reverted
after a day due to a
>>>>>> >> disagreement about the commit approval and
apparently abandoned.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
>>>>>> >>
>>>>>> >>
>>>>>> >>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
>>>>>> >>
>>>>>> >>
>>>>>> >>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
>>>>>> >>
>>>>>> >> The class Alp comitted did solve the possible
mismatch between the
>>>>>> >> SmallString and the stream by making the
SmallString private to the
>>>>>> >> class.
>>>>>> >> This however did not treat the root problem,
the duplication of the
>>>>>> >> information about the buffer between
SmallString and the stream.
>>>>>> >>
>>>>>> >> I can make performance measurements but even
if the performance
>>>>>> >> difference
>>>>>> >> is neglible (and looking at all the code paths
and conditionals of
>>>>>> >> non-inlined functions at raw_ostream.cpp
that's hard to believe),
>>>>>> >> the
>>>>>> >> current SmallString-raw_svector_ostream is
simply wrong.
>>>>>> >>
>>>>>> >> Personally, after the previous "Alp"
discussion I found and fixed
>>>>>> >> several
>>>>>> >> such issues in my out-of-tree code only to
make new similar new
>>>>>> >> error
>>>>>> >> earlier this year, which I caught only months
after, when Rafael
>>>>>> >> committed
>>>>>> >> the pwrite which reminded me the issue. Due ot
the information
>>>>>> >> duplication
>>>>>> >> Rafael commit also had a bug and Mehdi reports
similar experience.
>>>>>> >> Back then
>>>>>> >> Alp reported similar problems he found in LLVM
code which were
>>>>>> >> hopefully
>>>>>> >> fixed otherwise.
>>>>>> >>
>>>>>> >> In addition to the information duplication,
the design is quite
>>>>>> >> confusing
>>>>>> >> to use
>>>>>> >>
>>>>>> >> - Should one use the stream.str() function or
the SmallString
>>>>>> >> itself?
>>>>>> >> - flush or str?
>>>>>> >> - How do you properly clear the combination
for reuse (that was my
>>>>>> >> mistake, I forgot to resync after OS.clear())?
>>>>>> >>
>>>>>> >> It's safe to say this design pattern is
very easy to get wrong one
>>>>>> >> way or
>>>>>> >> another, we got burned by it multiple times
and it should be
>>>>>> >> replaced.
>>>>>> >>
>>>>>> >> Alp suggested a derived class containing its
own SmallString.
>>>>>> >> That's a
>>>>>> >> simple and effective approach to avoid the
bugs mentioned but does
>>>>>> >> not fix
>>>>>> >> the memory or runtime issues. Small as they
may be, why have them
>>>>>> >> at a
>>>>>> >> fundemental data structure?
>>>>>> >>
>>>>>> >> I was thinking about going further avoiding
all duplication with a
>>>>>> >> templated class derived with its own internal
buffer storage.
>>>>>> >>
>>>>>> >> Rafael suggested a more modular approach, a
derived adpater class
>>>>>> >> adapter
>>>>>> >> to a *simple* buffer (or nullptr for
fully-dynamic operation) which
>>>>>> >> also
>>>>>> >> won't duplicate any information the buffer
is dumb and has no
>>>>>> >> state.
>>>>>> >>
>>>>>> >> That solution sounds simple, efficient and
safe to use. The
>>>>>> >> implementation
>>>>>> >> would be actually simpler then
raw_svector_ostream since all the
>>>>>> >> coordination logic is not required anymore.
>>>>>> >>
>>>>>> >>
>>>>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva
<chisophugis at gmail.com>:
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron
Keren
>>>>>> >>> <yaron.keren at gmail.com>
>>>>>> >>> wrote:
>>>>>> >>>>
>>>>>> >>>> A very common code pattern in LLVM is
>>>>>> >>>>
>>>>>> >>>>  SmallString<128> S;
>>>>>> >>>>  raw_svector_ostream OS(S);
>>>>>> >>>>  OS<< ...
>>>>>> >>>>  Use OS.str()
>>>>>> >>>>
>>>>>> >>>> While raw_svector_ostream is smart to
share the text buffer
>>>>>> >>>> itself, it's
>>>>>> >>>> inefficient keeping two sets of
pointers to the same buffer:
>>>>>> >>>>
>>>>>> >>>>  In SmallString: void *BeginX, *EndX,
*CapacityX
>>>>>> >>>>  In raw_ostream: char *OutBufStart,
*OutBufEnd, *OutBufCur
>>>>>> >>>>
>>>>>> >>>> Moreover, at runtime the two sets of
pointers need to be
>>>>>> >>>> coordinated
>>>>>> >>>> between the SmallString and
raw_svector_ostream using
>>>>>> >>>> raw_svector_ostream::init,
raw_svector_ostream::pwrite,
>>>>>> >>>> raw_svector_ostream::resync and
raw_svector_ostream::write_impl.
>>>>>> >>>> All these functions have non-inlined
implementations in
>>>>>> >>>> raw_ostream.cpp.
>>>>>> >>>>
>>>>>> >>>> Finally, this may cause subtle bugs if
S is modified without
>>>>>> >>>> calling
>>>>>> >>>> OS::resync(). This is too easy to do
by mistake.
>>>>>> >>>>
>>>>>> >>>> In this frequent case usage the client
does not really care about
>>>>>> >>>> S
>>>>>> >>>> being a SmallString with its many
useful string helper function.
>>>>>> >>>> It's just
>>>>>> >>>> boilerplate code for
raw_svector_ostream. But it does cost three
>>>>>> >>>> extra
>>>>>> >>>> pointers, some runtime performance and
possible bugs.
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> I agree the bugs are real (Alp proposed
something a while back
>>>>>> >>> regarding
>>>>>> >>> this?), but you will need to provide
measurements to justify the
>>>>>> >>> cost in
>>>>>> >>> runtime performance. One technique I have
used in the past to
>>>>>> >>> measure these
>>>>>> >>> sorts of things I call
"stuffing": take the operation that you
>>>>>> >>> want to
>>>>>> >>> measure, then essentially change the logic
so that you pay the
>>>>>> >>> cost 2 times,
>>>>>> >>> 3 times, etc. You can then look at the
trend in performance as N
>>>>>> >>> varies and
>>>>>> >>> extrapolate back to the case where N = 0
(i.e. you don't pay the
>>>>>> >>> cost).
>>>>>> >>>
>>>>>> >>> For example, in one situation where I used
this method it was to
>>>>>> >>> measure
>>>>>> >>> the cost of stat'ing files
(sys::fs::status) across a holistic
>>>>>> >>> build, using
>>>>>> >>> only "time" on the command line
(it was on Windows and I didn't
>>>>>> >>> have any
>>>>>> >>> tools like DTrace available that can
directly measure this). In
>>>>>> >>> order to do
>>>>>> >>> this, I changed sys::fs::status to call
stat N times instead of 1,
>>>>>> >>> and
>>>>>> >>> measured with N=1 N=2 N=3 etc. The result
was that the difference
>>>>>> >>> between
>>>>>> >>> the N and N+1 versions was about 1-2%
across N=1..10 (or whatever
>>>>>> >>> I
>>>>>> >>> measured). In order to negate caching and
other confounding
>>>>>> >>> effects, it is
>>>>>> >>> important to try different distributions
of stats; e.g. the extra
>>>>>> >>> stats are
>>>>>> >>> on the same file as the "real"
stat vs. the extra stats are on
>>>>>> >>> nonexistent
>>>>>> >>> files in the same directory as the
"real" file vs. parent
>>>>>> >>> directories of the
>>>>>> >>> "real" file; if these match up
fairly well (they did), then you
>>>>>> >>> have some
>>>>>> >>> indication that the "stuffing"
is measuring what you want to
>>>>>> >>> measure.
>>>>>> >>>
>>>>>> >>> So e.g. if you think the cost of 3 extra
pointers is significant,
>>>>>> >>> then
>>>>>> >>> "stuff" the struct with 3, 6, 9,
... extra pointers and measure
>>>>>> >>> the
>>>>>> >>> difference in performance (e.g. measure
the time building a real
>>>>>> >>> project).
>>>>>> >>>
>>>>>> >>> -- Sean Silva
>>>>>> >>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> To solve all three issues, would it
make sense to have
>>>>>> >>>> raw_ostream-derived container with a
its own SmallString like
>>>>>> >>>> templated-size
>>>>>> >>>> built-in buffer?
>>>>>> >>>>
>>>>>> >>>
>>>>>> >
>>>>>
>>>>>
>>>>
>

Yaron Keren via llvm-dev

2015-Aug-12 05:53 UTC

head link

[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient

+llvm-dev at lists.llvm.org

The impact should be small as all the other streamers usually write
directly to the memory buffer and only when out of buffer they call write().

OTOH, raw_svector_ostream (without a buffer) goes though write for every
character or block it writes. It can work without virtual write() by
overriding the existing virtual write_impl() but this is a slower code path
for raw_svector_ostream.

Even though the attached raw_svector_ostream patch without virtual write()
is still significantly faster then the current raw_svector_ostream since it
avoids the double buffering. With this patch, on my system the example runs
at 750ms compared with about 1000ms for the current raw_svector_ostream.

Would you prefer to continue review on Phabricator?

2015-07-11 20:48 GMT+03:00 Rafael Espíndola <rafael.espindola at
gmail.com>:
> This makes write virtual. What is the impact of that on the other
> streamers?
>
>
>
> On 22 May 2015 at 10:17, Yaron Keren <yaron.keren at gmail.com>
wrote:
> > Here's a performance testcase for the raw_svector_ostream patch.
On my
> > WIndows x64 machine it runs in 1010ms with the current code and in
440ms
> > with the patch applies. Is this OK to commit?
> >
> >
> > 2015-05-02 21:31 GMT+03:00 Yaron Keren <yaron.keren at
gmail.com>:
> >>
> >> Following a hint from Duncan in http://llvm.org/pr23395, here is a
> revised
> >> patch. Rather then introduce the raw_char_ostream adapter, this
version
> >> improves raw_svector_ostream.
> >>
> >> raw_svector_ostream is now a very lightweight adapter, running
without a
> >> buffer and fowarding almost anything to the SmallString. This
solves
> both
> >> the performance issue and the information duplication issue. The
> SmallString
> >> is always up-to-date and can be used at any time. flush() does
nothing
> and
> >> is not required. resync() was kept in this patch as a no-op and
will be
> >> removed with its users in a follow-up patch. I'll also try to
remove the
> >> superfluous flush calls() along the way.
> >>
> >>
> >>
> >> 2015-05-02 7:41 GMT+03:00 Yaron Keren <yaron.keren at
gmail.com>:
> >>>
> >>> +update diff
> >>>
> >>>
> >>> 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at
gmail.com>:
> >>>>
> >>>> I outlined (is that the right word?)
raw_char_ostream::grow,
> >>>> raw_char_ostream::write (both) into raw_ostream.cpp with
less than 10%
> >>>> difference in performance.
> >>>>
> >>>> Profiling reveals that the real culprit is the code line
> >>>>
> >>>>  OS.reserve(OS.size() + 64);
> >>>>
> >>>> called from raw_svector_ostream::write_impl called from
> >>>> raw_svector_ostream::~raw_svector_ostream.
> >>>> The issue is already documented:
> >>>>
> >>>> raw_svector_ostream::~raw_svector_ostream() {
> >>>>   // FIXME: Prevent resizing during this flush().
> >>>>   flush();
> >>>> }
> >>>>
> >>>> And a solution is provided in raw_svector_ostream::init():
> >>>>
> >>>>   // Set up the initial external buffer. We make sure that
the buffer
> >>>> has at
> >>>>   // least 128 bytes free; raw_ostream itself only
requires 64, but we
> >>>> want to
> >>>>   // make sure that we don't grow the buffer
unnecessarily on
> >>>> destruction (when
> >>>>   // the data is flushed). See the FIXME below.
> >>>>   OS.reserve(OS.size() + 128);
> >>>>
> >>>> This solution may be worse than the problem. In total:
> >>>>
> >>>> * If the SmallString was less than 128 bytes init() will
always(!)
> heap
> >>>> allocate.
> >>>> * If it's more or equal to 128 bytes but upon
destruction less than 64
> >>>> bytes are left unused it will heap allocate for no reason.
> >>>>
> >>>> A careful programmer who did size the SmallString to match
its use +
> >>>> small reserve will find either of these heap allocations
surprising,
> >>>> negating the SmallString advantage and then paying memcpy
cost.
> >>>>
> >>>> I filed http://llvm.org/pr23395 about this. To temporarly
avoid this
> >>>> issue, in addition to the outline methods change I
commented the
> OS.reserve
> >>>> line in flush(). (This change of course requires further
review.)
> >>>>
> >>>> With reserve being out of the way, the gap now is smaller
but still
> >>>> significant: raw_char_ostream is about 20% faster than
> raw_svector_ostream
> >>>> in Release=optimized configuration.
> >>>>
> >>>>
> >>>> 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at
gmail.com>:
> >>>>>
> >>>>>
> >>>>> Could you dig into why:
> >>>>> +  raw_ostream &write(unsigned char C) override {
> >>>>> +    grow(1);
> >>>>> +    *OutBufCur++ = C;
> >>>>> +    return *this;
> >>>>> +  }
> >>>>>
> >>>>> Is 3 times as fast as raw_svector_ostream? I don't
see a good reason
> >>>>> why that should be any faster than:
> >>>>>
> >>>>>   raw_ostream &operator<<(char C) {
> >>>>>     if (OutBufCur >= OutBufEnd)
> >>>>>       return write(C);
> >>>>>     *OutBufCur++ = C;
> >>>>>     return *this;
> >>>>>   }
> >>>>>
> >>>>>
> >>>>> You might just be seeing the difference between having
write inlined
> >>>>> vs. non-inlined, in which case your patch might be a
complicated way
> to pull
> >>>>> the likely cases of some raw_ostream methods into the
header.
> >>>>>
> >>>>>
> >>>>> -- Sean Silva
> >>>>
> >>>>
> >>>> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren
<yaron.keren at gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Yes, we should do without virtual flush. The problem
was
> >>>>> raw_char_ostream should not be flushed - it's
always current - so I
> >>>>> overloaded flush to a no-op. A cleaner solution is
attached, adding a
> >>>>> DirectBuffer mode to the raw_ostream.
> >>>>>
> >>>>> Also added a simple performance comparison project
between
> >>>>> raw_char_ostream and raw_svector_ostream. On Window 7
x64 machine,
> >>>>> raw_char_ostream was three times faster than
raw_svector_ostream
> when the
> >>>>> provided buffer size is large enough and two times as
fast with one
> dynamic
> >>>>> allocation resize.
> >>>>>
> >>>>>
> >>>>> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola
> >>>>> <rafael.espindola at gmail.com>:
> >>>>>>
> >>>>>> I don't think we should make flush virtual.
Why do you need to do
> it?
> >>>>>> Can't you set up the base class to write to
use the tail of the
> memory
> >>>>>> region as the buffer?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 24 April 2015 at 06:46, Yaron Keren
<yaron.keren at gmail.com>
> wrote:
> >>>>>> > Hi,
> >>>>>> >
> >>>>>> > Is this what you're thinking about?
> >>>>>> > The code is not tested yet, I'd like to
know if the overall
> >>>>>> > direction and
> >>>>>> > design are correct.
> >>>>>> >
> >>>>>> > Yaron
> >>>>>> >
> >>>>>> >
> >>>>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren
<yaron.keren at gmail.com>:
> >>>>>> >>
> >>>>>> >> Sean, thanks for reminding this, Alp did
commit a class derived
> >>>>>> >> from
> >>>>>> >> raw_svector_ostream conatining an
internal SmallString he called
> >>>>>> >> small_string_ostream. The commit was
reverted after a day due to
> a
> >>>>>> >> disagreement about the commit approval
and apparently abandoned.
> >>>>>> >>
> >>>>>> >>
> >>>>>> >>
> >>>>>> >>
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
> >>>>>> >>
> >>>>>> >>
> >>>>>> >>
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
> >>>>>> >>
> >>>>>> >>
> >>>>>> >>
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
> >>>>>> >>
> >>>>>> >> The class Alp comitted did solve the
possible mismatch between
> the
> >>>>>> >> SmallString and the stream by making the
SmallString private to
> the
> >>>>>> >> class.
> >>>>>> >> This however did not treat the root
problem, the duplication of
> the
> >>>>>> >> information about the buffer between
SmallString and the stream.
> >>>>>> >>
> >>>>>> >> I can make performance measurements but
even if the performance
> >>>>>> >> difference
> >>>>>> >> is neglible (and looking at all the code
paths and conditionals
> of
> >>>>>> >> non-inlined functions at raw_ostream.cpp
that's hard to believe),
> >>>>>> >> the
> >>>>>> >> current SmallString-raw_svector_ostream
is simply wrong.
> >>>>>> >>
> >>>>>> >> Personally, after the previous
"Alp" discussion I found and fixed
> >>>>>> >> several
> >>>>>> >> such issues in my out-of-tree code only
to make new similar new
> >>>>>> >> error
> >>>>>> >> earlier this year, which I caught only
months after, when Rafael
> >>>>>> >> committed
> >>>>>> >> the pwrite which reminded me the issue.
Due ot the information
> >>>>>> >> duplication
> >>>>>> >> Rafael commit also had a bug and Mehdi
reports similar
> experience.
> >>>>>> >> Back then
> >>>>>> >> Alp reported similar problems he found in
LLVM code which were
> >>>>>> >> hopefully
> >>>>>> >> fixed otherwise.
> >>>>>> >>
> >>>>>> >> In addition to the information
duplication, the design is quite
> >>>>>> >> confusing
> >>>>>> >> to use
> >>>>>> >>
> >>>>>> >> - Should one use the stream.str()
function or the SmallString
> >>>>>> >> itself?
> >>>>>> >> - flush or str?
> >>>>>> >> - How do you properly clear the
combination for reuse (that was
> my
> >>>>>> >> mistake, I forgot to resync after
OS.clear())?
> >>>>>> >>
> >>>>>> >> It's safe to say this design pattern
is very easy to get wrong
> one
> >>>>>> >> way or
> >>>>>> >> another, we got burned by it multiple
times and it should be
> >>>>>> >> replaced.
> >>>>>> >>
> >>>>>> >> Alp suggested a derived class containing
its own SmallString.
> >>>>>> >> That's a
> >>>>>> >> simple and effective approach to avoid
the bugs mentioned but
> does
> >>>>>> >> not fix
> >>>>>> >> the memory or runtime issues. Small as
they may be, why have them
> >>>>>> >> at a
> >>>>>> >> fundemental data structure?
> >>>>>> >>
> >>>>>> >> I was thinking about going further
avoiding all duplication with
> a
> >>>>>> >> templated class derived with its own
internal buffer storage.
> >>>>>> >>
> >>>>>> >> Rafael suggested a more modular approach,
a derived adpater class
> >>>>>> >> adapter
> >>>>>> >> to a *simple* buffer (or nullptr for
fully-dynamic operation)
> which
> >>>>>> >> also
> >>>>>> >> won't duplicate any information the
buffer is dumb and has no
> >>>>>> >> state.
> >>>>>> >>
> >>>>>> >> That solution sounds simple, efficient
and safe to use. The
> >>>>>> >> implementation
> >>>>>> >> would be actually simpler then
raw_svector_ostream since all the
> >>>>>> >> coordination logic is not required
anymore.
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva
<chisophugis at gmail.com>:
> >>>>>> >>>
> >>>>>> >>>
> >>>>>> >>>
> >>>>>> >>> On Sun, Apr 19, 2015 at 7:40 AM,
Yaron Keren
> >>>>>> >>> <yaron.keren at gmail.com>
> >>>>>> >>> wrote:
> >>>>>> >>>>
> >>>>>> >>>> A very common code pattern in
LLVM is
> >>>>>> >>>>
> >>>>>> >>>>  SmallString<128> S;
> >>>>>> >>>>  raw_svector_ostream OS(S);
> >>>>>> >>>>  OS<< ...
> >>>>>> >>>>  Use OS.str()
> >>>>>> >>>>
> >>>>>> >>>> While raw_svector_ostream is
smart to share the text buffer
> >>>>>> >>>> itself, it's
> >>>>>> >>>> inefficient keeping two sets of
pointers to the same buffer:
> >>>>>> >>>>
> >>>>>> >>>>  In SmallString: void *BeginX,
*EndX, *CapacityX
> >>>>>> >>>>  In raw_ostream: char
*OutBufStart, *OutBufEnd, *OutBufCur
> >>>>>> >>>>
> >>>>>> >>>> Moreover, at runtime the two sets
of pointers need to be
> >>>>>> >>>> coordinated
> >>>>>> >>>> between the SmallString and
raw_svector_ostream using
> >>>>>> >>>> raw_svector_ostream::init,
raw_svector_ostream::pwrite,
> >>>>>> >>>> raw_svector_ostream::resync and
> raw_svector_ostream::write_impl.
> >>>>>> >>>> All these functions have
non-inlined implementations in
> >>>>>> >>>> raw_ostream.cpp.
> >>>>>> >>>>
> >>>>>> >>>> Finally, this may cause subtle
bugs if S is modified without
> >>>>>> >>>> calling
> >>>>>> >>>> OS::resync(). This is too easy to
do by mistake.
> >>>>>> >>>>
> >>>>>> >>>> In this frequent case usage the
client does not really care
> about
> >>>>>> >>>> S
> >>>>>> >>>> being a SmallString with its many
useful string helper
> function.
> >>>>>> >>>> It's just
> >>>>>> >>>> boilerplate code for
raw_svector_ostream. But it does cost
> three
> >>>>>> >>>> extra
> >>>>>> >>>> pointers, some runtime
performance and possible bugs.
> >>>>>> >>>
> >>>>>> >>>
> >>>>>> >>> I agree the bugs are real (Alp
proposed something a while back
> >>>>>> >>> regarding
> >>>>>> >>> this?), but you will need to provide
measurements to justify the
> >>>>>> >>> cost in
> >>>>>> >>> runtime performance. One technique I
have used in the past to
> >>>>>> >>> measure these
> >>>>>> >>> sorts of things I call
"stuffing": take the operation that you
> >>>>>> >>> want to
> >>>>>> >>> measure, then essentially change the
logic so that you pay the
> >>>>>> >>> cost 2 times,
> >>>>>> >>> 3 times, etc. You can then look at
the trend in performance as N
> >>>>>> >>> varies and
> >>>>>> >>> extrapolate back to the case where N
= 0 (i.e. you don't pay the
> >>>>>> >>> cost).
> >>>>>> >>>
> >>>>>> >>> For example, in one situation where I
used this method it was to
> >>>>>> >>> measure
> >>>>>> >>> the cost of stat'ing files
(sys::fs::status) across a holistic
> >>>>>> >>> build, using
> >>>>>> >>> only "time" on the command
line (it was on Windows and I didn't
> >>>>>> >>> have any
> >>>>>> >>> tools like DTrace available that can
directly measure this). In
> >>>>>> >>> order to do
> >>>>>> >>> this, I changed sys::fs::status to
call stat N times instead of
> 1,
> >>>>>> >>> and
> >>>>>> >>> measured with N=1 N=2 N=3 etc. The
result was that the
> difference
> >>>>>> >>> between
> >>>>>> >>> the N and N+1 versions was about 1-2%
across N=1..10 (or
> whatever
> >>>>>> >>> I
> >>>>>> >>> measured). In order to negate caching
and other confounding
> >>>>>> >>> effects, it is
> >>>>>> >>> important to try different
distributions of stats; e.g. the
> extra
> >>>>>> >>> stats are
> >>>>>> >>> on the same file as the
"real" stat vs. the extra stats are on
> >>>>>> >>> nonexistent
> >>>>>> >>> files in the same directory as the
"real" file vs. parent
> >>>>>> >>> directories of the
> >>>>>> >>> "real" file; if these match
up fairly well (they did), then you
> >>>>>> >>> have some
> >>>>>> >>> indication that the
"stuffing" is measuring what you want to
> >>>>>> >>> measure.
> >>>>>> >>>
> >>>>>> >>> So e.g. if you think the cost of 3
extra pointers is
> significant,
> >>>>>> >>> then
> >>>>>> >>> "stuff" the struct with 3,
6, 9, ... extra pointers and measure
> >>>>>> >>> the
> >>>>>> >>> difference in performance (e.g.
measure the time building a real
> >>>>>> >>> project).
> >>>>>> >>>
> >>>>>> >>> -- Sean Silva
> >>>>>> >>>
> >>>>>> >>>>
> >>>>>> >>>>
> >>>>>> >>>> To solve all three issues, would
it make sense to have
> >>>>>> >>>> raw_ostream-derived container
with a its own SmallString like
> >>>>>> >>>> templated-size
> >>>>>> >>>> built-in buffer?
> >>>>>> >>>>
> >>>>>> >>>
> >>>>>> >
> >>>>>
> >>>>>
> >>>>
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150812/15243f99/attachment-0001.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Aug 2015 - SmallString + raw_svector_ostream combination should be more efficient

[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient

Possibly Parallel Threads