thr3ads.net - llvm dev - [LLVMdev] SmallString + raw_svector_ostream combination should be more efficient [May 2015]

If this information is useful, please help other people find it:
Share via:

Sean Silva

2015-May-02 01:39 UTC

[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

Could you dig into why:
+  raw_ostream &write(unsigned char C) override {
+    grow(1);
+    *OutBufCur++ = C;
+    return *this;
+  }

Is 3 times as fast as raw_svector_ostream? I don't see a good reason why
that should be any faster than:

  raw_ostream &operator<<(char C) {
    if (OutBufCur >= OutBufEnd)
      return write(C);
    *OutBufCur++ = C;
    return *this;
  }


You might just be seeing the difference between having write inlined vs.
non-inlined, in which case your patch might be a complicated way to pull
the likely cases of some raw_ostream methods into the header.


-- Sean Silva

On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at gmail.com>
wrote:
> Yes, we should do without virtual flush. The problem was raw_char_ostream
> should not be flushed - it's always current - so I overloaded flush to
a
> no-op. A cleaner solution is attached, adding a DirectBuffer mode to the
> raw_ostream.
>
> Also added a simple performance comparison project
> between raw_char_ostream and raw_svector_ostream. On Window 7 x64
> machine, raw_char_ostream was three times faster than raw_svector_ostream
> when the provided buffer size is large enough and two times as fast with
> one dynamic allocation resize.
>
>
> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola <rafael.espindola at
gmail.com>:
>
>> I don't think we should make flush virtual. Why do you need to do
it?
>> Can't you set up the base class to write to use the tail of the
memory
>> region as the buffer?
>>
>>
>>
>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com>
wrote:
>> > Hi,
>> >
>> > Is this what you're thinking about?
>> > The code is not tested yet, I'd like to know if the overall
direction
>> and
>> > design are correct.
>> >
>> > Yaron
>> >
>> >
>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at
gmail.com>:
>> >>
>> >> Sean, thanks for reminding this, Alp did commit a class
derived from
>> >> raw_svector_ostream conatining an internal SmallString he
called
>> >> small_string_ostream. The commit was reverted after a day due
to a
>> >> disagreement about the commit approval and apparently
abandoned.
>> >>
>> >>
>> >>
>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
>> >>
>> >>
>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
>> >>
>> >>
>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
>> >>
>> >> The class Alp comitted did solve the possible mismatch between
the
>> >> SmallString and the stream by making the SmallString private
to the
>> class.
>> >> This however did not treat the root problem, the duplication
of the
>> >> information about the buffer between SmallString and the
stream.
>> >>
>> >> I can make performance measurements but even if the
performance
>> difference
>> >> is neglible (and looking at all the code paths and
conditionals of
>> >> non-inlined functions at raw_ostream.cpp that's hard to
believe), the
>> >> current SmallString-raw_svector_ostream is simply wrong.
>> >>
>> >> Personally, after the previous "Alp" discussion I
found and fixed
>> several
>> >> such issues in my out-of-tree code only to make new similar
new error
>> >> earlier this year, which I caught only months after, when
Rafael
>> committed
>> >> the pwrite which reminded me the issue. Due ot the information
>> duplication
>> >> Rafael commit also had a bug and Mehdi reports similar
experience.
>> Back then
>> >> Alp reported similar problems he found in LLVM code which were
>> hopefully
>> >> fixed otherwise.
>> >>
>> >> In addition to the information duplication, the design is
quite
>> confusing
>> >> to use
>> >>
>> >> - Should one use the stream.str() function or the SmallString
itself?
>> >> - flush or str?
>> >> - How do you properly clear the combination for reuse (that
was my
>> >> mistake, I forgot to resync after OS.clear())?
>> >>
>> >> It's safe to say this design pattern is very easy to get
wrong one way
>> or
>> >> another, we got burned by it multiple times and it should be
replaced.
>> >>
>> >> Alp suggested a derived class containing its own SmallString.
That's a
>> >> simple and effective approach to avoid the bugs mentioned but
does not
>> fix
>> >> the memory or runtime issues. Small as they may be, why have
them at a
>> >> fundemental data structure?
>> >>
>> >> I was thinking about going further avoiding all duplication
with a
>> >> templated class derived with its own internal buffer storage.
>> >>
>> >> Rafael suggested a more modular approach, a derived adpater
class
>> adapter
>> >> to a *simple* buffer (or nullptr for fully-dynamic operation)
which
>> also
>> >> won't duplicate any information the buffer is dumb and has
no state.
>> >>
>> >> That solution sounds simple, efficient and safe to use. The
>> implementation
>> >> would be actually simpler then raw_svector_ostream since all
the
>> >> coordination logic is not required anymore.
>> >>
>> >>
>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at
gmail.com>:
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren
<yaron.keren at gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> A very common code pattern in LLVM is
>> >>>>
>> >>>>  SmallString<128> S;
>> >>>>  raw_svector_ostream OS(S);
>> >>>>  OS<< ...
>> >>>>  Use OS.str()
>> >>>>
>> >>>> While raw_svector_ostream is smart to share the text
buffer itself,
>> it's
>> >>>> inefficient keeping two sets of pointers to the same
buffer:
>> >>>>
>> >>>>  In SmallString: void *BeginX, *EndX, *CapacityX
>> >>>>  In raw_ostream: char *OutBufStart, *OutBufEnd,
*OutBufCur
>> >>>>
>> >>>> Moreover, at runtime the two sets of pointers need to
be coordinated
>> >>>> between the SmallString and raw_svector_ostream using
>> >>>> raw_svector_ostream::init,
raw_svector_ostream::pwrite,
>> >>>> raw_svector_ostream::resync and
raw_svector_ostream::write_impl.
>> >>>> All these functions have non-inlined implementations
in
>> raw_ostream.cpp.
>> >>>>
>> >>>> Finally, this may cause subtle bugs if S is modified
without calling
>> >>>> OS::resync(). This is too easy to do by mistake.
>> >>>>
>> >>>> In this frequent case usage the client does not really
care about S
>> >>>> being a SmallString with its many useful string helper
function.
>> It's just
>> >>>> boilerplate code for raw_svector_ostream. But it does
cost three
>> extra
>> >>>> pointers, some runtime performance and possible bugs.
>> >>>
>> >>>
>> >>> I agree the bugs are real (Alp proposed something a while
back
>> regarding
>> >>> this?), but you will need to provide measurements to
justify the cost
>> in
>> >>> runtime performance. One technique I have used in the past
to measure
>> these
>> >>> sorts of things I call "stuffing": take the
operation that you want to
>> >>> measure, then essentially change the logic so that you pay
the cost 2
>> times,
>> >>> 3 times, etc. You can then look at the trend in
performance as N
>> varies and
>> >>> extrapolate back to the case where N = 0 (i.e. you
don't pay the
>> cost).
>> >>>
>> >>> For example, in one situation where I used this method it
was to
>> measure
>> >>> the cost of stat'ing files (sys::fs::status) across a
holistic build,
>> using
>> >>> only "time" on the command line (it was on
Windows and I didn't have
>> any
>> >>> tools like DTrace available that can directly measure
this). In order
>> to do
>> >>> this, I changed sys::fs::status to call stat N times
instead of 1, and
>> >>> measured with N=1 N=2 N=3 etc. The result was that the
difference
>> between
>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or
whatever I
>> >>> measured). In order to negate caching and other
confounding effects,
>> it is
>> >>> important to try different distributions of stats; e.g.
the extra
>> stats are
>> >>> on the same file as the "real" stat vs. the
extra stats are on
>> nonexistent
>> >>> files in the same directory as the "real" file
vs. parent directories
>> of the
>> >>> "real" file; if these match up fairly well (they
did), then you have
>> some
>> >>> indication that the "stuffing" is measuring what
you want to measure.
>> >>>
>> >>> So e.g. if you think the cost of 3 extra pointers is
significant, then
>> >>> "stuff" the struct with 3, 6, 9, ... extra
pointers and measure the
>> >>> difference in performance (e.g. measure the time building
a real
>> project).
>> >>>
>> >>> -- Sean Silva
>> >>>
>> >>>>
>> >>>>
>> >>>> To solve all three issues, would it make sense to have
>> >>>> raw_ostream-derived container with a its own
SmallString like
>> templated-size
>> >>>> built-in buffer?
>> >>>>
>> >>>
>> >
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/d6e980d6/attachment.html>

Yaron Keren

2015-May-02 04:38 UTC

head link

[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

I outlined (is that the right
word?) raw_char_ostream::grow, raw_char_ostream::write (both) into
raw_ostream.cpp with less than 10% difference in performance.

Profiling reveals that the real culprit is the code line

 OS.reserve(OS.size() + 64);

called from raw_svector_ostream::write_impl called from
raw_svector_ostream::~raw_svector_ostream.
The issue is already documented:

raw_svector_ostream::~raw_svector_ostream() {
  // FIXME: Prevent resizing during this flush().
  flush();
}

And a solution is provided in raw_svector_ostream::init():

  // Set up the initial external buffer. We make sure that the buffer has at
  // least 128 bytes free; raw_ostream itself only requires 64, but we want
to
  // make sure that we don't grow the buffer unnecessarily on destruction
(when
  // the data is flushed). See the FIXME below.
  OS.reserve(OS.size() + 128);

This solution may be worse than the problem. In total:

* If the SmallString was less than 128 bytes init() will always(!) heap
allocate.
* If it's more or equal to 128 bytes but upon destruction less than 64
bytes are left unused it will heap allocate for no reason.

A careful programmer who did size the SmallString to match its use + small
reserve will find either of these heap allocations surprising, negating the
SmallString advantage and then paying memcpy cost.

I filed http://llvm.org/pr23395 about this. To temporarly avoid this issue,
in addition to the outline methods change I commented the OS.reserve line
in flush(). (This change of course requires further review.)

With reserve being out of the way, the gap now is smaller but still
significant: raw_char_ostream is about 20% faster than raw_svector_ostream in
Release=optimized configuration.


2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at gmail.com>:
>
> Could you dig into why:
> +  raw_ostream &write(unsigned char C) override {
> +    grow(1);
> +    *OutBufCur++ = C;
> +    return *this;
> +  }
>
> Is 3 times as fast as raw_svector_ostream? I don't see a good reason
why
> that should be any faster than:
>
>   raw_ostream &operator<<(char C) {
>     if (OutBufCur >= OutBufEnd)
>       return write(C);
>     *OutBufCur++ = C;
>     return *this;
>   }
>
>
> You might just be seeing the difference between having write inlined vs.
> non-inlined, in which case your patch might be a complicated way to pull
> the likely cases of some raw_ostream methods into the header.
>
>
> -- Sean Silva
>
On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at gmail.com>
wrote:
> Yes, we should do without virtual flush. The problem was raw_char_ostream
> should not be flushed - it's always current - so I overloaded flush to
a
> no-op. A cleaner solution is attached, adding a DirectBuffer mode to the
> raw_ostream.
>
> Also added a simple performance comparison project
> between raw_char_ostream and raw_svector_ostream. On Window 7 x64
> machine, raw_char_ostream was three times faster than raw_svector_ostream
> when the provided buffer size is large enough and two times as fast with
> one dynamic allocation resize.
>
>
> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola <rafael.espindola at
gmail.com>:
>
>> I don't think we should make flush virtual. Why do you need to do
it?
>> Can't you set up the base class to write to use the tail of the
memory
>> region as the buffer?
>>
>>
>>
>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com>
wrote:
>> > Hi,
>> >
>> > Is this what you're thinking about?
>> > The code is not tested yet, I'd like to know if the overall
direction
>> and
>> > design are correct.
>> >
>> > Yaron
>> >
>> >
>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at
gmail.com>:
>> >>
>> >> Sean, thanks for reminding this, Alp did commit a class
derived from
>> >> raw_svector_ostream conatining an internal SmallString he
called
>> >> small_string_ostream. The commit was reverted after a day due
to a
>> >> disagreement about the commit approval and apparently
abandoned.
>> >>
>> >>
>> >>
>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
>> >>
>> >>
>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
>> >>
>> >>
>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
>> >>
>> >> The class Alp comitted did solve the possible mismatch between
the
>> >> SmallString and the stream by making the SmallString private
to the
>> class.
>> >> This however did not treat the root problem, the duplication
of the
>> >> information about the buffer between SmallString and the
stream.
>> >>
>> >> I can make performance measurements but even if the
performance
>> difference
>> >> is neglible (and looking at all the code paths and
conditionals of
>> >> non-inlined functions at raw_ostream.cpp that's hard to
believe), the
>> >> current SmallString-raw_svector_ostream is simply wrong.
>> >>
>> >> Personally, after the previous "Alp" discussion I
found and fixed
>> several
>> >> such issues in my out-of-tree code only to make new similar
new error
>> >> earlier this year, which I caught only months after, when
Rafael
>> committed
>> >> the pwrite which reminded me the issue. Due ot the information
>> duplication
>> >> Rafael commit also had a bug and Mehdi reports similar
experience.
>> Back then
>> >> Alp reported similar problems he found in LLVM code which were
>> hopefully
>> >> fixed otherwise.
>> >>
>> >> In addition to the information duplication, the design is
quite
>> confusing
>> >> to use
>> >>
>> >> - Should one use the stream.str() function or the SmallString
itself?
>> >> - flush or str?
>> >> - How do you properly clear the combination for reuse (that
was my
>> >> mistake, I forgot to resync after OS.clear())?
>> >>
>> >> It's safe to say this design pattern is very easy to get
wrong one way
>> or
>> >> another, we got burned by it multiple times and it should be
replaced.
>> >>
>> >> Alp suggested a derived class containing its own SmallString.
That's a
>> >> simple and effective approach to avoid the bugs mentioned but
does not
>> fix
>> >> the memory or runtime issues. Small as they may be, why have
them at a
>> >> fundemental data structure?
>> >>
>> >> I was thinking about going further avoiding all duplication
with a
>> >> templated class derived with its own internal buffer storage.
>> >>
>> >> Rafael suggested a more modular approach, a derived adpater
class
>> adapter
>> >> to a *simple* buffer (or nullptr for fully-dynamic operation)
which
>> also
>> >> won't duplicate any information the buffer is dumb and has
no state.
>> >>
>> >> That solution sounds simple, efficient and safe to use. The
>> implementation
>> >> would be actually simpler then raw_svector_ostream since all
the
>> >> coordination logic is not required anymore.
>> >>
>> >>
>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at
gmail.com>:
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren
<yaron.keren at gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> A very common code pattern in LLVM is
>> >>>>
>> >>>>  SmallString<128> S;
>> >>>>  raw_svector_ostream OS(S);
>> >>>>  OS<< ...
>> >>>>  Use OS.str()
>> >>>>
>> >>>> While raw_svector_ostream is smart to share the text
buffer itself,
>> it's
>> >>>> inefficient keeping two sets of pointers to the same
buffer:
>> >>>>
>> >>>>  In SmallString: void *BeginX, *EndX, *CapacityX
>> >>>>  In raw_ostream: char *OutBufStart, *OutBufEnd,
*OutBufCur
>> >>>>
>> >>>> Moreover, at runtime the two sets of pointers need to
be coordinated
>> >>>> between the SmallString and raw_svector_ostream using
>> >>>> raw_svector_ostream::init,
raw_svector_ostream::pwrite,
>> >>>> raw_svector_ostream::resync and
raw_svector_ostream::write_impl.
>> >>>> All these functions have non-inlined implementations
in
>> raw_ostream.cpp.
>> >>>>
>> >>>> Finally, this may cause subtle bugs if S is modified
without calling
>> >>>> OS::resync(). This is too easy to do by mistake.
>> >>>>
>> >>>> In this frequent case usage the client does not really
care about S
>> >>>> being a SmallString with its many useful string helper
function.
>> It's just
>> >>>> boilerplate code for raw_svector_ostream. But it does
cost three
>> extra
>> >>>> pointers, some runtime performance and possible bugs.
>> >>>
>> >>>
>> >>> I agree the bugs are real (Alp proposed something a while
back
>> regarding
>> >>> this?), but you will need to provide measurements to
justify the cost
>> in
>> >>> runtime performance. One technique I have used in the past
to measure
>> these
>> >>> sorts of things I call "stuffing": take the
operation that you want to
>> >>> measure, then essentially change the logic so that you pay
the cost 2
>> times,
>> >>> 3 times, etc. You can then look at the trend in
performance as N
>> varies and
>> >>> extrapolate back to the case where N = 0 (i.e. you
don't pay the
>> cost).
>> >>>
>> >>> For example, in one situation where I used this method it
was to
>> measure
>> >>> the cost of stat'ing files (sys::fs::status) across a
holistic build,
>> using
>> >>> only "time" on the command line (it was on
Windows and I didn't have
>> any
>> >>> tools like DTrace available that can directly measure
this). In order
>> to do
>> >>> this, I changed sys::fs::status to call stat N times
instead of 1, and
>> >>> measured with N=1 N=2 N=3 etc. The result was that the
difference
>> between
>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or
whatever I
>> >>> measured). In order to negate caching and other
confounding effects,
>> it is
>> >>> important to try different distributions of stats; e.g.
the extra
>> stats are
>> >>> on the same file as the "real" stat vs. the
extra stats are on
>> nonexistent
>> >>> files in the same directory as the "real" file
vs. parent directories
>> of the
>> >>> "real" file; if these match up fairly well (they
did), then you have
>> some
>> >>> indication that the "stuffing" is measuring what
you want to measure.
>> >>>
>> >>> So e.g. if you think the cost of 3 extra pointers is
significant, then
>> >>> "stuff" the struct with 3, 6, 9, ... extra
pointers and measure the
>> >>> difference in performance (e.g. measure the time building
a real
>> project).
>> >>>
>> >>> -- Sean Silva
>> >>>
>> >>>>
>> >>>>
>> >>>> To solve all three issues, would it make sense to have
>> >>>> raw_ostream-derived container with a its own
SmallString like
>> templated-size
>> >>>> built-in buffer?
>> >>>>
>> >>>
>> >
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150502/0e33cc0c/attachment.html>

Yaron Keren

2015-May-02 04:41 UTC

head link

[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

+update diff


2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
> I outlined (is that the right
> word?) raw_char_ostream::grow, raw_char_ostream::write (both) into
> raw_ostream.cpp with less than 10% difference in performance.
>
> Profiling reveals that the real culprit is the code line
>
>  OS.reserve(OS.size() + 64);
>
> called from raw_svector_ostream::write_impl called from
> raw_svector_ostream::~raw_svector_ostream.
> The issue is already documented:
>
> raw_svector_ostream::~raw_svector_ostream() {
>   // FIXME: Prevent resizing during this flush().
>   flush();
> }
>
> And a solution is provided in raw_svector_ostream::init():
>
>   // Set up the initial external buffer. We make sure that the buffer has
> at
>   // least 128 bytes free; raw_ostream itself only requires 64, but we
> want to
>   // make sure that we don't grow the buffer unnecessarily on
destruction
> (when
>   // the data is flushed). See the FIXME below.
>   OS.reserve(OS.size() + 128);
>
> This solution may be worse than the problem. In total:
>
> * If the SmallString was less than 128 bytes init() will always(!) heap
> allocate.
> * If it's more or equal to 128 bytes but upon destruction less than 64
> bytes are left unused it will heap allocate for no reason.
>
> A careful programmer who did size the SmallString to match its use + small
> reserve will find either of these heap allocations surprising, negating the
> SmallString advantage and then paying memcpy cost.
>
> I filed http://llvm.org/pr23395 about this. To temporarly avoid this
> issue, in addition to the outline methods change I commented the OS.reserve
> line in flush(). (This change of course requires further review.)
>
> With reserve being out of the way, the gap now is smaller but still
> significant: raw_char_ostream is about 20% faster than
> raw_svector_ostream in Release=optimized configuration.
>
>
> 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at gmail.com>:
>
>>
>> Could you dig into why:
>> +  raw_ostream &write(unsigned char C) override {
>> +    grow(1);
>> +    *OutBufCur++ = C;
>> +    return *this;
>> +  }
>>
>> Is 3 times as fast as raw_svector_ostream? I don't see a good
reason why
>> that should be any faster than:
>>
>>   raw_ostream &operator<<(char C) {
>>     if (OutBufCur >= OutBufEnd)
>>       return write(C);
>>     *OutBufCur++ = C;
>>     return *this;
>>   }
>>
>>
>> You might just be seeing the difference between having write inlined
vs.
>> non-inlined, in which case your patch might be a complicated way to
pull
>> the likely cases of some raw_ostream methods into the header.
>>
>>
>> -- Sean Silva
>>
>
> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at
gmail.com>
> wrote:
>
>> Yes, we should do without virtual flush. The problem was
raw_char_ostream
>> should not be flushed - it's always current - so I overloaded flush
to a
>> no-op. A cleaner solution is attached, adding a DirectBuffer mode to
the
>> raw_ostream.
>>
>> Also added a simple performance comparison project
>> between raw_char_ostream and raw_svector_ostream. On Window 7 x64
>> machine, raw_char_ostream was three times faster than
raw_svector_ostream
>> when the provided buffer size is large enough and two times as fast
with
>> one dynamic allocation resize.
>>
>>
>> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola <rafael.espindola at
gmail.com>:
>>
>>> I don't think we should make flush virtual. Why do you need to
do it?
>>> Can't you set up the base class to write to use the tail of the
memory
>>> region as the buffer?
>>>
>>>
>>>
>>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at
gmail.com> wrote:
>>> > Hi,
>>> >
>>> > Is this what you're thinking about?
>>> > The code is not tested yet, I'd like to know if the
overall direction
>>> and
>>> > design are correct.
>>> >
>>> > Yaron
>>> >
>>> >
>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at
gmail.com>:
>>> >>
>>> >> Sean, thanks for reminding this, Alp did commit a class
derived from
>>> >> raw_svector_ostream conatining an internal SmallString he
called
>>> >> small_string_ostream. The commit was reverted after a day
due to a
>>> >> disagreement about the commit approval and apparently
abandoned.
>>> >>
>>> >>
>>> >>
>>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
>>> >>
>>> >>
>>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
>>> >>
>>> >>
>>>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
>>> >>
>>> >> The class Alp comitted did solve the possible mismatch
between the
>>> >> SmallString and the stream by making the SmallString
private to the
>>> class.
>>> >> This however did not treat the root problem, the
duplication of the
>>> >> information about the buffer between SmallString and the
stream.
>>> >>
>>> >> I can make performance measurements but even if the
performance
>>> difference
>>> >> is neglible (and looking at all the code paths and
conditionals of
>>> >> non-inlined functions at raw_ostream.cpp that's hard
to believe), the
>>> >> current SmallString-raw_svector_ostream is simply wrong.
>>> >>
>>> >> Personally, after the previous "Alp" discussion
I found and fixed
>>> several
>>> >> such issues in my out-of-tree code only to make new
similar new error
>>> >> earlier this year, which I caught only months after, when
Rafael
>>> committed
>>> >> the pwrite which reminded me the issue. Due ot the
information
>>> duplication
>>> >> Rafael commit also had a bug and Mehdi reports similar
experience.
>>> Back then
>>> >> Alp reported similar problems he found in LLVM code which
were
>>> hopefully
>>> >> fixed otherwise.
>>> >>
>>> >> In addition to the information duplication, the design is
quite
>>> confusing
>>> >> to use
>>> >>
>>> >> - Should one use the stream.str() function or the
SmallString itself?
>>> >> - flush or str?
>>> >> - How do you properly clear the combination for reuse
(that was my
>>> >> mistake, I forgot to resync after OS.clear())?
>>> >>
>>> >> It's safe to say this design pattern is very easy to
get wrong one
>>> way or
>>> >> another, we got burned by it multiple times and it should
be replaced.
>>> >>
>>> >> Alp suggested a derived class containing its own
SmallString. That's a
>>> >> simple and effective approach to avoid the bugs mentioned
but does
>>> not fix
>>> >> the memory or runtime issues. Small as they may be, why
have them at a
>>> >> fundemental data structure?
>>> >>
>>> >> I was thinking about going further avoiding all
duplication with a
>>> >> templated class derived with its own internal buffer
storage.
>>> >>
>>> >> Rafael suggested a more modular approach, a derived
adpater class
>>> adapter
>>> >> to a *simple* buffer (or nullptr for fully-dynamic
operation) which
>>> also
>>> >> won't duplicate any information the buffer is dumb and
has no state.
>>> >>
>>> >> That solution sounds simple, efficient and safe to use.
The
>>> implementation
>>> >> would be actually simpler then raw_svector_ostream since
all the
>>> >> coordination logic is not required anymore.
>>> >>
>>> >>
>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at
gmail.com>:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren
<yaron.keren at gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> A very common code pattern in LLVM is
>>> >>>>
>>> >>>>  SmallString<128> S;
>>> >>>>  raw_svector_ostream OS(S);
>>> >>>>  OS<< ...
>>> >>>>  Use OS.str()
>>> >>>>
>>> >>>> While raw_svector_ostream is smart to share the
text buffer itself,
>>> it's
>>> >>>> inefficient keeping two sets of pointers to the
same buffer:
>>> >>>>
>>> >>>>  In SmallString: void *BeginX, *EndX, *CapacityX
>>> >>>>  In raw_ostream: char *OutBufStart, *OutBufEnd,
*OutBufCur
>>> >>>>
>>> >>>> Moreover, at runtime the two sets of pointers need
to be coordinated
>>> >>>> between the SmallString and raw_svector_ostream
using
>>> >>>> raw_svector_ostream::init,
raw_svector_ostream::pwrite,
>>> >>>> raw_svector_ostream::resync and
raw_svector_ostream::write_impl.
>>> >>>> All these functions have non-inlined
implementations in
>>> raw_ostream.cpp.
>>> >>>>
>>> >>>> Finally, this may cause subtle bugs if S is
modified without calling
>>> >>>> OS::resync(). This is too easy to do by mistake.
>>> >>>>
>>> >>>> In this frequent case usage the client does not
really care about S
>>> >>>> being a SmallString with its many useful string
helper function.
>>> It's just
>>> >>>> boilerplate code for raw_svector_ostream. But it
does cost three
>>> extra
>>> >>>> pointers, some runtime performance and possible
bugs.
>>> >>>
>>> >>>
>>> >>> I agree the bugs are real (Alp proposed something a
while back
>>> regarding
>>> >>> this?), but you will need to provide measurements to
justify the
>>> cost in
>>> >>> runtime performance. One technique I have used in the
past to
>>> measure these
>>> >>> sorts of things I call "stuffing": take the
operation that you want
>>> to
>>> >>> measure, then essentially change the logic so that you
pay the cost
>>> 2 times,
>>> >>> 3 times, etc. You can then look at the trend in
performance as N
>>> varies and
>>> >>> extrapolate back to the case where N = 0 (i.e. you
don't pay the
>>> cost).
>>> >>>
>>> >>> For example, in one situation where I used this method
it was to
>>> measure
>>> >>> the cost of stat'ing files (sys::fs::status)
across a holistic
>>> build, using
>>> >>> only "time" on the command line (it was on
Windows and I didn't have
>>> any
>>> >>> tools like DTrace available that can directly measure
this). In
>>> order to do
>>> >>> this, I changed sys::fs::status to call stat N times
instead of 1,
>>> and
>>> >>> measured with N=1 N=2 N=3 etc. The result was that the
difference
>>> between
>>> >>> the N and N+1 versions was about 1-2% across N=1..10
(or whatever I
>>> >>> measured). In order to negate caching and other
confounding effects,
>>> it is
>>> >>> important to try different distributions of stats;
e.g. the extra
>>> stats are
>>> >>> on the same file as the "real" stat vs. the
extra stats are on
>>> nonexistent
>>> >>> files in the same directory as the "real"
file vs. parent
>>> directories of the
>>> >>> "real" file; if these match up fairly well
(they did), then you have
>>> some
>>> >>> indication that the "stuffing" is measuring
what you want to measure.
>>> >>>
>>> >>> So e.g. if you think the cost of 3 extra pointers is
significant,
>>> then
>>> >>> "stuff" the struct with 3, 6, 9, ... extra
pointers and measure the
>>> >>> difference in performance (e.g. measure the time
building a real
>>> project).
>>> >>>
>>> >>> -- Sean Silva
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> To solve all three issues, would it make sense to
have
>>> >>>> raw_ostream-derived container with a its own
SmallString like
>>> templated-size
>>> >>>> built-in buffer?
>>> >>>>
>>> >>>
>>> >
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150502/cf76e5ee/attachment.html>
-------------- next part --------------
Index: include/llvm/Support/raw_ostream.h
==================================================================---
include/llvm/Support/raw_ostream.h	(revision 236304)
+++ include/llvm/Support/raw_ostream.h	(working copy)
@@ -52,17 +52,19 @@
   ///               OutBufEnd - OutBufStart >= 1).
   ///
   /// If buffered, then the raw_ostream owns the buffer if (BufferMode =-  ///
InternalBuffer); otherwise the buffer has been set via SetBuffer and is
-  /// managed by the subclass.
+  /// InternalBuffer or BufferMode == DirectBuffer); otherwise the buffer has
+  /// been set via SetBuffer and is managed by the subclass.
   ///
   /// If a subclass installs an external buffer using SetBuffer then it can
wait
   /// for a \see write_impl() call to handle the data which has been put into
   /// this buffer.
+protected:
   char *OutBufStart, *OutBufEnd, *OutBufCur;
 
   enum BufferKind {
     Unbuffered = 0,
     InternalBuffer,
+    DirectBuffer,
     ExternalBuffer
   } BufferMode;
 
@@ -132,7 +134,7 @@
  
//===--------------------------------------------------------------------===//
 
   void flush() {
-    if (OutBufCur != OutBufStart)
+    if (BufferMode != DirectBuffer && OutBufCur != OutBufStart)
       flush_nonempty();
   }
 
@@ -208,8 +210,8 @@
   /// satisfy std::isprint into an escape sequence.
   raw_ostream &write_escaped(StringRef Str, bool UseHexEscapes = false);
 
-  raw_ostream &write(unsigned char C);
-  raw_ostream &write(const char *Ptr, size_t Size);
+  virtual raw_ostream &write(unsigned char C);
+  virtual raw_ostream &write(const char *Ptr, size_t Size);
 
   // Formatted output, see the format() function in Support/Format.h.
   raw_ostream &operator<<(const format_object_base &Fmt);
@@ -297,13 +299,13 @@
   /// unbuffered.
   const char *getBufferStart() const { return OutBufStart; }
 
+  /// Install the given buffer and mode.
+  void SetBufferAndMode(char *BufferStart, size_t Size, BufferKind Mode);
+
  
//===--------------------------------------------------------------------===//
   // Private Interface
  
//===--------------------------------------------------------------------===//
 private:
-  /// Install the given buffer and mode.
-  void SetBufferAndMode(char *BufferStart, size_t Size, BufferKind Mode);
-
   /// Flush the current buffer, which is known to be non-empty. This outputs
the
   /// currently buffered data and resets the buffer to empty.
   void flush_nonempty();
@@ -481,6 +483,33 @@
   }
 };
 
+/// A raw_ostream that writes to a char array. This is a simple adaptor class.
+/// This class does not encounter output errors.
+class raw_char_ostream : public raw_pwrite_stream {
+  void write_impl(const char *Ptr, size_t Size) override{};
+  void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override {
+    memcpy(OutBufStart + Offset, Ptr, Size);
+  }
+  uint64_t current_pos() const override { return 0; }
+  void grow(size_t Size);
+
+public:
+  explicit raw_char_ostream(char *Buffer, unsigned BufferSize) {
+    if (Buffer) {
+      SetBufferAndMode(Buffer, BufferSize, ExternalBuffer);
+    } else {
+      if (!BufferSize)
+        BufferSize = preferred_buffer_size();
+      SetBufferAndMode(new char[BufferSize], BufferSize, DirectBuffer);
+    }
+  }
+  ~raw_char_ostream() override { OutBufCur = OutBufStart; }
+  raw_ostream &write(unsigned char C) override;
+  raw_ostream &write(const char *Ptr, size_t Size) override;
+  /// Returns the buffer.
+  StringRef str() { return StringRef(OutBufStart, GetNumBytesInBuffer()); }
+};
+
 /// A raw_ostream that writes to an SmallVector or SmallString.  This is a
 /// simple adaptor class. This class does not encounter output errors.
 class raw_svector_ostream : public raw_pwrite_stream {
Index: lib/Support/raw_ostream.cpp
==================================================================---
lib/Support/raw_ostream.cpp	(revision 236304)
+++ lib/Support/raw_ostream.cpp	(working copy)
@@ -65,7 +65,7 @@
   assert(OutBufCur == OutBufStart &&
          "raw_ostream destructor called with non-empty buffer!");
 
-  if (BufferMode == InternalBuffer)
+  if (BufferMode == InternalBuffer || BufferMode == DirectBuffer)
     delete [] OutBufStart;
 }
 
@@ -95,7 +95,7 @@
   // child buffer management logic will be in write_impl).
   assert(GetNumBytesInBuffer() == 0 && "Current buffer is
non-empty!");
 
-  if (BufferMode == InternalBuffer)
+  if (BufferMode == InternalBuffer || BufferMode == DirectBuffer)
     delete [] OutBufStart;
   OutBufStart = BufferStart;
   OutBufEnd = OutBufStart+Size;
@@ -751,6 +751,29 @@
   OS.append(Ptr, Size);
 }
 
+void raw_char_ostream::grow(size_t Size) {
+  size_t CurBytesAvailable = OutBufEnd - OutBufCur;
+  if (Size > CurBytesAvailable) {
+    size_t CurPos = OutBufCur - OutBufStart;
+    OutBufCur = OutBufStart;
+    size_t NewSize = size_t(NextPowerOf2(CurPos + Size + 2));
+    SetBufferAndMode(new char[NewSize], NewSize, DirectBuffer);
+    OutBufCur = OutBufStart + CurPos;
+  }
+}
+
+raw_ostream &raw_char_ostream::write(unsigned char C) {
+  grow(1);
+  *OutBufCur++ = C;
+  return *this;
+}
+raw_ostream &raw_char_ostream::write(const char *Ptr, size_t Size) {
+  grow(Size);
+  memcpy(OutBufCur, Ptr, Size);
+  OutBufCur += Size;
+  return *this;
+}
+
 //===----------------------------------------------------------------------===//
 //  raw_svector_ostream
 //===----------------------------------------------------------------------===//
@@ -810,6 +833,7 @@
     OS.append(Ptr, Ptr + Size);
   }
 
+  // TESTME: comment out reserve call.
   OS.reserve(OS.size() + 64);
   SetBuffer(OS.end(), OS.capacity() - OS.size());
 }
Index: projects/raw_char_ostream_speed/CMakeLists.txt
==================================================================---
projects/raw_char_ostream_speed/CMakeLists.txt	(revision 0)
+++ projects/raw_char_ostream_speed/CMakeLists.txt	(working copy)
@@ -0,0 +1,7 @@
+set(LLVM_LINK_COMPONENTS
+  Support
+)
+
+add_llvm_tool(raw_char_ostream
+  raw_char_ostream_speed.cpp
+)
Index: projects/raw_char_ostream_speed/raw_char_ostream_speed.cpp
==================================================================---
projects/raw_char_ostream_speed/raw_char_ostream_speed.cpp	(revision 0)
+++ projects/raw_char_ostream_speed/raw_char_ostream_speed.cpp	(working copy)
@@ -0,0 +1,45 @@
+#include "llvm/ADT/SmallString.h"
+#include "llvm/Support/raw_ostream.h"
+
+#include <ctime>
+
+using namespace llvm;
+
+namespace {
+const StringRef FuncName +   
"thadddddddddsrttttttttttttttttttttttttjouhojkirjdjdsjoiuosiasddjisjsrof";
+
+void writeStream(raw_ostream &OS) {
+  OS << "_Z" << FuncName.size() << FuncName
<< 'v';
+}
+
+const unsigned BufferSize = 128;
+void test_raw_char_ostream() {
+  char Buffer[BufferSize];
+  raw_char_ostream OS(Buffer, sizeof(Buffer));
+  writeStream(OS);
+}
+
+void test_raw_svector_ostream() {
+  SmallString<BufferSize> Buffer;
+  raw_svector_ostream OS(Buffer);
+  writeStream(OS);
+  // Make sure Buffer was not reallocated.
+  assert(Buffer.capacity() == BufferSize);
+}
+
+void testIt(void (*F)()) {
+  std::clock_t StartTime = std::clock();
+  for (unsigned i = 1e7; i; --i)
+    F();
+  std::clock_t EndTime = std::clock();
+  unsigned ms = unsigned(double(EndTime - StartTime) * 1000.0 /
CLOCKS_PER_SEC);
+  outs() << ms << "ms\n";
+}
+}
+
+int main() {
+  testIt(test_raw_char_ostream);
+  testIt(test_raw_svector_ostream);
+  return 0;
+}

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - May 2015 - [LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

Maybe Matching Threads