thr3ads.net - llvm dev - [llvm-dev] SmallString + raw_svector_ostream combination should be more efficient [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Yaron Keren via llvm-dev

2015-Aug-13 08:17 UTC

[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient

r244870 (with the change you requested), thanks!

I initially placed the virtual function in header since they were
one-liners.
The coding standards say that an anchor() function should be supplied,
which was indeed missing.  Other than required anchor function, why should
the virtual functions go in the .cpp?
Should I move the empty raw_svector_ostream destructor to the .cpp file too
as well?


2015-08-13 2:59 GMT+03:00 Rafael Espíndola <rafael.espindola at
gmail.com>:
> Given that it is faster, I am OK with it.
>
> I get 0.774371631s with master and 0.649787169s with your patch.
>
> Why move all virtual functions to the header? LGTM with them still in
> the .cpp file.
>
> On 12 August 2015 at 01:53, Yaron Keren <yaron.keren at gmail.com>
wrote:
> > +llvm-dev at lists.llvm.org
> >
> > The impact should be small as all the other streamers usually write
> directly
> > to the memory buffer and only when out of buffer they call write().
> >
> > OTOH, raw_svector_ostream (without a buffer) goes though write for
every
> > character or block it writes. It can work without virtual write() by
> > overriding the existing virtual write_impl() but this is a slower code
> path
> > for raw_svector_ostream.
> >
> > Even though the attached raw_svector_ostream patch without virtual
> write()
> > is still significantly faster then the current raw_svector_ostream
since
> it
> > avoids the double buffering. With this patch, on my system the example
> runs
> > at 750ms compared with about 1000ms for the current
raw_svector_ostream.
> >
> > Would you prefer to continue review on Phabricator?
> >
> > 2015-07-11 20:48 GMT+03:00 Rafael Espíndola <rafael.espindola at
gmail.com
> >:
> >>
> >> This makes write virtual. What is the impact of that on the other
> >> streamers?
> >>
> >>
> >>
> >> On 22 May 2015 at 10:17, Yaron Keren <yaron.keren at
gmail.com> wrote:
> >> > Here's a performance testcase for the raw_svector_ostream
patch. On my
> >> > WIndows x64 machine it runs in 1010ms with the current code
and in
> 440ms
> >> > with the patch applies. Is this OK to commit?
> >> >
> >> >
> >> > 2015-05-02 21:31 GMT+03:00 Yaron Keren <yaron.keren at
gmail.com>:
> >> >>
> >> >> Following a hint from Duncan in http://llvm.org/pr23395,
here is a
> >> >> revised
> >> >> patch. Rather then introduce the raw_char_ostream
adapter, this
> version
> >> >> improves raw_svector_ostream.
> >> >>
> >> >> raw_svector_ostream is now a very lightweight adapter,
running
> without
> >> >> a
> >> >> buffer and fowarding almost anything to the SmallString.
This solves
> >> >> both
> >> >> the performance issue and the information duplication
issue. The
> >> >> SmallString
> >> >> is always up-to-date and can be used at any time. flush()
does
> nothing
> >> >> and
> >> >> is not required. resync() was kept in this patch as a
no-op and will
> be
> >> >> removed with its users in a follow-up patch. I'll
also try to remove
> >> >> the
> >> >> superfluous flush calls() along the way.
> >> >>
> >> >>
> >> >>
> >> >> 2015-05-02 7:41 GMT+03:00 Yaron Keren <yaron.keren at
gmail.com>:
> >> >>>
> >> >>> +update diff
> >> >>>
> >> >>>
> >> >>> 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren
at gmail.com>:
> >> >>>>
> >> >>>> I outlined (is that the right word?)
raw_char_ostream::grow,
> >> >>>> raw_char_ostream::write (both) into
raw_ostream.cpp with less than
> >> >>>> 10%
> >> >>>> difference in performance.
> >> >>>>
> >> >>>> Profiling reveals that the real culprit is the
code line
> >> >>>>
> >> >>>>  OS.reserve(OS.size() + 64);
> >> >>>>
> >> >>>> called from raw_svector_ostream::write_impl
called from
> >> >>>> raw_svector_ostream::~raw_svector_ostream.
> >> >>>> The issue is already documented:
> >> >>>>
> >> >>>> raw_svector_ostream::~raw_svector_ostream() {
> >> >>>>   // FIXME: Prevent resizing during this flush().
> >> >>>>   flush();
> >> >>>> }
> >> >>>>
> >> >>>> And a solution is provided in
raw_svector_ostream::init():
> >> >>>>
> >> >>>>   // Set up the initial external buffer. We make
sure that the
> buffer
> >> >>>> has at
> >> >>>>   // least 128 bytes free; raw_ostream itself
only requires 64, but
> >> >>>> we
> >> >>>> want to
> >> >>>>   // make sure that we don't grow the buffer
unnecessarily on
> >> >>>> destruction (when
> >> >>>>   // the data is flushed). See the FIXME below.
> >> >>>>   OS.reserve(OS.size() + 128);
> >> >>>>
> >> >>>> This solution may be worse than the problem. In
total:
> >> >>>>
> >> >>>> * If the SmallString was less than 128 bytes
init() will always(!)
> >> >>>> heap
> >> >>>> allocate.
> >> >>>> * If it's more or equal to 128 bytes but upon
destruction less than
> >> >>>> 64
> >> >>>> bytes are left unused it will heap allocate for
no reason.
> >> >>>>
> >> >>>> A careful programmer who did size the SmallString
to match its use
> +
> >> >>>> small reserve will find either of these heap
allocations
> surprising,
> >> >>>> negating the SmallString advantage and then
paying memcpy cost.
> >> >>>>
> >> >>>> I filed http://llvm.org/pr23395 about this. To
temporarly avoid
> this
> >> >>>> issue, in addition to the outline methods change
I commented the
> >> >>>> OS.reserve
> >> >>>> line in flush(). (This change of course requires
further review.)
> >> >>>>
> >> >>>> With reserve being out of the way, the gap now is
smaller but still
> >> >>>> significant: raw_char_ostream is about 20% faster
than
> >> >>>> raw_svector_ostream
> >> >>>> in Release=optimized configuration.
> >> >>>>
> >> >>>>
> >> >>>> 2015-05-02 4:39 GMT+03:00 Sean Silva
<chisophugis at gmail.com>:
> >> >>>>>
> >> >>>>>
> >> >>>>> Could you dig into why:
> >> >>>>> +  raw_ostream &write(unsigned char C)
override {
> >> >>>>> +    grow(1);
> >> >>>>> +    *OutBufCur++ = C;
> >> >>>>> +    return *this;
> >> >>>>> +  }
> >> >>>>>
> >> >>>>> Is 3 times as fast as raw_svector_ostream? I
don't see a good
> reason
> >> >>>>> why that should be any faster than:
> >> >>>>>
> >> >>>>>   raw_ostream &operator<<(char C) {
> >> >>>>>     if (OutBufCur >= OutBufEnd)
> >> >>>>>       return write(C);
> >> >>>>>     *OutBufCur++ = C;
> >> >>>>>     return *this;
> >> >>>>>   }
> >> >>>>>
> >> >>>>>
> >> >>>>> You might just be seeing the difference
between having write
> inlined
> >> >>>>> vs. non-inlined, in which case your patch
might be a complicated
> way
> >> >>>>> to pull
> >> >>>>> the likely cases of some raw_ostream methods
into the header.
> >> >>>>>
> >> >>>>>
> >> >>>>> -- Sean Silva
> >> >>>>
> >> >>>>
> >> >>>> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <
> yaron.keren at gmail.com>
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Yes, we should do without virtual flush. The
problem was
> >> >>>>> raw_char_ostream should not be flushed -
it's always current - so
> I
> >> >>>>> overloaded flush to a no-op. A cleaner
solution is attached,
> adding
> >> >>>>> a
> >> >>>>> DirectBuffer mode to the raw_ostream.
> >> >>>>>
> >> >>>>> Also added a simple performance comparison
project between
> >> >>>>> raw_char_ostream and raw_svector_ostream. On
Window 7 x64 machine,
> >> >>>>> raw_char_ostream was three times faster than
raw_svector_ostream
> >> >>>>> when the
> >> >>>>> provided buffer size is large enough and two
times as fast with
> one
> >> >>>>> dynamic
> >> >>>>> allocation resize.
> >> >>>>>
> >> >>>>>
> >> >>>>> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola
> >> >>>>> <rafael.espindola at gmail.com>:
> >> >>>>>>
> >> >>>>>> I don't think we should make flush
virtual. Why do you need to do
> >> >>>>>> it?
> >> >>>>>> Can't you set up the base class to
write to use the tail of the
> >> >>>>>> memory
> >> >>>>>> region as the buffer?
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On 24 April 2015 at 06:46, Yaron Keren
<yaron.keren at gmail.com>
> >> >>>>>> wrote:
> >> >>>>>> > Hi,
> >> >>>>>> >
> >> >>>>>> > Is this what you're thinking
about?
> >> >>>>>> > The code is not tested yet, I'd
like to know if the overall
> >> >>>>>> > direction and
> >> >>>>>> > design are correct.
> >> >>>>>> >
> >> >>>>>> > Yaron
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > 2015-04-21 0:10 GMT+03:00 Yaron
Keren <yaron.keren at gmail.com>:
> >> >>>>>> >>
> >> >>>>>> >> Sean, thanks for reminding this,
Alp did commit a class
> derived
> >> >>>>>> >> from
> >> >>>>>> >> raw_svector_ostream conatining
an internal SmallString he
> called
> >> >>>>>> >> small_string_ostream. The commit
was reverted after a day due
> to
> >> >>>>>> >> a
> >> >>>>>> >> disagreement about the commit
approval and apparently
> abandoned.
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >>
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
> >> >>>>>> >>
> >> >>>>>> >> The class Alp comitted did solve
the possible mismatch between
> >> >>>>>> >> the
> >> >>>>>> >> SmallString and the stream by
making the SmallString private
> to
> >> >>>>>> >> the
> >> >>>>>> >> class.
> >> >>>>>> >> This however did not treat the
root problem, the duplication
> of
> >> >>>>>> >> the
> >> >>>>>> >> information about the buffer
between SmallString and the
> stream.
> >> >>>>>> >>
> >> >>>>>> >> I can make performance
measurements but even if the
> performance
> >> >>>>>> >> difference
> >> >>>>>> >> is neglible (and looking at all
the code paths and
> conditionals
> >> >>>>>> >> of
> >> >>>>>> >> non-inlined functions at
raw_ostream.cpp that's hard to
> >> >>>>>> >> believe),
> >> >>>>>> >> the
> >> >>>>>> >> current
SmallString-raw_svector_ostream is simply wrong.
> >> >>>>>> >>
> >> >>>>>> >> Personally, after the previous
"Alp" discussion I found and
> >> >>>>>> >> fixed
> >> >>>>>> >> several
> >> >>>>>> >> such issues in my out-of-tree
code only to make new similar
> new
> >> >>>>>> >> error
> >> >>>>>> >> earlier this year, which I
caught only months after, when
> Rafael
> >> >>>>>> >> committed
> >> >>>>>> >> the pwrite which reminded me the
issue. Due ot the information
> >> >>>>>> >> duplication
> >> >>>>>> >> Rafael commit also had a bug and
Mehdi reports similar
> >> >>>>>> >> experience.
> >> >>>>>> >> Back then
> >> >>>>>> >> Alp reported similar problems he
found in LLVM code which were
> >> >>>>>> >> hopefully
> >> >>>>>> >> fixed otherwise.
> >> >>>>>> >>
> >> >>>>>> >> In addition to the information
duplication, the design is
> quite
> >> >>>>>> >> confusing
> >> >>>>>> >> to use
> >> >>>>>> >>
> >> >>>>>> >> - Should one use the
stream.str() function or the SmallString
> >> >>>>>> >> itself?
> >> >>>>>> >> - flush or str?
> >> >>>>>> >> - How do you properly clear the
combination for reuse (that
> was
> >> >>>>>> >> my
> >> >>>>>> >> mistake, I forgot to resync
after OS.clear())?
> >> >>>>>> >>
> >> >>>>>> >> It's safe to say this design
pattern is very easy to get wrong
> >> >>>>>> >> one
> >> >>>>>> >> way or
> >> >>>>>> >> another, we got burned by it
multiple times and it should be
> >> >>>>>> >> replaced.
> >> >>>>>> >>
> >> >>>>>> >> Alp suggested a derived class
containing its own SmallString.
> >> >>>>>> >> That's a
> >> >>>>>> >> simple and effective approach to
avoid the bugs mentioned but
> >> >>>>>> >> does
> >> >>>>>> >> not fix
> >> >>>>>> >> the memory or runtime issues.
Small as they may be, why have
> >> >>>>>> >> them
> >> >>>>>> >> at a
> >> >>>>>> >> fundemental data structure?
> >> >>>>>> >>
> >> >>>>>> >> I was thinking about going
further avoiding all duplication
> with
> >> >>>>>> >> a
> >> >>>>>> >> templated class derived with its
own internal buffer storage.
> >> >>>>>> >>
> >> >>>>>> >> Rafael suggested a more modular
approach, a derived adpater
> >> >>>>>> >> class
> >> >>>>>> >> adapter
> >> >>>>>> >> to a *simple* buffer (or nullptr
for fully-dynamic operation)
> >> >>>>>> >> which
> >> >>>>>> >> also
> >> >>>>>> >> won't duplicate any
information the buffer is dumb and has no
> >> >>>>>> >> state.
> >> >>>>>> >>
> >> >>>>>> >> That solution sounds simple,
efficient and safe to use. The
> >> >>>>>> >> implementation
> >> >>>>>> >> would be actually simpler then
raw_svector_ostream since all
> the
> >> >>>>>> >> coordination logic is not
required anymore.
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >> 2015-04-20 22:17 GMT+03:00 Sean
Silva <chisophugis at gmail.com
> >:
> >> >>>>>> >>>
> >> >>>>>> >>>
> >> >>>>>> >>>
> >> >>>>>> >>> On Sun, Apr 19, 2015 at 7:40
AM, Yaron Keren
> >> >>>>>> >>> <yaron.keren at
gmail.com>
> >> >>>>>> >>> wrote:
> >> >>>>>> >>>>
> >> >>>>>> >>>> A very common code
pattern in LLVM is
> >> >>>>>> >>>>
> >> >>>>>> >>>>  SmallString<128>
S;
> >> >>>>>> >>>>  raw_svector_ostream
OS(S);
> >> >>>>>> >>>>  OS<< ...
> >> >>>>>> >>>>  Use OS.str()
> >> >>>>>> >>>>
> >> >>>>>> >>>> While
raw_svector_ostream is smart to share the text buffer
> >> >>>>>> >>>> itself, it's
> >> >>>>>> >>>> inefficient keeping two
sets of pointers to the same buffer:
> >> >>>>>> >>>>
> >> >>>>>> >>>>  In SmallString: void
*BeginX, *EndX, *CapacityX
> >> >>>>>> >>>>  In raw_ostream: char
*OutBufStart, *OutBufEnd, *OutBufCur
> >> >>>>>> >>>>
> >> >>>>>> >>>> Moreover, at runtime the
two sets of pointers need to be
> >> >>>>>> >>>> coordinated
> >> >>>>>> >>>> between the SmallString
and raw_svector_ostream using
> >> >>>>>> >>>>
raw_svector_ostream::init, raw_svector_ostream::pwrite,
> >> >>>>>> >>>>
raw_svector_ostream::resync and
> >> >>>>>> >>>>
raw_svector_ostream::write_impl.
> >> >>>>>> >>>> All these functions have
non-inlined implementations in
> >> >>>>>> >>>> raw_ostream.cpp.
> >> >>>>>> >>>>
> >> >>>>>> >>>> Finally, this may cause
subtle bugs if S is modified without
> >> >>>>>> >>>> calling
> >> >>>>>> >>>> OS::resync(). This is
too easy to do by mistake.
> >> >>>>>> >>>>
> >> >>>>>> >>>> In this frequent case
usage the client does not really care
> >> >>>>>> >>>> about
> >> >>>>>> >>>> S
> >> >>>>>> >>>> being a SmallString with
its many useful string helper
> >> >>>>>> >>>> function.
> >> >>>>>> >>>> It's just
> >> >>>>>> >>>> boilerplate code for
raw_svector_ostream. But it does cost
> >> >>>>>> >>>> three
> >> >>>>>> >>>> extra
> >> >>>>>> >>>> pointers, some runtime
performance and possible bugs.
> >> >>>>>> >>>
> >> >>>>>> >>>
> >> >>>>>> >>> I agree the bugs are real
(Alp proposed something a while
> back
> >> >>>>>> >>> regarding
> >> >>>>>> >>> this?), but you will need to
provide measurements to justify
> >> >>>>>> >>> the
> >> >>>>>> >>> cost in
> >> >>>>>> >>> runtime performance. One
technique I have used in the past to
> >> >>>>>> >>> measure these
> >> >>>>>> >>> sorts of things I call
"stuffing": take the operation that
> you
> >> >>>>>> >>> want to
> >> >>>>>> >>> measure, then essentially
change the logic so that you pay
> the
> >> >>>>>> >>> cost 2 times,
> >> >>>>>> >>> 3 times, etc. You can then
look at the trend in performance
> as
> >> >>>>>> >>> N
> >> >>>>>> >>> varies and
> >> >>>>>> >>> extrapolate back to the case
where N = 0 (i.e. you don't pay
> >> >>>>>> >>> the
> >> >>>>>> >>> cost).
> >> >>>>>> >>>
> >> >>>>>> >>> For example, in one
situation where I used this method it was
> >> >>>>>> >>> to
> >> >>>>>> >>> measure
> >> >>>>>> >>> the cost of stat'ing
files (sys::fs::status) across a
> holistic
> >> >>>>>> >>> build, using
> >> >>>>>> >>> only "time" on the
command line (it was on Windows and I
> didn't
> >> >>>>>> >>> have any
> >> >>>>>> >>> tools like DTrace available
that can directly measure this).
> In
> >> >>>>>> >>> order to do
> >> >>>>>> >>> this, I changed
sys::fs::status to call stat N times instead
> of
> >> >>>>>> >>> 1,
> >> >>>>>> >>> and
> >> >>>>>> >>> measured with N=1 N=2 N=3
etc. The result was that the
> >> >>>>>> >>> difference
> >> >>>>>> >>> between
> >> >>>>>> >>> the N and N+1 versions was
about 1-2% across N=1..10 (or
> >> >>>>>> >>> whatever
> >> >>>>>> >>> I
> >> >>>>>> >>> measured). In order to
negate caching and other confounding
> >> >>>>>> >>> effects, it is
> >> >>>>>> >>> important to try different
distributions of stats; e.g. the
> >> >>>>>> >>> extra
> >> >>>>>> >>> stats are
> >> >>>>>> >>> on the same file as the
"real" stat vs. the extra stats are
> on
> >> >>>>>> >>> nonexistent
> >> >>>>>> >>> files in the same directory
as the "real" file vs. parent
> >> >>>>>> >>> directories of the
> >> >>>>>> >>> "real" file; if
these match up fairly well (they did), then
> you
> >> >>>>>> >>> have some
> >> >>>>>> >>> indication that the
"stuffing" is measuring what you want to
> >> >>>>>> >>> measure.
> >> >>>>>> >>>
> >> >>>>>> >>> So e.g. if you think the
cost of 3 extra pointers is
> >> >>>>>> >>> significant,
> >> >>>>>> >>> then
> >> >>>>>> >>> "stuff" the struct
with 3, 6, 9, ... extra pointers and
> measure
> >> >>>>>> >>> the
> >> >>>>>> >>> difference in performance
(e.g. measure the time building a
> >> >>>>>> >>> real
> >> >>>>>> >>> project).
> >> >>>>>> >>>
> >> >>>>>> >>> -- Sean Silva
> >> >>>>>> >>>
> >> >>>>>> >>>>
> >> >>>>>> >>>>
> >> >>>>>> >>>> To solve all three
issues, would it make sense to have
> >> >>>>>> >>>> raw_ostream-derived
container with a its own SmallString
> like
> >> >>>>>> >>>> templated-size
> >> >>>>>> >>>> built-in buffer?
> >> >>>>>> >>>>
> >> >>>>>> >>>
> >> >>>>>> >
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150813/fddfda3b/attachment.html>

Rafael Espíndola via llvm-dev

2015-Aug-13 13:53 UTC

head link

[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient

On 13 August 2015 at 04:17, Yaron Keren <yaron.keren at gmail.com>
wrote:> r244870 (with the change you requested), thanks!
>
> I initially placed the virtual function in header since they were
> one-liners.
> The coding standards say that an anchor() function should be supplied,
which
> was indeed missing.  Other than required anchor function, why should the
> virtual functions go in the .cpp?
Just because there is no advantage for most virtual methods to be in the .h.
> Should I move the empty raw_svector_ostream destructor to the .cpp file too
> as well?
Maybe. Destructors are the one exception since base class destructors
call them non-virtually. If the class is final we can move them.

Cheers,
Rafael

Yaron Keren via llvm-dev

2015-Aug-13 14:03 UTC

head link

[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient

Ok, one more issue. I've removed the no-op resync() method and now getting
rid of numerous raw_ostream::flush() calls. These were required before but
now will never do anything but will waste runtime discovering that.
To find raw_svector_ostream::flush() calls speficially I locally shadowed
raw_ostream:flush() with a deleted flush():

class raw_svector_ostream {
...
public:
...
  void flush() = delete;
};

so any direct use of raw_svector_ostream::flush() fails compilation.
Calling raw_ostream:flush() itself would never reach the deleted function
as it's not virtual, it will continue to function as before.

To explicitly prevent calling raw_svector_ostream::flush(), is that a good
solution to commit?

2015-08-13 16:53 GMT+03:00 Rafael Espíndola <rafael.espindola at
gmail.com>:
> On 13 August 2015 at 04:17, Yaron Keren <yaron.keren at gmail.com>
wrote:
> > r244870 (with the change you requested), thanks!
> >
> > I initially placed the virtual function in header since they were
> > one-liners.
> > The coding standards say that an anchor() function should be supplied,
> which
> > was indeed missing.  Other than required anchor function, why should
the
> > virtual functions go in the .cpp?
>
> Just because there is no advantage for most virtual methods to be in the
> .h.
>
> > Should I move the empty raw_svector_ostream destructor to the .cpp
file
> too
> > as well?
>
> Maybe. Destructors are the one exception since base class destructors
> call them non-virtually. If the class is final we can move them.
>
> Cheers,
> Rafael
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150813/18664a1f/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Aug 2015 - SmallString + raw_svector_ostream combination should be more efficient

[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient

[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient

[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient

Apparently Analagous Threads