Yaron Keren via llvm-dev
2015-Aug-13 08:17 UTC
[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient
r244870 (with the change you requested), thanks! I initially placed the virtual function in header since they were one-liners. The coding standards say that an anchor() function should be supplied, which was indeed missing. Other than required anchor function, why should the virtual functions go in the .cpp? Should I move the empty raw_svector_ostream destructor to the .cpp file too as well? 2015-08-13 2:59 GMT+03:00 Rafael Espíndola <rafael.espindola at gmail.com>:> Given that it is faster, I am OK with it. > > I get 0.774371631s with master and 0.649787169s with your patch. > > Why move all virtual functions to the header? LGTM with them still in > the .cpp file. > > On 12 August 2015 at 01:53, Yaron Keren <yaron.keren at gmail.com> wrote: > > +llvm-dev at lists.llvm.org > > > > The impact should be small as all the other streamers usually write > directly > > to the memory buffer and only when out of buffer they call write(). > > > > OTOH, raw_svector_ostream (without a buffer) goes though write for every > > character or block it writes. It can work without virtual write() by > > overriding the existing virtual write_impl() but this is a slower code > path > > for raw_svector_ostream. > > > > Even though the attached raw_svector_ostream patch without virtual > write() > > is still significantly faster then the current raw_svector_ostream since > it > > avoids the double buffering. With this patch, on my system the example > runs > > at 750ms compared with about 1000ms for the current raw_svector_ostream. > > > > Would you prefer to continue review on Phabricator? > > > > 2015-07-11 20:48 GMT+03:00 Rafael Espíndola <rafael.espindola at gmail.com > >: > >> > >> This makes write virtual. What is the impact of that on the other > >> streamers? > >> > >> > >> > >> On 22 May 2015 at 10:17, Yaron Keren <yaron.keren at gmail.com> wrote: > >> > Here's a performance testcase for the raw_svector_ostream patch. On my > >> > WIndows x64 machine it runs in 1010ms with the current code and in > 440ms > >> > with the patch applies. Is this OK to commit? > >> > > >> > > >> > 2015-05-02 21:31 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: > >> >> > >> >> Following a hint from Duncan in http://llvm.org/pr23395, here is a > >> >> revised > >> >> patch. Rather then introduce the raw_char_ostream adapter, this > version > >> >> improves raw_svector_ostream. > >> >> > >> >> raw_svector_ostream is now a very lightweight adapter, running > without > >> >> a > >> >> buffer and fowarding almost anything to the SmallString. This solves > >> >> both > >> >> the performance issue and the information duplication issue. The > >> >> SmallString > >> >> is always up-to-date and can be used at any time. flush() does > nothing > >> >> and > >> >> is not required. resync() was kept in this patch as a no-op and will > be > >> >> removed with its users in a follow-up patch. I'll also try to remove > >> >> the > >> >> superfluous flush calls() along the way. > >> >> > >> >> > >> >> > >> >> 2015-05-02 7:41 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: > >> >>> > >> >>> +update diff > >> >>> > >> >>> > >> >>> 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: > >> >>>> > >> >>>> I outlined (is that the right word?) raw_char_ostream::grow, > >> >>>> raw_char_ostream::write (both) into raw_ostream.cpp with less than > >> >>>> 10% > >> >>>> difference in performance. > >> >>>> > >> >>>> Profiling reveals that the real culprit is the code line > >> >>>> > >> >>>> OS.reserve(OS.size() + 64); > >> >>>> > >> >>>> called from raw_svector_ostream::write_impl called from > >> >>>> raw_svector_ostream::~raw_svector_ostream. > >> >>>> The issue is already documented: > >> >>>> > >> >>>> raw_svector_ostream::~raw_svector_ostream() { > >> >>>> // FIXME: Prevent resizing during this flush(). > >> >>>> flush(); > >> >>>> } > >> >>>> > >> >>>> And a solution is provided in raw_svector_ostream::init(): > >> >>>> > >> >>>> // Set up the initial external buffer. We make sure that the > buffer > >> >>>> has at > >> >>>> // least 128 bytes free; raw_ostream itself only requires 64, but > >> >>>> we > >> >>>> want to > >> >>>> // make sure that we don't grow the buffer unnecessarily on > >> >>>> destruction (when > >> >>>> // the data is flushed). See the FIXME below. > >> >>>> OS.reserve(OS.size() + 128); > >> >>>> > >> >>>> This solution may be worse than the problem. In total: > >> >>>> > >> >>>> * If the SmallString was less than 128 bytes init() will always(!) > >> >>>> heap > >> >>>> allocate. > >> >>>> * If it's more or equal to 128 bytes but upon destruction less than > >> >>>> 64 > >> >>>> bytes are left unused it will heap allocate for no reason. > >> >>>> > >> >>>> A careful programmer who did size the SmallString to match its use > + > >> >>>> small reserve will find either of these heap allocations > surprising, > >> >>>> negating the SmallString advantage and then paying memcpy cost. > >> >>>> > >> >>>> I filed http://llvm.org/pr23395 about this. To temporarly avoid > this > >> >>>> issue, in addition to the outline methods change I commented the > >> >>>> OS.reserve > >> >>>> line in flush(). (This change of course requires further review.) > >> >>>> > >> >>>> With reserve being out of the way, the gap now is smaller but still > >> >>>> significant: raw_char_ostream is about 20% faster than > >> >>>> raw_svector_ostream > >> >>>> in Release=optimized configuration. > >> >>>> > >> >>>> > >> >>>> 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at gmail.com>: > >> >>>>> > >> >>>>> > >> >>>>> Could you dig into why: > >> >>>>> + raw_ostream &write(unsigned char C) override { > >> >>>>> + grow(1); > >> >>>>> + *OutBufCur++ = C; > >> >>>>> + return *this; > >> >>>>> + } > >> >>>>> > >> >>>>> Is 3 times as fast as raw_svector_ostream? I don't see a good > reason > >> >>>>> why that should be any faster than: > >> >>>>> > >> >>>>> raw_ostream &operator<<(char C) { > >> >>>>> if (OutBufCur >= OutBufEnd) > >> >>>>> return write(C); > >> >>>>> *OutBufCur++ = C; > >> >>>>> return *this; > >> >>>>> } > >> >>>>> > >> >>>>> > >> >>>>> You might just be seeing the difference between having write > inlined > >> >>>>> vs. non-inlined, in which case your patch might be a complicated > way > >> >>>>> to pull > >> >>>>> the likely cases of some raw_ostream methods into the header. > >> >>>>> > >> >>>>> > >> >>>>> -- Sean Silva > >> >>>> > >> >>>> > >> >>>> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren < > yaron.keren at gmail.com> > >> >>>> wrote: > >> >>>>> > >> >>>>> Yes, we should do without virtual flush. The problem was > >> >>>>> raw_char_ostream should not be flushed - it's always current - so > I > >> >>>>> overloaded flush to a no-op. A cleaner solution is attached, > adding > >> >>>>> a > >> >>>>> DirectBuffer mode to the raw_ostream. > >> >>>>> > >> >>>>> Also added a simple performance comparison project between > >> >>>>> raw_char_ostream and raw_svector_ostream. On Window 7 x64 machine, > >> >>>>> raw_char_ostream was three times faster than raw_svector_ostream > >> >>>>> when the > >> >>>>> provided buffer size is large enough and two times as fast with > one > >> >>>>> dynamic > >> >>>>> allocation resize. > >> >>>>> > >> >>>>> > >> >>>>> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola > >> >>>>> <rafael.espindola at gmail.com>: > >> >>>>>> > >> >>>>>> I don't think we should make flush virtual. Why do you need to do > >> >>>>>> it? > >> >>>>>> Can't you set up the base class to write to use the tail of the > >> >>>>>> memory > >> >>>>>> region as the buffer? > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com> > >> >>>>>> wrote: > >> >>>>>> > Hi, > >> >>>>>> > > >> >>>>>> > Is this what you're thinking about? > >> >>>>>> > The code is not tested yet, I'd like to know if the overall > >> >>>>>> > direction and > >> >>>>>> > design are correct. > >> >>>>>> > > >> >>>>>> > Yaron > >> >>>>>> > > >> >>>>>> > > >> >>>>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>: > >> >>>>>> >> > >> >>>>>> >> Sean, thanks for reminding this, Alp did commit a class > derived > >> >>>>>> >> from > >> >>>>>> >> raw_svector_ostream conatining an internal SmallString he > called > >> >>>>>> >> small_string_ostream. The commit was reverted after a day due > to > >> >>>>>> >> a > >> >>>>>> >> disagreement about the commit approval and apparently > abandoned. > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html > >> >>>>>> >> > >> >>>>>> >> The class Alp comitted did solve the possible mismatch between > >> >>>>>> >> the > >> >>>>>> >> SmallString and the stream by making the SmallString private > to > >> >>>>>> >> the > >> >>>>>> >> class. > >> >>>>>> >> This however did not treat the root problem, the duplication > of > >> >>>>>> >> the > >> >>>>>> >> information about the buffer between SmallString and the > stream. > >> >>>>>> >> > >> >>>>>> >> I can make performance measurements but even if the > performance > >> >>>>>> >> difference > >> >>>>>> >> is neglible (and looking at all the code paths and > conditionals > >> >>>>>> >> of > >> >>>>>> >> non-inlined functions at raw_ostream.cpp that's hard to > >> >>>>>> >> believe), > >> >>>>>> >> the > >> >>>>>> >> current SmallString-raw_svector_ostream is simply wrong. > >> >>>>>> >> > >> >>>>>> >> Personally, after the previous "Alp" discussion I found and > >> >>>>>> >> fixed > >> >>>>>> >> several > >> >>>>>> >> such issues in my out-of-tree code only to make new similar > new > >> >>>>>> >> error > >> >>>>>> >> earlier this year, which I caught only months after, when > Rafael > >> >>>>>> >> committed > >> >>>>>> >> the pwrite which reminded me the issue. Due ot the information > >> >>>>>> >> duplication > >> >>>>>> >> Rafael commit also had a bug and Mehdi reports similar > >> >>>>>> >> experience. > >> >>>>>> >> Back then > >> >>>>>> >> Alp reported similar problems he found in LLVM code which were > >> >>>>>> >> hopefully > >> >>>>>> >> fixed otherwise. > >> >>>>>> >> > >> >>>>>> >> In addition to the information duplication, the design is > quite > >> >>>>>> >> confusing > >> >>>>>> >> to use > >> >>>>>> >> > >> >>>>>> >> - Should one use the stream.str() function or the SmallString > >> >>>>>> >> itself? > >> >>>>>> >> - flush or str? > >> >>>>>> >> - How do you properly clear the combination for reuse (that > was > >> >>>>>> >> my > >> >>>>>> >> mistake, I forgot to resync after OS.clear())? > >> >>>>>> >> > >> >>>>>> >> It's safe to say this design pattern is very easy to get wrong > >> >>>>>> >> one > >> >>>>>> >> way or > >> >>>>>> >> another, we got burned by it multiple times and it should be > >> >>>>>> >> replaced. > >> >>>>>> >> > >> >>>>>> >> Alp suggested a derived class containing its own SmallString. > >> >>>>>> >> That's a > >> >>>>>> >> simple and effective approach to avoid the bugs mentioned but > >> >>>>>> >> does > >> >>>>>> >> not fix > >> >>>>>> >> the memory or runtime issues. Small as they may be, why have > >> >>>>>> >> them > >> >>>>>> >> at a > >> >>>>>> >> fundemental data structure? > >> >>>>>> >> > >> >>>>>> >> I was thinking about going further avoiding all duplication > with > >> >>>>>> >> a > >> >>>>>> >> templated class derived with its own internal buffer storage. > >> >>>>>> >> > >> >>>>>> >> Rafael suggested a more modular approach, a derived adpater > >> >>>>>> >> class > >> >>>>>> >> adapter > >> >>>>>> >> to a *simple* buffer (or nullptr for fully-dynamic operation) > >> >>>>>> >> which > >> >>>>>> >> also > >> >>>>>> >> won't duplicate any information the buffer is dumb and has no > >> >>>>>> >> state. > >> >>>>>> >> > >> >>>>>> >> That solution sounds simple, efficient and safe to use. The > >> >>>>>> >> implementation > >> >>>>>> >> would be actually simpler then raw_svector_ostream since all > the > >> >>>>>> >> coordination logic is not required anymore. > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com > >: > >> >>>>>> >>> > >> >>>>>> >>> > >> >>>>>> >>> > >> >>>>>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren > >> >>>>>> >>> <yaron.keren at gmail.com> > >> >>>>>> >>> wrote: > >> >>>>>> >>>> > >> >>>>>> >>>> A very common code pattern in LLVM is > >> >>>>>> >>>> > >> >>>>>> >>>> SmallString<128> S; > >> >>>>>> >>>> raw_svector_ostream OS(S); > >> >>>>>> >>>> OS<< ... > >> >>>>>> >>>> Use OS.str() > >> >>>>>> >>>> > >> >>>>>> >>>> While raw_svector_ostream is smart to share the text buffer > >> >>>>>> >>>> itself, it's > >> >>>>>> >>>> inefficient keeping two sets of pointers to the same buffer: > >> >>>>>> >>>> > >> >>>>>> >>>> In SmallString: void *BeginX, *EndX, *CapacityX > >> >>>>>> >>>> In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur > >> >>>>>> >>>> > >> >>>>>> >>>> Moreover, at runtime the two sets of pointers need to be > >> >>>>>> >>>> coordinated > >> >>>>>> >>>> between the SmallString and raw_svector_ostream using > >> >>>>>> >>>> raw_svector_ostream::init, raw_svector_ostream::pwrite, > >> >>>>>> >>>> raw_svector_ostream::resync and > >> >>>>>> >>>> raw_svector_ostream::write_impl. > >> >>>>>> >>>> All these functions have non-inlined implementations in > >> >>>>>> >>>> raw_ostream.cpp. > >> >>>>>> >>>> > >> >>>>>> >>>> Finally, this may cause subtle bugs if S is modified without > >> >>>>>> >>>> calling > >> >>>>>> >>>> OS::resync(). This is too easy to do by mistake. > >> >>>>>> >>>> > >> >>>>>> >>>> In this frequent case usage the client does not really care > >> >>>>>> >>>> about > >> >>>>>> >>>> S > >> >>>>>> >>>> being a SmallString with its many useful string helper > >> >>>>>> >>>> function. > >> >>>>>> >>>> It's just > >> >>>>>> >>>> boilerplate code for raw_svector_ostream. But it does cost > >> >>>>>> >>>> three > >> >>>>>> >>>> extra > >> >>>>>> >>>> pointers, some runtime performance and possible bugs. > >> >>>>>> >>> > >> >>>>>> >>> > >> >>>>>> >>> I agree the bugs are real (Alp proposed something a while > back > >> >>>>>> >>> regarding > >> >>>>>> >>> this?), but you will need to provide measurements to justify > >> >>>>>> >>> the > >> >>>>>> >>> cost in > >> >>>>>> >>> runtime performance. One technique I have used in the past to > >> >>>>>> >>> measure these > >> >>>>>> >>> sorts of things I call "stuffing": take the operation that > you > >> >>>>>> >>> want to > >> >>>>>> >>> measure, then essentially change the logic so that you pay > the > >> >>>>>> >>> cost 2 times, > >> >>>>>> >>> 3 times, etc. You can then look at the trend in performance > as > >> >>>>>> >>> N > >> >>>>>> >>> varies and > >> >>>>>> >>> extrapolate back to the case where N = 0 (i.e. you don't pay > >> >>>>>> >>> the > >> >>>>>> >>> cost). > >> >>>>>> >>> > >> >>>>>> >>> For example, in one situation where I used this method it was > >> >>>>>> >>> to > >> >>>>>> >>> measure > >> >>>>>> >>> the cost of stat'ing files (sys::fs::status) across a > holistic > >> >>>>>> >>> build, using > >> >>>>>> >>> only "time" on the command line (it was on Windows and I > didn't > >> >>>>>> >>> have any > >> >>>>>> >>> tools like DTrace available that can directly measure this). > In > >> >>>>>> >>> order to do > >> >>>>>> >>> this, I changed sys::fs::status to call stat N times instead > of > >> >>>>>> >>> 1, > >> >>>>>> >>> and > >> >>>>>> >>> measured with N=1 N=2 N=3 etc. The result was that the > >> >>>>>> >>> difference > >> >>>>>> >>> between > >> >>>>>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or > >> >>>>>> >>> whatever > >> >>>>>> >>> I > >> >>>>>> >>> measured). In order to negate caching and other confounding > >> >>>>>> >>> effects, it is > >> >>>>>> >>> important to try different distributions of stats; e.g. the > >> >>>>>> >>> extra > >> >>>>>> >>> stats are > >> >>>>>> >>> on the same file as the "real" stat vs. the extra stats are > on > >> >>>>>> >>> nonexistent > >> >>>>>> >>> files in the same directory as the "real" file vs. parent > >> >>>>>> >>> directories of the > >> >>>>>> >>> "real" file; if these match up fairly well (they did), then > you > >> >>>>>> >>> have some > >> >>>>>> >>> indication that the "stuffing" is measuring what you want to > >> >>>>>> >>> measure. > >> >>>>>> >>> > >> >>>>>> >>> So e.g. if you think the cost of 3 extra pointers is > >> >>>>>> >>> significant, > >> >>>>>> >>> then > >> >>>>>> >>> "stuff" the struct with 3, 6, 9, ... extra pointers and > measure > >> >>>>>> >>> the > >> >>>>>> >>> difference in performance (e.g. measure the time building a > >> >>>>>> >>> real > >> >>>>>> >>> project). > >> >>>>>> >>> > >> >>>>>> >>> -- Sean Silva > >> >>>>>> >>> > >> >>>>>> >>>> > >> >>>>>> >>>> > >> >>>>>> >>>> To solve all three issues, would it make sense to have > >> >>>>>> >>>> raw_ostream-derived container with a its own SmallString > like > >> >>>>>> >>>> templated-size > >> >>>>>> >>>> built-in buffer? > >> >>>>>> >>>> > >> >>>>>> >>> > >> >>>>>> > > >> >>>>> > >> >>>>> > >> >>>> > >> > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150813/fddfda3b/attachment.html>
Rafael Espíndola via llvm-dev
2015-Aug-13 13:53 UTC
[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient
On 13 August 2015 at 04:17, Yaron Keren <yaron.keren at gmail.com> wrote:> r244870 (with the change you requested), thanks! > > I initially placed the virtual function in header since they were > one-liners. > The coding standards say that an anchor() function should be supplied, which > was indeed missing. Other than required anchor function, why should the > virtual functions go in the .cpp?Just because there is no advantage for most virtual methods to be in the .h.> Should I move the empty raw_svector_ostream destructor to the .cpp file too > as well?Maybe. Destructors are the one exception since base class destructors call them non-virtually. If the class is final we can move them. Cheers, Rafael
Yaron Keren via llvm-dev
2015-Aug-13 14:03 UTC
[llvm-dev] SmallString + raw_svector_ostream combination should be more efficient
Ok, one more issue. I've removed the no-op resync() method and now getting rid of numerous raw_ostream::flush() calls. These were required before but now will never do anything but will waste runtime discovering that. To find raw_svector_ostream::flush() calls speficially I locally shadowed raw_ostream:flush() with a deleted flush(): class raw_svector_ostream { ... public: ... void flush() = delete; }; so any direct use of raw_svector_ostream::flush() fails compilation. Calling raw_ostream:flush() itself would never reach the deleted function as it's not virtual, it will continue to function as before. To explicitly prevent calling raw_svector_ostream::flush(), is that a good solution to commit? 2015-08-13 16:53 GMT+03:00 Rafael Espíndola <rafael.espindola at gmail.com>:> On 13 August 2015 at 04:17, Yaron Keren <yaron.keren at gmail.com> wrote: > > r244870 (with the change you requested), thanks! > > > > I initially placed the virtual function in header since they were > > one-liners. > > The coding standards say that an anchor() function should be supplied, > which > > was indeed missing. Other than required anchor function, why should the > > virtual functions go in the .cpp? > > Just because there is no advantage for most virtual methods to be in the > .h. > > > Should I move the empty raw_svector_ostream destructor to the .cpp file > too > > as well? > > Maybe. Destructors are the one exception since base class destructors > call them non-virtually. If the class is final we can move them. > > Cheers, > Rafael >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150813/18664a1f/attachment.html>
Apparently Analagous Threads
- SmallString + raw_svector_ostream combination should be more efficient
- SmallString + raw_svector_ostream combination should be more efficient
- [LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
- [LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
- [LLVMdev] SmallString + raw_svector_ostream combination should be more efficient