I was writing something using MemoryBuffer, and while looking through its code I came across line 59: assert(BufEnd[0] == 0 && "Buffer is not null terminated!"); I am curious if the MemoryBuffer only supports non-binary, non-null embedded files, or if it supports binary as well. I do not see anything inherently not expecting binary files except for that one line, so I am curious as to the purpose of that assert? Also, is the BufferEnd parameter supposed to be a pointer to the end of the input, or to one past the end as in the STL? It seems it is supposed to be to the end of the input since that assert could access invalid memory if it was to one past the end, but if it was to the end of input and not one past the end that that seems inconsistent, so I am curious as to which it is? I know the comments in the header indicate that the end needs to be dereferencable for getMemBuffer, but not for getMemBufferCopy and other ones, so I am curious as to why it is inconsistent? I do not see any real reason why getMemBuffer has the requirement that the EndPtr needs to be dereferencable except for that assert, and I do not really see a reason for that assert right off the bat. Basically, is there a rational section somewhere for the design of MemoryBuffer? Although the public member function getBufferEnd has no comments, from the looks of it, I think it returns one past the end based on how getBufferSize is implemented. Out of curiosity, since const char* fulfills the concept of a randomIterator concept, would there be any aversion to add an stl style begin/end that returns the same things? I have some template code I use (and do not control) that can take an STL style container (only in that it calls begin/end on it), and I see no real reason as to why MemoryBuffer could not implement an STL style begin/end iterator interface by just returning const char*. This would allow it to fulfill a few other concepts as well. Perhaps even giving it a more complete interface as well, would not change functionality, but would add capability. Would anyone be opposed to a patch?
On Sep 24, 2009, at 1:23 PM, OvermindDL1 wrote:> I was writing something using MemoryBuffer, and while looking through > its code I came across line 59: > assert(BufEnd[0] == 0 && "Buffer is not null terminated!"); > I am curious if the MemoryBuffer only supports non-binary, non-null > embedded files, or if it supports binary as well. I do not see > anything inherently not expecting binary files except for that one > line, so I am curious as to the purpose of that assert?Your question is difficult to understand. MemoryBuffer doesn't care what is in the file. It works fine if you have embedded nul's in the file. It just requires the mapped image to have a nul at the end. -Chris
On Thu, Sep 24, 2009 at 2:32 PM, Chris Lattner <clattner at apple.com> wrote:> > On Sep 24, 2009, at 1:23 PM, OvermindDL1 wrote: > >> I was writing something using MemoryBuffer, and while looking through >> its code I came across line 59: >> assert(BufEnd[0] == 0 && "Buffer is not null terminated!"); >> I am curious if the MemoryBuffer only supports non-binary, non-null >> embedded files, or if it supports binary as well. I do not see >> anything inherently not expecting binary files except for that one >> line, so I am curious as to the purpose of that assert? > > Your question is difficult to understand. MemoryBuffer doesn't care what is > in the file. It works fine if you have embedded nul's in the file. It just > requires the mapped image to have a nul at the end.I am mostly just curious as to why it requires a nul at the end, seems odd when specifying two pointers for a begin/end style interface. I am mostly just worried because a library I use returns such iterators that work in Memory Buffer (they are const char*'s as well), but the end pointer is one-past-the-end, and may point into invalid memory (say if it was at the end of a virtual page), thus could cause an access violation. To prevent that I would need to copy it, and with a sizable chunk of memory that it may be, can cause the program to freeze for a sec. It would also be nice to not require that as you could create a MemoryBuffer that references some *part* of an existing MemoryBuffer that does not have a nul as the defreferenced end pointer that you are giving in it. Mostly I just do not understand why this restriction exists, so I am just looking for comprehension.
NUL-termination is an optimization, in particular for clang's parser, where it's cheaper to check for '\0' to signal EOF than it is to compare pointers. Its presence doesn't move the end iterator (just makes it dereference-able) and in no way impacts your ability to use iterator pairs to delimit the binary content of the file. This feature does require file contents be copied into local memory (rather than mapped in) if the file is exactly a multiple of the system page size. On 2009-09-24, at 16:44, OvermindDL1 wrote:> On Thu, Sep 24, 2009 at 2:32 PM, Chris Lattner <clattner at apple.com> > wrote: >> >> On Sep 24, 2009, at 1:23 PM, OvermindDL1 wrote: >> >>> I was writing something using MemoryBuffer, and while looking >>> through >>> its code I came across line 59: >>> assert(BufEnd[0] == 0 && "Buffer is not null terminated!"); >>> I am curious if the MemoryBuffer only supports non-binary, non-null >>> embedded files, or if it supports binary as well. I do not see >>> anything inherently not expecting binary files except for that one >>> line, so I am curious as to the purpose of that assert? >> >> Your question is difficult to understand. MemoryBuffer doesn't >> care what is >> in the file. It works fine if you have embedded nul's in the >> file. It just >> requires the mapped image to have a nul at the end. > > I am mostly just curious as to why it requires a nul at the end, seems > odd when specifying two pointers for a begin/end style interface. I > am mostly just worried because a library I use returns such iterators > that work in Memory Buffer (they are const char*'s as well), but the > end pointer is one-past-the-end, and may point into invalid memory > (say if it was at the end of a virtual page), thus could cause an > access violation. To prevent that I would need to copy it, and with a > sizable chunk of memory that it may be, can cause the program to > freeze for a sec. It would also be nice to not require that as you > could create a MemoryBuffer that references some *part* of an existing > MemoryBuffer that does not have a nul as the defreferenced end pointer > that you are giving in it. Mostly I just do not understand why this > restriction exists, so I am just looking for comprehension.
On Thu, Sep 24, 2009 at 4:26 PM, Gordon Henriksen <gordonhenriksen at me.com> wrote:> NUL-termination is an optimization, in particular for clang's parser, > where it's cheaper to check for '\0' to signal EOF than it is to > compare pointers. Its presence doesn't move the end iterator (just > makes it dereference-able) and in no way impacts your ability to use > iterator pairs to delimit the binary content of the file. This feature > does require file contents be copied into local memory (rather than > mapped in) if the file is exactly a multiple of the system page size. > > On 2009-09-24, at 16:44, OvermindDL1 wrote: > >> On Thu, Sep 24, 2009 at 2:32 PM, Chris Lattner <clattner at apple.com> >> wrote: >>> >>> On Sep 24, 2009, at 1:23 PM, OvermindDL1 wrote: >>> >>>> I was writing something using MemoryBuffer, and while looking >>>> through >>>> its code I came across line 59: >>>> assert(BufEnd[0] == 0 && "Buffer is not null terminated!"); >>>> I am curious if the MemoryBuffer only supports non-binary, non-null >>>> embedded files, or if it supports binary as well. I do not see >>>> anything inherently not expecting binary files except for that one >>>> line, so I am curious as to the purpose of that assert? >>> >>> Your question is difficult to understand. MemoryBuffer doesn't >>> care what is >>> in the file. It works fine if you have embedded nul's in the >>> file. It just >>> requires the mapped image to have a nul at the end. >> >> I am mostly just curious as to why it requires a nul at the end, seems >> odd when specifying two pointers for a begin/end style interface. I >> am mostly just worried because a library I use returns such iterators >> that work in Memory Buffer (they are const char*'s as well), but the >> end pointer is one-past-the-end, and may point into invalid memory >> (say if it was at the end of a virtual page), thus could cause an >> access violation. To prevent that I would need to copy it, and with a >> sizable chunk of memory that it may be, can cause the program to >> freeze for a sec. It would also be nice to not require that as you >> could create a MemoryBuffer that references some *part* of an existing >> MemoryBuffer that does not have a nul as the defreferenced end pointer >> that you are giving in it. Mostly I just do not understand why this >> restriction exists, so I am just looking for comprehension.Out of curiosity, what code in Clang is optimized by doing a pointer derefence then compare to 0, rather then just comparing two points directly? Does not seem that efficient when laid out like that, which is why I am curious what code actually is helped by that pattern?
On 2009-09-24, at 18:56, OvermindDL1 wrote:> Out of curiosity, what code in Clang is optimized by doing a pointer > derefence then compare to 0, rather then just comparing two points > directly? Does not seem that efficient when laid out like that, > which is why I am curious what code actually is helped by that > pattern?Consider parsing an integer: // With NUL termination. while ('0' <= *p && *p <= '9') n = n * 10 + (*p - '0'); // Without. while (p != e && '0' <= *p && *p <= '9') n = n * 10 + (*p - '0'); — Gordon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090924/818f64d6/attachment.html>
On Sep 24, 2009, at 3:57 PM, OvermindDL1 wrote:>>> freeze for a sec. It would also be nice to not require that as you >>> could create a MemoryBuffer that references some *part* of an >>> existing >>> MemoryBuffer that does not have a nul as the defreferenced end >>> pointer >>> that you are giving in it. Mostly I just do not understand why this >>> restriction exists, so I am just looking for comprehension. > > Out of curiosity, what code in Clang is optimized by doing a pointer > derefence then compare to 0, rather then just comparing two points > directly? Does not seem that efficient when laid out like that, which > is why I am curious what code actually is helped by that pattern?The code is out there, go read it :) -Chris
On Thu, Sep 24, 2009 at 5:38 PM, Gordon Henriksen <gordonhenriksen at me.com> wrote:> On 2009-09-24, at 18:56, OvermindDL1 wrote: > > Out of curiosity, what code in Clang is optimized by doing a > pointer derefence then compare to 0, rather then just comparing two > points directly? Does not seem that efficient when laid out like that, > which is why I am curious what code actually is helped by that pattern? > > Consider parsing an integer: > > // With NUL termination. > > while ('0' <= *p && *p <= '9') > n = n * 10 + (*p - '0'); > > // Without. > > while (p != e && '0' <= *p && *p <= '9') > n = n * 10 + (*p - '0');That does not help with binary files though (as in my case). Why not use a templated range interface (I could make one if you so wish), thus you could specify binary or not as a policy to the template and you could allow it to test for end internally, comparing pointers for end if not null terminated, and comparing to null if null terminated for an optimization, based on the policy design you could even let it terminate on any given passed in terminator type, fully optimized for all situations. Policy based trumps hand-rolled trumps generic in terms of speed for all situations. I write code like that all the time, would be quite simple, would you accept such code? Just from a quick thought the interface would change from using getBufferStart()/getBufferEnd()/getBufferSize() to something like getRange() which returns a random_access_range_type<no_policy>, which would probably have an equivalent use like (using a monotype font to line things up): typedef const char cchar; MemoryBuffer mb; MemoryBuffer::range_type<policy> rt(mb.getRange()); Current MemoryBuffer Range-based MemoryBuffer ___________________________________________________________________ cchar *c=mb.getBufferStart(); cchar *c=rt.begin(); cchar *ce=mb.getBufferEnd(); cchar *ce=rt.end(); size_t s=mb.getBufferSize(); size_t s=rt.size(); cchar ch=*c; cchar ch=*rt; c++; rt++; ++c; ++rt; c--; rt--; --c; --rt; c[n]; rt[n]; // for unchecked (c+n<ce&&c+n>=mb.getBufferStart())?c[n]:0; rt.at(n); // for checked c!=ce; rt.empty(); // for binary-style end test *c==0; rt.empty(); // for text-style nul end test And so forth. Could also use bcp to bring in just the parts of the Boost.Range library as well and build it using that (although it uses free-standing functions for *full* generality while using policies), although a hand-grown one as above (that is similar to one I have made and use) has a more simple interface. Could change the "rt.empty()" calls to just "!rt" using the safe bool idiom, so I could do that as well. There appears to be ~34 areas in the LLVM sources that would require some changes, I could do that. It should compile to the same assembly as well so no issues there. It is also very easy to create new policy types as well should it ever need to be extended in the future, and such range types also allow you to transparently handle non-contiguous memory too.