thr3ads.net - llvm dev - [LLVMdev] MemoryBuffer [Sep 2009]

If this information is useful, please help other people find it:
Share via:

OvermindDL1

2009-Sep-24 20:23 UTC

[LLVMdev] MemoryBuffer

I was writing something using MemoryBuffer, and while looking through
its code I came across line 59:
  assert(BufEnd[0] == 0 && "Buffer is not null terminated!");
I am curious if the MemoryBuffer only supports non-binary, non-null
embedded files, or if it supports binary as well.  I do not see
anything inherently not expecting binary files except for that one
line, so I am curious as to the purpose of that assert?

Also, is the BufferEnd parameter supposed to be a pointer to the end
of the input, or to one past the end as in the STL?  It seems it is
supposed to be to the end of the input since that assert could access
invalid memory if it was to one past the end, but if it was to the end
of input and not one past the end that that seems inconsistent, so I
am curious as to which it is?  I know the comments in the header
indicate that the end needs to be dereferencable for getMemBuffer, but
not for getMemBufferCopy and other ones, so I am curious as to why it
is inconsistent?  I do not see any real reason why getMemBuffer has
the requirement that the EndPtr needs to be dereferencable except for
that assert, and I do not really see a reason for that assert right
off the bat.  Basically, is there a rational section somewhere for the
design of MemoryBuffer?

Although the public member function getBufferEnd has no comments, from
the looks of it, I think it returns one past the end based on how
getBufferSize is implemented.  Out of curiosity, since const char*
fulfills the concept of a randomIterator concept, would there be any
aversion to add an stl style begin/end that returns the same things?
I have some template code I use (and do not control) that can take an
STL style container (only in that it calls begin/end on it), and I see
no real reason as to why MemoryBuffer could not implement an STL style
begin/end iterator interface by just returning const char*.  This
would allow it to fulfill a few other concepts as well.  Perhaps even
giving it a more complete interface as well, would not change
functionality, but would add capability.  Would anyone be opposed to a
patch?

Chris Lattner

2009-Sep-24 20:32 UTC

head link

[LLVMdev] MemoryBuffer

On Sep 24, 2009, at 1:23 PM, OvermindDL1 wrote:
> I was writing something using MemoryBuffer, and while looking through
> its code I came across line 59:
>  assert(BufEnd[0] == 0 && "Buffer is not null
terminated!");
> I am curious if the MemoryBuffer only supports non-binary, non-null
> embedded files, or if it supports binary as well.  I do not see
> anything inherently not expecting binary files except for that one
> line, so I am curious as to the purpose of that assert?
Your question is difficult to understand.  MemoryBuffer doesn't care  
what is in the file.  It works fine if you have embedded nul's in the  
file.  It just requires the mapped image to have a nul at the end.

-Chris

OvermindDL1

2009-Sep-24 20:44 UTC

head link

[LLVMdev] MemoryBuffer

On Thu, Sep 24, 2009 at 2:32 PM, Chris Lattner <clattner at apple.com>
wrote:>
> On Sep 24, 2009, at 1:23 PM, OvermindDL1 wrote:
>
>> I was writing something using MemoryBuffer, and while looking through
>> its code I came across line 59:
>>  assert(BufEnd[0] == 0 && "Buffer is not null
terminated!");
>> I am curious if the MemoryBuffer only supports non-binary, non-null
>> embedded files, or if it supports binary as well.  I do not see
>> anything inherently not expecting binary files except for that one
>> line, so I am curious as to the purpose of that assert?
>
> Your question is difficult to understand.  MemoryBuffer doesn't care
what is
> in the file.  It works fine if you have embedded nul's in the file.  It
just
> requires the mapped image to have a nul at the end.
I am mostly just curious as to why it requires a nul at the end, seems
odd when specifying two pointers for a begin/end style interface.  I
am mostly just worried because a library I use returns such iterators
that work in Memory Buffer (they are const char*'s as well), but the
end pointer is one-past-the-end, and may point into invalid memory
(say if it was at the end of a virtual page), thus could cause an
access violation.  To prevent that I would need to copy it, and with a
sizable chunk of memory that it may be, can cause the program to
freeze for a sec.  It would also be nice to not require that as you
could create a MemoryBuffer that references some *part* of an existing
MemoryBuffer that does not have a nul as the defreferenced end pointer
that you are giving in it.  Mostly I just do not understand why this
restriction exists, so I am just looking for comprehension.

Gordon Henriksen

2009-Sep-24 22:26 UTC

head link

[LLVMdev] MemoryBuffer

NUL-termination is an optimization, in particular for clang's parser,  
where it's cheaper to check for '\0' to signal EOF than it is to  
compare pointers. Its presence doesn't move the end iterator (just  
makes it dereference-able) and in no way impacts your ability to use  
iterator pairs to delimit the binary content of the file. This feature  
does require file contents be copied into local memory (rather than  
mapped in) if the file is exactly a multiple of the system page size.

On 2009-09-24, at 16:44, OvermindDL1 wrote:
> On Thu, Sep 24, 2009 at 2:32 PM, Chris Lattner <clattner at
apple.com>
> wrote:
>>
>> On Sep 24, 2009, at 1:23 PM, OvermindDL1 wrote:
>>
>>> I was writing something using MemoryBuffer, and while looking  
>>> through
>>> its code I came across line 59:
>>> assert(BufEnd[0] == 0 && "Buffer is not null
terminated!");
>>> I am curious if the MemoryBuffer only supports non-binary, non-null
>>> embedded files, or if it supports binary as well.  I do not see
>>> anything inherently not expecting binary files except for that one
>>> line, so I am curious as to the purpose of that assert?
>>
>> Your question is difficult to understand.  MemoryBuffer doesn't  
>> care what is
>> in the file.  It works fine if you have embedded nul's in the  
>> file.  It just
>> requires the mapped image to have a nul at the end.
>
> I am mostly just curious as to why it requires a nul at the end, seems
> odd when specifying two pointers for a begin/end style interface.  I
> am mostly just worried because a library I use returns such iterators
> that work in Memory Buffer (they are const char*'s as well), but the
> end pointer is one-past-the-end, and may point into invalid memory
> (say if it was at the end of a virtual page), thus could cause an
> access violation.  To prevent that I would need to copy it, and with a
> sizable chunk of memory that it may be, can cause the program to
> freeze for a sec.  It would also be nice to not require that as you
> could create a MemoryBuffer that references some *part* of an existing
> MemoryBuffer that does not have a nul as the defreferenced end pointer
> that you are giving in it.  Mostly I just do not understand why this
> restriction exists, so I am just looking for comprehension.

OvermindDL1

2009-Sep-24 22:57 UTC

head link

[LLVMdev] MemoryBuffer

On Thu, Sep 24, 2009 at 4:26 PM, Gordon Henriksen
<gordonhenriksen at me.com> wrote:> NUL-termination is an optimization, in particular for clang's parser,
> where it's cheaper to check for '\0' to signal EOF than it is
to
> compare pointers. Its presence doesn't move the end iterator (just
> makes it dereference-able) and in no way impacts your ability to use
> iterator pairs to delimit the binary content of the file. This feature
> does require file contents be copied into local memory (rather than
> mapped in) if the file is exactly a multiple of the system page size.
>
> On 2009-09-24, at 16:44, OvermindDL1 wrote:
>
>> On Thu, Sep 24, 2009 at 2:32 PM, Chris Lattner <clattner at
apple.com>
>> wrote:
>>>
>>> On Sep 24, 2009, at 1:23 PM, OvermindDL1 wrote:
>>>
>>>> I was writing something using MemoryBuffer, and while looking
>>>> through
>>>> its code I came across line 59:
>>>> assert(BufEnd[0] == 0 && "Buffer is not null
terminated!");
>>>> I am curious if the MemoryBuffer only supports non-binary,
non-null
>>>> embedded files, or if it supports binary as well.  I do not see
>>>> anything inherently not expecting binary files except for that
one
>>>> line, so I am curious as to the purpose of that assert?
>>>
>>> Your question is difficult to understand.  MemoryBuffer doesn't
>>> care what is
>>> in the file.  It works fine if you have embedded nul's in the
>>> file.  It just
>>> requires the mapped image to have a nul at the end.
>>
>> I am mostly just curious as to why it requires a nul at the end, seems
>> odd when specifying two pointers for a begin/end style interface.  I
>> am mostly just worried because a library I use returns such iterators
>> that work in Memory Buffer (they are const char*'s as well), but
the
>> end pointer is one-past-the-end, and may point into invalid memory
>> (say if it was at the end of a virtual page), thus could cause an
>> access violation.  To prevent that I would need to copy it, and with a
>> sizable chunk of memory that it may be, can cause the program to
>> freeze for a sec.  It would also be nice to not require that as you
>> could create a MemoryBuffer that references some *part* of an existing
>> MemoryBuffer that does not have a nul as the defreferenced end pointer
>> that you are giving in it.  Mostly I just do not understand why this
>> restriction exists, so I am just looking for comprehension.
Out of curiosity, what code in Clang is optimized by doing a pointer
derefence then compare to 0, rather then just comparing two points
directly?  Does not seem that efficient when laid out like that, which
is why I am curious what code actually is helped by that pattern?

Gordon Henriksen

2009-Sep-24 23:38 UTC

head link

[LLVMdev] MemoryBuffer

On 2009-09-24, at 18:56, OvermindDL1 wrote:
> Out of curiosity, what code in Clang is optimized by doing a pointer  
> derefence then compare to 0, rather then just comparing two points  
> directly?  Does not seem that efficient when laid out like that,  
> which is why I am curious what code actually is helped by that  
> pattern?

Consider parsing an integer:

// With NUL termination.
while ('0' <= *p && *p <= '9')
	n = n * 10 + (*p - '0');

// Without.
while (p != e && '0' <= *p && *p <= '9')
	n = n * 10 + (*p - '0');

— Gordon
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090924/818f64d6/attachment.html>

Chris Lattner

2009-Sep-24 23:39 UTC

head link

[LLVMdev] MemoryBuffer

On Sep 24, 2009, at 3:57 PM, OvermindDL1 wrote:
>>> freeze for a sec.  It would also be nice to not require that as you
>>> could create a MemoryBuffer that references some *part* of an  
>>> existing
>>> MemoryBuffer that does not have a nul as the defreferenced end  
>>> pointer
>>> that you are giving in it.  Mostly I just do not understand why
this
>>> restriction exists, so I am just looking for comprehension.
>
> Out of curiosity, what code in Clang is optimized by doing a pointer
> derefence then compare to 0, rather then just comparing two points
> directly?  Does not seem that efficient when laid out like that, which
> is why I am curious what code actually is helped by that pattern?
The code is out there, go read it :)

-Chris

OvermindDL1

2009-Sep-25 01:07 UTC

head link

[LLVMdev] MemoryBuffer

On Thu, Sep 24, 2009 at 5:38 PM, Gordon Henriksen
<gordonhenriksen at me.com> wrote:> On 2009-09-24, at 18:56, OvermindDL1 wrote:
>
> Out of curiosity, what code in Clang is optimized by doing a
> pointer derefence then compare to 0, rather then just comparing two
> points directly?  Does not seem that efficient when laid out like that,
> which is why I am curious what code actually is helped by that pattern?
>
> Consider parsing an integer:
>
> // With NUL termination.
>
> while ('0' <= *p && *p <= '9')
> n = n * 10 + (*p - '0');
>
> // Without.
>
> while (p != e && '0' <= *p && *p <=
'9')
> n = n * 10 + (*p - '0');
That does not help with binary files though (as in my case).  Why not
use a templated range interface (I could make one if you so wish),
thus you could specify binary or not as a policy to the template and
you could allow it to test for end internally, comparing pointers for
end if not null terminated, and comparing to null if null terminated
for an optimization, based on the policy design you could even let it
terminate on any given passed in terminator type, fully optimized for
all situations.  Policy based trumps hand-rolled trumps generic in
terms of speed for all situations.  I write code like that all the
time, would be quite simple, would you accept such code?  Just from a
quick thought the interface would change from using
getBufferStart()/getBufferEnd()/getBufferSize() to something like
getRange() which returns a random_access_range_type<no_policy>, which
would probably have an equivalent use like (using a monotype font to
line things up):

typedef const char cchar;
MemoryBuffer mb;
MemoryBuffer::range_type<policy> rt(mb.getRange());

Current MemoryBuffer                       Range-based MemoryBuffer
___________________________________________________________________
cchar *c=mb.getBufferStart();              cchar *c=rt.begin();
cchar *ce=mb.getBufferEnd();               cchar *ce=rt.end();
size_t s=mb.getBufferSize();               size_t s=rt.size();
cchar  ch=*c;                              cchar  ch=*rt;
c++;                                       rt++;
++c;                                       ++rt;
c--;                                       rt--;
--c;                                       --rt;
c[n];                                      rt[n];      // for unchecked
(c+n<ce&&c+n>=mb.getBufferStart())?c[n]:0; rt.at(n);   // for
checked
c!=ce;                                     rt.empty(); // for
binary-style end test
*c==0;                                     rt.empty(); // for
text-style nul end test

And so forth.  Could also use bcp to bring in just the parts of the
Boost.Range library as well and build it using that (although it uses
free-standing functions for *full* generality while using policies),
although a hand-grown one as above (that is similar to one I have made
and use) has a more simple interface.  Could change the "rt.empty()"
calls to just "!rt" using the safe bool idiom, so I could do that as
well.

There appears to be ~34 areas in the LLVM sources that would require
some changes, I could do that.  It should compile to the same assembly
as well so no issues there.  It is also very easy to create new policy
types as well should it ever need to be extended in the future, and
such range types also allow you to transparently handle non-contiguous
memory too.

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Sep 2009 - [LLVMdev] MemoryBuffer

[LLVMdev] MemoryBuffer

[LLVMdev] MemoryBuffer

[LLVMdev] MemoryBuffer

[LLVMdev] MemoryBuffer

[LLVMdev] MemoryBuffer

[LLVMdev] MemoryBuffer

[LLVMdev] MemoryBuffer

[LLVMdev] MemoryBuffer

Apparently Analagous Threads