In MemoryBuffer::init, we have an assert that reads the memory at `BufEnd`, which is one past the end of some memory region: from lib/Support/MemoryBuffer.cpp:45:> void MemoryBuffer::init(const char *BufStart, const char *BufEnd, > bool RequiresNullTerminator) { > assert((!RequiresNullTerminator || BufEnd[0] == 0) && > "Buffer is not null terminated!"); > BufferStart = BufStart; > BufferEnd = BufEnd; > }However, this can be, and often is, one past an allocated region, since in MemoryBufferMMapFile we mmap `Len` bytes and then call this with `BufEnd = Start + Len`: from lib/Support/MemoryBuffer.cpp:210:> MemoryBufferMMapFile(bool RequiresNullTerminator, int FD, uint64_t Len, > uint64_t Offset, std::error_code &EC) > : MFR(FD, sys::fs::mapped_file_region::readonly, > getLegalMapSize(Len, Offset), getLegalMapOffset(Offset), EC) { > if (!EC) { > const char *Start = getStart(Len, Offset); > init(Start, Start + Len, RequiresNullTerminator); > } > }This is UB, since we're reading one past an allocated region, and I have an internal test case where I'm seeing it crash because that memory happens to be unreadable. There are three options here: 1. Remove the assert - it isn't possible to write correctly. 2. Remove the RequiresNullTerminator argument from MemoryBufferMMapFile and hard code it to false. 3. Allocate an extra byte here and zero it. I'm not sure which is best. Thoughts?
> On Nov 16, 2016, at 9:46 PM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > In MemoryBuffer::init, we have an assert that reads the memory at > `BufEnd`, which is one past the end of some memory region: > > from lib/Support/MemoryBuffer.cpp:45: >> void MemoryBuffer::init(const char *BufStart, const char *BufEnd, >> bool RequiresNullTerminator) { >> assert((!RequiresNullTerminator || BufEnd[0] == 0) &&Why isn’t it BufEnd[-1] == 0 ? What is the use-case for this RequiresNullTerminator after the end of the buffer? I feel I’m missing some piece here! — Mehdi>> "Buffer is not null terminated!"); >> BufferStart = BufStart; >> BufferEnd = BufEnd; >> } > > However, this can be, and often is, one past an allocated region, since > in MemoryBufferMMapFile we mmap `Len` bytes and then call this with > `BufEnd = Start + Len`: > > from lib/Support/MemoryBuffer.cpp:210: >> MemoryBufferMMapFile(bool RequiresNullTerminator, int FD, uint64_t Len, >> uint64_t Offset, std::error_code &EC) >> : MFR(FD, sys::fs::mapped_file_region::readonly, >> getLegalMapSize(Len, Offset), getLegalMapOffset(Offset), EC) { >> if (!EC) { >> const char *Start = getStart(Len, Offset); >> init(Start, Start + Len, RequiresNullTerminator); >> } >> } > > This is UB, since we're reading one past an allocated region, and I have > an internal test case where I'm seeing it crash because that memory > happens to be unreadable. > > There are three options here: > > 1. Remove the assert - it isn't possible to write correctly. > 2. Remove the RequiresNullTerminator argument from MemoryBufferMMapFile > and hard code it to false. > 3. Allocate an extra byte here and zero it. > > I'm not sure which is best. Thoughts? > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
On Nov 16, 2016, at 9:46 PM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> In MemoryBuffer::init, we have an assert that reads the memory at > `BufEnd`, which is one past the end of some memory region: > > from lib/Support/MemoryBuffer.cpp:45: >> void MemoryBuffer::init(const char *BufStart, const char *BufEnd, >> bool RequiresNullTerminator) { >> assert((!RequiresNullTerminator || BufEnd[0] == 0) && >> "Buffer is not null terminated!"); >> BufferStart = BufStart; >> BufferEnd = BufEnd; >> }Commenting from a historical perspective, this is as intended. In the case of an mmapped buffer, you usually mmap some file that is not a multiple of a page size, and the OS zero fills the rest of the page. This makes this behavior free. In the case where you’re not mmaping a file (e.g. for clang, if the file is small) it is easy enough to malloc one byte more than you need to ensure the zero byte is present.> However, this can be, and often is, one past an allocated region, since > in MemoryBufferMMapFile we mmap `Len` bytes and then call this with > `BufEnd = Start + Len`: > > from lib/Support/MemoryBuffer.cpp:210: >> MemoryBufferMMapFile(bool RequiresNullTerminator, int FD, uint64_t Len, >> uint64_t Offset, std::error_code &EC) >> : MFR(FD, sys::fs::mapped_file_region::readonly, >> getLegalMapSize(Len, Offset), getLegalMapOffset(Offset), EC) { >> if (!EC) { >> const char *Start = getStart(Len, Offset); >> init(Start, Start + Len, RequiresNullTerminator); >> } >> } > > This is UB, since we're reading one past an allocated region, and I have > an internal test case where I'm seeing it crash because that memory > happens to be unreadable.Well that is wrong. It should mmap the file, and if it knows that the file is an exact multiple of a page size, it should copy it into a malloc’d buffer that is one byte larger, and put a zero there.> There are three options here: > > 1. Remove the assert - it isn't possible to write correctly.This will break clang. Clang “knows” that files end with a nul byte.> 2. Remove the RequiresNullTerminator argument from MemoryBufferMMapFile > and hard code it to false. > 3. Allocate an extra byte here and zero it.#3 seems right to me assuming the problematic client is clang. If the problematic client isn’t clang, then it should mirror whatever clang is doing with this API. If Clang isn’t using this (which seems to be the case on a cursory look?), then we should change this implementation to respect memorybuffer’s invariants IMO. -Chris
Chris Lattner <clattner at apple.com> writes:> On Nov 16, 2016, at 9:46 PM, Justin Bogner via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> In MemoryBuffer::init, we have an assert that reads the memory at >> `BufEnd`, which is one past the end of some memory region: >> >> from lib/Support/MemoryBuffer.cpp:45: >>> void MemoryBuffer::init(const char *BufStart, const char *BufEnd, >>> bool RequiresNullTerminator) { >>> assert((!RequiresNullTerminator || BufEnd[0] == 0) && >>> "Buffer is not null terminated!"); >>> BufferStart = BufStart; >>> BufferEnd = BufEnd; >>> } > > Commenting from a historical perspective, this is as intended. In the > case of an mmapped buffer, you usually mmap some file that is not a > multiple of a page size, and the OS zero fills the rest of the page. > This makes this behavior free. In the case where you’re not mmaping a > file (e.g. for clang, if the file is small) it is easy enough to > malloc one byte more than you need to ensure the zero byte is present. > >> However, this can be, and often is, one past an allocated region, since >> in MemoryBufferMMapFile we mmap `Len` bytes and then call this with >> `BufEnd = Start + Len`: >> >> from lib/Support/MemoryBuffer.cpp:210: >>> MemoryBufferMMapFile(bool RequiresNullTerminator, int FD, uint64_t Len, >>> uint64_t Offset, std::error_code &EC) >>> : MFR(FD, sys::fs::mapped_file_region::readonly, >>> getLegalMapSize(Len, Offset), getLegalMapOffset(Offset), EC) { >>> if (!EC) { >>> const char *Start = getStart(Len, Offset); >>> init(Start, Start + Len, RequiresNullTerminator); >>> } >>> } >> >> This is UB, since we're reading one past an allocated region, and I have >> an internal test case where I'm seeing it crash because that memory >> happens to be unreadable. > > Well that is wrong. It should mmap the file, and if it knows that the > file is an exact multiple of a page size, it should copy it into a > malloc’d buffer that is one byte larger, and put a zero there.This doesn't seem to be working correctly, unless sys::Process::getPageSize() is returning the wrong thing for me. I'm getting here from getOpenFileImpl() in MemoryBuffer.cpp, which is deciding it's safe to mmap (by calling shouldUseMMap, where the page size check you mention is). The file is 561152 bytes, and the page size reports 16384, so this is not a multiple of page size (0x89000 & 0x3fff == 0x1000). Yet, the memory one byte past the mmap is not readable.>> There are three options here: >> >> 1. Remove the assert - it isn't possible to write correctly. > > This will break clang. Clang “knows” that files end with a nul byte.Strictly speaking, it won't "break" clang, it will just make a certain class of bugs harder to track down.>> 2. Remove the RequiresNullTerminator argument from MemoryBufferMMapFile >> and hard code it to false. >> 3. Allocate an extra byte here and zero it. > > #3 seems right to me assuming the problematic client is clang. If the > problematic client isn’t clang, then it should mirror whatever clang > is doing with this API. If Clang isn’t using this (which seems to be > the case on a cursory look?), then we should change this > implementation to respect memorybuffer’s invariants IMO.This isn't clang, but the problem is happening entirely within LLVM - I'm getting here from a call to MemoryBuffer::getFileOrSTDIN(SomeFile). I can work around the problem by updating my client to pass RequiresNullTerminator=false, but that doesn't actually fix the problem in LLVM itself.