On Thu, Jun 09, 2022 at 04:24:12PM +0100, Richard W.M. Jones wrote:> On Thu, Jun 09, 2022 at 03:20:02PM +0100, Daniel P. Berrang? wrote: > > > + go test -count=1 -v > > > === RUN Test010Load > > > --- PASS: Test010Load (0.00s) > > > === RUN TestAioBuffer > > > --- PASS: TestAioBuffer (0.00s) > > > === RUN TestAioBufferFree > > > --- PASS: TestAioBufferFree (0.00s) > > > === RUN TestAioBufferBytesAfterFree > > > SIGABRT: abort > > > PC=0x3fdf6f9bac m=0 sigcode=18446744073709551610 > > > > So suggesting TestAioBufferBytesAfterFree is as fault, but quite > > odd as that test case is trivial and whle it allocates some > > native memory it doesn't seem to write to it. Unless the problem > > happened in an earlier test case and we have delayed detection ? > > > > I guess I'd try throwing darts at the wall by chopping out bits > > of test code to see what makes it disappear. > > > > Perhaps also try swapping MakeAioBuffer with MakeAioBufferZero > > in case pre-existing data into the C.malloc()d block is confusing > > Go ? > > Interestingly if I remove libnbd_020_aio_buffer_test.go completely, > and disable GODEBUG, then the tests pass. (Reproducer commands at end > of email). So I guess at least one of the problems is confined to > this test and/or functions it calls in the main library. > Unfortunately this test is huge. > > At your suggestion, replacing every MakeAioBuffer with > MakeAioBufferZero in that test, but it didn't help. Also tried > replacing malloc -> calloc in the aio_buffer.go implementation which > didn't help. > > I'll try some more random things ...Adding a few printf's shows something interesting: === RUN TestAioBufferBytesAfterFree calling Free on 0x3fbc1882b0 calling C.GoBytes on 0x3fbc1882b0 SIGABRT: abort PC=0x3fe6aaebac m=0 sigcode=18446744073709551610 goroutine 21 [running]: gsignal :0 abort :0 runtime.throwException ../../../libgo/runtime/go-unwind.c:128 runtime.unwindStack ../../../libgo/go/runtime/panic.go:535 panic ../../../libgo/go/runtime/panic.go:750 runtime.panicmem ../../../libgo/go/runtime/panic.go:210 runtime.sigpanic ../../../libgo/go/runtime/signal_unix.go:634 _wordcopy_fwd_aligned :0 __GI_memmove :0 runtime.stringtoslicebyte ../../../libgo/go/runtime/string.go:155 __go_string_to_byte_array ../../../libgo/go/runtime/string.go:509 _cgo_23192bdcbd72_Cfunc_GoBytes ./cgo-c-prolog-gccgo:46 This is a simple use after free because the Free function in aio_buffer.go frees the array and then the Bytes function attempts to copy b.Size bytes from the NULL pointer. I didn't write this test so I'm not quite sure what it's trying to achieve. It seems to be deliberately trying to cause a panic, but causes a segfault instead? (And why only on RISC-V?) func TestAioBufferBytesAfterFree(t *testing.T) { buf := MakeAioBuffer(uint(32)) buf.Free() defer func() { if r := recover(); r == nil { t.Fatal("Did not recover from panic calling Bytes() after Free()") } }() buf.Bytes() } Since this only happens on RISC-V I guess it might be something to do with the golang implementation on this architecture being unable to turn segfaults into panics. Removing all three *AfterFree tests fixes the tests. It seems a bit of an odd function however. Wouldn't it be better to changes the Bytes function so that it tests if the pointer is NULL and panics? NB: this _does not_ address the other problem where GODEBUG=cgocheck=2 complains about "fatal error: Go pointer stored into non-Go memory". Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
On Thu, Jun 09, 2022 at 04:48:34PM +0100, Richard W.M. Jones wrote:> On Thu, Jun 09, 2022 at 04:24:12PM +0100, Richard W.M. Jones wrote: > > On Thu, Jun 09, 2022 at 03:20:02PM +0100, Daniel P. Berrang? wrote: > > > > + go test -count=1 -v > > > > === RUN Test010Load > > > > --- PASS: Test010Load (0.00s) > > > > === RUN TestAioBuffer > > > > --- PASS: TestAioBuffer (0.00s) > > > > === RUN TestAioBufferFree > > > > --- PASS: TestAioBufferFree (0.00s) > > > > === RUN TestAioBufferBytesAfterFree > > > > SIGABRT: abort > > > > PC=0x3fdf6f9bac m=0 sigcode=18446744073709551610 > > > > > > So suggesting TestAioBufferBytesAfterFree is as fault, but quite > > > odd as that test case is trivial and whle it allocates some > > > native memory it doesn't seem to write to it. Unless the problem > > > happened in an earlier test case and we have delayed detection ? > > > > > > I guess I'd try throwing darts at the wall by chopping out bits > > > of test code to see what makes it disappear. > > > > > > Perhaps also try swapping MakeAioBuffer with MakeAioBufferZero > > > in case pre-existing data into the C.malloc()d block is confusing > > > Go ? > > > > Interestingly if I remove libnbd_020_aio_buffer_test.go completely, > > and disable GODEBUG, then the tests pass. (Reproducer commands at end > > of email). So I guess at least one of the problems is confined to > > this test and/or functions it calls in the main library. > > Unfortunately this test is huge. > > > > At your suggestion, replacing every MakeAioBuffer with > > MakeAioBufferZero in that test, but it didn't help. Also tried > > replacing malloc -> calloc in the aio_buffer.go implementation which > > didn't help. > > > > I'll try some more random things ... > > Adding a few printf's shows something interesting: > > === RUN TestAioBufferBytesAfterFree > calling Free on 0x3fbc1882b0 > calling C.GoBytes on 0x3fbc1882b0 > SIGABRT: abort > PC=0x3fe6aaebac m=0 sigcode=18446744073709551610 > > goroutine 21 [running]: > gsignal > :0 > abort > :0 > runtime.throwException > ../../../libgo/runtime/go-unwind.c:128 > runtime.unwindStack > ../../../libgo/go/runtime/panic.go:535 > panic > ../../../libgo/go/runtime/panic.go:750 > runtime.panicmem > ../../../libgo/go/runtime/panic.go:210 > runtime.sigpanic > ../../../libgo/go/runtime/signal_unix.go:634 > _wordcopy_fwd_aligned > :0 > __GI_memmove > :0 > runtime.stringtoslicebyte > ../../../libgo/go/runtime/string.go:155 > __go_string_to_byte_array > ../../../libgo/go/runtime/string.go:509 > _cgo_23192bdcbd72_Cfunc_GoBytes > ./cgo-c-prolog-gccgo:46 > > This is a simple use after free because the Free function in > aio_buffer.go frees the array and then the Bytes function attempts to > copy b.Size bytes from the NULL pointer.Well it isn't use-after-free, because we've cleared the pointer we freed. Rather we're simply trying to reference the NULL pointer,> I didn't write this test so I'm not quite sure what it's trying to > achieve. It seems to be deliberately trying to cause a panic, but > causes a segfault instead? (And why only on RISC-V?)IIUC, the kernel will map a page without read/write perms at address 0x0 in userspace. Thus a NULL pointer reference causes a SEGV to userspace. Golang tries to catch this SEGV and turn it into a panic I assume. Assuming the kernel isn't doing something wierd on Risc-V with the userspace mapping at 0x0, then points to the Golang SEGV handler on RISCV.> > func TestAioBufferBytesAfterFree(t *testing.T) { > buf := MakeAioBuffer(uint(32)) > buf.Free() > > defer func() { > if r := recover(); r == nil { > t.Fatal("Did not recover from panic calling Bytes() after Free()") > } > }() > > buf.Bytes() > }> > Since this only happens on RISC-V I guess it might be something to do > with the golang implementation on this architecture being unable to > turn segfaults into panics. > > Removing all three *AfterFree tests fixes the tests. > > It seems a bit of an odd function however. Wouldn't it be better to > changes the Bytes function so that it tests if the pointer is NULL and > panics?In theory I guess both should be equivalent in terms of semantics for the caller. Also I feel like 'Free' ought to set 'b.Size = 0' after it set 'b.P = nul'. That should solve the problem for the Bytes & Slice method tests at least, but probably not the Get method test.> NB: this _does not_ address the other problem where GODEBUG=cgocheck=2 > complains about "fatal error: Go pointer stored into non-Go memory".Maybe that message across comes from the Go signal handler that's trying to cope with the SEGV from the NULL reference, causing it to trip over itself & thus not turn the problem into a pnaic. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Jun 9, 2022 at 6:48 PM Richard W.M. Jones <rjones at redhat.com> wrote:> > On Thu, Jun 09, 2022 at 04:24:12PM +0100, Richard W.M. Jones wrote: > > On Thu, Jun 09, 2022 at 03:20:02PM +0100, Daniel P. Berrang? wrote: > > > > + go test -count=1 -v > > > > === RUN Test010Load > > > > --- PASS: Test010Load (0.00s) > > > > === RUN TestAioBuffer > > > > --- PASS: TestAioBuffer (0.00s) > > > > === RUN TestAioBufferFree > > > > --- PASS: TestAioBufferFree (0.00s) > > > > === RUN TestAioBufferBytesAfterFree > > > > SIGABRT: abort > > > > PC=0x3fdf6f9bac m=0 sigcode=18446744073709551610 > > > > > > So suggesting TestAioBufferBytesAfterFree is as fault, but quite > > > odd as that test case is trivial and whle it allocates some > > > native memory it doesn't seem to write to it. Unless the problem > > > happened in an earlier test case and we have delayed detection ? > > > > > > I guess I'd try throwing darts at the wall by chopping out bits > > > of test code to see what makes it disappear. > > > > > > Perhaps also try swapping MakeAioBuffer with MakeAioBufferZero > > > in case pre-existing data into the C.malloc()d block is confusing > > > Go ? > > > > Interestingly if I remove libnbd_020_aio_buffer_test.go completely, > > and disable GODEBUG, then the tests pass. (Reproducer commands at end > > of email). So I guess at least one of the problems is confined to > > this test and/or functions it calls in the main library. > > Unfortunately this test is huge. > > > > At your suggestion, replacing every MakeAioBuffer with > > MakeAioBufferZero in that test, but it didn't help. Also tried > > replacing malloc -> calloc in the aio_buffer.go implementation which > > didn't help. > > > > I'll try some more random things ... > > Adding a few printf's shows something interesting: > > === RUN TestAioBufferBytesAfterFree > calling Free on 0x3fbc1882b0 > calling C.GoBytes on 0x3fbc1882b0 > SIGABRT: abort > PC=0x3fe6aaebac m=0 sigcode=18446744073709551610 > > goroutine 21 [running]: > gsignal > :0 > abort > :0 > runtime.throwException > ../../../libgo/runtime/go-unwind.c:128 > runtime.unwindStack > ../../../libgo/go/runtime/panic.go:535 > panic > ../../../libgo/go/runtime/panic.go:750 > runtime.panicmem > ../../../libgo/go/runtime/panic.go:210 > runtime.sigpanic > ../../../libgo/go/runtime/signal_unix.go:634 > _wordcopy_fwd_aligned > :0 > __GI_memmove > :0 > runtime.stringtoslicebyte > ../../../libgo/go/runtime/string.go:155 > __go_string_to_byte_array > ../../../libgo/go/runtime/string.go:509 > _cgo_23192bdcbd72_Cfunc_GoBytes > ./cgo-c-prolog-gccgo:46 > > This is a simple use after free because the Free function in > aio_buffer.go frees the array and then the Bytes function attempts to > copy b.Size bytes from the NULL pointer. > > I didn't write this test so I'm not quite sure what it's trying to > achieve.The test verifies that using the buffer in the wrong way fails in a clean way (panic) and not silent double free like it was before https://gitlab.com/nbdkit/libnbd/-/commit/3394f47556cac009fa7d39c9e2f7e5f2468bd65d> It seems to be deliberately trying to cause a panic, but > causes a segfault instead? (And why only on RISC-V?) > > func TestAioBufferBytesAfterFree(t *testing.T) { > buf := MakeAioBuffer(uint(32)) > buf.Free() > > defer func() { > if r := recover(); r == nil { > t.Fatal("Did not recover from panic calling Bytes() after Free()") > } > }() > > buf.Bytes() > } > > Since this only happens on RISC-V I guess it might be something to do > with the golang implementation on this architecture being unable to > turn segfaults into panics. > > Removing all three *AfterFree tests fixes the tests.But this hides the real issue - if users use Bytes() in the wrong way, we want the panic, not the segfault - the tests are good!> It seems a bit of an odd function however. Wouldn't it be better to > changes the Bytes function so that it tests if the pointer is NULL and > panics?I cannot find now any docs for GoBytes, maybe I tested that it panics in this case, but this does not work this arch (bug?). Panic with a good error message about the wrong usage will be much better.> > NB: this _does not_ address the other problem where GODEBUG=cgocheck=2 > complains about "fatal error: Go pointer stored into non-Go memory".Do we keep go pointers in buffers allocated in C? Nir