On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote:> On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote: > > > I am have strange issuse with nginx on FreeBSD11. > > I am have FreeBSD11 instaled over STABLE-10. > > nginx build for FreeBSD10 and run w/o recompile work fine. > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node > > totaly craped. > > > > I am see next potential cause: > > > > 1) clang 3.8 code generation issuse > > 2) system library issuse > > > > may be i am miss something? > > > > How to find real cause? > > I find real cause and this like show-stopper for RELEASE. > I am use nginx with AIO and AIO from one nginx process corrupt memory > from other nginx process. Yes, this is cross-process memory > corruption. > > Last case, core dumped proccess with pid 1060 at 15:45:14. > Corruped memory at 0x860697000. > I am know about good memory at 0x86067f800. > Dumping (form core) this region to file and analyze by hexdump I am > found start of corrupt region -- offset 0000c8c0 from 0x86067f800. > 0x86067f800+0xc8c0 = 0x86068c0c0 > > I am preliminary enabled debuggin of AIO started operation to nginx > error log (memory address, file name, offset and size of transfer). > > grep -i 86068c0c0 error.log near 15:45:14 give target file. > grep ce949665cbcd.hls error.log near 15:45:14 give next result: > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 000000082065DB60 start 000000086068C0C0 561b0 2646736 ce949665cbcd.hls > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 000000081F1FFB60 start 000000086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 00000008216B6B60 start 000000086472B7C0 7ff70 2999424 ce949665cbcd.hlsDoes nginx only use AIO for regular files or does it also use it with sockets? You can try using this patch as a diagnostic (you will need to run with INVARIANTS enabled, or at least enabled for vfs_aio.c): Index: vfs_aio.c ==================================================================--- vfs_aio.c (revision 305811) +++ vfs_aio.c (working copy) @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job) * aio_aqueue() acquires a reference to the file that is * released in aio_free_entry(). */ + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, + ("%s: vmspace mismatch", __func__)); if (cb->aio_lio_opcode == LIO_READ) { auio.uio_rw = UIO_READ; if (auio.uio_resid == 0) @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job) { vmspace_switch_aio(job->userproc->p_vmspace); + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, + ("%s: vmspace mismatch", __func__)); } If this panics, then vmspace_switch_aio() is not working for some reason. -- John Baldwin
On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote:> On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote: > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote: > > > > > I am have strange issuse with nginx on FreeBSD11. > > > I am have FreeBSD11 instaled over STABLE-10. > > > nginx build for FreeBSD10 and run w/o recompile work fine. > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node > > > totaly craped. > > > > > > I am see next potential cause: > > > > > > 1) clang 3.8 code generation issuse > > > 2) system library issuse > > > > > > may be i am miss something? > > > > > > How to find real cause? > > > > I find real cause and this like show-stopper for RELEASE. > > I am use nginx with AIO and AIO from one nginx process corrupt memory > > from other nginx process. Yes, this is cross-process memory > > corruption. > > > > Last case, core dumped proccess with pid 1060 at 15:45:14. > > Corruped memory at 0x860697000. > > I am know about good memory at 0x86067f800. > > Dumping (form core) this region to file and analyze by hexdump I am > > found start of corrupt region -- offset 0000c8c0 from 0x86067f800. > > 0x86067f800+0xc8c0 = 0x86068c0c0 > > > > I am preliminary enabled debuggin of AIO started operation to nginx > > error log (memory address, file name, offset and size of transfer). > > > > grep -i 86068c0c0 error.log near 15:45:14 give target file. > > grep ce949665cbcd.hls error.log near 15:45:14 give next result: > > > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 000000082065DB60 start 000000086068C0C0 561b0 2646736 ce949665cbcd.hls > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 000000081F1FFB60 start 000000086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 00000008216B6B60 start 000000086472B7C0 7ff70 2999424 ce949665cbcd.hls > > Does nginx only use AIO for regular files or does it also use it with sockets?Only for regular files.> You can try using this patch as a diagnostic (you will need to > run with INVARIANTS enabled,How much debugs produced? I am have about 5-10K aio's per second.> or at least enabled for vfs_aio.c):How I can do this (enable INVARIANTS for vfs_aio.c)?> Index: vfs_aio.c > ==================================================================> --- vfs_aio.c (revision 305811) > +++ vfs_aio.c (working copy) > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job) > * aio_aqueue() acquires a reference to the file that is > * released in aio_free_entry(). > */ > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > + ("%s: vmspace mismatch", __func__)); > if (cb->aio_lio_opcode == LIO_READ) { > auio.uio_rw = UIO_READ; > if (auio.uio_resid == 0) > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job) > { > > vmspace_switch_aio(job->userproc->p_vmspace); > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > + ("%s: vmspace mismatch", __func__)); > } > > If this panics, then vmspace_switch_aio() is not working for > some reason.This issuse caused rare, this panic produced with issuse or on any aio request? (this is production server)
On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote:> On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote: > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote: > > > > > I am have strange issuse with nginx on FreeBSD11. > > > I am have FreeBSD11 instaled over STABLE-10. > > > nginx build for FreeBSD10 and run w/o recompile work fine. > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node > > > totaly craped. > > > > > > I am see next potential cause: > > > > > > 1) clang 3.8 code generation issuse > > > 2) system library issuse > > > > > > may be i am miss something? > > > > > > How to find real cause? > > > > I find real cause and this like show-stopper for RELEASE. > > I am use nginx with AIO and AIO from one nginx process corrupt memory > > from other nginx process. Yes, this is cross-process memory > > corruption. > > > > Last case, core dumped proccess with pid 1060 at 15:45:14. > > Corruped memory at 0x860697000. > > I am know about good memory at 0x86067f800. > > Dumping (form core) this region to file and analyze by hexdump I am > > found start of corrupt region -- offset 0000c8c0 from 0x86067f800. > > 0x86067f800+0xc8c0 = 0x86068c0c0 > > > > I am preliminary enabled debuggin of AIO started operation to nginx > > error log (memory address, file name, offset and size of transfer). > > > > grep -i 86068c0c0 error.log near 15:45:14 give target file. > > grep ce949665cbcd.hls error.log near 15:45:14 give next result: > > > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 000000082065DB60 start 000000086068C0C0 561b0 2646736 ce949665cbcd.hls > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 000000081F1FFB60 start 000000086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 00000008216B6B60 start 000000086472B7C0 7ff70 2999424 ce949665cbcd.hls > > Does nginx only use AIO for regular files or does it also use it with sockets? > > You can try using this patch as a diagnostic (you will need to > run with INVARIANTS enabled, or at least enabled for vfs_aio.c): > > Index: vfs_aio.c > ==================================================================> --- vfs_aio.c (revision 305811) > +++ vfs_aio.c (working copy) > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job) > * aio_aqueue() acquires a reference to the file that is > * released in aio_free_entry(). > */ > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > + ("%s: vmspace mismatch", __func__)); > if (cb->aio_lio_opcode == LIO_READ) { > auio.uio_rw = UIO_READ; > if (auio.uio_resid == 0) > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job) > { > > vmspace_switch_aio(job->userproc->p_vmspace); > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > + ("%s: vmspace mismatch", __func__)); > } > > If this panics, then vmspace_switch_aio() is not working for > some reason.I am try using next DTrace script: ===#pragma D option dynvarsize=64m int req[struct vmspace *, void *]; self int trace; syscall:freebsd:aio_read:entry { this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct aiocb)); req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = curthread->td_proc->p_pid; } fbt:kernel:aio_process_rw:entry { self->job = args[0]; self->trace = 1; } fbt:kernel:aio_process_rw:return /self->trace/ { req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] = 0; self->job = 0; self->trace = 0; } fbt:kernel:vn_io_fault:entry /self->trace && !req[curthread->td_proc->p_vmspace, args[1]->uio_iov[0].iov_base]/ { this->buf = args[1]->uio_iov[0].iov_base; printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, curthread->td_proc->p_vmspace, this->buf, req[curthread->td_proc->p_vmspace, this->buf]); } == And don't got any messages near nginx core dump. What I can check next? May be check context/address space switch for kernel process?