On Thursday, September 15, 2016 08:49:48 PM Slawa Olhovchenkov wrote:> On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote: > > > On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote: > > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote: > > > > > > > I am have strange issuse with nginx on FreeBSD11. > > > > I am have FreeBSD11 instaled over STABLE-10. > > > > nginx build for FreeBSD10 and run w/o recompile work fine. > > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node > > > > totaly craped. > > > > > > > > I am see next potential cause: > > > > > > > > 1) clang 3.8 code generation issuse > > > > 2) system library issuse > > > > > > > > may be i am miss something? > > > > > > > > How to find real cause? > > > > > > I find real cause and this like show-stopper for RELEASE. > > > I am use nginx with AIO and AIO from one nginx process corrupt memory > > > from other nginx process. Yes, this is cross-process memory > > > corruption. > > > > > > Last case, core dumped proccess with pid 1060 at 15:45:14. > > > Corruped memory at 0x860697000. > > > I am know about good memory at 0x86067f800. > > > Dumping (form core) this region to file and analyze by hexdump I am > > > found start of corrupt region -- offset 0000c8c0 from 0x86067f800. > > > 0x86067f800+0xc8c0 = 0x86068c0c0 > > > > > > I am preliminary enabled debuggin of AIO started operation to nginx > > > error log (memory address, file name, offset and size of transfer). > > > > > > grep -i 86068c0c0 error.log near 15:45:14 give target file. > > > grep ce949665cbcd.hls error.log near 15:45:14 give next result: > > > > > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 000000082065DB60 start 000000086068C0C0 561b0 2646736 ce949665cbcd.hls > > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 000000081F1FFB60 start 000000086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls > > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 00000008216B6B60 start 000000086472B7C0 7ff70 2999424 ce949665cbcd.hls > > > > Does nginx only use AIO for regular files or does it also use it with sockets? > > Only for regular files. > > > You can try using this patch as a diagnostic (you will need to > > run with INVARIANTS enabled, > > How much debugs produced? > I am have about 5-10K aio's per second. > > > or at least enabled for vfs_aio.c): > > How I can do this (enable INVARIANTS for vfs_aio.c)?Include INVARIANT_SUPPORT in your kernel and add a line with: #define INVARIANTS at the top of sys/kern/vfs_aio.c.> > > Index: vfs_aio.c > > ==================================================================> > --- vfs_aio.c (revision 305811) > > +++ vfs_aio.c (working copy) > > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job) > > * aio_aqueue() acquires a reference to the file that is > > * released in aio_free_entry(). > > */ > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > + ("%s: vmspace mismatch", __func__)); > > if (cb->aio_lio_opcode == LIO_READ) { > > auio.uio_rw = UIO_READ; > > if (auio.uio_resid == 0) > > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job) > > { > > > > vmspace_switch_aio(job->userproc->p_vmspace); > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > + ("%s: vmspace mismatch", __func__)); > > } > > > > If this panics, then vmspace_switch_aio() is not working for > > some reason. > > This issuse caused rare, this panic produced with issuse or on any aio > request? (this is production server)It would panic in the case that we are going to write into the wrong process (so about as rare as your issue). -- John Baldwin
On Thu, Sep 15, 2016 at 11:54:12AM -0700, John Baldwin wrote:> > > Index: vfs_aio.c > > > ==================================================================> > > --- vfs_aio.c (revision 305811) > > > +++ vfs_aio.c (working copy) > > > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job) > > > * aio_aqueue() acquires a reference to the file that is > > > * released in aio_free_entry(). > > > */ > > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > > + ("%s: vmspace mismatch", __func__)); > > > if (cb->aio_lio_opcode == LIO_READ) { > > > auio.uio_rw = UIO_READ; > > > if (auio.uio_resid == 0) > > > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job) > > > { > > > > > > vmspace_switch_aio(job->userproc->p_vmspace); > > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > > + ("%s: vmspace mismatch", __func__)); > > > } > > > > > > If this panics, then vmspace_switch_aio() is not working for > > > some reason. > > > > This issuse caused rare, this panic produced with issuse or on any aio > > request? (this is production server) > > It would panic in the case that we are going to write into the wrong > process (so about as rare as your issue).Can I configure automatic reboot (not halted) in this case?
On Thu, Sep 15, 2016 at 11:54:12AM -0700, John Baldwin wrote:> On Thursday, September 15, 2016 08:49:48 PM Slawa Olhovchenkov wrote: > > On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote: > > > > > On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote: > > > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote: > > > > > > > > > I am have strange issuse with nginx on FreeBSD11. > > > > > I am have FreeBSD11 instaled over STABLE-10. > > > > > nginx build for FreeBSD10 and run w/o recompile work fine. > > > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node > > > > > totaly craped. > > > > > > > > > > I am see next potential cause: > > > > > > > > > > 1) clang 3.8 code generation issuse > > > > > 2) system library issuse > > > > > > > > > > may be i am miss something? > > > > > > > > > > How to find real cause? > > > > > > > > I find real cause and this like show-stopper for RELEASE. > > > > I am use nginx with AIO and AIO from one nginx process corrupt memory > > > > from other nginx process. Yes, this is cross-process memory > > > > corruption. > > > > > > > > Last case, core dumped proccess with pid 1060 at 15:45:14. > > > > Corruped memory at 0x860697000. > > > > I am know about good memory at 0x86067f800. > > > > Dumping (form core) this region to file and analyze by hexdump I am > > > > found start of corrupt region -- offset 0000c8c0 from 0x86067f800. > > > > 0x86067f800+0xc8c0 = 0x86068c0c0 > > > > > > > > I am preliminary enabled debuggin of AIO started operation to nginx > > > > error log (memory address, file name, offset and size of transfer). > > > > > > > > grep -i 86068c0c0 error.log near 15:45:14 give target file. > > > > grep ce949665cbcd.hls error.log near 15:45:14 give next result: > > > > > > > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 000000082065DB60 start 000000086068C0C0 561b0 2646736 ce949665cbcd.hls > > > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 000000081F1FFB60 start 000000086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls > > > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 00000008216B6B60 start 000000086472B7C0 7ff70 2999424 ce949665cbcd.hls > > > > > > Does nginx only use AIO for regular files or does it also use it with sockets? > > > > Only for regular files. > > > > > You can try using this patch as a diagnostic (you will need to > > > run with INVARIANTS enabled, > > > > How much debugs produced? > > I am have about 5-10K aio's per second. > > > > > or at least enabled for vfs_aio.c): > > > > How I can do this (enable INVARIANTS for vfs_aio.c)? > > Include INVARIANT_SUPPORT in your kernel and add a line with: > > #define INVARIANTS > > at the top of sys/kern/vfs_aio.c.# sysctl -a | grep -i inva kern.timecounter.invariant_tsc: 1 kern.features.invariant_support: 1 options INVARIANT_SUPPORT but no string `vmspace mismatch` in kernel. May be I am miss some? #define INVARIANTS #include <sys/cdefs.h> __FBSDID("$FreeBSD: stable/11/sys/kern/vfs_aio.c 304738 2016-08-24 09:18:38Z kib $"); #include "opt_compat.h" #include <sys/param.h> #include <sys/systm.h> #include <sys/malloc.h> #include <sys/bio.h> #include <sys/buf.h>
On Thu, Sep 15, 2016 at 11:54:12AM -0700, John Baldwin wrote:> On Thursday, September 15, 2016 08:49:48 PM Slawa Olhovchenkov wrote: > > On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote: > > > > > On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote: > > > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote: > > > > > > > > > I am have strange issuse with nginx on FreeBSD11. > > > > > I am have FreeBSD11 instaled over STABLE-10. > > > > > nginx build for FreeBSD10 and run w/o recompile work fine. > > > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node > > > > > totaly craped. > > > > > > > > > > I am see next potential cause: > > > > > > > > > > 1) clang 3.8 code generation issuse > > > > > 2) system library issuse > > > > > > > > > > may be i am miss something? > > > > > > > > > > How to find real cause? > > > > > > > > I find real cause and this like show-stopper for RELEASE. > > > > I am use nginx with AIO and AIO from one nginx process corrupt memory > > > > from other nginx process. Yes, this is cross-process memory > > > > corruption. > > > > > > > > Last case, core dumped proccess with pid 1060 at 15:45:14. > > > > Corruped memory at 0x860697000. > > > > I am know about good memory at 0x86067f800. > > > > Dumping (form core) this region to file and analyze by hexdump I am > > > > found start of corrupt region -- offset 0000c8c0 from 0x86067f800. > > > > 0x86067f800+0xc8c0 = 0x86068c0c0 > > > > > > > > I am preliminary enabled debuggin of AIO started operation to nginx > > > > error log (memory address, file name, offset and size of transfer). > > > > > > > > grep -i 86068c0c0 error.log near 15:45:14 give target file. > > > > grep ce949665cbcd.hls error.log near 15:45:14 give next result: > > > > > > > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 000000082065DB60 start 000000086068C0C0 561b0 2646736 ce949665cbcd.hls > > > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 000000081F1FFB60 start 000000086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls > > > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 00000008216B6B60 start 000000086472B7C0 7ff70 2999424 ce949665cbcd.hls > > > > > > Does nginx only use AIO for regular files or does it also use it with sockets? > > > > Only for regular files. > > > > > You can try using this patch as a diagnostic (you will need to > > > run with INVARIANTS enabled, > > > > How much debugs produced? > > I am have about 5-10K aio's per second. > > > > > or at least enabled for vfs_aio.c): > > > > How I can do this (enable INVARIANTS for vfs_aio.c)? > > Include INVARIANT_SUPPORT in your kernel and add a line with: > > #define INVARIANTS > > at the top of sys/kern/vfs_aio.c. > > > > > > Index: vfs_aio.c > > > ==================================================================> > > --- vfs_aio.c (revision 305811) > > > +++ vfs_aio.c (working copy) > > > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job) > > > * aio_aqueue() acquires a reference to the file that is > > > * released in aio_free_entry(). > > > */ > > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > > + ("%s: vmspace mismatch", __func__)); > > > if (cb->aio_lio_opcode == LIO_READ) { > > > auio.uio_rw = UIO_READ; > > > if (auio.uio_resid == 0) > > > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job) > > > { > > > > > > vmspace_switch_aio(job->userproc->p_vmspace); > > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > > + ("%s: vmspace mismatch", __func__)); > > > } > > > > > > If this panics, then vmspace_switch_aio() is not working for > > > some reason. > > > > This issuse caused rare, this panic produced with issuse or on any aio > > request? (this is production server) > > It would panic in the case that we are going to write into the wrong > process (so about as rare as your issue). >vmspace_switch_aio() allows context switching with old curpmap and new proc->p_vmspace. This is a weird condition, where curproc->p_vmspace->vm_pmap is not equal to curcpu->pc_curpmap. I do not see an obvious place which would immediately break, e.g. even for context switch between assignment of newvm to p_vmspace and pmap_activate(), the context-switch call to pmap_activate_sw() seems to do right thing. Still, just in case, try this diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c index a23468e..fbaa6c1 100644 --- a/sys/vm/vm_map.c +++ b/sys/vm/vm_map.c @@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm) if (oldvm == newvm) return; + critical_enter(); /* * Point to the new address space and refer to it. */ @@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm) /* Activate the new mapping. */ pmap_activate(curthread); + critical_exit(); /* Remove the daemon's reference to the old address space. */ KASSERT(oldvm->vm_refcnt > 1,