On Tue, Sep 20, 2016 at 11:38:54PM +0300, Slawa Olhovchenkov
wrote:> On Tue, Sep 20, 2016 at 11:19:25PM +0300, Konstantin Belousov wrote:
>
> > On Tue, Sep 20, 2016 at 10:20:53PM +0300, Slawa Olhovchenkov wrote:
> > > On Tue, Sep 20, 2016 at 09:52:44AM +0300, Slawa Olhovchenkov
wrote:
> > >
> > > > On Mon, Sep 19, 2016 at 06:05:46PM -0700, John Baldwin
wrote:
> > > >
> > > > > > > If this panics, then vmspace_switch_aio() is
not working for
> > > > > > > some reason.
> > > > > >
> > > > > > I am try using next DTrace script:
> > > > > > ===> > > > > > #pragma D option
dynvarsize=64m
> > > > > >
> > > > > > int req[struct vmspace *, void *];
> > > > > > self int trace;
> > > > > >
> > > > > > syscall:freebsd:aio_read:entry
> > > > > > {
> > > > > > this->aio = *(struct aiocb
*)copyin(arg0, sizeof(struct aiocb));
> > > > > > req[curthread->td_proc->p_vmspace,
this->aio.aio_buf] = curthread->td_proc->p_pid;
> > > > > > }
> > > > > >
> > > > > > fbt:kernel:aio_process_rw:entry
> > > > > > {
> > > > > > self->job = args[0];
> > > > > > self->trace = 1;
> > > > > > }
> > > > > >
> > > > > > fbt:kernel:aio_process_rw:return
> > > > > > /self->trace/
> > > > > > {
> > > > > >
req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] =
0;
> > > > > > self->job = 0;
> > > > > > self->trace = 0;
> > > > > > }
> > > > > >
> > > > > > fbt:kernel:vn_io_fault:entry
> > > > > > /self->trace &&
!req[curthread->td_proc->p_vmspace, args[1]->uio_iov[0].iov_base]/
> > > > > > {
> > > > > > this->buf =
args[1]->uio_iov[0].iov_base;
> > > > > > printf("%Y vn_io_fault %p:%p pid
%d\n", walltimestamp, curthread->td_proc->p_vmspace, this->buf,
req[curthread->td_proc->p_vmspace, this->buf]);
> > > > > > }
> > > > > > ==> > > > > >
> > > > > > And don't got any messages near nginx core
dump.
> > > > > > What I can check next?
> > > > > > May be check context/address space switch for
kernel process?
> > > > >
> > > > > Which CPU are you using?
> > > >
> > > > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.04-MHz
K8-class CPU)
> > Is this sandy bridge ?
>
> Sandy Bridge EP
>
> > Show me first 100 lines of the verbose dmesg,
>
> After day or two, after end of this test run -- I am need to enable
verbose.
>
> > I want to see cpu features lines. In particular, does you CPU support
> > the INVPCID feature.
>
> CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.05-MHz K8-class CPU)
> Origin="GenuineIntel" Id=0x206d7 Family=0x6 Model=0x2d
Stepping=7
>
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>
Features2=0x1fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX>
> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
> AMD Features2=0x1<LAHF>
> XSAVE Features=0x1<XSAVEOPT>
> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
> TSC: P-state invariant, performance statistics
>
> I am don't see this feature before E5v3:
>
> CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz K8-class CPU)
> Origin="GenuineIntel" Id=0x306e4 Family=0x6 Model=0x3e
Stepping=4
>
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>
Features2=0x7fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
> AMD Features2=0x1<LAHF>
> Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
> XSAVE Features=0x1<XSAVEOPT>
> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
> TSC: P-state invariant, performance statistics
>
> (don't run 11.0 on this CPU)
Ok.
>
> CPU: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (2600.05-MHz K8-class CPU)
> Origin="GenuineIntel" Id=0x306f2 Family=0x6 Model=0x3f
Stepping=2
>
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>
Features2=0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
> AMD Features2=0x21<LAHF,ABM>
> Structured Extended
Features=0x37ab<FSGSBASE,TSCADJ,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,NFPUSG>
> XSAVE Features=0x1<XSAVEOPT>
> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
> TSC: P-state invariant, performance statistics
>
> (11.0 run w/o this issuse)
Do you mean that similarly configured nginx+aio do not demonstrate the
corruption on this machine ?
>
> > Also you may show me the 'sysctl vm.pmap' output.
>
> # sysctl vm.pmap
> vm.pmap.pdpe.demotions: 3
> vm.pmap.pde.promotions: 172495
> vm.pmap.pde.p_failures: 2119294
> vm.pmap.pde.mappings: 1927
> vm.pmap.pde.demotions: 126192
> vm.pmap.pcid_save_cnt: 0
> vm.pmap.invpcid_works: 0
> vm.pmap.pcid_enabled: 0
> vm.pmap.pg_ps_enabled: 1
> vm.pmap.pat_works: 1
>
> This is after vm.pmap.pcid_enabled=0 in loader.conf
>
> > > >
> > > > > Perhaps try disabling PCID support (I think
vm.pmap.pcid_enabled=0 from
> > > > > loader prompt or loader.conf)? (Wondering if
pmap_activate() is somehow not switching)
> > >
> > > I am need some more time to test (day or two), but now this is
like
> > > workaround/solution: 12h runtime and peak hour w/o nginx crash.
> > > (vm.pmap.pcid_enabled=0 in loader.conf).
> >
> > Please try this variation of the previous patch.
>
> and remove vm.pmap.pcid_enabled=0?
Definitely.
>
> > diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
> > index a23468e..f754652 100644
> > --- a/sys/vm/vm_map.c
> > +++ b/sys/vm/vm_map.c
> > @@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm)
> > if (oldvm == newvm)
> > return;
> >
> > + spinlock_enter();
> > /*
> > * Point to the new address space and refer to it.
> > */
> > @@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm)
> >
> > /* Activate the new mapping. */
> > pmap_activate(curthread);
> > + spinlock_exit();
> >
> > /* Remove the daemon's reference to the old address space. */
> > KASSERT(oldvm->vm_refcnt > 1,