On Tue, Sep 20, 2016 at 3:47 PM, Slawa Olhovchenkov <slw at zxy.spb.ru>
wrote:> On Wed, Sep 21, 2016 at 12:15:17AM +0300, Konstantin Belousov wrote:
>
>> On Tue, Sep 20, 2016 at 11:38:54PM +0300, Slawa Olhovchenkov wrote:
>> > On Tue, Sep 20, 2016 at 11:19:25PM +0300, Konstantin Belousov
wrote:
>> >
>> > > On Tue, Sep 20, 2016 at 10:20:53PM +0300, Slawa Olhovchenkov
wrote:
>> > > > On Tue, Sep 20, 2016 at 09:52:44AM +0300, Slawa
Olhovchenkov wrote:
>> > > >
>> > > > > On Mon, Sep 19, 2016 at 06:05:46PM -0700, John
Baldwin wrote:
>> > > > >
>> > > > > > > > If this panics, then
vmspace_switch_aio() is not working for
>> > > > > > > > some reason.
>> > > > > > >
>> > > > > > > I am try using next DTrace script:
>> > > > > > > ===>> > > > > > >
#pragma D option dynvarsize=64m
>> > > > > > >
>> > > > > > > int req[struct vmspace *, void *];
>> > > > > > > self int trace;
>> > > > > > >
>> > > > > > > syscall:freebsd:aio_read:entry
>> > > > > > > {
>> > > > > > > this->aio = *(struct aiocb
*)copyin(arg0, sizeof(struct aiocb));
>> > > > > > >
req[curthread->td_proc->p_vmspace, this->aio.aio_buf] =
curthread->td_proc->p_pid;
>> > > > > > > }
>> > > > > > >
>> > > > > > > fbt:kernel:aio_process_rw:entry
>> > > > > > > {
>> > > > > > > self->job = args[0];
>> > > > > > > self->trace = 1;
>> > > > > > > }
>> > > > > > >
>> > > > > > > fbt:kernel:aio_process_rw:return
>> > > > > > > /self->trace/
>> > > > > > > {
>> > > > > > >
req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] =
0;
>> > > > > > > self->job = 0;
>> > > > > > > self->trace = 0;
>> > > > > > > }
>> > > > > > >
>> > > > > > > fbt:kernel:vn_io_fault:entry
>> > > > > > > /self->trace &&
!req[curthread->td_proc->p_vmspace, args[1]->uio_iov[0].iov_base]/
>> > > > > > > {
>> > > > > > > this->buf =
args[1]->uio_iov[0].iov_base;
>> > > > > > > printf("%Y vn_io_fault %p:%p
pid %d\n", walltimestamp, curthread->td_proc->p_vmspace,
this->buf, req[curthread->td_proc->p_vmspace, this->buf]);
>> > > > > > > }
>> > > > > > > ==>> > > > > > >
>> > > > > > > And don't got any messages near nginx
core dump.
>> > > > > > > What I can check next?
>> > > > > > > May be check context/address space switch
for kernel process?
>> > > > > >
>> > > > > > Which CPU are you using?
>> > > > >
>> > > > > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
(2000.04-MHz K8-class CPU)
>> > > Is this sandy bridge ?
>> >
>> > Sandy Bridge EP
>> >
>> > > Show me first 100 lines of the verbose dmesg,
>> >
>> > After day or two, after end of this test run -- I am need to
enable verbose.
>> >
>> > > I want to see cpu features lines. In particular, does you
CPU support
>> > > the INVPCID feature.
>> >
>> > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.05-MHz
K8-class CPU)
>> > Origin="GenuineIntel" Id=0x206d7 Family=0x6
Model=0x2d Stepping=7
>> >
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>> >
Features2=0x1fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX>
>> > AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>> > AMD Features2=0x1<LAHF>
>> > XSAVE Features=0x1<XSAVEOPT>
>> > VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
>> > TSC: P-state invariant, performance statistics
>> >
>> > I am don't see this feature before E5v3:
>> >
>> > CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz
K8-class CPU)
>> > Origin="GenuineIntel" Id=0x306e4 Family=0x6
Model=0x3e Stepping=4
>> >
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>> >
Features2=0x7fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
>> > AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>> > AMD Features2=0x1<LAHF>
>> > Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
>> > XSAVE Features=0x1<XSAVEOPT>
>> > VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>> > TSC: P-state invariant, performance statistics
>> >
>> > (don't run 11.0 on this CPU)
>> Ok.
>>
>> >
>> > CPU: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (2600.05-MHz
K8-class CPU)
>> > Origin="GenuineIntel" Id=0x306f2 Family=0x6
Model=0x3f Stepping=2
>> >
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>> >
Features2=0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
>> > AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>> > AMD Features2=0x21<LAHF,ABM>
>> > Structured Extended
Features=0x37ab<FSGSBASE,TSCADJ,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,NFPUSG>
>> > XSAVE Features=0x1<XSAVEOPT>
>> > VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>> > TSC: P-state invariant, performance statistics
>> >
>> > (11.0 run w/o this issuse)
>> Do you mean that similarly configured nginx+aio do not demonstrate the
corruption on this machine ?
>
> Yes.
> But different storage configuration and different pattern load.
>
> Also 11.0 run w/o this issuse on
>
> CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (2200.04-MHz K8-class CPU)
> Origin="GenuineIntel" Id=0x406f1 Family=0x6 Model=0x4f
Stepping=1
>
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>
Features2=0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
> AMD Features2=0x121<LAHF,ABM,Prefetch>
> Structured Extended
Features=0x21cbfbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,NFPUSG,PQE,RDSEED,ADX,SMAP,PROCTRACE>
> XSAVE Features=0x1<XSAVEOPT>
> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
> TSC: P-state invariant, performance statistics
>
> PS: all systems is dual-cpu.
Does this mean 2 cores or two sockets? We've seen a similar hang with
the following CPU:
CPU: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (2700.06-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x306e4 Family=0x6 Model=0x3e
Stepping=4
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x7fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x1<LAHF>
Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
XSAVE Features=0x1<XSAVEOPT>
VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
TSC: P-state invariant, performance statistics
real memory = 274877906944 (262144 MB)
avail memory = 267146330112 (254770 MB)
12 cores x 2 SMT x 1 socket
Warner