On Thu, 20 Jul 2017, Kevin Stange wrote:> On 07/20/2017 05:31 AM, Piotr Gackiewicz wrote: >> On Wed, 19 Jul 2017, Johnny Hughes wrote: >> >>> On 07/19/2017 09:23 AM, Johnny Hughes wrote: >>>> On 07/19/2017 04:27 AM, Piotr Gackiewicz wrote: >>>>> On Mon, 17 Jul 2017, Johnny Hughes wrote: >>>>> >>>>>> Are the testing kernels (kernel-4.9.37-29.el7 and >>>>>> kernel-4.9.37-29.el6, >>>>>> with the one config file change) working for everyone: >>>>>> >>>>>> (turn off: CONFIG_IO_STRICT_DEVMEM) >>>>> >>>>> Hello. >>>>> Maybe it's not the most appropriate thread or time, but I have been >>>>> signalling it before: >>>>> >>>>> 4.9.* kernels do not work well for me any more (and for other people >>>>> neither, as I know). Last stable kernel was 4.9.13-22. >> >> I think I have nailed down the faulty combo. >> My tests showed, that SLUB allocator does not work well in Xen Dom0, on >> top of Xen Hypervisor. >> Id does not work at least on one of my testing servers (old AMD K8 (1 proc, >> 1 core), only 1 paravirt guest). >> If kernel with SLUB booted as main (w/o Xen hypervisor), it works well. >> If booted as Xen hypervisor module - it almost instantly gets page >> allocation failure. >> >> >> SLAB=>SLUB was changed in kernel config, starting from 4.9.25. Then >> problems >> started to explode in my production environment, and on testing server >> mentioned >> above. >> >> After recompiling recent 4.9.34 with SLAB - everything works well on >> that testing machine. >> A will try to test 4.9.38 with the same config on my production servers. > > I was having page allocation failures on 4.9.25 with SLUB, but these > problems seem to be gone with 4.9.34 (still with SLUB). Have you > checked this build? It was moved to the stable repo on July 4th.Yes, 4.9.34 was failing too. And this was actually the worst case, with I/O error on guest: Jul 16 06:01:03 dom0 kernel: [452360.743312] CPU: 0 PID: 28450 Comm: 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 Jul 16 06:01:03 guest kernel: end_request: I/O error, dev xvda3, sector 9200640 Jul 16 06:01:03 dom0 kernel: [452360.758931] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT) Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical block 1150080 Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3 Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical block 1150081 Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3 Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical block 1150082 Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3 Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical block 1150083 Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3 Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical block 1150084 Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3 Jul 16 06:01:03 dom0 kernel: [452361.449389] 12.xvda3-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) Jul 16 06:01:03 dom0 kernel: [452361.449685] CPU: 1 PID: 28450 Comm: 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 Jul 16 06:01:03 dom0 kernel: [452361.449934] Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.0c 02/25/2010 Jul 16 06:01:03 guest kernel: end_request: I/O error, dev xvda3, sector 6102784 Jul 16 06:01:03 dom0 kernel: [452361.462103] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT) Jul 16 06:01:03 dom0 kernel: [452361.676257] 12.xvda3-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) Jul 16 06:01:03 dom0 kernel: [452361.676531] CPU: 0 PID: 28450 Comm: 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 Jul 16 06:01:03 guest kernel: end_request: I/O error, dev xvda3, sector 6127872 Jul 16 06:01:03 dom0 kernel: [452361.692171] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT) Jul 16 06:01:07 dom0 kernel: [452365.438565] 12.xvda3-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) Jul 16 06:01:07 dom0 kernel: [452365.438870] CPU: 0 PID: 28450 Comm: 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 Jul 16 06:01:07 dom0 kernel: [452365.454213] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT) Jul 16 06:01:07 guest kernel: end_request: I/O error, dev xvda3, sector 6477112 Jul 16 06:01:09 dom0 kernel: [452366.732994] 12.xvda3-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) Jul 16 06:01:09 dom0 kernel: [452366.733306] CPU: 0 PID: 28450 Comm: 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 Jul 16 06:01:09 dom0 kernel: [452366.746362] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT) Jul 16 06:01:09 guest kernel: end_request: I/O error, dev xvda3, sector 6546488 Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical block 818311 Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3 Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical block 818312 Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3 Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical block 818313 Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3 Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical block 818314 Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3 Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical block 818315 Jul 16 06:01:09 dom0 kernel: [452366.913734] 12.xvda3-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) Jul 16 06:01:09 dom0 kernel: [452366.914002] CPU: 1 PID: 28450 Comm: 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 Jul 16 06:01:09 guest kernel: end_request: I/O error, dev xvda3, sector 6366208 Jul 16 06:01:09 dom0 kernel: [452366.929809] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT) Jul 16 06:01:09 dom0 kernel: [452367.288193] 12.xvda3-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) Jul 16 06:01:09 dom0 kernel: [452367.288455] CPU: 1 PID: 28450 Comm: 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 Jul 16 06:01:09 dom0 kernel: [452367.301690] SLUB: Unable to allocate memory on node -1, gfp=0x2000000(GFP_NOWAIT) Jul 16 06:01:09 guest kernel: end_request: I/O error, dev xvda3, sector 6630656 Jul 16 06:01:10 dom0 kernel: [452368.253435] 12.xvda3-0: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) Jul 16 06:01:10 dom0 kernel: [452368.253701] CPU: 0 PID: 28450 Comm: 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 Jul 16 06:01:10 guest kernel: end_request: I/O error, dev xvda3, sector 6708224 Regards, -- Piotr Gackiewicz Intertele S.A. - operator system?w ITL.PL i DOMENY.ITL.PL al. T. Rejtana 10, 35-310 Rzesz?w TEL: +48 17 8507580, FAX: +48 17 8520275 http://www.itl.pl - niezawodne us?ugi hostingowe http://domeny.itl.pl - tanie domeny internetowe http://www.intertele.pl
On 07/20/2017 03:14 PM, Piotr Gackiewicz wrote:> On Thu, 20 Jul 2017, Kevin Stange wrote: > >> On 07/20/2017 05:31 AM, Piotr Gackiewicz wrote: >>> On Wed, 19 Jul 2017, Johnny Hughes wrote: >>> >>>> On 07/19/2017 09:23 AM, Johnny Hughes wrote: >>>>> On 07/19/2017 04:27 AM, Piotr Gackiewicz wrote: >>>>>> On Mon, 17 Jul 2017, Johnny Hughes wrote: >>>>>> >>>>>>> Are the testing kernels (kernel-4.9.37-29.el7 and >>>>>>> kernel-4.9.37-29.el6, >>>>>>> with the one config file change) working for everyone: >>>>>>> >>>>>>> (turn off: CONFIG_IO_STRICT_DEVMEM) >>>>>> >>>>>> Hello. >>>>>> Maybe it's not the most appropriate thread or time, but I have been >>>>>> signalling it before: >>>>>> >>>>>> 4.9.* kernels do not work well for me any more (and for other people >>>>>> neither, as I know). Last stable kernel was 4.9.13-22. >>> >>> I think I have nailed down the faulty combo. >>> My tests showed, that SLUB allocator does not work well in Xen Dom0, on >>> top of Xen Hypervisor. >>> Id does not work at least on one of my testing servers (old AMD K8 (1 >>> proc, >>> 1 core), only 1 paravirt guest). >>> If kernel with SLUB booted as main (w/o Xen hypervisor), it works well. >>> If booted as Xen hypervisor module - it almost instantly gets page >>> allocation failure. >>> >>> >>> SLAB=>SLUB was changed in kernel config, starting from 4.9.25. Then >>> problems >>> started to explode in my production environment, and on testing server >>> mentioned >>> above. >>> >>> After recompiling recent 4.9.34 with SLAB - everything works well on >>> that testing machine. >>> A will try to test 4.9.38 with the same config on my production servers. >> >> I was having page allocation failures on 4.9.25 with SLUB, but these >> problems seem to be gone with 4.9.34 (still with SLUB). Have you >> checked this build? It was moved to the stable repo on July 4th. > > Yes, 4.9.34 was failing too. And this was actually the worst case, with > I/O error on guest: > > Jul 16 06:01:03 dom0 kernel: [452360.743312] CPU: 0 PID: 28450 Comm: > 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 > Jul 16 06:01:03 guest kernel: end_request: I/O error, dev xvda3, sector > 9200640 > Jul 16 06:01:03 dom0 kernel: [452360.758931] SLUB: Unable to allocate > memory on node -1, gfp=0x2000000(GFP_NOWAIT) > Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical > block 1150080 > Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3 > Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical > block 1150081 > Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3 > Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical > block 1150082 > Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3 > Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical > block 1150083 > Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3 > Jul 16 06:01:03 guest kernel: Buffer I/O error on device xvda3, logical > block 1150084 > Jul 16 06:01:03 guest kernel: lost page write due to I/O error on xvda3 > Jul 16 06:01:03 dom0 kernel: [452361.449389] 12.xvda3-0: page allocation > failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) > Jul 16 06:01:03 dom0 kernel: [452361.449685] CPU: 1 PID: 28450 Comm: > 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 > Jul 16 06:01:03 dom0 kernel: [452361.449934] Hardware name: Supermicro > X8SIL/X8SIL, BIOS 1.0c 02/25/2010 > Jul 16 06:01:03 guest kernel: end_request: I/O error, dev xvda3, sector > 6102784 > Jul 16 06:01:03 dom0 kernel: [452361.462103] SLUB: Unable to allocate > memory on node -1, gfp=0x2000000(GFP_NOWAIT) > Jul 16 06:01:03 dom0 kernel: [452361.676257] 12.xvda3-0: page allocation > failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) > Jul 16 06:01:03 dom0 kernel: [452361.676531] CPU: 0 PID: 28450 Comm: > 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 > Jul 16 06:01:03 guest kernel: end_request: I/O error, dev xvda3, sector > 6127872 > Jul 16 06:01:03 dom0 kernel: [452361.692171] SLUB: Unable to allocate > memory on node -1, gfp=0x2000000(GFP_NOWAIT) > Jul 16 06:01:07 dom0 kernel: [452365.438565] 12.xvda3-0: page allocation > failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) > Jul 16 06:01:07 dom0 kernel: [452365.438870] CPU: 0 PID: 28450 Comm: > 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 > Jul 16 06:01:07 dom0 kernel: [452365.454213] SLUB: Unable to allocate > memory on node -1, gfp=0x2000000(GFP_NOWAIT) > Jul 16 06:01:07 guest kernel: end_request: I/O error, dev xvda3, sector > 6477112 > Jul 16 06:01:09 dom0 kernel: [452366.732994] 12.xvda3-0: page allocation > failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) > Jul 16 06:01:09 dom0 kernel: [452366.733306] CPU: 0 PID: 28450 Comm: > 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 > Jul 16 06:01:09 dom0 kernel: [452366.746362] SLUB: Unable to allocate > memory on node -1, gfp=0x2000000(GFP_NOWAIT) > Jul 16 06:01:09 guest kernel: end_request: I/O error, dev xvda3, sector > 6546488 > Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical > block 818311 > Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3 > Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical > block 818312 > Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3 > Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical > block 818313 > Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3 > Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical > block 818314 > Jul 16 06:01:09 guest kernel: lost page write due to I/O error on xvda3 > Jul 16 06:01:09 guest kernel: Buffer I/O error on device xvda3, logical > block 818315 > Jul 16 06:01:09 dom0 kernel: [452366.913734] 12.xvda3-0: page allocation > failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) > Jul 16 06:01:09 dom0 kernel: [452366.914002] CPU: 1 PID: 28450 Comm: > 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 > Jul 16 06:01:09 guest kernel: end_request: I/O error, dev xvda3, sector > 6366208 > Jul 16 06:01:09 dom0 kernel: [452366.929809] SLUB: Unable to allocate > memory on node -1, gfp=0x2000000(GFP_NOWAIT) > Jul 16 06:01:09 dom0 kernel: [452367.288193] 12.xvda3-0: page allocation > failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) > Jul 16 06:01:09 dom0 kernel: [452367.288455] CPU: 1 PID: 28450 Comm: > 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 > Jul 16 06:01:09 dom0 kernel: [452367.301690] SLUB: Unable to allocate > memory on node -1, gfp=0x2000000(GFP_NOWAIT) > Jul 16 06:01:09 guest kernel: end_request: I/O error, dev xvda3, sector > 6630656 > Jul 16 06:01:10 dom0 kernel: [452368.253435] 12.xvda3-0: page allocation > failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK) > Jul 16 06:01:10 dom0 kernel: [452368.253701] CPU: 0 PID: 28450 Comm: > 12.xvda3-0 Tainted: G O 4.9.34-29.el6.x86_64 #1 > Jul 16 06:01:10 guest kernel: end_request: I/O error, dev xvda3, sector > 6708224 >I will happily create a test kernel with SLAB .. what is your config file diff? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20170721/33674e72/attachment-0002.sig>
On Fri, 21 Jul 2017, Johnny Hughes wrote:>> > > I will happily create a test kernel with SLAB .. what is your config > file diff?I have just choosed SLAB allocator in menuconfig. It has implied several other internal configurations changes. Overall differencess (config file patch) is in attachment. But my considerations about compiled in PATA etc., instead of modules, remain actual ;-). Regards, -- Piotr Gackiewicz Intertele S.A. - operator system?w ITL.PL i DOMENY.ITL.PL al. T. Rejtana 10, 35-310 Rzesz?w TEL: +48 17 8507580, FAX: +48 17 8520275 http://www.itl.pl - niezawodne us?ugi hostingowe http://domeny.itl.pl - tanie domeny internetowe http://www.intertele.pl -------------- next part -------------- --- SOURCES/config-x86_64 2017-07-21 12:54:16.550735676 +0200 +++ /home/gacek/rpm/SOURCES/config-x86_64 2017-07-19 23:50:35.000000000 +0200 @@ -1,6 +1,6 @@ # # Automatically generated file; DO NOT EDIT. -# Linux/x86 4.9.20 Kernel Configuration +# Linux/x86 4.9.38 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y @@ -243,13 +243,11 @@ CONFIG_PERF_EVENTS=y # CONFIG_DEBUG_PERF_USE_VMALLOC is not set CONFIG_VM_EVENT_COUNTERS=y -CONFIG_SLUB_DEBUG=y # CONFIG_COMPAT_BRK is not set -# CONFIG_SLAB is not set -CONFIG_SLUB=y +CONFIG_SLAB=y +# CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_SLAB_FREELIST_RANDOM=y -CONFIG_SLUB_CPU_PARTIAL=y # CONFIG_SYSTEM_DATA_VERIFICATION is not set CONFIG_PROFILING=y CONFIG_TRACEPOINTS=y @@ -290,7 +288,6 @@ CONFIG_HAVE_PERF_USER_STACK_DUMP=y CONFIG_HAVE_ARCH_JUMP_LABEL=y CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y -CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y CONFIG_HAVE_CMPXCHG_LOCAL=y CONFIG_HAVE_CMPXCHG_DOUBLE=y CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y @@ -7330,8 +7327,7 @@ # CONFIG_PAGE_POISONING is not set # CONFIG_DEBUG_PAGE_REF is not set # CONFIG_DEBUG_OBJECTS is not set -# CONFIG_SLUB_DEBUG_ON is not set -# CONFIG_SLUB_STATS is not set +# CONFIG_DEBUG_SLAB is not set CONFIG_HAVE_DEBUG_KMEMLEAK=y # CONFIG_DEBUG_KMEMLEAK is not set # CONFIG_DEBUG_STACK_USAGE is not set
On 07/20/2017 03:14 PM, Piotr Gackiewicz wrote:> On Thu, 20 Jul 2017, Kevin Stange wrote: > >> On 07/20/2017 05:31 AM, Piotr Gackiewicz wrote: >>> On Wed, 19 Jul 2017, Johnny Hughes wrote: >>> >>>> On 07/19/2017 09:23 AM, Johnny Hughes wrote: >>>>> On 07/19/2017 04:27 AM, Piotr Gackiewicz wrote: >>>>>> On Mon, 17 Jul 2017, Johnny Hughes wrote: >>>>>> >>>>>>> Are the testing kernels (kernel-4.9.37-29.el7 and >>>>>>> kernel-4.9.37-29.el6, >>>>>>> with the one config file change) working for everyone: >>>>>>> >>>>>>> (turn off: CONFIG_IO_STRICT_DEVMEM) >>>>>> >>>>>> Hello. >>>>>> Maybe it's not the most appropriate thread or time, but I have been >>>>>> signalling it before: >>>>>> >>>>>> 4.9.* kernels do not work well for me any more (and for other people >>>>>> neither, as I know). Last stable kernel was 4.9.13-22. >>> >>> I think I have nailed down the faulty combo. >>> My tests showed, that SLUB allocator does not work well in Xen Dom0, on >>> top of Xen Hypervisor. >>> Id does not work at least on one of my testing servers (old AMD K8 (1 >>> proc, >>> 1 core), only 1 paravirt guest). >>> If kernel with SLUB booted as main (w/o Xen hypervisor), it works well. >>> If booted as Xen hypervisor module - it almost instantly gets page >>> allocation failure. >>> >>> >>> SLAB=>SLUB was changed in kernel config, starting from 4.9.25. Then >>> problems >>> started to explode in my production environment, and on testing server >>> mentioned >>> above. >>> >>> After recompiling recent 4.9.34 with SLAB - everything works well on >>> that testing machine. >>> A will try to test 4.9.38 with the same config on my production servers. >> >> I was having page allocation failures on 4.9.25 with SLUB, but these >> problems seem to be gone with 4.9.34 (still with SLUB). Have you >> checked this build? It was moved to the stable repo on July 4th. > > Yes, 4.9.34 was failing too. And this was actually the worst case, with > I/O error on guest:I did find one server running 4.9.34 that was still throwing SLUB page allocation errors, but oddly, the only servers ever to have this issue for me are spares that are running no domains. I've just tried booting that box up on 4.9.39, but I may not know if the switch back to SLAB fixes anything for several weeks. Otherwise, the other server I'm running 4.9.39 on for the past 72 hours has been stable with running domains. -- Kevin Stange Chief Technology Officer Steadfast | Managed Infrastructure, Datacenter and Cloud Services 800 S Wells, Suite 190 | Chicago, IL 60607 312.602.2689 X203 | Fax: 312.602.2688 kevin at steadfast.net | www.steadfast.net
On 07/24/2017 03:05 PM, Kevin Stange wrote:> On 07/20/2017 03:14 PM, Piotr Gackiewicz wrote: >> On Thu, 20 Jul 2017, Kevin Stange wrote: >> >>> On 07/20/2017 05:31 AM, Piotr Gackiewicz wrote: >>>> On Wed, 19 Jul 2017, Johnny Hughes wrote: >>>> >>>>> On 07/19/2017 09:23 AM, Johnny Hughes wrote: >>>>>> On 07/19/2017 04:27 AM, Piotr Gackiewicz wrote: >>>>>>> On Mon, 17 Jul 2017, Johnny Hughes wrote: >>>>>>> >>>>>>>> Are the testing kernels (kernel-4.9.37-29.el7 and >>>>>>>> kernel-4.9.37-29.el6, >>>>>>>> with the one config file change) working for everyone: >>>>>>>> >>>>>>>> (turn off: CONFIG_IO_STRICT_DEVMEM) >>>>>>> >>>>>>> Hello. >>>>>>> Maybe it's not the most appropriate thread or time, but I have been >>>>>>> signalling it before: >>>>>>> >>>>>>> 4.9.* kernels do not work well for me any more (and for other people >>>>>>> neither, as I know). Last stable kernel was 4.9.13-22. >>>> >>>> I think I have nailed down the faulty combo. >>>> My tests showed, that SLUB allocator does not work well in Xen Dom0, on >>>> top of Xen Hypervisor. >>>> Id does not work at least on one of my testing servers (old AMD K8 (1 >>>> proc, >>>> 1 core), only 1 paravirt guest). >>>> If kernel with SLUB booted as main (w/o Xen hypervisor), it works well. >>>> If booted as Xen hypervisor module - it almost instantly gets page >>>> allocation failure. >>>> >>>> >>>> SLAB=>SLUB was changed in kernel config, starting from 4.9.25. Then >>>> problems >>>> started to explode in my production environment, and on testing server >>>> mentioned >>>> above. >>>> >>>> After recompiling recent 4.9.34 with SLAB - everything works well on >>>> that testing machine. >>>> A will try to test 4.9.38 with the same config on my production servers. >>> >>> I was having page allocation failures on 4.9.25 with SLUB, but these >>> problems seem to be gone with 4.9.34 (still with SLUB). Have you >>> checked this build? It was moved to the stable repo on July 4th. >> >> Yes, 4.9.34 was failing too. And this was actually the worst case, with >> I/O error on guest: > > I did find one server running 4.9.34 that was still throwing SLUB page > allocation errors, but oddly, the only servers ever to have this issue > for me are spares that are running no domains. I've just tried booting > that box up on 4.9.39, but I may not know if the switch back to SLAB > fixes anything for several weeks. > > Otherwise, the other server I'm running 4.9.39 on for the past 72 hours > has been stable with running domains. >Cool, We have several good reports .. I'll wait until Wednesday and push this kernel to "release" if we don't get any bad reports. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20170724/b29257dd/attachment-0002.sig>