Current kernel, and I just booted, and dmesg shows, of the 32 cores, 0, 2, 4 and 6 ok, and *all* other show "is now offline. What's happening here? mark
m.roth at 5-cent.us
2018-Jun-13 14:00 UTC
[CentOS] C 7: smpboot: CPU 16 is now offline, and slabs...
m.roth at 5-cent.us wrote:> Current kernel, and I just booted, and dmesg shows, of the 32 cores, 0, 2, > 4 and 6 ok, and *all* other show "is now offline. > > What's happening here? >A followup: I also find a core in /var/spool/abrt, and "reason" is kernel BUG at mm/slub.c:3601! In googling, I see threads about incorrect calculation of slabs. Following one thread, I find cat /sys/kernel/slab/:t-0000048/cpu_slabs gives me 4 N0=4 Meanwhile, slabtop shows Active / Total Slabs (% used) : 25927 / 25927 (100.0%) Which changes, but just varying around that number, and st 100%. So: should I increase the number of slabs, using the kernel parm of swiotlb, and if so, for what I show above, should I set it to, say, 32000? mark
m.roth at 5-cent.us
2018-Jun-13 15:10 UTC
[CentOS] C 7: smpboot: CPU 16 is now offline, and slabs...
m.roth at 5-cent.us wrote:> m.roth at 5-cent.us wrote: >> Current kernel, and I just booted, and dmesg shows, of the 32 cores, 0, >> 2, 4 and 6 ok, and *all* other show "is now offline. >> >> What's happening here?<snip> Ok, more info. I found how to online a CPU - echo 1 > /sys/devices/system/cpu/cpu23/online Perhaps I should have started with 1,3, etc, but I was doing the 20's, instead. Got to CPU27... and the system rebooted. Now I'm wondering if the offline'd CPUs have something to do with the fact that this (and an identical one, in the datacenter, are rebooting around 04:00 every day. Btw, they're Dell PE R530's from 2016.... mark
Possibly Parallel Threads
- C 7: smpboot: CPU 16 is now offline, and slabs...
- C 7: smpboot: CPU 16 is now offline, and slabs...
- OCFS 1.2.4 memory problems still?
- Slow concurrent actions on the same LVM logical volume
- [PATCH v7 67/72] x86/smpboot: Load TSS and getcpu GDT entry before loading IDT