Hi List, We're currently seeing a kernel BUG in conjunction with CTDB. This is on hosts running latest Centos 7.2 with Samba/CTDB from standard repos. Underlying FS is MooseFS backed by ZFS. Every so often, especially when CIFS activity is high (eg 100+ users loading or saving Windows profiles, we see a BUG in our logs: Jun 20 16:31:13 metamora kernel: BUG: Bad page state in process ctdb_recovered pfn:8c8e00 Jun 20 16:31:13 metamora kernel: page:ffffea0023238000 count:0 mapcount:0 mapping: (null) index:0x7ff509800 Jun 20 16:31:13 metamora kernel: page flags: 0x2fffff00184008(uptodate|head|swapbacked|unevictable) Jun 20 16:31:13 metamora kernel: page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set Jun 20 16:31:13 metamora kernel: bad because of flags: Jun 20 16:31:13 metamora kernel: page flags: 0x100000(unevictable) Jun 20 16:31:13 metamora kernel: Modules linked in: iptable_filter binfmt_misc fuse bonding iTCO_wdt iTCO_vendor_support mxm_wmi intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core mei_me mei ses enclosure sg ioatdma shpchp ipmi_ssif ipmi_si ipmi_msghandler wmi acpi_pad acpi_power_meter ip_tables xfs libcrc32c raid1 sd_mod crc_t10dif crct10dif_generic ast syscopyarea crct10dif_pclmul sysfillrect crct10dif_common sysimgblt crc32c_intel drm_kms_helper ttm drm ixgbe ahci igb libahci mpt3sas mdio libata ptp i2c_algo_bit raid_class pps_core i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate Jun 20 16:31:13 metamora kernel: Jun 20 16:31:13 metamora kernel: CPU: 23 PID: 13830 Comm: ctdb_recovered Tainted: P B OE ------------ 3.10.0-327.13.1.el7.x86_64 #1 Jun 20 16:31:13 metamora kernel: Hardware name: Supermicro X10DRi/X10DRi, BIOS 2.0 12/28/2015 Jun 20 16:31:13 metamora kernel: ffffea0023238000 00000000888bd1fd ffff883d6a677c50 ffffffff8163571c Jun 20 16:31:13 metamora kernel: ffff883d6a677c78 ffffffff81630935 ffffea0023238000 0000000000000000 Jun 20 16:31:13 metamora kernel: 000fffff00000000 ffff883d6a677cc0 ffffffff811711ad fff00000fe000000 Jun 20 16:31:13 metamora kernel: Call Trace: Jun 20 16:31:13 metamora kernel: [<ffffffff8163571c>] dump_stack+0x19/0x1b Jun 20 16:31:13 metamora kernel: [<ffffffff81630935>] bad_page.part.59+0xdf/0xfc Jun 20 16:31:13 metamora kernel: [<ffffffff811711ad>] free_pages_prepare+0x16d/0x190 Jun 20 16:31:13 metamora kernel: [<ffffffff81171589>] free_compound_page+0x29/0x40 Jun 20 16:31:13 metamora kernel: [<ffffffff81176ddf>] __put_compound_page+0x1f/0x30 Jun 20 16:31:13 metamora kernel: [<ffffffff81176e51>] put_compound_page+0x31/0x170 Jun 20 16:31:13 metamora kernel: [<ffffffff81176fbc>] put_page+0x2c/0x40 Jun 20 16:31:13 metamora kernel: [<ffffffff811c8737>] migrate_misplaced_transhuge_page+0x5a7/0x6c0 Jun 20 16:31:13 metamora kernel: [<ffffffff811cadc2>] do_huge_pmd_numa_page+0x1d2/0x340 Jun 20 16:31:13 metamora kernel: [<ffffffff811971b4>] handle_mm_fault+0x5e4/0xf50 Jun 20 16:31:13 metamora kernel: [<ffffffff81230b84>] ? locks_free_lock+0x64/0x70 Jun 20 16:31:13 metamora kernel: [<ffffffff81233636>] ? fcntl_setlk+0x66/0x310 Jun 20 16:31:13 metamora kernel: [<ffffffff816413c0>] __do_page_fault+0x150/0x450 Jun 20 16:31:13 metamora kernel: [<ffffffff816416e3>] do_page_fault+0x23/0x80 Jun 20 16:31:13 metamora kernel: [<ffffffff8163d948>] page_fault+0x28/0x30 I am not sure if this is a real problem or is simply cosmetic. Any help or suggestions would be much appreciated. Software versions: Linux metamora.ifa.net 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Installed Packages Name : ctdb Arch : x86_64 Version : 4.2.10 Release : 6.el7_2 Size : 1.2 M Installed Packages Name : samba Arch : x86_64 Version : 4.2.10 Release : 6.el7_2 Size : 1.8 M Installed Packages Name : zfs Arch : x86_64 Version : 0.6.5.7 Release : 1.el7.centos Size : 800 k Installed Packages Name : moosefs-pro-client Arch : x86_64 Version : 3.0.78 Release : 1.rhsystemd Size : 595 k Thanks, Alex -- This message is intended only for the addressee and may contain confidential information. Unless you are that person, you may not disclose its contents or use it in any way and are requested to delete the message along with any attachments and notify us immediately. This email is not intended to, nor should it be taken to, constitute advice. The information provided is correct to our knowledge & belief and must not be used as a substitute for obtaining tax, regulatory, investment, legal or any other appropriate advice. "Transact" is operated by Integrated Financial Arrangements Ltd. 29 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608 5300. (Registered office: as above; Registered in England and Wales under number: 3727592). Authorised and regulated by the Financial Conduct Authority (entered on the Financial Services Register; no. 190856).
Hello! Sorry to say that, but this list lives in user space. If your kernel crashes, you should contact your CentOS support for help. Regards, Volker On Wed, Jun 22, 2016 at 09:57:43AM +0100, Alex Crow wrote:> Hi List, > > We're currently seeing a kernel BUG in conjunction with CTDB. > > This is on hosts running latest Centos 7.2 with Samba/CTDB from standard > repos. Underlying FS is MooseFS backed by ZFS. Every so often, especially > when CIFS activity is high (eg 100+ users loading or saving Windows > profiles, we see a BUG in our logs: > > Jun 20 16:31:13 metamora kernel: BUG: Bad page state in process > ctdb_recovered pfn:8c8e00 > Jun 20 16:31:13 metamora kernel: page:ffffea0023238000 count:0 mapcount:0 > mapping: (null) index:0x7ff509800 > Jun 20 16:31:13 metamora kernel: page flags: > 0x2fffff00184008(uptodate|head|swapbacked|unevictable) > Jun 20 16:31:13 metamora kernel: page dumped because: > PAGE_FLAGS_CHECK_AT_FREE flag(s) set > Jun 20 16:31:13 metamora kernel: bad because of flags: > Jun 20 16:31:13 metamora kernel: page flags: 0x100000(unevictable) > Jun 20 16:31:13 metamora kernel: Modules linked in: iptable_filter > binfmt_misc fuse bonding iTCO_wdt iTCO_vendor_support mxm_wmi > intel_powerclamp coretemp intel_rapl kvm_intel kvm crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core mei_me mei ses enclosure > sg ioatdma shpchp ipmi_ssif ipmi_si ipmi_msghandler wmi acpi_pad > acpi_power_meter ip_tables xfs libcrc32c raid1 sd_mod crc_t10dif > crct10dif_generic ast syscopyarea crct10dif_pclmul sysfillrect > crct10dif_common sysimgblt crc32c_intel drm_kms_helper ttm drm ixgbe ahci > igb libahci mpt3sas mdio libata ptp i2c_algo_bit raid_class pps_core > i2c_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod > zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) > zlib_deflate > Jun 20 16:31:13 metamora kernel: > Jun 20 16:31:13 metamora kernel: CPU: 23 PID: 13830 Comm: ctdb_recovered > Tainted: P B OE ------------ 3.10.0-327.13.1.el7.x86_64 #1 > Jun 20 16:31:13 metamora kernel: Hardware name: Supermicro X10DRi/X10DRi, > BIOS 2.0 12/28/2015 > Jun 20 16:31:13 metamora kernel: ffffea0023238000 00000000888bd1fd > ffff883d6a677c50 ffffffff8163571c > Jun 20 16:31:13 metamora kernel: ffff883d6a677c78 ffffffff81630935 > ffffea0023238000 0000000000000000 > Jun 20 16:31:13 metamora kernel: 000fffff00000000 ffff883d6a677cc0 > ffffffff811711ad fff00000fe000000 > Jun 20 16:31:13 metamora kernel: Call Trace: > Jun 20 16:31:13 metamora kernel: [<ffffffff8163571c>] dump_stack+0x19/0x1b > Jun 20 16:31:13 metamora kernel: [<ffffffff81630935>] > bad_page.part.59+0xdf/0xfc > Jun 20 16:31:13 metamora kernel: [<ffffffff811711ad>] > free_pages_prepare+0x16d/0x190 > Jun 20 16:31:13 metamora kernel: [<ffffffff81171589>] > free_compound_page+0x29/0x40 > Jun 20 16:31:13 metamora kernel: [<ffffffff81176ddf>] > __put_compound_page+0x1f/0x30 > Jun 20 16:31:13 metamora kernel: [<ffffffff81176e51>] > put_compound_page+0x31/0x170 > Jun 20 16:31:13 metamora kernel: [<ffffffff81176fbc>] put_page+0x2c/0x40 > Jun 20 16:31:13 metamora kernel: [<ffffffff811c8737>] > migrate_misplaced_transhuge_page+0x5a7/0x6c0 > Jun 20 16:31:13 metamora kernel: [<ffffffff811cadc2>] > do_huge_pmd_numa_page+0x1d2/0x340 > Jun 20 16:31:13 metamora kernel: [<ffffffff811971b4>] > handle_mm_fault+0x5e4/0xf50 > Jun 20 16:31:13 metamora kernel: [<ffffffff81230b84>] ? > locks_free_lock+0x64/0x70 > Jun 20 16:31:13 metamora kernel: [<ffffffff81233636>] ? > fcntl_setlk+0x66/0x310 > Jun 20 16:31:13 metamora kernel: [<ffffffff816413c0>] > __do_page_fault+0x150/0x450 > Jun 20 16:31:13 metamora kernel: [<ffffffff816416e3>] > do_page_fault+0x23/0x80 > Jun 20 16:31:13 metamora kernel: [<ffffffff8163d948>] page_fault+0x28/0x30 > > I am not sure if this is a real problem or is simply cosmetic. > > Any help or suggestions would be much appreciated. > > Software versions: > > Linux metamora.ifa.net 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 > UTC 2016 x86_64 x86_64 x86_64 GNU/Linux > > Installed Packages > Name : ctdb > Arch : x86_64 > Version : 4.2.10 > Release : 6.el7_2 > Size : 1.2 M > > Installed Packages > Name : samba > Arch : x86_64 > Version : 4.2.10 > Release : 6.el7_2 > Size : 1.8 M > > Installed Packages > Name : zfs > Arch : x86_64 > Version : 0.6.5.7 > Release : 1.el7.centos > Size : 800 k > > Installed Packages > Name : moosefs-pro-client > Arch : x86_64 > Version : 3.0.78 > Release : 1.rhsystemd > Size : 595 k > > Thanks, > > Alex > -- > This message is intended only for the addressee and may contain > confidential information. Unless you are that person, you may not > disclose its contents or use it in any way and are requested to delete > the message along with any attachments and notify us immediately. > This email is not intended to, nor should it be taken to, constitute advice. > The information provided is correct to our knowledge & belief and must not > be used as a substitute for obtaining tax, regulatory, investment, legal or > any other appropriate advice. > > "Transact" is operated by Integrated Financial Arrangements Ltd. > 29 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608 5300. > (Registered office: as above; Registered in England and Wales under > number: 3727592). Authorised and regulated by the Financial Conduct > Authority (entered on the Financial Services Register; no. 190856). > > -- > To unsubscribe from this list go to the following URL and read the > instructions: https://lists.samba.org/mailman/options/samba--
On 22/06/16 10:48, Volker Lendecke wrote:> Hello! > > Sorry to say that, but this list lives in user space. If your kernel > crashes, you should contact your CentOS support for help. > > Regards, Volker > >Hi, Thanks, just found it odd that it was only that process having problems. Alex -- This message is intended only for the addressee and may contain confidential information. Unless you are that person, you may not disclose its contents or use it in any way and are requested to delete the message along with any attachments and notify us immediately. This email is not intended to, nor should it be taken to, constitute advice. The information provided is correct to our knowledge & belief and must not be used as a substitute for obtaining tax, regulatory, investment, legal or any other appropriate advice. "Transact" is operated by Integrated Financial Arrangements Ltd. 29 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608 5300. (Registered office: as above; Registered in England and Wales under number: 3727592). Authorised and regulated by the Financial Conduct Authority (entered on the Financial Services Register; no. 190856).