stephen.rankin at stfc.ac.uk
2014-Mar-10 17:39 UTC
[CentOS] gfs2 and quotas - system crash
I have tried sending this before, but it did not appear to get through. Hello, When using gfs2 with quotas on a SAN that is providing storage to two clustered systems running CentOS6.5, one of the systems can crash. This crash appears to be caused when a user tries to add something to a SAN disk when they have exceeded their quota on that disk. Sometimes a stack trace is produced in /var/log/messages which appears to indicate that it was gfs2 that caused the problem. At the same time you get the gfs2 stack trace you also see problems with someone exceeding their quota. The stack trace is below. Has anyone got a solution to this, other than switching of quotas? I have switched of quotas which appears to have stabilised the system so far, but I do need the quotas on. Your help is appreciated. Stephen Rankin STFC, RAL, ISIS Mar 5 11:40:50 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355 Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_explode_dn(usi660) returned NULL: Success Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_result() failed: Invalid DN syntax Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] lookup of user usi660 failed: Invalid DN syntax Mar 5 11:41:46 chadwick kernel: ------------[ cut here ]------------ Mar 5 11:41:46 chadwick kernel: WARNING: at lib/list_debug.c:26 __list_add+0x6d/0xa0() (Not tainted) Mar 5 11:41:46 chadwick kernel: Hardware name: PowerEdge R910 Mar 5 11:41:46 chadwick kernel: list_add corruption. next->prev should be prev (ffff8820531518d0), but was ffff884d4c4594d0. (next=ffff884d4c4594d0). Mar 5 11:41:46 chadwick kernel: Modules linked in: gfs2 dlm configfs bridge autofs4 des_generic ecb md4 nls_utf8 cifs bnx2fc cnic uio fcoe libfcoe libfc 8021q garp stp llc ipv6 microcode power_meter iTCO_wdt iTCO_vendor_support dcdbas serio_raw ixgbe dca ptp pps_core mdio lpc_ich mfd_core sg ses enclosure i7core_edac edac_core bnx2 ext4 jbd2 mbcache dm_round_robin sr_mod cdrom sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Mar 5 11:41:46 chadwick kernel: Pid: 74823, comm: vncserver Not tainted 2.6.32-431.3.1.el6.x86_64 #1 Mar 5 11:41:46 chadwick kernel: Call Trace: Mar 5 11:41:46 chadwick kernel: [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0 Mar 5 11:41:46 chadwick kernel: [<ffffffff81071f16>] ? warn_slowpath_fmt+0x46/0x50 Mar 5 11:41:46 chadwick kernel: [<ffffffff812944ed>] ? __list_add+0x6d/0xa0 Mar 5 11:41:46 chadwick kernel: [<ffffffff811a6c02>] ? new_inode+0x72/0xb0 Mar 5 11:41:46 chadwick kernel: [<ffffffffa03f45d5>] ? gfs2_create_inode+0x1b5/0x1150 [gfs2] Mar 5 11:41:46 chadwick kernel: [<ffffffffa03f3986>] ? gfs2_glock_nq_init+0x16/0x40 [gfs2] Mar 5 11:41:46 chadwick kernel: [<ffffffffa03ffc74>] ? gfs2_mkdir+0x24/0x30 [gfs2] Mar 5 11:41:46 chadwick kernel: [<ffffffff8122766f>] ? security_inode_mkdir+0x1f/0x30 Mar 5 11:41:46 chadwick kernel: [<ffffffff81198149>] ? vfs_mkdir+0xd9/0x140 Mar 5 11:41:46 chadwick kernel: [<ffffffff8119ab67>] ? sys_mkdirat+0xc7/0x1b0 Mar 5 11:41:46 chadwick kernel: [<ffffffff8119ac68>] ? sys_mkdir+0x18/0x20 Mar 5 11:41:46 chadwick kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Mar 5 11:41:46 chadwick kernel: ---[ end trace e51734a39976a028 ]--- Mar 5 11:41:46 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355 Mar 5 11:41:47 chadwick abrtd: Directory 'oops-2014-03-05-11:41:47-12194-1' creation detected Mar 5 11:41:47 chadwick abrt-dump-oops: Reported 1 kernel oopses to Abrt Mar 5 11:41:47 chadwick abrtd: Can't open file '/var/spool/abrt/oops-2014-03-05-11:41:47-12194-1/uid': No such file or directory Mar 5 11:41:54 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355 -- Scanned by iCritical.
I've not seen this before, but I am sure the cluster folks would be interested to see it. Can you repost/cross-post this to the linux-cluster mailing list? https://www.redhat.com/mailman/listinfo/linux-cluster digimer On 10/03/14 01:39 PM, stephen.rankin at stfc.ac.uk wrote:> I have tried sending this before, but it did not appear to get through. > > > > Hello, > > > > When using gfs2 with quotas on a SAN that is providing storage to two > clustered systems running CentOS6.5, one of the systems > can crash. This crash appears to be caused when a user tries > to add something to a SAN disk when they have exceeded their > quota on that disk. Sometimes a stack trace is produced in /var/log/messages > which appears to indicate that it was gfs2 that caused the problem. > At the same time you get the gfs2 stack trace you also see problems > with someone exceeding their quota. > > The stack trace is below. > > Has anyone got a solution to this, other than switching of quotas? I have > switched of quotas which appears to have stabilised the system so far, but I > do need the quotas on. > > Your help is appreciated. > > Stephen Rankin > STFC, RAL, ISIS > > Mar 5 11:40:50 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355 > Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_explode_dn(usi660) returned NULL: Success > Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_result() failed: Invalid DN syntax > Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] lookup of user usi660 failed: Invalid DN syntax > Mar 5 11:41:46 chadwick kernel: ------------[ cut here ]------------ > Mar 5 11:41:46 chadwick kernel: WARNING: at lib/list_debug.c:26 __list_add+0x6d/0xa0() (Not tainted) > Mar 5 11:41:46 chadwick kernel: Hardware name: PowerEdge R910 > Mar 5 11:41:46 chadwick kernel: list_add corruption. next->prev should be prev (ffff8820531518d0), but was ffff884d4c4594d0. (next=ffff884d4c4594d0). > Mar 5 11:41:46 chadwick kernel: Modules linked in: gfs2 dlm configfs bridge autofs4 des_generic ecb md4 nls_utf8 cifs bnx2fc cnic uio fcoe libfcoe libfc 8021q garp stp llc ipv6 microcode power_meter iTCO_wdt iTCO_vendor_support dcdbas serio_raw ixgbe dca ptp pps_core mdio lpc_ich mfd_core sg ses enclosure i7core_edac edac_core bnx2 ext4 jbd2 mbcache dm_round_robin sr_mod cdrom sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] > Mar 5 11:41:46 chadwick kernel: Pid: 74823, comm: vncserver Not tainted 2.6.32-431.3.1.el6.x86_64 #1 > Mar 5 11:41:46 chadwick kernel: Call Trace: > Mar 5 11:41:46 chadwick kernel: [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0 > Mar 5 11:41:46 chadwick kernel: [<ffffffff81071f16>] ? warn_slowpath_fmt+0x46/0x50 > Mar 5 11:41:46 chadwick kernel: [<ffffffff812944ed>] ? __list_add+0x6d/0xa0 > Mar 5 11:41:46 chadwick kernel: [<ffffffff811a6c02>] ? new_inode+0x72/0xb0 > Mar 5 11:41:46 chadwick kernel: [<ffffffffa03f45d5>] ? gfs2_create_inode+0x1b5/0x1150 [gfs2] > Mar 5 11:41:46 chadwick kernel: [<ffffffffa03f3986>] ? gfs2_glock_nq_init+0x16/0x40 [gfs2] > Mar 5 11:41:46 chadwick kernel: [<ffffffffa03ffc74>] ? gfs2_mkdir+0x24/0x30 [gfs2] > Mar 5 11:41:46 chadwick kernel: [<ffffffff8122766f>] ? security_inode_mkdir+0x1f/0x30 > Mar 5 11:41:46 chadwick kernel: [<ffffffff81198149>] ? vfs_mkdir+0xd9/0x140 > Mar 5 11:41:46 chadwick kernel: [<ffffffff8119ab67>] ? sys_mkdirat+0xc7/0x1b0 > Mar 5 11:41:46 chadwick kernel: [<ffffffff8119ac68>] ? sys_mkdir+0x18/0x20 > Mar 5 11:41:46 chadwick kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b > Mar 5 11:41:46 chadwick kernel: ---[ end trace e51734a39976a028 ]--- > Mar 5 11:41:46 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355 > Mar 5 11:41:47 chadwick abrtd: Directory 'oops-2014-03-05-11:41:47-12194-1' creation detected > Mar 5 11:41:47 chadwick abrt-dump-oops: Reported 1 kernel oopses to Abrt > Mar 5 11:41:47 chadwick abrtd: Can't open file '/var/spool/abrt/oops-2014-03-05-11:41:47-12194-1/uid': No such file or directory > Mar 5 11:41:54 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355 > > > >-- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
Apparently Analagous Threads
- hme0 interface going up/down (dhclient ?)
- top not restoring terminal echo/icanon correctly
- [LLVMdev] ComplexPattern
- Intermittent <MRxSMB; W50:> "Delayed Write Failed" when writing to Office 2007's "Recent Files" index.dat on a network share
- SATA devices not added/probed from ICH7 sata300 controller, FreeBSD7.0, 7.1beta, 8.0 Daily