Alexandru Cardaniuc
2014-Jul-01  08:57 UTC
[CentOS] corruption of in-memory data detected (xfs)
Hi All, I am having an issue with an XFS filesystem shutting down under high load with very many small files. Basically, I have around 3.5 - 4 million files on this filesystem. New files are being written to the FS all the time, until I get to 9-11 mln small files (35k on average). at some point I get the following in dmesg: [2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.? Caller 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace: [2870477.695611]? [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643]? [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673]? [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707]? [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb [2870477.695726]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695736]? [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695764]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695784]? [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695791]? [<ffffffff80209f4c>] __d_lookup+0xb0/0xff [2870477.695803]? [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 [2870477.695814]? [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 [2870477.695829]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695837]? [<ffffffff802230e6>] __up_read+0x19/0x7f [2870477.695861]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 [2870477.695887]? [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 [2870477.695899]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 [2870477.695923]? [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 [2870477.695933]? [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 [2870477.695953]? [<ffffffff80260295>] tracesys+0x47/0xb6 [2870477.695963]? [<ffffffff802602f9>] tracesys+0xab/0xb6 [2870477.695977] [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file fs/xfs/xfs_trans.c.? Return address 0xffffffff88262c46 [2870477.696452] Filesystem "sda5": Corruption of in-memory data detected.? Shutting down filesystem: sda5 [2870477.696464] Please umount the filesystem, and rectify the problem(s) # ls -l /store ls: /store: Input/output error ?--------- 0 root root 0 Jan? 1? 1970 /store Filesystems is ~1T in size # df -hT /store Filesystem??? Type??? Size? Used Avail Use% Mounted on /dev/sda5????? xfs??? 910G? 142G? 769G? 16% /store Using CentOS 5.9 with kernel 2.6.18-348.el5xen The filesystem is in a virtual machine (Xen) and on top of LVM. Filesystem was created using mkfs.xfs defaults with xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS 5.x by default.) These are the defaults with which the filesystem was created: # xfs_info /store meta-data=/dev/sda5????????????? isize=256??? agcount=32, agsize=7454720 blks ???????? =?????????????????????? sectsz=512?? attr=0 data???? =?????????????????????? bsize=4096?? blocks=238551040, imaxpct=25 ???????? =?????????????????????? sunit=0????? swidth=0 blks, unwritten=1 naming?? =version 2????????????? bsize=4096 log????? =internal?????????????? bsize=4096?? blocks=32768, version=1 ???????? =?????????????????????? sectsz=512?? sunit=0 blks, lazy-count=0 realtime =none?????????????????? extsz=4096?? blocks=0, rtextents=0 The problem is reproducible and I don't think it's hardware related. The problem was reproduced on multiple servers of the same type. So, I doubt it's a memory issue or something like that. Is that a known issue? If it is then what's the fix? I went through the kernel updates for CentOS 5.10 (newer kernel), but didn't see any xfs related fixes since CentOS 5.9 Any help will be greatly appreciated... -- "If we really understand the problem, the answer will come out of it, because the answer is not separate from the problem." - Krishnamurti
James A. Peltier
2014-Jul-01  18:28 UTC
[CentOS] corruption of in-memory data detected (xfs)
----- Original Message ----- | | Hi All, | | I am having an issue with an XFS filesystem shutting down under high | load with very many small files. | Basically, I have around 3.5 - 4 million files on this filesystem. | New files are being written to the FS all the | time, until I get to 9-11 mln small files (35k on average). | | at some point I get the following in dmesg: | | [2870477.695512] Filesystem "sda5": XFS internal error | xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. | Caller 0xffffffff8826bb7d | [2870477.695558] | [2870477.695559] Call Trace: | [2870477.695611]? [<ffffffff88262c28>] | :xfs:xfs_trans_cancel+0x5b/0xfe | [2870477.695643]? [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 | [2870477.695673]? [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 | [2870477.695707]? [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb | [2870477.695726]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695736]? [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695764]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | [2870477.695776]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695784]? [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695791]? [<ffffffff80209f4c>] __d_lookup+0xb0/0xff | [2870477.695803]? [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 | [2870477.695814]? [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 | [2870477.695829]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695837]? [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695861]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | [2870477.695887]? [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 | [2870477.695899]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695923]? [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 | [2870477.695933]? [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 | [2870477.695953]? [<ffffffff80260295>] tracesys+0x47/0xb6 | [2870477.695963]? [<ffffffff802602f9>] tracesys+0xab/0xb6 | [2870477.695977] | [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 | of file fs/xfs/xfs_trans.c.? Return address | 0xffffffff88262c46 | [2870477.696452] Filesystem "sda5": Corruption of in-memory data | detected.? Shutting down filesystem: sda5 | [2870477.696464] Please umount the filesystem, and rectify the | problem(s) | | # ls -l /store | ls: /store: Input/output error | ?--------- 0 root root 0 Jan? 1? 1970 /store | | Filesystems is ~1T in size | # df -hT /store | Filesystem??? Type??? Size? Used Avail Use% Mounted on | /dev/sda5????? xfs??? 910G? 142G? 769G? 16% /store | | | Using CentOS 5.9 with kernel 2.6.18-348.el5xen | | | The filesystem is in a virtual machine (Xen) and on top of LVM. | | Filesystem was created using mkfs.xfs defaults with | xfsprogs-2.9.4-1.el5.centos (that's the one that comes with | CentOS 5.x by default.) | | These are the defaults with which the filesystem was created: | # xfs_info /store | meta-data=/dev/sda5????????????? isize=256??? agcount=32, | agsize=7454720 blks | ???????? =?????????????????????? sectsz=512?? attr=0 | data???? =?????????????????????? bsize=4096?? blocks=238551040, | imaxpct=25 | ???????? =?????????????????????? sunit=0????? swidth=0 blks, | ???????? unwritten=1 | naming?? =version 2????????????? bsize=4096 | log????? =internal?????????????? bsize=4096?? blocks=32768, version=1 | ???????? =?????????????????????? sectsz=512?? sunit=0 blks, | ???????? lazy-count=0 | realtime =none?????????????????? extsz=4096?? blocks=0, rtextents=0 | | The problem is reproducible and I don't think it's hardware related. | The problem was reproduced on multiple | servers of the same type. So, I doubt it's a memory issue or | something like that. | | Is that a known issue? If it is then what's the fix? I went through | the kernel updates for CentOS 5.10 (newer | kernel), but didn't see any xfs related fixes since CentOS 5.9 | | Any help will be greatly appreciated... | | | -- | "If we really understand the problem, the answer will come out of it, | because the answer is not separate from the problem." | - Krishnamurti Is this filesystem mounted with the inode64 option? -- James A. Peltier Manager, IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpeltier at sfu.ca Website : http://www.sfu.ca/itservices To be original seek your inspiration from unexpected sources.
James A. Peltier
2014-Jul-01  18:32 UTC
[CentOS] corruption of in-memory data detected (xfs)
----- Original Message ----- | | Hi All, | | I am having an issue with an XFS filesystem shutting down under high | load with very many small files. | Basically, I have around 3.5 - 4 million files on this filesystem. | New files are being written to the FS all the | time, until I get to 9-11 mln small files (35k on average). | | at some point I get the following in dmesg: | | [2870477.695512] Filesystem "sda5": XFS internal error | xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. | Caller 0xffffffff8826bb7d | [2870477.695558] | [2870477.695559] Call Trace: | [2870477.695611]? [<ffffffff88262c28>] | :xfs:xfs_trans_cancel+0x5b/0xfe | [2870477.695643]? [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 | [2870477.695673]? [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 | [2870477.695707]? [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb | [2870477.695726]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695736]? [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695764]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | [2870477.695776]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695784]? [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695791]? [<ffffffff80209f4c>] __d_lookup+0xb0/0xff | [2870477.695803]? [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 | [2870477.695814]? [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 | [2870477.695829]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695837]? [<ffffffff802230e6>] __up_read+0x19/0x7f | [2870477.695861]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 | [2870477.695887]? [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 | [2870477.695899]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 | [2870477.695923]? [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 | [2870477.695933]? [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 | [2870477.695953]? [<ffffffff80260295>] tracesys+0x47/0xb6 | [2870477.695963]? [<ffffffff802602f9>] tracesys+0xab/0xb6 | [2870477.695977] | [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 | of file fs/xfs/xfs_trans.c.? Return address | 0xffffffff88262c46 | [2870477.696452] Filesystem "sda5": Corruption of in-memory data | detected.? Shutting down filesystem: sda5 | [2870477.696464] Please umount the filesystem, and rectify the | problem(s) | | # ls -l /store | ls: /store: Input/output error | ?--------- 0 root root 0 Jan? 1? 1970 /store | | Filesystems is ~1T in size | # df -hT /store | Filesystem??? Type??? Size? Used Avail Use% Mounted on | /dev/sda5????? xfs??? 910G? 142G? 769G? 16% /store | | | Using CentOS 5.9 with kernel 2.6.18-348.el5xen | | | The filesystem is in a virtual machine (Xen) and on top of LVM. | | Filesystem was created using mkfs.xfs defaults with | xfsprogs-2.9.4-1.el5.centos (that's the one that comes with | CentOS 5.x by default.) | | These are the defaults with which the filesystem was created: | # xfs_info /store | meta-data=/dev/sda5????????????? isize=256??? agcount=32, | agsize=7454720 blks | ???????? =?????????????????????? sectsz=512?? attr=0 | data???? =?????????????????????? bsize=4096?? blocks=238551040, | imaxpct=25 | ???????? =?????????????????????? sunit=0????? swidth=0 blks, | ???????? unwritten=1 | naming?? =version 2????????????? bsize=4096 | log????? =internal?????????????? bsize=4096?? blocks=32768, version=1 | ???????? =?????????????????????? sectsz=512?? sunit=0 blks, | ???????? lazy-count=0 | realtime =none?????????????????? extsz=4096?? blocks=0, rtextents=0 | | The problem is reproducible and I don't think it's hardware related. | The problem was reproduced on multiple | servers of the same type. So, I doubt it's a memory issue or | something like that. | | Is that a known issue? If it is then what's the fix? I went through | the kernel updates for CentOS 5.10 (newer | kernel), but didn't see any xfs related fixes since CentOS 5.9 | | Any help will be greatly appreciated... | | | -- | "If we really understand the problem, the answer will come out of it, | because the answer is not separate from the problem." | - Krishnamurti Sorry, further to this, most bugs related to XFS are related to kernel bugs. I can see that you're running an older kernel and just because you don't see the bugs listed in the errata doesn't mean the bugs haven't been found as part of the backport process -- James A. Peltier Manager, IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpeltier at sfu.ca Website : http://www.sfu.ca/itservices To be original seek your inspiration from unexpected sources.
Eliezer Croitoru
2014-Jul-02  02:22 UTC
[CentOS] corruption of in-memory data detected (xfs)
I had similar issue: A nfs server with XFS as the FS for backup of a very large system. I have a 2TB raid-1 volume and I started rsync the backup and then somewhere I got this issue. There were lots of files there and the system has 8GB of ram and CentOS 6.5 64bit. I didn't bother to look at the issue due to the fact that ReiserFS was just OK with it without any issues. I never new about the inode64 option, is it only on the mount options or also on the mkfs.xfs command? Also in a case I want to test it again what would be a recommendation to not crash the system when there is lot's of memory in use? Thanks, Eliezer On 07/01/2014 11:57 AM, Alexandru Cardaniuc wrote:> > Hi All, > > I am having an issue with an XFS filesystem shutting down under high load with very many small files. > Basically, I have around 3.5 - 4 million files on this filesystem. New files are being written to the FS all the > time, until I get to 9-11 mln small files (35k on average). > > at some point I get the following in dmesg: > > [2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. > Caller 0xffffffff8826bb7d > [2870477.695558] > [2870477.695559] Call Trace: > [2870477.695611] [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe > [2870477.695643] [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7 > [2870477.695673] [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2 > [2870477.695707] [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb > [2870477.695726] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 > [2870477.695736] [<ffffffff802230e6>] __up_read+0x19/0x7f > [2870477.695764] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 > [2870477.695776] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 > [2870477.695784] [<ffffffff802230e6>] __up_read+0x19/0x7f > [2870477.695791] [<ffffffff80209f4c>] __d_lookup+0xb0/0xff > [2870477.695803] [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57 > [2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89 > [2870477.695829] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 > [2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f > [2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79 > [2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46 > [2870477.695899] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14 > [2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152 > [2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4 > [2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6 > [2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6 > [2870477.695977] > [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file fs/xfs/xfs_trans.c. Return address > 0xffffffff88262c46 > [2870477.696452] Filesystem "sda5": Corruption of in-memory data detected. Shutting down filesystem: sda5 > [2870477.696464] Please umount the filesystem, and rectify the problem(s) > > # ls -l /store > ls: /store: Input/output error > ?--------- 0 root root 0 Jan 1 1970 /store > > Filesystems is ~1T in size > # df -hT /store > Filesystem Type Size Used Avail Use% Mounted on > /dev/sda5 xfs 910G 142G 769G 16% /store > > > Using CentOS 5.9 with kernel 2.6.18-348.el5xen > > > The filesystem is in a virtual machine (Xen) and on top of LVM. > > Filesystem was created using mkfs.xfs defaults with xfsprogs-2.9.4-1.el5.centos (that's the one that comes with > CentOS 5.x by default.) > > These are the defaults with which the filesystem was created: > # xfs_info /store > meta-data=/dev/sda5 isize=256 agcount=32, agsize=7454720 blks > = sectsz=512 attr=0 > data = bsize=4096 blocks=238551040, imaxpct=25 > = sunit=0 swidth=0 blks, unwritten=1 > naming =version 2 bsize=4096 > log =internal bsize=4096 blocks=32768, version=1 > = sectsz=512 sunit=0 blks, lazy-count=0 > realtime =none extsz=4096 blocks=0, rtextents=0 > > The problem is reproducible and I don't think it's hardware related. The problem was reproduced on multiple > servers of the same type. So, I doubt it's a memory issue or something like that. > > Is that a known issue? If it is then what's the fix? I went through the kernel updates for CentOS 5.10 (newer > kernel), but didn't see any xfs related fixes since CentOS 5.9 > > Any help will be greatly appreciated... > >