Hi all - After several months of worry-free operation, we received the following kernel messages about an xfs filesystem running under CentOS 6.6. The proximate causes appear to be "Internal error xfs_trans_cancel" and "Corruption of in-memory data detected. Shutting down filesystem". The filesystem is back up, mounted, appears to be working OK underlying a Splunk datastore. Does anyone have a suggestion on diagnosis or known problems? Many thanks.....Nick Geo Sep 18 20:35:15 gries kernel: XFS (dm-2): Internal error xfs_trans_cancel at line 1948 of file fs/xfs/xfs_trans.c. Caller 0xffffffffa01f1388 Sep 18 20:35:15 gries kernel: Sep 18 20:35:15 gries kernel: Pid: 24005, comm: splunkd Not tainted 2.6.32-504.8.1.el6.x86_64 #1 Sep 18 20:35:15 gries kernel: Call Trace: Sep 18 20:35:15 gries kernel: [<ffffffffa01d57bf>] ? xfs_error_report+0x3f/0x50 [xfs] Sep 18 20:35:15 gries kernel: [<ffffffffa01f1388>] ? xfs_rename+0x2d8/0x720 [xfs] Sep 18 20:35:15 gries kernel: [<ffffffffa01f2e55>] ? xfs_trans_cancel+0xf5/0x120 [xfs] Sep 18 20:35:15 gries kernel: [<ffffffffa01f1388>] ? xfs_rename+0x2d8/0x720 [xfs] Sep 18 20:35:15 gries kernel: [<ffffffff8114eef9>] ? __do_fault+0x469/0x530 Sep 18 20:35:15 gries kernel: [<ffffffffa02050d6>] ? xfs_vn_rename+0x66/0x70 [xfs] Sep 18 20:35:15 gries kernel: [<ffffffff8119d149>] ? vfs_rename+0x419/0x480 Sep 18 20:35:15 gries kernel: [<ffffffff8119fab9>] ? sys_renameat+0x309/0x3a0 Sep 18 20:35:15 gries kernel: [<ffffffff8128c295>] ? _atomic_dec_and_lock+0x55/0x80 Sep 18 20:35:15 gries kernel: [<ffffffff811b07e0>] ? mntput_no_expire+0x30/0x110 Sep 18 20:35:15 gries kernel: [<ffffffff810e5c87>] ? audit_syscall_entry+0x1d7/0x200 Sep 18 20:35:15 gries kernel: [<ffffffff8119fb6b>] ? sys_rename+0x1b/0x20 Sep 18 20:35:15 gries kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Sep 18 20:35:15 gries kernel: XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1949 of file fs/xfs/xfs_trans.c. Return address 0xffffffffa01f2e6e Sep 18 20:35:15 gries kernel: XFS (dm-2): Corruption of in-memory data detected. Shutting down filesystem Sep 18 20:35:15 gries kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s) Sep 18 20:35:27 gries kernel: XFS (dm-2): xfs_log_force: error 5 returned.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think you need to read this from the bottom up: "Corruption of in-memory data detected. Shutting down filesystem" so XFS calls xfs_do_force_shutdown to shut down the filesystem. The call comes from fs/xfs/xfs_trans.c which fails, and so reports "Internal error xfs_trans_cancel". In other words, I would look at the memory corruption first. This _could_ be a kernel problem, but I would suggest starting with an extended memory check, it smells to me of a failing chip. Just my 2d worth! Martin On 21/09/15 21:41, Nicholas Geovanis wrote:> Hi all - After several months of worry-free operation, we received > the following kernel messages about an xfs filesystem running under > CentOS 6.6. The proximate causes appear to be "Internal error > xfs_trans_cancel" and "Corruption of in-memory data detected. > Shutting down filesystem". The filesystem is back up, mounted, > appears to be working OK underlying a Splunk datastore. Does anyone > have a suggestion on diagnosis or known problems? Many > thanks.....Nick Geo > > Sep 18 20:35:15 gries kernel: XFS (dm-2): Internal error > xfs_trans_cancel at line 1948 of file fs/xfs/xfs_trans.c. Caller > 0xffffffffa01f1388 Sep 18 20:35:15 gries kernel: Sep 18 20:35:15 > gries kernel: Pid: 24005, comm: splunkd Not tainted > 2.6.32-504.8.1.el6.x86_64 #1 Sep 18 20:35:15 gries kernel: Call > Trace: Sep 18 20:35:15 gries kernel: [<ffffffffa01d57bf>] ? > xfs_error_report+0x3f/0x50 [xfs] Sep 18 20:35:15 gries kernel: > [<ffffffffa01f1388>] ? xfs_rename+0x2d8/0x720 [xfs] Sep 18 20:35:15 > gries kernel: [<ffffffffa01f2e55>] ? xfs_trans_cancel+0xf5/0x120 > [xfs] Sep 18 20:35:15 gries kernel: [<ffffffffa01f1388>] ? > xfs_rename+0x2d8/0x720 [xfs] Sep 18 20:35:15 gries kernel: > [<ffffffff8114eef9>] ? __do_fault+0x469/0x530 Sep 18 20:35:15 gries > kernel: [<ffffffffa02050d6>] ? xfs_vn_rename+0x66/0x70 [xfs] Sep 18 > 20:35:15 gries kernel: [<ffffffff8119d149>] ? > vfs_rename+0x419/0x480 Sep 18 20:35:15 gries kernel: > [<ffffffff8119fab9>] ? sys_renameat+0x309/0x3a0 Sep 18 20:35:15 > gries kernel: [<ffffffff8128c295>] ? > _atomic_dec_and_lock+0x55/0x80 Sep 18 20:35:15 gries kernel: > [<ffffffff811b07e0>] ? mntput_no_expire+0x30/0x110 Sep 18 20:35:15 > gries kernel: [<ffffffff810e5c87>] ? > audit_syscall_entry+0x1d7/0x200 Sep 18 20:35:15 gries kernel: > [<ffffffff8119fb6b>] ? sys_rename+0x1b/0x20 Sep 18 20:35:15 gries > kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Sep > 18 20:35:15 gries kernel: XFS (dm-2): xfs_do_force_shutdown(0x8) > called from line 1949 of file fs/xfs/xfs_trans.c. Return address > = 0xffffffffa01f2e6e Sep 18 20:35:15 gries kernel: XFS (dm-2): > Corruption of in-memory data detected. Shutting down filesystem > Sep 18 20:35:15 gries kernel: XFS (dm-2): Please umount the > filesystem and rectify the problem(s) Sep 18 20:35:27 gries kernel: > XFS (dm-2): xfs_log_force: error 5 returned. > _______________________________________________ CentOS mailing > list CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJWAHVZAAoJEAF3yXsqtyBlT7IQAM45t0n8I7aQ203LjBjSUx39 9O4xu8gTYb1XFdoM2DkzPAygKuiVYRiN3dgcMO6KP2mgT+MNK8G2043lY3v6w5wK HzgYQ0/GwyDkJiy5EqaG6JWRUDyF788BU3kiWLJUxclsTqXN9Aw9E58aiu2duNvj +e5WSflUbN1DdLep0LdGe0QR4QzsQBiFUhgt4i3EU6oYPQvS3dJyByPAOnD9t7+s dbJQ1i7fDmLpCaYGvon8DoDQSE8aA/ums94NJzkPYyIza/D5pBfFf6r3RH3Xrg85 6aYFfjIBXcEQgq4DyEccJviaJ5eOWMCLocvMni6oWKml3+u6PtEvnw6sqIWoKwiC xhyUVOXmF3qgH3xhx8pXMag0eO5hGm9ApGNckaXLy/j0AinCV9APvE9rAtYG94j+ IL0x9WCvtgduJvXZaSnekPaKKbT9MS1G+Zohi+WlY8u7PZlZdXzjyAgC8BPJQAyZ yNendFRl7WQB1rbWZQJJD4tlhlU/Nwpwy6BtHn/lbhiYlFaTP1ytS09vToGTJw0A BwX0+f4PnnYJV58X7WtEm1jhdsO/u+hykHqqmsq7ATsX9I6bkFTNwm13+Khf88zy ve4fLJ/JEtJi2nVwD6K9mEqTO+I1CiGhJnOnfrphPsLa0WSkBtjl+FWM0jYFTSwR TAavAlYHzW5/9BP0eNmL =UGN5 -----END PGP SIGNATURE-----
----- Original Message ----- | -----BEGIN PGP SIGNED MESSAGE----- | Hash: SHA1 | | I think you need to read this from the bottom up: | | "Corruption of in-memory data detected. Shutting down filesystem" | so XFS calls xfs_do_force_shutdown to shut down the filesystem. The | call comes from fs/xfs/xfs_trans.c which fails, and so reports | "Internal error xfs_trans_cancel". | | In other words, I would look at the memory corruption first. This | _could_ be a kernel problem, but I would suggest starting with an | extended memory check, it smells to me of a failing chip. | | Just my 2d worth! | | Martin | | On 21/09/15 21:41, Nicholas Geovanis wrote: | > Hi all - After several months of worry-free operation, we received | > the following kernel messages about an xfs filesystem running under | > CentOS 6.6. The proximate causes appear to be "Internal error | > xfs_trans_cancel" and "Corruption of in-memory data detected. | > Shutting down filesystem". The filesystem is back up, mounted, | > appears to be working OK underlying a Splunk datastore. Does anyone | > have a suggestion on diagnosis or known problems? Many | > thanks.....Nick Geo | > | > Sep 18 20:35:15 gries kernel: XFS (dm-2): Internal error | > xfs_trans_cancel at line 1948 of file fs/xfs/xfs_trans.c. Caller | > 0xffffffffa01f1388 Sep 18 20:35:15 gries kernel: Sep 18 20:35:15 | > gries kernel: Pid: 24005, comm: splunkd Not tainted | > 2.6.32-504.8.1.el6.x86_64 #1 Sep 18 20:35:15 gries kernel: Call | > Trace: Sep 18 20:35:15 gries kernel: [<ffffffffa01d57bf>] ? | > xfs_error_report+0x3f/0x50 [xfs] Sep 18 20:35:15 gries kernel: | > [<ffffffffa01f1388>] ? xfs_rename+0x2d8/0x720 [xfs] Sep 18 20:35:15 | > gries kernel: [<ffffffffa01f2e55>] ? xfs_trans_cancel+0xf5/0x120 | > [xfs] Sep 18 20:35:15 gries kernel: [<ffffffffa01f1388>] ? | > xfs_rename+0x2d8/0x720 [xfs] Sep 18 20:35:15 gries kernel: | > [<ffffffff8114eef9>] ? __do_fault+0x469/0x530 Sep 18 20:35:15 gries | > kernel: [<ffffffffa02050d6>] ? xfs_vn_rename+0x66/0x70 [xfs] Sep 18 | > 20:35:15 gries kernel: [<ffffffff8119d149>] ? | > vfs_rename+0x419/0x480 Sep 18 20:35:15 gries kernel: | > [<ffffffff8119fab9>] ? sys_renameat+0x309/0x3a0 Sep 18 20:35:15 | > gries kernel: [<ffffffff8128c295>] ? | > _atomic_dec_and_lock+0x55/0x80 Sep 18 20:35:15 gries kernel: | > [<ffffffff811b07e0>] ? mntput_no_expire+0x30/0x110 Sep 18 20:35:15 | > gries kernel: [<ffffffff810e5c87>] ? | > audit_syscall_entry+0x1d7/0x200 Sep 18 20:35:15 gries kernel: | > [<ffffffff8119fb6b>] ? sys_rename+0x1b/0x20 Sep 18 20:35:15 gries | > kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Sep | > 18 20:35:15 gries kernel: XFS (dm-2): xfs_do_force_shutdown(0x8) | > called from line 1949 of file fs/xfs/xfs_trans.c. Return address | > = 0xffffffffa01f2e6e Sep 18 20:35:15 gries kernel: XFS (dm-2): | > Corruption of in-memory data detected. Shutting down filesystem | > Sep 18 20:35:15 gries kernel: XFS (dm-2): Please umount the | > filesystem and rectify the problem(s) Sep 18 20:35:27 gries kernel: | > XFS (dm-2): xfs_log_force: error 5 returned. Do you have any XFS optimizations enabled in /etc/fstab such logbsize, nobarrier, etc? is the filesystem full? What percentage of the file system is available? Some optimizations will cause a similar type of error when there is insufficient space for the extent allocations to take place or for file system rebalances to happen. -- James A. Peltier IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 604-365-6432 Fax : 778-782-3045 E-Mail : jpeltier at sfu.ca Website : http://www.sfu.ca/itservices Twitter : @sfu_rcg Powering Engagement Through Technology