thr3ads.net - CentOS - [CentOS] corruption of in-memory data detected (xfs) [Jul 2014]

If this information is useful, please help other people find it:
Share via:

Alexandru Cardaniuc

2014-Jul-01 08:57 UTC

[CentOS] corruption of in-memory data detected (xfs)

Hi All,

I am having an issue with an XFS filesystem shutting down under high load with
very many small files.
Basically, I have around 3.5 - 4 million files on this filesystem. New files are
being written to the FS all the
time, until I get to 9-11 mln small files (35k on average).

at some point I get the following in dmesg:

[2870477.695512] Filesystem "sda5": XFS internal error
xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.?
Caller 0xffffffff8826bb7d
[2870477.695558]
[2870477.695559] Call Trace:
[2870477.695611]? [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe
[2870477.695643]? [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7
[2870477.695673]? [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2
[2870477.695707]? [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb
[2870477.695726]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695736]? [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695764]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
[2870477.695776]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695784]? [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695791]? [<ffffffff80209f4c>] __d_lookup+0xb0/0xff
[2870477.695803]? [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57
[2870477.695814]? [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89
[2870477.695829]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695837]? [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695861]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
[2870477.695887]? [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46
[2870477.695899]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695923]? [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152
[2870477.695933]? [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4
[2870477.695953]? [<ffffffff80260295>] tracesys+0x47/0xb6
[2870477.695963]? [<ffffffff802602f9>] tracesys+0xab/0xb6
[2870477.695977]
[2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file
fs/xfs/xfs_trans.c.? Return address 0xffffffff88262c46
[2870477.696452] Filesystem "sda5": Corruption of in-memory data
detected.? Shutting down filesystem: sda5
[2870477.696464] Please umount the filesystem, and rectify the problem(s)

# ls -l /store
ls: /store: Input/output error
?--------- 0 root root 0 Jan? 1? 1970 /store

Filesystems is ~1T in size
# df -hT /store
Filesystem??? Type??? Size? Used Avail Use% Mounted on
/dev/sda5????? xfs??? 910G? 142G? 769G? 16% /store


Using CentOS 5.9 with kernel 2.6.18-348.el5xen


The filesystem is in a virtual machine (Xen) and on top of LVM.

Filesystem was created using mkfs.xfs defaults with xfsprogs-2.9.4-1.el5.centos
(that's the one that comes with
CentOS 5.x by default.)

These are the defaults with which the filesystem was created:
# xfs_info /store
meta-data=/dev/sda5????????????? isize=256??? agcount=32, agsize=7454720 blks
???????? =?????????????????????? sectsz=512?? attr=0
data???? =?????????????????????? bsize=4096?? blocks=238551040, imaxpct=25
???????? =?????????????????????? sunit=0????? swidth=0 blks, unwritten=1
naming?? =version 2????????????? bsize=4096
log????? =internal?????????????? bsize=4096?? blocks=32768, version=1
???????? =?????????????????????? sectsz=512?? sunit=0 blks, lazy-count=0
realtime =none?????????????????? extsz=4096?? blocks=0, rtextents=0

The problem is reproducible and I don't think it's hardware related. The
problem was reproduced on multiple
servers of the same type. So, I doubt it's a memory issue or something like
that.

Is that a known issue? If it is then what's the fix? I went through the
kernel updates for CentOS 5.10 (newer
kernel), but didn't see any xfs related fixes since CentOS 5.9

Any help will be greatly appreciated...


-- 
"If we really understand the problem, the answer will come out of it,
because the answer is not separate from the problem."  
- Krishnamurti

James A. Peltier

2014-Jul-01 18:28 UTC

head link

[CentOS] corruption of in-memory data detected (xfs)

----- Original Message -----
| 
| Hi All,
| 
| I am having an issue with an XFS filesystem shutting down under high
| load with very many small files.
| Basically, I have around 3.5 - 4 million files on this filesystem.
| New files are being written to the FS all the
| time, until I get to 9-11 mln small files (35k on average).
| 
| at some point I get the following in dmesg:
| 
| [2870477.695512] Filesystem "sda5": XFS internal error
| xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.
| Caller 0xffffffff8826bb7d
| [2870477.695558]
| [2870477.695559] Call Trace:
| [2870477.695611]? [<ffffffff88262c28>]
| :xfs:xfs_trans_cancel+0x5b/0xfe
| [2870477.695643]? [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7
| [2870477.695673]? [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2
| [2870477.695707]? [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb
| [2870477.695726]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695736]? [<ffffffff802230e6>] __up_read+0x19/0x7f
| [2870477.695764]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
| [2870477.695776]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695784]? [<ffffffff802230e6>] __up_read+0x19/0x7f
| [2870477.695791]? [<ffffffff80209f4c>] __d_lookup+0xb0/0xff
| [2870477.695803]? [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57
| [2870477.695814]? [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89
| [2870477.695829]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695837]? [<ffffffff802230e6>] __up_read+0x19/0x7f
| [2870477.695861]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
| [2870477.695887]? [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46
| [2870477.695899]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695923]? [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152
| [2870477.695933]? [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4
| [2870477.695953]? [<ffffffff80260295>] tracesys+0x47/0xb6
| [2870477.695963]? [<ffffffff802602f9>] tracesys+0xab/0xb6
| [2870477.695977]
| [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139
| of file fs/xfs/xfs_trans.c.? Return address | 0xffffffff88262c46
| [2870477.696452] Filesystem "sda5": Corruption of in-memory data
| detected.? Shutting down filesystem: sda5
| [2870477.696464] Please umount the filesystem, and rectify the
| problem(s)
| 
| # ls -l /store
| ls: /store: Input/output error
| ?--------- 0 root root 0 Jan? 1? 1970 /store
| 
| Filesystems is ~1T in size
| # df -hT /store
| Filesystem??? Type??? Size? Used Avail Use% Mounted on
| /dev/sda5????? xfs??? 910G? 142G? 769G? 16% /store
| 
| 
| Using CentOS 5.9 with kernel 2.6.18-348.el5xen
| 
| 
| The filesystem is in a virtual machine (Xen) and on top of LVM.
| 
| Filesystem was created using mkfs.xfs defaults with
| xfsprogs-2.9.4-1.el5.centos (that's the one that comes with
| CentOS 5.x by default.)
| 
| These are the defaults with which the filesystem was created:
| # xfs_info /store
| meta-data=/dev/sda5????????????? isize=256??? agcount=32,
| agsize=7454720 blks
| ???????? =?????????????????????? sectsz=512?? attr=0
| data???? =?????????????????????? bsize=4096?? blocks=238551040,
| imaxpct=25
| ???????? =?????????????????????? sunit=0????? swidth=0 blks,
| ???????? unwritten=1
| naming?? =version 2????????????? bsize=4096
| log????? =internal?????????????? bsize=4096?? blocks=32768, version=1
| ???????? =?????????????????????? sectsz=512?? sunit=0 blks,
| ???????? lazy-count=0
| realtime =none?????????????????? extsz=4096?? blocks=0, rtextents=0
| 
| The problem is reproducible and I don't think it's hardware related.
| The problem was reproduced on multiple
| servers of the same type. So, I doubt it's a memory issue or
| something like that.
| 
| Is that a known issue? If it is then what's the fix? I went through
| the kernel updates for CentOS 5.10 (newer
| kernel), but didn't see any xfs related fixes since CentOS 5.9
| 
| Any help will be greatly appreciated...
| 
| 
| --
| "If we really understand the problem, the answer will come out of it,
| because the answer is not separate from the problem."
| - Krishnamurti

Is this filesystem mounted with the inode64 option?

-- 
James A. Peltier
Manager, IT Services - Research Computing Group
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
E-Mail  : jpeltier at sfu.ca
Website : http://www.sfu.ca/itservices

To be original seek your inspiration from unexpected sources.

James A. Peltier

2014-Jul-01 18:32 UTC

head link

[CentOS] corruption of in-memory data detected (xfs)

----- Original Message -----
| 
| Hi All,
| 
| I am having an issue with an XFS filesystem shutting down under high
| load with very many small files.
| Basically, I have around 3.5 - 4 million files on this filesystem.
| New files are being written to the FS all the
| time, until I get to 9-11 mln small files (35k on average).
| 
| at some point I get the following in dmesg:
| 
| [2870477.695512] Filesystem "sda5": XFS internal error
| xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.
| Caller 0xffffffff8826bb7d
| [2870477.695558]
| [2870477.695559] Call Trace:
| [2870477.695611]? [<ffffffff88262c28>]
| :xfs:xfs_trans_cancel+0x5b/0xfe
| [2870477.695643]? [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7
| [2870477.695673]? [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2
| [2870477.695707]? [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb
| [2870477.695726]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695736]? [<ffffffff802230e6>] __up_read+0x19/0x7f
| [2870477.695764]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
| [2870477.695776]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695784]? [<ffffffff802230e6>] __up_read+0x19/0x7f
| [2870477.695791]? [<ffffffff80209f4c>] __d_lookup+0xb0/0xff
| [2870477.695803]? [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57
| [2870477.695814]? [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89
| [2870477.695829]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695837]? [<ffffffff802230e6>] __up_read+0x19/0x7f
| [2870477.695861]? [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
| [2870477.695887]? [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46
| [2870477.695899]? [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
| [2870477.695923]? [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152
| [2870477.695933]? [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4
| [2870477.695953]? [<ffffffff80260295>] tracesys+0x47/0xb6
| [2870477.695963]? [<ffffffff802602f9>] tracesys+0xab/0xb6
| [2870477.695977]
| [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139
| of file fs/xfs/xfs_trans.c.? Return address | 0xffffffff88262c46
| [2870477.696452] Filesystem "sda5": Corruption of in-memory data
| detected.? Shutting down filesystem: sda5
| [2870477.696464] Please umount the filesystem, and rectify the
| problem(s)
| 
| # ls -l /store
| ls: /store: Input/output error
| ?--------- 0 root root 0 Jan? 1? 1970 /store
| 
| Filesystems is ~1T in size
| # df -hT /store
| Filesystem??? Type??? Size? Used Avail Use% Mounted on
| /dev/sda5????? xfs??? 910G? 142G? 769G? 16% /store
| 
| 
| Using CentOS 5.9 with kernel 2.6.18-348.el5xen
| 
| 
| The filesystem is in a virtual machine (Xen) and on top of LVM.
| 
| Filesystem was created using mkfs.xfs defaults with
| xfsprogs-2.9.4-1.el5.centos (that's the one that comes with
| CentOS 5.x by default.)
| 
| These are the defaults with which the filesystem was created:
| # xfs_info /store
| meta-data=/dev/sda5????????????? isize=256??? agcount=32,
| agsize=7454720 blks
| ???????? =?????????????????????? sectsz=512?? attr=0
| data???? =?????????????????????? bsize=4096?? blocks=238551040,
| imaxpct=25
| ???????? =?????????????????????? sunit=0????? swidth=0 blks,
| ???????? unwritten=1
| naming?? =version 2????????????? bsize=4096
| log????? =internal?????????????? bsize=4096?? blocks=32768, version=1
| ???????? =?????????????????????? sectsz=512?? sunit=0 blks,
| ???????? lazy-count=0
| realtime =none?????????????????? extsz=4096?? blocks=0, rtextents=0
| 
| The problem is reproducible and I don't think it's hardware related.
| The problem was reproduced on multiple
| servers of the same type. So, I doubt it's a memory issue or
| something like that.
| 
| Is that a known issue? If it is then what's the fix? I went through
| the kernel updates for CentOS 5.10 (newer
| kernel), but didn't see any xfs related fixes since CentOS 5.9
| 
| Any help will be greatly appreciated...
| 
| 
| --
| "If we really understand the problem, the answer will come out of it,
| because the answer is not separate from the problem."
| - Krishnamurti

Sorry, further to this, most bugs related to XFS are related to kernel bugs.  I
can see that you're running an older kernel and just because you don't
see the bugs listed in the errata doesn't mean the bugs haven't been
found as part of the backport process

-- 
James A. Peltier
Manager, IT Services - Research Computing Group
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
E-Mail  : jpeltier at sfu.ca
Website : http://www.sfu.ca/itservices

To be original seek your inspiration from unexpected sources.

Eliezer Croitoru

2014-Jul-02 02:22 UTC

head link

[CentOS] corruption of in-memory data detected (xfs)

I had similar issue:
A nfs server with XFS as the FS for backup of a very large system.
I have a 2TB raid-1 volume and I started rsync the backup and then 
somewhere I got this issue.
There were lots of files there and the system has 8GB of ram and CentOS 
6.5 64bit.
I didn't bother to look at the issue due to the fact that ReiserFS was 
just OK with it without any issues.

I never new about the inode64 option, is it only on the mount options or 
also on the mkfs.xfs command?

Also in a case I want to test it again what would be a recommendation to 
not crash the system when there is lot's of memory in use?

Thanks,
Eliezer

On 07/01/2014 11:57 AM, Alexandru Cardaniuc wrote:>
> Hi All,
>
> I am having an issue with an XFS filesystem shutting down under high load
with very many small files.
> Basically, I have around 3.5 - 4 million files on this filesystem. New
files are being written to the FS all the
> time, until I get to 9-11 mln small files (35k on average).
>
> at some point I get the following in dmesg:
>
> [2870477.695512] Filesystem "sda5": XFS internal error
xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.
> Caller 0xffffffff8826bb7d
> [2870477.695558]
> [2870477.695559] Call Trace:
> [2870477.695611]  [<ffffffff88262c28>]
:xfs:xfs_trans_cancel+0x5b/0xfe
> [2870477.695643]  [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7
> [2870477.695673]  [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2
> [2870477.695707]  [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb
> [2870477.695726]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695736]  [<ffffffff802230e6>] __up_read+0x19/0x7f
> [2870477.695764]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
> [2870477.695776]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695784]  [<ffffffff802230e6>] __up_read+0x19/0x7f
> [2870477.695791]  [<ffffffff80209f4c>] __d_lookup+0xb0/0xff
> [2870477.695803]  [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57
> [2870477.695814]  [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89
> [2870477.695829]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695837]  [<ffffffff802230e6>] __up_read+0x19/0x7f
> [2870477.695861]  [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
> [2870477.695887]  [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46
> [2870477.695899]  [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
> [2870477.695923]  [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152
> [2870477.695933]  [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4
> [2870477.695953]  [<ffffffff80260295>] tracesys+0x47/0xb6
> [2870477.695963]  [<ffffffff802602f9>] tracesys+0xab/0xb6
> [2870477.695977]
> [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file
fs/xfs/xfs_trans.c.  Return address > 0xffffffff88262c46
> [2870477.696452] Filesystem "sda5": Corruption of in-memory data
detected.  Shutting down filesystem: sda5
> [2870477.696464] Please umount the filesystem, and rectify the problem(s)
>
> # ls -l /store
> ls: /store: Input/output error
> ?--------- 0 root root 0 Jan  1  1970 /store
>
> Filesystems is ~1T in size
> # df -hT /store
> Filesystem    Type    Size  Used Avail Use% Mounted on
> /dev/sda5      xfs    910G  142G  769G  16% /store
>
>
> Using CentOS 5.9 with kernel 2.6.18-348.el5xen
>
>
> The filesystem is in a virtual machine (Xen) and on top of LVM.
>
> Filesystem was created using mkfs.xfs defaults with
xfsprogs-2.9.4-1.el5.centos (that's the one that comes with
> CentOS 5.x by default.)
>
> These are the defaults with which the filesystem was created:
> # xfs_info /store
> meta-data=/dev/sda5              isize=256    agcount=32, agsize=7454720
blks
>           =                       sectsz=512   attr=0
> data     =                       bsize=4096   blocks=238551040, imaxpct=25
>           =                       sunit=0      swidth=0 blks, unwritten=1
> naming   =version 2              bsize=4096
> log      =internal               bsize=4096   blocks=32768, version=1
>           =                       sectsz=512   sunit=0 blks, lazy-count=0
> realtime =none                   extsz=4096   blocks=0, rtextents=0
>
> The problem is reproducible and I don't think it's hardware
related. The problem was reproduced on multiple
> servers of the same type. So, I doubt it's a memory issue or something
like that.
>
> Is that a known issue? If it is then what's the fix? I went through the
kernel updates for CentOS 5.10 (newer
> kernel), but didn't see any xfs related fixes since CentOS 5.9
>
> Any help will be greatly appreciated...
>
>

Reasonably Related Threads

Search for more reasonably related threads

CentOS - Jul 2014 - corruption of in-memory data detected (xfs)

[CentOS] corruption of in-memory data detected (xfs)

[CentOS] corruption of in-memory data detected (xfs)

[CentOS] corruption of in-memory data detected (xfs)

[CentOS] corruption of in-memory data detected (xfs)

Reasonably Related Threads