thr3ads.net - CentOS - [CentOS] weird XFS problem [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Boris Epstein

2012-Jan-22 14:06 UTC

[CentOS] weird XFS problem

Hello all,

I have a CentOS 5.7 machine hosting a 16 TB XFS partition used to house
backups. The backups are run via rsync/rsnapshot and are large in terms of
the number of files: over 10 million each.

Now the machine is not particularly powerful: it is 64-bit machine, dual
core CPU, 3 GB RAM. So perhaps this is a factor in why I am having the
following problem: once in awhile that XFS partition starts generating
multiple I/O errors, files that had content become 0 byte, directories
disappear, etc. Every time a reboot fixes that, however. So far I've looked
at logs but could not find a cause of precipitating event.

Hence the question: has anyone experienced anything along those lines? What
could be the cause of this?

Thanks.

Boris.

Boris Epstein

2012-Jan-22 15:00 UTC

head link

[CentOS] weird XFS problem

On Sun, Jan 22, 2012 at 9:06 AM, Boris Epstein <borepstein at gmail.com>
wrote:
> Hello all,
>
> I have a CentOS 5.7 machine hosting a 16 TB XFS partition used to house
> backups. The backups are run via rsync/rsnapshot and are large in terms of
> the number of files: over 10 million each.
>
> Now the machine is not particularly powerful: it is 64-bit machine, dual
> core CPU, 3 GB RAM. So perhaps this is a factor in why I am having the
> following problem: once in awhile that XFS partition starts generating
> multiple I/O errors, files that had content become 0 byte, directories
> disappear, etc. Every time a reboot fixes that, however. So far I've
looked
> at logs but could not find a cause of precipitating event.
>
> Hence the question: has anyone experienced anything along those lines?
> What could be the cause of this?
>
> Thanks.
>
> Boris.
>
Correction to the above: the XFS partition is 26TB, not 16 TB (not that it
should matter in the context of this particular situation).

Also, here's somethine else I have discovered. Apparently there is an
potential intermittent RAID disk trouble. At least I found the following in
the system log:

Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x0026):
Drive ECC error reported:port=4, unit=0.
Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x002D):
Source drive error occurred:port=4, unit=0.
Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x0004):
Rebuild failed:unit=0.
Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: INFO (0x04:0x003B):
Rebuild paused:unit=0.

...

Jan 22 09:55:23 nrims-bs kernel: 3w-9xxx: scsi6: AEN: WARNING
(0x04:0x000F): SMART threshold exceeded:port=9.
Jan 22 09:55:23 nrims-bs kernel: 3w-9xxx: scsi6: AEN: WARNING
(0x04:0x000F): SMART threshold exceeded:port=9.
Jan 22 09:56:17 nrims-bs kernel: 3w-9xxx: scsi6: AEN: INFO (0x04:0x000B):
Rebuild started:unit=0.

Even if a disk is misbehaving in a RAID6 that should not be causing I/O
errors. Plus, why is it never straight after a rebbot and is always fixed
by a reboot?

Be that as it may, I am still puzzled.

Boris.

Miguel Medalha

2012-Jan-22 19:23 UTC

head link

[CentOS] weird XFS problem

> Now the machine is not particularly powerful: it is 64-bit machine, dual
> core CPU, 3 GB RAM. So perhaps this is a factor in why I am having the
> following problem: once in awhile that XFS partition starts generating
> multiple I/O errors, files that had content become 0 byte, directories
> disappear, etc. Every time a reboot fixes that, however. So far I've
looked
> at logs but could not find a cause of precipitating event.
Is the CentOS you are running a 64 bit one?

The reason I am asking this is because the use of XFS under a 32 bit OS 
is NOT recommended.
If you search this list's archives you will find some discussion about 
this subject.

Joseph L. Casale

2012-Jan-22 19:56 UTC

head link

[CentOS] weird XFS problem

>I have a CentOS 5.7 machine hosting a 16 TB XFS partition used to house
>backups. The backups are run via rsync/rsnapshot and are large in terms of
>the number of files: over 10 million each.
>
>Now the machine is not particularly powerful: it is 64-bit machine, dual
>core CPU, 3 GB RAM. So perhaps this is a factor in why I am having the
>following problem: once in awhile that XFS partition starts generating
>multiple I/O errors, files that had content become 0 byte, directories
>disappear, etc. Every time a reboot fixes that, however. So far I've
looked
>at logs but could not find a cause of precipitating event.
>
>Hence the question: has anyone experienced anything along those lines? What
>could be the cause of this?
In every situation like this that I have seen, it was hardware that never had
adequate memory provisioned.

Another consideration is you almost certainly wont be able to run a repair on
that
fs with so little ram.

Finally, it would be interesting to know how you architected the storage
hardware.
Hardware raid, BBC, drive cache status, barrier status etc...

Apparently Analagous Threads

Search for more seemingly similar threads

CentOS - Jan 2012 - weird XFS problem

[CentOS] weird XFS problem

[CentOS] weird XFS problem

[CentOS] weird XFS problem

[CentOS] weird XFS problem

Apparently Analagous Threads