thr3ads.net - CentOS - [CentOS] reboot - is there a timeout on filesystem flush? [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Keith Keller

2015-Jan-07 06:10 UTC

[CentOS] reboot - is there a timeout on filesystem flush?

On 2015-01-07, Gordon Messmer <gordon.messmer at gmail.com>
wrote:>
> Of course, the other possibility is simply that you've formatted your 
> own filesystems, and they have a maximum mount count or a check 
> interval.
If Les is having to run fsck manually, as he wrote in his OP, then this
is unlikely to be the cause of the issues he described in that post.
There must be some sort of errors on the filesystem that caused the
unattended fsck to exit nonzero.

--keith


-- 
kkeller at wombat.san-francisco.ca.us

Les Mikesell

2015-Jan-07 13:53 UTC

head link

[CentOS] reboot - is there a timeout on filesystem flush?

On Wed, Jan 7, 2015 at 12:10 AM, Keith Keller
<kkeller at wombat.san-francisco.ca.us> wrote:> On 2015-01-07, Gordon Messmer <gordon.messmer at gmail.com> wrote:
>>
>> Of course, the other possibility is simply that you've formatted
your
>> own filesystems, and they have a maximum mount count or a check
>> interval.
>
> If Les is having to run fsck manually, as he wrote in his OP, then this
> is unlikely to be the cause of the issues he described in that post.
> There must be some sort of errors on the filesystem that caused the
> unattended fsck to exit nonzero.
>
Yes - the unattended fsck fails.   Personally, I'd prefer for the
default run to use '-y' in the first place.  It's not like I'm
more
likely than fsck to know how to fix it and it is very inconvenient on
remote machines.   The recent case was an opennms system updating a
lot of rrd files, but I've also seen it on backuppc archives with lots
of files and lots of hard links.  Some of these have been on VMware
ESXi hosts where the physical host wasn't rebooted and the
controller/power not involved at all.  Eventually these will be
replaced with CentOS7 systems, probably using XFS but I don't know if
that will be better or worse.   It is mostly on aging hardware, so it
is possible that there are underlying controller issues.  I also see
some rare cases on similar machines where a filesystem will go
read-only with some scsi errors logged, but didn't look for that yet
in this case.

-- 
   Les Mikesell
     lesmikesell at gmail.com

Gordon Messmer

2015-Jan-07 15:52 UTC

head link

[CentOS] reboot - is there a timeout on filesystem flush?

On 01/07/2015 05:53 AM, Les Mikesell wrote:>
> Yes - the unattended fsck fails.
In that case, there should be logs indicating the cause of the error 
when it was detected by the kernel.  There's probably something wrong 
with your controller or other hardware.
> Personally, I'd prefer for the
> default run to use '-y' in the first place.  It's not like
I'm more
> likely than fsck to know how to fix it and it is very inconvenient on
> remote machines.   The recent case was an opennms system updating a
> lot of rrd files, but I've also seen it on backuppc archives with lots
> of files and lots of hard links.
Every regular file's directory entry on your system is a hard link. 
There's nothing particular about links (files) that make a filesystem 
fragile.
> It is mostly on aging hardware, so it
> is possible that there are underlying controller issues.  I also see
> some rare cases on similar machines where a filesystem will go
> read-only with some scsi errors logged, but didn't look for that yet
> in this case.
It's probably a similar cause in all cases.  I don't know how many times
I've seen you on this list defending running old hardware / obsolete 
hardware.  Corruption and failure are more or less what I'd expect if 
your hardware is junk.

Steve Clark

2015-Jan-07 16:32 UTC

head link

[CentOS] reboot - is there a timeout on filesystem flush?

On 01/07/2015 08:53 AM, Les Mikesell wrote:> On Wed, Jan 7, 2015 at 12:10 AM, Keith Keller
> <kkeller at wombat.san-francisco.ca.us> wrote:
>> On 2015-01-07, Gordon Messmer <gordon.messmer at gmail.com>
wrote:
>>> Of course, the other possibility is simply that you've
formatted your
>>> own filesystems, and they have a maximum mount count or a check
>>> interval.
>> If Les is having to run fsck manually, as he wrote in his OP, then this
>> is unlikely to be the cause of the issues he described in that post.
>> There must be some sort of errors on the filesystem that caused the
>> unattended fsck to exit nonzero.
>>
> Yes - the unattended fsck fails.   Personally, I'd prefer for the
> default run to use '-y' in the first place.  It's not like
I'm more
> likely than fsck to know how to fix it and it is very inconvenient on
> remote machines.   The recent case was an opennms system updating a
> lot of rrd files, but I've also seen it on backuppc archives with lots
> of files and lots of hard links.  Some of these have been on VMware
> ESXi hosts where the physical host wasn't rebooted and the
> controller/power not involved at all.  Eventually these will be
> replaced with CentOS7 systems, probably using XFS but I don't know if
> that will be better or worse.   It is mostly on aging hardware, so it
> is possible that there are underlying controller issues.  I also see
> some rare cases on similar machines where a filesystem will go
> read-only with some scsi errors logged, but didn't look for that yet
> in this case.
>I know that I have seen it take 10 ot 15 minutes to sync a 7200 rpm 3 TB WD
drive that had over
2 million rrd files being updated by ntopng when the system had 32GB of ram. The
system is a
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz but one cpu will in in constant IO wait
state until the
sync finishes. I have never tried shutting it down when it was syncing though.

-- 
Stephen Clark
*NetWolves Managed Services, LLC.*
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark at netwolves.com
http://www.netwolves.com

Les Mikesell

2015-Jan-07 17:03 UTC

head link

[CentOS] reboot - is there a timeout on filesystem flush?

On Wed, Jan 7, 2015 at 10:15 AM,  <m.roth at 5-cent.us>
wrote:>>>
>> Yes - the unattended fsck fails.   Personally, I'd prefer for the
>> default run to use '-y' in the first place.  It's not like
I'm more
>> likely than fsck to know how to fix it and it is very inconvenient on
>> remote machines.   The recent case was an opennms system updating a
> <snip>
>
> In some ways, I prefer the fsck run by reboot to fail - that way, I see
> it, and it most probably tells me that it's time to look at replacing
the
> disk.
Seems random to me - not repeating on the same box, and rare enough
that it is hard to make any generalization except that it is painful
to talk some remote helper through the recovery process - usually
involving emailing some cell phone photos of the console to figure out
which partition has the problem.

-- 
   Les Mikesell
     lesmikesell at gmail.com

Apparently Analagous Threads

Search for more maybe matching threads

CentOS - Jan 2015 - reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

Apparently Analagous Threads