thr3ads.net - CentOS - [CentOS] reboot - is there a timeout on filesystem flush? [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Gary Greene

2015-Jan-07 00:37 UTC

[CentOS] reboot - is there a timeout on filesystem flush?

On Jan 6, 2015, at 4:28 PM, Fran Garcia <franchu.garcia at gmail.com>
wrote:> 
> On Tue, Jan 6, 2015 at 6:12 PM, Les Mikesell <> wrote:
>> I've had a few systems with a lot of RAM and very busy filesystems
>> come up with filesystem errors that took a manual 'fsck -y'
after what
>> should have been a clean reboot.  This is particularly annoying on
>> remote systems where I have to talk someone else through the recovery.
>> 
>> Is there some time limit on the cache write with a 'reboot' (no
>> options) command or is ext4 that fragile?
> 
> I'd say there's no limit in the amount of  time the kernel waits
until
> the blocks have been written to disk; driven by there parameters:
> 
> vm.dirty_background_bytes = 0
> vm.dirty_background_ratio = 10
> vm.dirty_bytes = 0
> vm.dirty_expire_centisecs = 3000
> vm.dirty_ratio = 20
> vm.dirty_writeback_centisecs = 500
> 
> ie, if the data cached on RAM is older than 30s or larger than 10%
> available RAM, the kernel will try to flush it to disk. Depending how
> much data needs to be flushed at poweroff/reboot time, this could have
> a significant effect on the time taken.
> 
> Regarding systems with lots of RAM, I've never seen such a behaviour
> on a few 192 GB RAM servers I administer. Granted, your system could
> be tuned in a different way or have some other configuration.
> 
> TBH I'm not confident to give a definitive answer re the data not been
> totally flushed before reboot. I'd investigate:
> 
> - Whether this happens on every reboot or just on some.
> - Whether your RAM is OK (the FS errors could come from that!).
> - Whether your disks/SAN are caching writes.  (Maybe they are and the
> OS thinks the data has been flushed to disk, but they haven't)
> - filesystem mount options that might interfere  (nobarrier, commit,
data...)
This has been discussed to death on various lists, including the LKML...

Almost every controller and drive out there now lies about what is and isn?t
flushed to disk, making it nigh on impossible for the Kernel to reliably know
100% of the time that the data HAS been flushed to disk. This is part of the
reason why it is always a Good Idea? to have some sort of pause in the shut down
to ensure that it IS flushed.

This is also why server grade gear uses battery backed buffers, etc. which are
supposed to allow drives to properly flush the data to disk. There is still a
slim chance in these cases that the data still will not reach the platter before
power off or reboot, especially in catastrophic cases.

--
Gary L. Greene, Jr.
Sr. Systems Administrator
IT Operations
Minerva Networks, Inc.
Cell: +1 (650) 704-6633

Les Mikesell

2015-Jan-07 01:50 UTC

head link

[CentOS] reboot - is there a timeout on filesystem flush?

On Tue, Jan 6, 2015 at 6:37 PM, Gary Greene <ggreene at
minervanetworks.com> wrote:>
>
> Almost every controller and drive out there now lies about what is and
isn?t flushed to disk, making it nigh on impossible for the Kernel to reliably
know 100% of the time that the data HAS been flushed to disk. This is part of
the reason why it is always a Good Idea? to have some sort of pause in the shut
down to ensure that it IS flushed.
>
> This is also why server grade gear uses battery backed buffers, etc. which
are supposed to allow drives to properly flush the data to disk. There is still
a slim chance in these cases that the data still will not reach the platter
before power off or reboot, especially in catastrophic cases.
>
This was a reboot from software, not a power drop.  Does that do
something to kill the disk cache if anything happened to still be
there?

-- 
   Les Mikesell
      lesmikesell at gmail.com

Gordon Messmer

2015-Jan-07 05:23 UTC

head link

[CentOS] reboot - is there a timeout on filesystem flush?

On 01/06/2015 04:37 PM, Gary Greene wrote:> This has been discussed to death on various lists, including the
> LKML...
>
> Almost every controller and drive out there now lies about what is
> and isn?t flushed to disk, making it nigh on impossible for the
> Kernel to reliably know 100% of the time that the data HAS been
> flushed to disk. This is part of the reason why it is always a Good
> Idea? to have some sort of pause in the shut down to ensure that it
> IS flushed.
That's pretty much entirely irrelevant to the original question.

(Feel free to correct me if I'm wrong in the following)

A filesystem has three states: Clean, Dirty, and Dirty with errors.

When a filesystem is unmounted, the cache is flushed and it is marked 
clean last.  This is the expected state when a filesystem is mounted.

Once a filesystem is mounted read/write, then it is marked dirty.  If a 
filesystem is dirty when it is mounted, then it wasn't unmounted 
properly.  In the case of a journaled filesystem, typically the journal 
will be replayed and the filesystem will then be mounted.

The last case, dirty with errors indicates that the kernel found invalid 
data while the filesystem was mounted, and recorded that fact in the 
filesystem metadata.  This will normally be the only condition that will 
force an fsck on boot.  It will also normally result in logs being 
generated when the errors are encountered.  If your filesystems are 
force-checked on boot, then the logs should usually tell you why.  It's 
not a matter of a timeout or some device not flushing its cache.

Of course, the other possibility is simply that you've formatted your 
own filesystems, and they have a maximum mount count or a check 
interval.  Use 'tune2fs -l' to check those two values.  If either of 
them are set, then there is no problem with your system.  It is behaving 
as designed, and forcing a periodic check because that is the default 
behavior.

Keith Keller

2015-Jan-07 06:10 UTC

head link

[CentOS] reboot - is there a timeout on filesystem flush?

On 2015-01-07, Gordon Messmer <gordon.messmer at gmail.com>
wrote:>
> Of course, the other possibility is simply that you've formatted your 
> own filesystems, and they have a maximum mount count or a check 
> interval.
If Les is having to run fsck manually, as he wrote in his OP, then this
is unlikely to be the cause of the issues he described in that post.
There must be some sort of errors on the filesystem that caused the
unattended fsck to exit nonzero.

--keith


-- 
kkeller at wombat.san-francisco.ca.us

Gary Greene

2015-Jan-07 19:30 UTC

head link

[CentOS] reboot - is there a timeout on filesystem flush?

> On Jan 6, 2015, at 5:50 PM, Les Mikesell <lesmikesell at gmail.com>
wrote:
> 
> On Tue, Jan 6, 2015 at 6:37 PM, Gary Greene <ggreene at
minervanetworks.com> wrote:
>> 
>> 
>> Almost every controller and drive out there now lies about what is and
isn?t flushed to disk, making it nigh on impossible for the Kernel to reliably
know 100% of the time that the data HAS been flushed to disk. This is part of
the reason why it is always a Good Idea? to have some sort of pause in the shut
down to ensure that it IS flushed.
>> 
>> This is also why server grade gear uses battery backed buffers, etc.
which are supposed to allow drives to properly flush the data to disk. There is
still a slim chance in these cases that the data still will not reach the
platter before power off or reboot, especially in catastrophic cases.
>> 
> 
> This was a reboot from software, not a power drop.  Does that do
> something to kill the disk cache if anything happened to still be
> there?
In most cases intentional reboots _shouldn?t_ trigger this, but I cannot say
that with a 100% certainty since, again, controllers CAN and DO lie. If the
controller is not battery backed, the certainty is even more shaky, since the
card's firmware can be in the process of lazy writing the content to disk
when the main board drops power to the card's slot on the main board during
the reboot, which without the extra battery would cause the data to be lost.

During the reboot, most card?s drivers on init, will invalidate the cache on the
card to ensure dirty pages of data don?t get flushed to disk, to prevent
scribbling junk data to the platters. From what I recall, this is true of both
the megaraid and adaptec based cards.

--
Gary L. Greene, Jr.
Sr. Systems Administrator
IT Operations
Minerva Networks, Inc.
Cell: +1 (650) 704-6633

Gary Greene

2015-Jan-07 19:37 UTC

head link

[CentOS] reboot - is there a timeout on filesystem flush?

> On Jan 6, 2015, at 9:23 PM, Gordon Messmer <gordon.messmer at
gmail.com> wrote:
> 
> On 01/06/2015 04:37 PM, Gary Greene wrote:
>> This has been discussed to death on various lists, including the
>> LKML...
>> 
>> Almost every controller and drive out there now lies about what is
>> and isn?t flushed to disk, making it nigh on impossible for the
>> Kernel to reliably know 100% of the time that the data HAS been
>> flushed to disk. This is part of the reason why it is always a Good
>> Idea? to have some sort of pause in the shut down to ensure that it
>> IS flushed.
> 
> That's pretty much entirely irrelevant to the original question.
> 
> (Feel free to correct me if I'm wrong in the following)
> 
> A filesystem has three states: Clean, Dirty, and Dirty with errors.
> 
> When a filesystem is unmounted, the cache is flushed and it is marked clean
last.  This is the expected state when a filesystem is mounted.
> 
> Once a filesystem is mounted read/write, then it is marked dirty.  If a
filesystem is dirty when it is mounted, then it wasn't unmounted properly. 
In the case of a journaled filesystem, typically the journal will be replayed
and the filesystem will then be mounted.
> 
> The last case, dirty with errors indicates that the kernel found invalid
data while the filesystem was mounted, and recorded that fact in the filesystem
metadata.  This will normally be the only condition that will force an fsck on
boot.  It will also normally result in logs being generated when the errors are
encountered.  If your filesystems are force-checked on boot, then the logs
should usually tell you why.  It's not a matter of a timeout or some device
not flushing its cache.
> 
> Of course, the other possibility is simply that you've formatted your
own filesystems, and they have a maximum mount count or a check interval.  Use
'tune2fs -l' to check those two values.  If either of them are set, then
there is no problem with your system.  It is behaving as designed, and forcing a
periodic check because that is the default behavior.
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
Problem is, Gordon, the layer I?m talking about is _below_ the logical layer
that filesystems live at, in the block layer, at the mercy of drivers, and
firmware that the kernel has zero control over. While in a perfect world, the
controller would do strictly only what the Kernel tells it, that just isn?t true
for a while now with the large caches that drives and controllers have now.

In most cases, this should never trigger, however in some buggy drivers, or
controllers that have buggy firmware, the writes can be seriously delayed to
disk, which can cause data to never make it to the platter.

--
Gary L. Greene, Jr.
Sr. Systems Administrator
IT Operations
Minerva Networks, Inc.
Cell: +1 (650) 704-6633

Maybe Matching Threads

Search for more reasonably related threads

CentOS - Jan 2015 - reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

[CentOS] reboot - is there a timeout on filesystem flush?

Maybe Matching Threads