Gary Greene
2015-Jan-07 19:30 UTC
[CentOS] reboot - is there a timeout on filesystem flush?
> On Jan 6, 2015, at 5:50 PM, Les Mikesell <lesmikesell at gmail.com> wrote: > > On Tue, Jan 6, 2015 at 6:37 PM, Gary Greene <ggreene at minervanetworks.com> wrote: >> >> >> Almost every controller and drive out there now lies about what is and isn?t flushed to disk, making it nigh on impossible for the Kernel to reliably know 100% of the time that the data HAS been flushed to disk. This is part of the reason why it is always a Good Idea? to have some sort of pause in the shut down to ensure that it IS flushed. >> >> This is also why server grade gear uses battery backed buffers, etc. which are supposed to allow drives to properly flush the data to disk. There is still a slim chance in these cases that the data still will not reach the platter before power off or reboot, especially in catastrophic cases. >> > > This was a reboot from software, not a power drop. Does that do > something to kill the disk cache if anything happened to still be > there?In most cases intentional reboots _shouldn?t_ trigger this, but I cannot say that with a 100% certainty since, again, controllers CAN and DO lie. If the controller is not battery backed, the certainty is even more shaky, since the card's firmware can be in the process of lazy writing the content to disk when the main board drops power to the card's slot on the main board during the reboot, which without the extra battery would cause the data to be lost. During the reboot, most card?s drivers on init, will invalidate the cache on the card to ensure dirty pages of data don?t get flushed to disk, to prevent scribbling junk data to the platters. From what I recall, this is true of both the megaraid and adaptec based cards. -- Gary L. Greene, Jr. Sr. Systems Administrator IT Operations Minerva Networks, Inc. Cell: +1 (650) 704-6633
John R Pierce
2015-Jan-07 20:08 UTC
[CentOS] reboot - is there a timeout on filesystem flush?
On 1/7/2015 11:30 AM, Gary Greene wrote:> During the reboot, most card?s drivers on init, will invalidate the cache on the card to ensure dirty pages of data don?t get flushed to disk, to prevent scribbling junk data to the platters. From what I recall, this is true of both the megaraid and adaptec based cards.Presumably, this cache invalidation is only on cards that don't have battery (or flash) backed write cache? Doing that on a BB/FBWC system would negate the usefulness of said battery backed cache entirely. IMHO, an even bigger problem is using cheap desktop class SATA drives for server storage. These FREQUENTLY lie about write commits. This sort of behavior is a VERY good reason to stick with vendor qualified and branded server drives that have been tested to work with the specific controller + backplane configurations they are sold with. And yes, those drives cost 2-3X more than your Newegg/Amazon elcheapo desktop stuff. All of this controller and drive behavior is a VERY good argument for the use of end to end checksumming like ZFS does... a ZFS 'scrub' operation WILL detect any data corruption on the file system and raid, whatever the source, and many inconsistencies can be corrected, such as one disk of a mirror having a stale block. -- john r pierce 37N 122W somewhere on the middle of the left coast
John R Pierce
2015-Jan-07 20:24 UTC
[CentOS] reboot - is there a timeout on filesystem flush?
On 1/7/2015 12:15 PM, m.roth at 5-cent.us wrote:> Actually, the WD Reds and similar are just fine.those are specifically sold for use in small NAS (raid) environments, so yeah, they are configured 'correctly'. -- john r pierce 37N 122W somewhere on the middle of the left coast
John R Pierce
2015-Jan-07 21:30 UTC
[CentOS] reboot - is there a timeout on filesystem flush?
On 1/7/2015 12:50 PM, m.roth at 5-cent.us wrote:> Right... but only cost 133% (about) more than consumer drives, as opposed > to the 300% that the "server/enterprise" grade drives' cost.well, those $$$ drives are likely SAS rather than SATA, and that has other advantages... 10k or 15k RPM gives you up to double the IOPS per spindle of a 7200rpm SATA drive (and WD Reds are only 5900 RPM, I believe?)... 2.5" enterprise disks let you have more smaller spindles in the same space (24-25 per 2U vs 12 for 3.5") for higher IO concurrency, and SAS supports multipathing (dual porting) for higher IO bandwidth, also SAS has tagged command queueing which often performs better than SATA NCQ under high IO concurrency workloads, like database servers. -- john r pierce 37N 122W somewhere on the middle of the left coast
Les Mikesell
2015-Jan-07 21:52 UTC
[CentOS] reboot - is there a timeout on filesystem flush?
On Wed, Jan 7, 2015 at 3:30 PM, John R Pierce <pierce at hogranch.com> wrote:>> >> Right... but only cost 133% (about) more than consumer drives, as opposed >> to the 300% that the "server/enterprise" grade drives' cost. > > > well, those $$$ drives are likely SAS rather than SATA, and that has other > advantages... 10k or 15k RPM gives you up to double the IOPS per spindle of > a 7200rpm SATA drive (and WD Reds are only 5900 RPM, I believe?)... 2.5" > enterprise disks let you have more smaller spindles in the same space (24-25 > per 2U vs 12 for 3.5") for higher IO concurrency, and SAS supports > multipathing (dual porting) for higher IO bandwidth, also SAS has tagged > command queueing which often performs better than SATA NCQ under high IO > concurrency workloads, like database servers.These particular drives are enterprise SAS versions, but about as old as they made them. -- Les Mikesell lesmikesell at gmail.com
Gary Greene
2015-Jan-07 21:53 UTC
[CentOS] reboot - is there a timeout on filesystem flush?
> On Jan 7, 2015, at 12:08 PM, John R Pierce <pierce at hogranch.com> wrote: > > On 1/7/2015 11:30 AM, Gary Greene wrote: >> During the reboot, most card?s drivers on init, will invalidate the cache on the card to ensure dirty pages of data don?t get flushed to disk, to prevent scribbling junk data to the platters. From what I recall, this is true of both the megaraid and adaptec based cards. > > Presumably, this cache invalidation is only on cards that don't have battery (or flash) backed write cache? Doing that on a BB/FBWC system would negate the usefulness of said battery backed cache entirely. >The ones with batteries will try to properly write the content of the cache to the disk right before the cache invalidate occurs. This is one of the few times when they aren?t lazy in their write patterns. Regarding cheap vs. enterprise drives, agreed. You should absolutely never trust the disks to do the ?right? thing with cheap models. -- Gary L. Greene, Jr. Sr. Systems Administrator IT Operations Minerva Networks, Inc. Cell: +1 (650) 704-6633