thr3ads.net - Gluster users - [Gluster-users] How reliable is XFS under Gluster? [Dec 2013]

If this information is useful, please help other people find it:
Share via:

Kal Black

2013-Dec-06 18:57 UTC

[Gluster-users] How reliable is XFS under Gluster?

Hello,
I am in the point of picking up a FS for new brick nodes. I was used to
like and use ext4 until now but I recently red for an issue introduced by a
patch in ext4 that breaks the distributed translator. In the same time, it
looks like the recommended FS for a brick is no longer ext4 but XFS which
apparently will also be the default FS in the upcoming RedHat7. On the
other hand, XFS is being known as a file system that can be easily
corrupted (zeroing files) in case of a power failure. Supporters of the
file system claim that this should never happen if an application has been
properly coded (properly committing/fsync-ing data to storage) and the
storage itself has been properly configured (disk cash disabled on
individual disks and battery backed cache used on the controllers). My
question is, should I be worried about losing data in a power failure or
similar scenarios (or any) using GlusterFS and XFS? Are there best
practices for setting up a Gluster brick + XFS? Has the ext4 issue been
reliably fixed? (my understanding is that this will be impossible unless
ext4 isn't being modified to allow popper work with Gluster)

Best regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131206/025d78f3/attachment.html>

Robert Hajime Lanning

2013-Dec-06 20:11 UTC

head link

[Gluster-users] How reliable is XFS under Gluster?

On 12/06/13 10:57, Kal Black wrote:> Hello,
> I am in the point of picking up a FS for new brick nodes. I was used to
> like and use ext4 until now but I recently red for an issue introduced
> by a patch in ext4 that breaks the distributed translator. In the same
> time, it looks like the recommended FS for a brick is no longer ext4 but
> XFS which apparently will also be the default FS in the upcoming
> RedHat7. On the other hand, XFS is being known as a file system that can
> be easily corrupted (zeroing files) in case of a power failure.
> Supporters of the file system claim that this should never happen if an
> application has been properly coded (properly committing/fsync-ing data
> to storage) and the storage itself has been properly configured (disk
> cash disabled on individual disks and battery backed cache used on the
> controllers). My question is, should I be worried about losing data in a
> power failure or similar scenarios (or any) using GlusterFS and XFS? Are
> there best practices for setting up a Gluster brick + XFS? Has the ext4
> issue been reliably fixed? (my understanding is that this will be
> impossible unless ext4 isn't being modified to allow popper work with
> Gluster)
The ext4<>Gluster issue has been fixed in the latest Gluster release.

As for xfs, I have never run into truncated files in my 8-ish years of 
using it in production and home.  With both ext3/4 and xfs, metadata is 
committed to a transaction log.  Data is written directly.  With a power 
outage, the filesystem is intact using the transaction log, but the file 
data is not.

You will always have the issue of file data corruption when a 
non-battery backed write cache is in use. (The kernel VFS cache is such 
a beast.)

So, you can have a file that is the correct length (not truncated), but 
be full of NULLs.  I have had that.  The metadata was transacted to the 
log, but the file data (contents) were still in the VFS write cache when 
I had a kernel deadlock.

Ext3/4 have options to write the file data to the transaction log, along 
with the metadata.  So, you end up with twice the IO, as the data has to 
be rewritten to its proper place.  This makes a smaller window for file 
data corruption at a power outage, at the expense of IO. You also loose 
the speed gain of the VFS write cache.

So, an application needs to use fsync for any data it deems required to 
be known written to stable storage.

Applications like sendmail, by default, use fsync for mail queue entry 
instantiation, before returning an OK result to the sender.  You can 
turn it off, but that looses the transactional guarantee of SMTP.

-- 
Mr. Flibble
King of the Potato People
http://www.linkedin.com/in/RobertLanning

Brian Foster

2013-Dec-06 20:23 UTC

head link

[Gluster-users] How reliable is XFS under Gluster?

On 12/06/2013 01:57 PM, Kal Black wrote:> Hello,
> I am in the point of picking up a FS for new brick nodes. I was used to
> like and use ext4 until now but I recently red for an issue introduced by a
> patch in ext4 that breaks the distributed translator. In the same time, it
> looks like the recommended FS for a brick is no longer ext4 but XFS which
> apparently will also be the default FS in the upcoming RedHat7. On the
> other hand, XFS is being known as a file system that can be easily
> corrupted (zeroing files) in case of a power failure. Supporters of the
> file system claim that this should never happen if an application has been
> properly coded (properly committing/fsync-ing data to storage) and the
> storage itself has been properly configured (disk cash disabled on
> individual disks and battery backed cache used on the controllers). My
> question is, should I be worried about losing data in a power failure or
> similar scenarios (or any) using GlusterFS and XFS? Are there best
> practices for setting up a Gluster brick + XFS? Has the ext4 issue been
> reliably fixed? (my understanding is that this will be impossible unless
> ext4 isn't being modified to allow popper work with Gluster)
> 
Hi Kal,

You are correct in that Red Hat recommends using XFS for gluster bricks.
I'm sure there are plenty of ext4 (and other fs) users as well, so other
users should chime in as far as real experiences with various brick
filesystems goes. Also, I believe the dht/ext issue has been resolved
for some time now.

With regard to "XFS zeroing files on power failure," I'd suggest
you
check out the following blog post:

http://sandeen.net/wordpress/computers/xfs-does-not-null-files-and-requires-no-flux/

My cursory understanding is that there were apparently situations where
the inode size of a recently extended file would be written to the log
before the actual extending data is written to disk, thus creating a
crash window where the updated size would be seen, but not the actual
data. In other words, this isn't a "zeroing files" behavior in as
much
as it is an ordering issue with logging the inode size. This is probably
why you've encountered references to fsync(), because with the fix your
data is still likely lost (unless/until you've run an fsync to flush to
disk), you just shouldn't see the extended inode size unless the actual
data made it to disk.

Also note that this was fixed in 2007. ;)

Brian
> Best regards
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>

Possibly Parallel Threads

Search for more apparently analagous threads

Gluster users - Dec 2013 - How reliable is XFS under Gluster?

[Gluster-users] How reliable is XFS under Gluster?

[Gluster-users] How reliable is XFS under Gluster?

[Gluster-users] How reliable is XFS under Gluster?

Possibly Parallel Threads