Hello all, I just did this post about the problem: http://www.posix.brte.com.br/blog/?p=103 I just want to know if somebody knows the Linux implementation of XFS, EXT3, or another filesystem to confirm that the ACK by the fileserver is without "log" the transaction (like ZIL), or without commit to stable storage? I mean, can you confirm that the zil_disable/zfs solaris nfs service, is a similar service like a standard xfs or ext3 linux/nfs solution (take into account the NFS service provided)? This message posted from opensolaris.org
Bob Friesenhahn
2008-Feb-25 23:27 UTC
[zfs-discuss] The old problem with tar, zfs, nfs and zil
On Mon, 25 Feb 2008, msl wrote:> I mean, can you confirm that the zil_disable/zfs solaris nfs > service, is a similar service like a standard xfs or ext3 linux/nfs > solution (take into account the NFS service provided)?>From what I have heard:* Linux does not implement NFS writes correctly in that data is not flushed to disk before returning. Don''t turn your Linux system off during application writes since user data will likely be lost when the system returns. Besides the applications losing data, running applications are likely to become confused. * ZFS has had an issue in that requesting a fsync() of one file causes a sync of the entire filesystem. This is a huge performance glitch. Wikipedia says that it is fixed in Solaris Nevada. Someone should update this WikiPedia section: http://en.wikipedia.org/wiki/ZFS#Solaris_implementation_issues Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Anton B. Rang
2008-Feb-26 04:58 UTC
[zfs-discuss] The old problem with tar, zfs, nfs and zil
For Linux NFS service, it''s a option in /etc/exports. The default for "modern" (post-1.0.1) NFS utilities is "sync", which means that data and metadata will be written to the disk whenever NFS requires it (generally upon an NFS COMMIT operation). This is the same as Solaris with UFS, or with ZFS+ZIL. This works with XFS, EXT3, and any other file system with a working fsync(). It''s possible to switch this off on Linux, but not recommended, as there is a chance that data could be lost if the server crashed. (For the same reason, the ZIL should not be disabled on a Solaris NFS server.) This message posted from opensolaris.org
> For Linux NFS service, it''s a option in > /etc/exports. > > The default for "modern" (post-1.0.1) NFS utilities > is "sync", which means that data and metadata will be > written to the disk whenever NFS requires it > (generally upon an NFS COMMIT operation). This is > the same as Solaris with UFS, or with ZFS+ZIL. This > works with XFS, EXT3, and any other file system with > a working fsync().Ok, i did know that, i have forgot to mention in my question that my doubt was if Linux would "really" honour the sync. Do you understand? I did read that Linux "does not" (even with sync in exports). In nfsv2 for example, does not matter if you put sync or async, the server will ACK as soon as it receives the request (NOP). But if you are telling that *now* Linux is really syncing discs before ACK the client, well... so there is a huge diff on zfs/nfs and xfs/nfs, because the numbers that i have posted is with "sync" on Linux.> > It''s possible to switch this off on Linux, but not > recommended, as there is a chance that data could be > lost if the server crashed. (For the same reason, the > ZIL should not be disabled on a Solaris NFS server.)I understand that, so i did not even try to disable ZIL until now. All the tests that i have made was respecting a semantically correct NFS service. If the ZIL could be configured per filesystem, or pool... The diff is 7.5s to 1.0s, and theoretically zfs is more efficient than xfs. This message posted from opensolaris.org
Roch Bourbonnais
2008-Feb-26 12:53 UTC
[zfs-discuss] The old problem with tar, zfs, nfs and zil
I would imagine that linux to behave more like ZFS that does not flush caches. (google Evil zfs_nocacheflush). If you can nfs tar extract files on linux faster than one file per rotation latency; that is suspicious. -r Le 26 f?vr. 08 ? 13:16, msl a ?crit :>> For Linux NFS service, it''s a option in >> /etc/exports. >> >> The default for "modern" (post-1.0.1) NFS utilities >> is "sync", which means that data and metadata will be >> written to the disk whenever NFS requires it >> (generally upon an NFS COMMIT operation). This is >> the same as Solaris with UFS, or with ZFS+ZIL. This >> works with XFS, EXT3, and any other file system with >> a working fsync(). > Ok, i did know that, i have forgot to mention in my question that my > doubt was if Linux would "really" honour the sync. Do you > understand? I did read that Linux "does not" (even with sync in > exports). In nfsv2 for example, does not matter if you put sync or > async, the server will ACK as soon as it receives the request (NOP). > But if you are telling that *now* Linux is really syncing discs > before ACK the client, well... so there is a huge diff on zfs/nfs > and xfs/nfs, because the numbers that i have posted is with "sync" > on Linux. >> >> It''s possible to switch this off on Linux, but not >> recommended, as there is a chance that data could be >> lost if the server crashed. (For the same reason, the >> ZIL should not be disabled on a Solaris NFS server.) > I understand that, so i did not even try to disable ZIL until now. > All the tests that i have made was respecting a semantically correct > NFS service. If the ZIL could be configured per filesystem, or pool... > The diff is 7.5s to 1.0s, and theoretically zfs is more efficient > than xfs. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Actually, i have some corrections to be made. When i did see the numbers, i was stunned and that blocked me to think? Here you can see the right numbers: http://www.posix.brte.com.br/blog/?p=104 The problem was the discs were i have made the tests. Thanks for your time. This message posted from opensolaris.org