Thomas Maier-Komor
2009-Mar-11 08:58 UTC
[zfs-discuss] ext4 bug & zfs handling of the very same situation
Hi, there was recently a bug reported against EXT4 that gets triggered by KDE: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781 Now I''d like to verify that my understanding of ZFS behavior and implementations is correct, and ZFS is unaffected from this kind of issue. Maybe somebody would like to comment on this. The underlying problem with ext4 is that some kde executables do something like this: 1a) open and read data from file x, close file x 1b) open and truncate file x 1c) write data to file x 1d) close file x or 2a) open and read data from file x, close file x 2b) open and truncate file x.new 2c) write data to file x.new 2d) close file x.new 2e) rename file x.new to file x Concerning case 1) I think ZFS may lose data if power is lost right after 1b) and open(xxx,O_WRONLY|O_TRUNC|O_CREAT) is issued in a transaction group separately from the one containing 1c/1d. Concerning case 2) I cannot see ZFS losing any data, because of copy-on-write and transaction grouping. Theodore Ts''o (ext4 developer) commented that both cases are flawed and cannot be supported correctly, because of a lacking fsync() before close. Is this correct? His comment is over here: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54 Any thoughts or comments? TIA, Thomas
Casper.Dik at Sun.COM
2009-Mar-11 09:22 UTC
[zfs-discuss] ext4 bug & zfs handling of the very same situation
>The underlying problem with ext4 is that some kde executables do >something like this: >1a) open and read data from file x, close file x >1b) open and truncate file x >1c) write data to file x >1d) close file x > >or > >2a) open and read data from file x, close file x >2b) open and truncate file x.new >2c) write data to file x.new >2d) close file x.new >2e) rename file x.new to file x > >Concerning case 1) I think ZFS may lose data if power is lost right >after 1b) and open(xxx,O_WRONLY|O_TRUNC|O_CREAT) is issued in a >transaction group separately from the one containing 1c/1d.Yes, I would assume that is possible but the change for it happening is small. Other filesystems prefer to write the meta data prompt, but ZFS will easily wait until the file is completely written. And UFS has the extra problem that it can change the file size and reading will show garbage in the file. (It changes the inode, possibly because it''s in the log, but it hasn''t written the data). We''ve seen that problem with UFS and the /etc/*_* driver files. Precisely because we didn''t flush/fsync. (And in some cases we used fsync(fileno(file)), but the new content was still in the stdio buffer) Only "versioned" filesystems can make the first sequence work.>Concerning case 2) I cannot see ZFS losing any data, because of >copy-on-write and transaction grouping. > >Theodore Ts''o (ext4 developer) commented that both cases are flawed and >cannot be supported correctly, because of a lacking fsync() before >close. Is this correct? His comment is over here: >https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54Perhaps we should Theodore Ts''o that ZFS gets this right :-) I''m assuming that all transactions in group N all happened before group N+1 at least when it comes to the partial order in which the transactions happen. Casper
Joerg Schilling
2009-Mar-11 11:46 UTC
[zfs-discuss] [osol-discuss] ext4 bug & zfs handling of the very same situation
Thomas Maier-Komor <thomas at maier-komor.de> wrote:> there was recently a bug reported against EXT4 that gets triggered by > KDE: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781 > > Now I''d like to verify that my understanding of ZFS behavior and > implementations is correct, and ZFS is unaffected from this kind of > issue. Maybe somebody would like to comment on this. > > The underlying problem with ext4 is that some kde executables do > something like this: > 1a) open and read data from file x, close file x > 1b) open and truncate file x > 1c) write data to file x > 1d) close file x > > or > > 2a) open and read data from file x, close file x > 2b) open and truncate file x.new > 2c) write data to file x.new > 2d) close file x.new > 2e) rename file x.new to file x > > Concerning case 1) I think ZFS may lose data if power is lost right > after 1b) and open(xxx,O_WRONLY|O_TRUNC|O_CREAT) is issued in a > transaction group separately from the one containing 1c/1d.It depends on what you call "loses data". If you truncate an existing file, the content of the file is expected to be lost if the file system behaves correctly.> Concerning case 2) I cannot see ZFS losing any data, because of > copy-on-write and transaction grouping. > > Theodore Ts''o (ext4 developer) commented that both cases are flawed and > cannot be supported correctly, because of a lacking fsync() before > close. Is this correct? His comment is over here: > https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54An application cannot be sure on whether the data has been written to disk if there was no fsync() call. Star for this reason by default calls fsync() before closing a file in order to be able to include information about problems in the star exit code. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) joerg.schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily