Joel Becker
2009-Aug-08 00:10 UTC
[Ocfs2-devel] New update for sync in reflink. [Re: some thoughts about data integrity test for reflink.]
[I've added ocfs2-devel to the Cc list. Please copy the list, otherwise we can't get Mark's thoughts, and he knows some of the code better than I do. ] On Fri, Aug 07, 2009 at 11:23:52PM +0800, Tao Ma wrote:> Joel Becker wrote: > >>I called ocfs2_sync_inode before reflink to flush the page > >>cache. So I think we should get all B's and C's are missed. > >>The reason I call ocfs2_sync_inode is the following test case: > >>1. Write a file with all A's. > >>2. reflink A to B. > >>3. read B. > >>If the page cache of A isn't synced to the disk, we may get the > >>garbage from reading of B. > > > > We won't get garbage. We get 'A's, which is a valid state of > >the file. Whether it's valid in the face of relink is what we're trying > >to decide here. > We don't get 'A' here actually and we would get garbage. Strange, > uh? I just added some printk and it seems that while reading file B, > the buffer_head is created > and read from the disk. I used to think that while we write file A, > we will create buffer_head. And reading file B will use this > buffer_head. > But that is not the case actually. The only right sequence is that > we add a sync between step 1 and 2.Um, what? Oh, we do the check for PageDirty() in CoW, but not in straight read...> > Now, whether we want a reflink to force a sync or not in the > >writeback case...on the one hand, people will probably expect write(); > >reflink(); read() to work. On the other hand, they really should > >write(); fsync(); reflink(); if they care. We're already dealing > >with this regarding rename(), and its ugly. Force the fsync() for them, > >and we help the naive caller but penalize (performance-wise) the > >intelligent caller. > I have checked my mount mode, it is data=ordered, so we also meet > with this issue in ordered mode.Yes, the original discussion was concerning the page cache contents during CoW. You've coded that up correctly. Here we're talking about simple read.> > I really lean towards just requiring the caller to deal with it. > >It's what unix-like systems do for all other operations. I poked Chris, > >and he agrees. So a caller if they knew they had such a situation, does > >'fsync(); reflink();'. > So a user always need to do fsync before he do reflink. Is it OK? If > yes, I will ask tristan to update his test case.Well... I would normally say yes, because stale vs not-stale is one thing. But complete garbage from before the first write is not OK.> >>I think it is OK if I call ocfs2_sync_inode before reflink?(See > >>my git tree for details). > > > > Calling it unconditionally is just a performance disaster > >waiting to happen. Even if we want to require reflink() to happen with > >data flushed, you don't want to call ocfs2_sync_inode() in the ordered > >case. You want the ordered journal to handle it. > Since it doesn't work in order mode which is more common for ocfs2 > usage, do you change your mind here?I do not. Having to call ocfs2_sync_inode() for every reflink is a disaster. We need to think about this. Basically, how can the reflink target notice it has to either sync the source or perhaps copy the data over? Joel -- Life's Little Instruction Book #43 "Never give up on somebody. Miracles happen every day." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127
Tao Ma
2009-Aug-08 14:28 UTC
[Ocfs2-devel] New update for sync in reflink. [Re: some thoughts about data integrity test for reflink.]
Joel Becker wrote:> [I've added ocfs2-devel to the Cc list. Please copy the list, otherwise > we can't get Mark's thoughts, and he knows some of the code better > than I do. ] > > On Fri, Aug 07, 2009 at 11:23:52PM +0800, Tao Ma wrote: > >> Joel Becker wrote: >> >>>> I called ocfs2_sync_inode before reflink to flush the page >>>> cache. So I think we should get all B's and C's are missed. >>>> The reason I call ocfs2_sync_inode is the following test case: >>>> 1. Write a file with all A's. >>>> 2. reflink A to B. >>>> 3. read B. >>>> If the page cache of A isn't synced to the disk, we may get the >>>> garbage from reading of B. >>>> >>> We won't get garbage. We get 'A's, which is a valid state of >>> the file. Whether it's valid in the face of relink is what we're trying >>> to decide here. >>> >> We don't get 'A' here actually and we would get garbage. Strange, >> uh? I just added some printk and it seems that while reading file B, >> the buffer_head is created >> and read from the disk. I used to think that while we write file A, >> we will create buffer_head. And reading file B will use this >> buffer_head. >> But that is not the case actually. The only right sequence is that >> we add a sync between step 1 and 2. >> > > Um, what? > Oh, we do the check for PageDirty() in CoW, but not in straight > read... >So what do you mean here? I need to add another patch for straight read check because of reflink? I am lack of this type of technique. So please give me some tips. Thanks.> >>> Now, whether we want a reflink to force a sync or not in the >>> writeback case...on the one hand, people will probably expect write(); >>> reflink(); read() to work. On the other hand, they really should >>> write(); fsync(); reflink(); if they care. We're already dealing >>> with this regarding rename(), and its ugly. Force the fsync() for them, >>> and we help the naive caller but penalize (performance-wise) the >>> intelligent caller. >>> >> I have checked my mount mode, it is data=ordered, so we also meet >> with this issue in ordered mode. >> > > Yes, the original discussion was concerning the page cache > contents during CoW. You've coded that up correctly. Here we're > talking about simple read. > > >>> I really lean towards just requiring the caller to deal with it. >>> It's what unix-like systems do for all other operations. I poked Chris, >>> and he agrees. So a caller if they knew they had such a situation, does >>> 'fsync(); reflink();'. >>> >> So a user always need to do fsync before he do reflink. Is it OK? If >> yes, I will ask tristan to update his test case. >> > > Well... I would normally say yes, because stale vs not-stale is > one thing. But complete garbage from before the first write is not OK. >No problem, I will ask tristan to do it. And we may also need to add some explanations about it when we finally release reflink.> >>>> I think it is OK if I call ocfs2_sync_inode before reflink?(See >>>> my git tree for details). >>>> >>> Calling it unconditionally is just a performance disaster >>> waiting to happen. Even if we want to require reflink() to happen with >>> data flushed, you don't want to call ocfs2_sync_inode() in the ordered >>> case. You want the ordered journal to handle it. >>> >> Since it doesn't work in order mode which is more common for ocfs2 >> usage, do you change your mind here? >> > > I do not. Having to call ocfs2_sync_inode() for every reflink > is a disaster. We need to think about this. Basically, how can the > reflink target notice it has to either sync the source or perhaps copy > the data over? >OK, I agree with you for ocfs2_sync_inode part. I am just a little worried about the above case I mentioned, so if we can resolve it gracefully, I would be very glad about it. Regards, Tao