Wang Shilong
2013-Oct-19 04:17 UTC
[PATCH] Btrfs: fix race condition between writting and scrubing supers
From: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Scrubing supers is not in a transaction context, when trying to write supers to disk, we should check if we are trying to scrub supers.Fix it. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> --- fs/btrfs/disk-io.c | 2 ++ fs/btrfs/transaction.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 419968e..0debb19 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root) return ret; } + btrfs_scrub_pause_super(root); ret = write_ctree_super(NULL, root, 0); + btrfs_scrub_continue_super(root); return ret; } diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 277fe81..3ebcbbd 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, goto cleanup_transaction; } + btrfs_scrub_pause_super(root); ret = write_ctree_super(trans, root, 0); + btrfs_scrub_continue_super(root); if (ret) { mutex_unlock(&root->fs_info->tree_log_mutex); goto cleanup_transaction; -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2013-Oct-19 08:50 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On 10/19/2013 06:17, Wang Shilong wrote:> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com> > > Scrubing supers is not in a transaction context, when trying to > write supers to disk, we should check if we are trying to > scrub supers.Fix it. > > Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> > --- > fs/btrfs/disk-io.c | 2 ++ > fs/btrfs/transaction.c | 2 ++ > 2 files changed, 4 insertions(+) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 419968e..0debb19 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root) > return ret; > } > > + btrfs_scrub_pause_super(root); > ret = write_ctree_super(NULL, root, 0); > + btrfs_scrub_continue_super(root); > return ret; > } > > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > index 277fe81..3ebcbbd 100644 > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, > goto cleanup_transaction; > } > > + btrfs_scrub_pause_super(root); > ret = write_ctree_super(trans, root, 0); > + btrfs_scrub_continue_super(root); > if (ret) { > mutex_unlock(&root->fs_info->tree_log_mutex); > goto cleanup_transaction; >What kind of race do you see between writing the 4K superblock and scrub checking its checksum? Or in other words, what could happen? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Shilong Wang
2013-Oct-19 10:32 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
Yeah, it did not hurt. but it may output checksum mismatch. For example: Writing 4k superblock is not totally finished, but we are trying to scrub it. Thanks, Wang 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>:> On 10/19/2013 06:17, Wang Shilong wrote: >> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com> >> >> Scrubing supers is not in a transaction context, when trying to >> write supers to disk, we should check if we are trying to >> scrub supers.Fix it. >> >> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> >> --- >> fs/btrfs/disk-io.c | 2 ++ >> fs/btrfs/transaction.c | 2 ++ >> 2 files changed, 4 insertions(+) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 419968e..0debb19 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root) >> return ret; >> } >> >> + btrfs_scrub_pause_super(root); >> ret = write_ctree_super(NULL, root, 0); >> + btrfs_scrub_continue_super(root); >> return ret; >> } >> >> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c >> index 277fe81..3ebcbbd 100644 >> --- a/fs/btrfs/transaction.c >> +++ b/fs/btrfs/transaction.c >> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct >> btrfs_trans_handle *trans, >> goto cleanup_transaction; >> } >> >> + btrfs_scrub_pause_super(root); >> ret = write_ctree_super(trans, root, 0); >> + btrfs_scrub_continue_super(root); >> if (ret) { >> mutex_unlock(&root->fs_info->tree_log_mutex); >> goto cleanup_transaction; >> > > What kind of race do you see between writing the 4K superblock and scrub > checking its checksum? Or in other words, what could happen? > > >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2013-Oct-19 14:03 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On 10/19/2013 12:32, Shilong Wang wrote:> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>: >> On 10/19/2013 06:17, Wang Shilong wrote: >>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com> >>> >>> Scrubing supers is not in a transaction context, when trying to >>> write supers to disk, we should check if we are trying to >>> scrub supers.Fix it. >>> >>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> >>> --- >>> fs/btrfs/disk-io.c | 2 ++ >>> fs/btrfs/transaction.c | 2 ++ >>> 2 files changed, 4 insertions(+) >>> >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>> index 419968e..0debb19 100644 >>> --- a/fs/btrfs/disk-io.c >>> +++ b/fs/btrfs/disk-io.c >>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root) >>> return ret; >>> } >>> >>> + btrfs_scrub_pause_super(root); >>> ret = write_ctree_super(NULL, root, 0); >>> + btrfs_scrub_continue_super(root); >>> return ret; >>> } >>> >>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c >>> index 277fe81..3ebcbbd 100644 >>> --- a/fs/btrfs/transaction.c >>> +++ b/fs/btrfs/transaction.c >>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct >>> btrfs_trans_handle *trans, >>> goto cleanup_transaction; >>> } >>> >>> + btrfs_scrub_pause_super(root); >>> ret = write_ctree_super(trans, root, 0); >>> + btrfs_scrub_continue_super(root); >>> if (ret) { >>> mutex_unlock(&root->fs_info->tree_log_mutex); >>> goto cleanup_transaction; >>> >> >> What kind of race do you see between writing the 4K superblock and scrub >> checking its checksum? Or in other words, what could happen?> Yeah, it did not hurt. but it may output checksum mismatch. For example: > Writing 4k superblock is not totally finished, but we are trying to scrub it. Have you ever seen this issue? If yes, let''s find a different solution. You scrub, let''s say, once a week. Scrubbing the superblock takes, let''s say, 100ms, then it''s finished. This short race doesn''t justify to add such code to btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting a transaction is synchronized to scrub already when the commit root is updated. If this is really an issue and these 4K disk writes and reads interfere, let''s find a better solution please. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Wang Shilong
2013-Oct-19 14:34 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
> On 10/19/2013 12:32, Shilong Wang wrote: >> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>: >>> On 10/19/2013 06:17, Wang Shilong wrote: >>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com> >>>> >>>> Scrubing supers is not in a transaction context, when trying to >>>> write supers to disk, we should check if we are trying to >>>> scrub supers.Fix it. >>>> >>>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> >>>> --- >>>> fs/btrfs/disk-io.c | 2 ++ >>>> fs/btrfs/transaction.c | 2 ++ >>>> 2 files changed, 4 insertions(+) >>>> >>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>>> index 419968e..0debb19 100644 >>>> --- a/fs/btrfs/disk-io.c >>>> +++ b/fs/btrfs/disk-io.c >>>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root) >>>> return ret; >>>> } >>>> >>>> + btrfs_scrub_pause_super(root); >>>> ret = write_ctree_super(NULL, root, 0); >>>> + btrfs_scrub_continue_super(root); >>>> return ret; >>>> } >>>> >>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c >>>> index 277fe81..3ebcbbd 100644 >>>> --- a/fs/btrfs/transaction.c >>>> +++ b/fs/btrfs/transaction.c >>>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct >>>> btrfs_trans_handle *trans, >>>> goto cleanup_transaction; >>>> } >>>> >>>> + btrfs_scrub_pause_super(root); >>>> ret = write_ctree_super(trans, root, 0); >>>> + btrfs_scrub_continue_super(root); >>>> if (ret) { >>>> mutex_unlock(&root->fs_info->tree_log_mutex); >>>> goto cleanup_transaction; >>>> >>> >>> What kind of race do you see between writing the 4K superblock and scrub >>> checking its checksum? Or in other words, what could happen? > > > Yeah, it did not hurt. but it may output checksum mismatch. For example: > > Writing 4k superblock is not totally finished, but we are trying to scrub it. > > Have you ever seen this issue?No, just noticing it by accident.> > If yes, let''s find a different solution. You scrub, let''s say, once a week. Scrubbing the superblock takes, let''s say, 100ms, then it''s finished. This short race doesn''t justify to add such code to btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting a transaction is synchronized to scrub already when the commit root is updated.Agree with this point.> > If this is really an issue and these 4K disk writes and reads interfere, let''s find a better solution please.From now, scrubing supers don''t involve writing, so it doesn''t hurt even if this short race really happens. Anyway, thanks for comments . Thanks, Wang> >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Wang Shilong
2013-Oct-20 04:03 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
> On 10/19/2013 12:32, Shilong Wang wrote: >> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>: >>> On 10/19/2013 06:17, Wang Shilong wrote: >>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com> >>>> >>>> Scrubing supers is not in a transaction context, when trying to >>>> write supers to disk, we should check if we are trying to >>>> scrub supers.Fix it. >>>> >>>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> >>>> --- >>>> fs/btrfs/disk-io.c | 2 ++ >>>> fs/btrfs/transaction.c | 2 ++ >>>> 2 files changed, 4 insertions(+) >>>> >>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>>> index 419968e..0debb19 100644 >>>> --- a/fs/btrfs/disk-io.c >>>> +++ b/fs/btrfs/disk-io.c >>>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root) >>>> return ret; >>>> } >>>> >>>> + btrfs_scrub_pause_super(root); >>>> ret = write_ctree_super(NULL, root, 0); >>>> + btrfs_scrub_continue_super(root); >>>> return ret; >>>> } >>>> >>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c >>>> index 277fe81..3ebcbbd 100644 >>>> --- a/fs/btrfs/transaction.c >>>> +++ b/fs/btrfs/transaction.c >>>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct >>>> btrfs_trans_handle *trans, >>>> goto cleanup_transaction; >>>> } >>>> >>>> + btrfs_scrub_pause_super(root); >>>> ret = write_ctree_super(trans, root, 0); >>>> + btrfs_scrub_continue_super(root); >>>> if (ret) { >>>> mutex_unlock(&root->fs_info->tree_log_mutex); >>>> goto cleanup_transaction; >>>> >>> >>> What kind of race do you see between writing the 4K superblock and scrub >>> checking its checksum? Or in other words, what could happen? > > > Yeah, it did not hurt. but it may output checksum mismatch. For example: > > Writing 4k superblock is not totally finished, but we are trying to scrub it. > > Have you ever seen this issue? > > If yes, let''s find a different solution. You scrub, let''s say, once a week. Scrubbing the superblock takes, let''s say, 100ms, then it''s finished. This short race doesn''t justify to add such code to btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting a transaction is synchronized to scrub already when the commit root is updated. > > If this is really an issue and these 4K disk writes and reads interfere, let''s find a better solution please.How about this approach? We let scrub_supers in a transaction context. btrfs_join_transaction() scrub_supers btrfs_commit_transaction(). This is not elegant, but we can remove scrub_lock with supers(Notice, there is another place that have used this lock). Thanks, Wang> >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bob Marley
2013-Oct-20 07:28 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On 19/10/2013 16:03, Stefan Behrens wrote:> On 10/19/2013 12:32, Shilong Wang wrote: > > > Yeah, it did not hurt. but it may output checksum mismatch. For > example: > > Writing 4k superblock is not totally finished, but we are trying to > scrub it. > > Have you ever seen this issue? >...> If this is really an issue and these 4K disk writes and reads > interfere, let''s find a better solution please. >Why don''t you scrub optimistically as is now, and then just in case of checksum mismatch you re-scrub in transaction context? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2013-Oct-22 08:37 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On Sun, 20 Oct 2013 12:03:01 +0800, Wang Shilong wrote:>> On 10/19/2013 12:32, Shilong Wang wrote: >>> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>: >>>> On 10/19/2013 06:17, Wang Shilong wrote: >>>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com> >>>>> >>>>> Scrubing supers is not in a transaction context, when trying to >>>>> write supers to disk, we should check if we are trying to >>>>> scrub supers.Fix it. >>>>> >>>>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>[...]>>>> What kind of race do you see between writing the 4K superblock and scrub >>>> checking its checksum? Or in other words, what could happen? >> >>> Yeah, it did not hurt. but it may output checksum mismatch. For example: >>> Writing 4k superblock is not totally finished, but we are trying to scrub it. >> >> Have you ever seen this issue?You replied with "No, just noticing it by accident" in the mail before. I don''t believe that this issue can ever happen. I don''t believe that somewhere on the path to the flash memory, to the magnetic disc or to the drive''s cache memory, someone interrupts a 4KB write in the middle of operation to read from this 4KB area. This is not an issue IMHO.>> If yes, let''s find a different solution. You scrub, let''s say, once a week. Scrubbing the superblock takes, let''s say, 100ms, then it''s finished. This short race doesn''t justify to add such code to btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting a transaction is synchronized to scrub already when the commit root is updated. >> >> If this is really an issue and these 4K disk writes and reads interfere, let''s find a better solution please. > > How about this approach? > > We let scrub_supers in a transaction context. > > btrfs_join_transaction() > > scrub_supers > > btrfs_commit_transaction(). > > This is not elegant, but we can remove scrub_lock with supers(Notice, there is another place that have used > this lock).-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bob Marley
2013-Oct-22 16:55 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On 22/10/2013 10:37, Stefan Behrens wrote:> I don''t believe that this issue can ever happen. I don''t believe that > somewhere on the path to the flash memory, to the magnetic disc or to > the drive''s cache memory, someone interrupts a 4KB write in the middle > of operation to read from this 4KB area. This is not an issue IMHO.I think I have read that unfortunately it can happen. SAS and SATA specs for disks do not mandate that if a write is in-flight but still not completed, reads from the same sector should return the value it is being written; they can return the old value. I also think that Linux does not check either. Much worse, I think I have even read that two simultaneous in-flight writes to the same sector can be completed in any order by the disk, and since the write which wins is the latter being completed, this results in an indeterminate value persisting on that sector at the end. One needs to synchronize cache between the two writes to guarantee the outcome. Way worse is when the drives also cheat on synchronize cache, and that one is impossible to fix I believe. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2013-Oct-23 17:21 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:> On 22/10/2013 10:37, Stefan Behrens wrote: >> I don''t believe that this issue can ever happen. I don''t believe that >> somewhere on the path to the flash memory, to the magnetic disc or to >> the drive''s cache memory, someone interrupts a 4KB write in the middle >> of operation to read from this 4KB area. This is not an issue IMHO. > > I think I have read that unfortunately it can happen. > SAS and SATA specs for disks do not mandate that if a write is in-flight > but still not completed, reads from the same sector should return the > value it is being written; they can return the old value. > I also think that Linux does not check either.If the _old_ 4KB block is returned, that''s fine and won''t cause a checksum error. The patch in question addresses the case that Btrfs submits a write request for a 4KB block, and a concurrent read request for that 4KB block reads partially the old block and partially the new block, resulting in a checksum error reported in the scrub statistic counters.> Much worse, I think I have even read that two simultaneous in-flight > writes to the same sector can be completed in any order by the disk, and > since the write which wins is the latter being completed, this results > in an indeterminate value persisting on that sector at the end. One > needs to synchronize cache between the two writes to guarantee the > outcome. Way worse is when the drives also cheat on synchronize cache, > and that one is impossible to fix I believe.Two simultaneous in-flight writes to the same superblock cannot happen in Btrfs. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2013-Oct-24 10:08 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
Quoting Stefan Behrens (2013-10-23 13:21:34)> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote: > > On 22/10/2013 10:37, Stefan Behrens wrote: > >> I don''t believe that this issue can ever happen. I don''t believe that > >> somewhere on the path to the flash memory, to the magnetic disc or to > >> the drive''s cache memory, someone interrupts a 4KB write in the middle > >> of operation to read from this 4KB area. This is not an issue IMHO. > > > > I think I have read that unfortunately it can happen. > > SAS and SATA specs for disks do not mandate that if a write is in-flight > > but still not completed, reads from the same sector should return the > > value it is being written; they can return the old value. > > I also think that Linux does not check either. > > If the _old_ 4KB block is returned, that''s fine and won''t cause a > checksum error. > > The patch in question addresses the case that Btrfs submits a write > request for a 4KB block, and a concurrent read request for that 4KB > block reads partially the old block and partially the new block, > resulting in a checksum error reported in the scrub statistic counters.Concurrent reads and writes to the device are completely undefined, and Any combination of old, new, random memory corruption wouldn''t surprise me...I''d rather avoid them ;) Doing the transaction join during the super read is probably the least complex choice. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Oct-24 10:42 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On thu, 24 Oct 2013 06:08:42 -0400, Chris Mason wrote:> Quoting Stefan Behrens (2013-10-23 13:21:34) >> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote: >>> On 22/10/2013 10:37, Stefan Behrens wrote: >>>> I don''t believe that this issue can ever happen. I don''t believe that >>>> somewhere on the path to the flash memory, to the magnetic disc or to >>>> the drive''s cache memory, someone interrupts a 4KB write in the middle >>>> of operation to read from this 4KB area. This is not an issue IMHO. >>> >>> I think I have read that unfortunately it can happen. >>> SAS and SATA specs for disks do not mandate that if a write is in-flight >>> but still not completed, reads from the same sector should return the >>> value it is being written; they can return the old value. >>> I also think that Linux does not check either. >> >> If the _old_ 4KB block is returned, that''s fine and won''t cause a >> checksum error. >> >> The patch in question addresses the case that Btrfs submits a write >> request for a 4KB block, and a concurrent read request for that 4KB >> block reads partially the old block and partially the new block, >> resulting in a checksum error reported in the scrub statistic counters. > > Concurrent reads and writes to the device are completely undefined, and > Any combination of old, new, random memory corruption wouldn''t > surprise me...I''d rather avoid them ;) > > Doing the transaction join during the super read is probably the least > complex choice.But it can not block the log tree sync, I think using device_list_mutex is better since we should acquire this mutex when writing the super blocks and we are sure that the super blocks are on non-volatile media on completion after we unlock the mutex. Thanks Miao> > -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Wang Shilong
2013-Oct-24 11:32 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On 10/24/2013 06:08 PM, Chris Mason wrote:> Quoting Stefan Behrens (2013-10-23 13:21:34) >> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote: >>> On 22/10/2013 10:37, Stefan Behrens wrote: >>>> I don''t believe that this issue can ever happen. I don''t believe that >>>> somewhere on the path to the flash memory, to the magnetic disc or to >>>> the drive''s cache memory, someone interrupts a 4KB write in the middle >>>> of operation to read from this 4KB area. This is not an issue IMHO. >>> I think I have read that unfortunately it can happen. >>> SAS and SATA specs for disks do not mandate that if a write is in-flight >>> but still not completed, reads from the same sector should return the >>> value it is being written; they can return the old value. >>> I also think that Linux does not check either. >> If the _old_ 4KB block is returned, that''s fine and won''t cause a >> checksum error. >> >> The patch in question addresses the case that Btrfs submits a write >> request for a 4KB block, and a concurrent read request for that 4KB >> block reads partially the old block and partially the new block, >> resulting in a checksum error reported in the scrub statistic counters. > Concurrent reads and writes to the device are completely undefined, and > Any combination of old, new, random memory corruption wouldn''t > surprise me...I''d rather avoid them ;) > > Doing the transaction join during the super read is probably the least > complex choice.Yeah, by joining transaction we can solve this problem, but it is a little confused, because we don''t involve writting in scrubing supers. And the only race condition happens in commiting transaction, Miao also pointed out that maybe the best way is to move btrfs_scrub_continue after write_ctree_super(). Thanks, Wang> -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2013-Oct-25 02:14 UTC
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On thu, 24 Oct 2013 19:32:15 +0800, Wang Shilong wrote:> On 10/24/2013 06:08 PM, Chris Mason wrote: >> Quoting Stefan Behrens (2013-10-23 13:21:34) >>> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote: >>>> On 22/10/2013 10:37, Stefan Behrens wrote: >>>>> I don''t believe that this issue can ever happen. I don''t believe that >>>>> somewhere on the path to the flash memory, to the magnetic disc or to >>>>> the drive''s cache memory, someone interrupts a 4KB write in the middle >>>>> of operation to read from this 4KB area. This is not an issue IMHO. >>>> I think I have read that unfortunately it can happen. >>>> SAS and SATA specs for disks do not mandate that if a write is in-flight >>>> but still not completed, reads from the same sector should return the >>>> value it is being written; they can return the old value. >>>> I also think that Linux does not check either. >>> If the _old_ 4KB block is returned, that''s fine and won''t cause a >>> checksum error. >>> >>> The patch in question addresses the case that Btrfs submits a write >>> request for a 4KB block, and a concurrent read request for that 4KB >>> block reads partially the old block and partially the new block, >>> resulting in a checksum error reported in the scrub statistic counters. >> Concurrent reads and writes to the device are completely undefined, and >> Any combination of old, new, random memory corruption wouldn''t >> surprise me...I''d rather avoid them ;) >> >> Doing the transaction join during the super read is probably the least >> complex choice. > Yeah, by joining transaction we can solve this problem, but it is a little confused, > because we don''t involve writting in scrubing supers. > > And the only race condition happens in commiting transaction, Miao also pointed out that > maybe the best way is to move btrfs_scrub_continue after write_ctree_super().Sorry, My miss. btrfs_scrub_continue() is behind write_ctree_super() all the while, so the above problem doesn''t exist. Thanks Miao> > Thanks, > Wang >> -chris >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Seemingly Similar Threads
- [RFC PATCH 4/4 v2] Btrfs: deal with filesystem state at mount, umount
- btrfs qgroup assign -> "ERROR: bad relation requested"
- [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
- [PATCH v4] btrfs-progs: Fix a segfault when using btrfs-corrupt-block with "-U"
- [PATCH] Btrfs: fix passing wrong arg gfp_t to decide the correct allocation mode