thr3ads.net - Btrfs devel - [PATCH] Btrfs: fix race condition between writting and scrubing supers [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Wang Shilong

2013-Oct-19 04:17 UTC

[PATCH] Btrfs: fix race condition between writting and scrubing supers

From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>

Scrubing supers is not in a transaction context, when trying to
write supers to disk, we should check if we are trying to
scrub supers.Fix it.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c     | 2 ++
 fs/btrfs/transaction.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 419968e..0debb19 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root)
 		return ret;
 	}
 
+	btrfs_scrub_pause_super(root);
 	ret = write_ctree_super(NULL, root, 0);
+	btrfs_scrub_continue_super(root);
 	return ret;
 }
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 277fe81..3ebcbbd 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle
*trans,
 		goto cleanup_transaction;
 	}
 
+	btrfs_scrub_pause_super(root);
 	ret = write_ctree_super(trans, root, 0);
+	btrfs_scrub_continue_super(root);
 	if (ret) {
 		mutex_unlock(&root->fs_info->tree_log_mutex);
 		goto cleanup_transaction;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Stefan Behrens

2013-Oct-19 08:50 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

On 10/19/2013 06:17, Wang Shilong wrote:> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>
> Scrubing supers is not in a transaction context, when trying to
> write supers to disk, we should check if we are trying to
> scrub supers.Fix it.
>
> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
> ---
>   fs/btrfs/disk-io.c     | 2 ++
>   fs/btrfs/transaction.c | 2 ++
>   2 files changed, 4 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 419968e..0debb19 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root)
>   		return ret;
>   	}
>
> +	btrfs_scrub_pause_super(root);
>   	ret = write_ctree_super(NULL, root, 0);
> +	btrfs_scrub_continue_super(root);
>   	return ret;
>   }
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 277fe81..3ebcbbd 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct
btrfs_trans_handle *trans,
>   		goto cleanup_transaction;
>   	}
>
> +	btrfs_scrub_pause_super(root);
>   	ret = write_ctree_super(trans, root, 0);
> +	btrfs_scrub_continue_super(root);
>   	if (ret) {
>   		mutex_unlock(&root->fs_info->tree_log_mutex);
>   		goto cleanup_transaction;
>
What kind of race do you see between writing the 4K superblock and scrub 
checking its checksum? Or in other words, what could happen?


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shilong Wang

2013-Oct-19 10:32 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Yeah, it did not hurt. but it may  output checksum mismatch.  For example:
Writing 4k superblock is not totally finished, but we are trying to scrub it.

Thanks,
Wang

2013/10/19, Stefan Behrens
<sbehrens@giantdisaster.de>:> On 10/19/2013 06:17, Wang Shilong wrote:
>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>
>> Scrubing supers is not in a transaction context, when trying to
>> write supers to disk, we should check if we are trying to
>> scrub supers.Fix it.
>>
>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>> ---
>>   fs/btrfs/disk-io.c     | 2 ++
>>   fs/btrfs/transaction.c | 2 ++
>>   2 files changed, 4 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 419968e..0debb19 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root *root)
>>   		return ret;
>>   	}
>>
>> +	btrfs_scrub_pause_super(root);
>>   	ret = write_ctree_super(NULL, root, 0);
>> +	btrfs_scrub_continue_super(root);
>>   	return ret;
>>   }
>>
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index 277fe81..3ebcbbd 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct
>> btrfs_trans_handle *trans,
>>   		goto cleanup_transaction;
>>   	}
>>
>> +	btrfs_scrub_pause_super(root);
>>   	ret = write_ctree_super(trans, root, 0);
>> +	btrfs_scrub_continue_super(root);
>>   	if (ret) {
>>   		mutex_unlock(&root->fs_info->tree_log_mutex);
>>   		goto cleanup_transaction;
>>
>
> What kind of race do you see between writing the 4K superblock and scrub
> checking its checksum? Or in other words, what could happen?
>
>
>--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Stefan Behrens

2013-Oct-19 14:03 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

On 10/19/2013 12:32, Shilong Wang wrote:> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>:
>> On 10/19/2013 06:17, Wang Shilong wrote:
>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>
>>> Scrubing supers is not in a transaction context, when trying to
>>> write supers to disk, we should check if we are trying to
>>> scrub supers.Fix it.
>>>
>>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>> ---
>>>    fs/btrfs/disk-io.c     | 2 ++
>>>    fs/btrfs/transaction.c | 2 ++
>>>    2 files changed, 4 insertions(+)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 419968e..0debb19 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root
*root)
>>>    		return ret;
>>>    	}
>>>
>>> +	btrfs_scrub_pause_super(root);
>>>    	ret = write_ctree_super(NULL, root, 0);
>>> +	btrfs_scrub_continue_super(root);
>>>    	return ret;
>>>    }
>>>
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 277fe81..3ebcbbd 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct
>>> btrfs_trans_handle *trans,
>>>    		goto cleanup_transaction;
>>>    	}
>>>
>>> +	btrfs_scrub_pause_super(root);
>>>    	ret = write_ctree_super(trans, root, 0);
>>> +	btrfs_scrub_continue_super(root);
>>>    	if (ret) {
>>>    		mutex_unlock(&root->fs_info->tree_log_mutex);
>>>    		goto cleanup_transaction;
>>>
>>
>> What kind of race do you see between writing the 4K superblock and
scrub
>> checking its checksum? Or in other words, what could happen?
 > Yeah, it did not hurt. but it may  output checksum mismatch.  For 
example:
 > Writing 4k superblock is not totally finished, but we are trying to 
scrub it.

Have you ever seen this issue?

If yes, let''s find a different solution. You scrub, let''s say,
once a
week. Scrubbing the superblock takes, let''s say, 100ms, then
it''s
finished. This short race doesn''t justify to add such code to 
btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting a 
transaction is synchronized to scrub already when the commit root is 
updated.

If this is really an issue and these 4K disk writes and reads interfere, 
let''s find a better solution please.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wang Shilong

2013-Oct-19 14:34 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

> On 10/19/2013 12:32, Shilong Wang wrote:
>> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>:
>>> On 10/19/2013 06:17, Wang Shilong wrote:
>>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>> 
>>>> Scrubing supers is not in a transaction context, when trying to
>>>> write supers to disk, we should check if we are trying to
>>>> scrub supers.Fix it.
>>>> 
>>>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>> ---
>>>>   fs/btrfs/disk-io.c     | 2 ++
>>>>   fs/btrfs/transaction.c | 2 ++
>>>>   2 files changed, 4 insertions(+)
>>>> 
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 419968e..0debb19 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root
*root)
>>>>   		return ret;
>>>>   	}
>>>> 
>>>> +	btrfs_scrub_pause_super(root);
>>>>   	ret = write_ctree_super(NULL, root, 0);
>>>> +	btrfs_scrub_continue_super(root);
>>>>   	return ret;
>>>>   }
>>>> 
>>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>>> index 277fe81..3ebcbbd 100644
>>>> --- a/fs/btrfs/transaction.c
>>>> +++ b/fs/btrfs/transaction.c
>>>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct
>>>> btrfs_trans_handle *trans,
>>>>   		goto cleanup_transaction;
>>>>   	}
>>>> 
>>>> +	btrfs_scrub_pause_super(root);
>>>>   	ret = write_ctree_super(trans, root, 0);
>>>> +	btrfs_scrub_continue_super(root);
>>>>   	if (ret) {
>>>>   		mutex_unlock(&root->fs_info->tree_log_mutex);
>>>>   		goto cleanup_transaction;
>>>> 
>>> 
>>> What kind of race do you see between writing the 4K superblock and
scrub
>>> checking its checksum? Or in other words, what could happen?
> 
> > Yeah, it did not hurt. but it may  output checksum mismatch.  For
example:
> > Writing 4k superblock is not totally finished, but we are trying to
scrub it.
> 
> Have you ever seen this issue?
No, just noticing it by accident.
> 
> If yes, let''s find a different solution. You scrub, let''s
say, once a week. Scrubbing the superblock takes, let''s say, 100ms,
then it''s finished. This short race doesn''t justify to add
such code to btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting
a transaction is synchronized to scrub already when the commit root is updated.
Agree with this point.
> 
> If this is really an issue and these 4K disk writes and reads interfere,
let''s find a better solution please.
From now, scrubing supers don''t involve writing,  so it
doesn''t hurt even if this short race really happens.
Anyway, thanks for comments .

Thanks,
Wang> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wang Shilong

2013-Oct-20 04:03 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

> On 10/19/2013 12:32, Shilong Wang wrote:
>> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>:
>>> On 10/19/2013 06:17, Wang Shilong wrote:
>>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>> 
>>>> Scrubing supers is not in a transaction context, when trying to
>>>> write supers to disk, we should check if we are trying to
>>>> scrub supers.Fix it.
>>>> 
>>>> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>> ---
>>>>   fs/btrfs/disk-io.c     | 2 ++
>>>>   fs/btrfs/transaction.c | 2 ++
>>>>   2 files changed, 4 insertions(+)
>>>> 
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 419968e..0debb19 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3582,7 +3582,9 @@ int btrfs_commit_super(struct btrfs_root
*root)
>>>>   		return ret;
>>>>   	}
>>>> 
>>>> +	btrfs_scrub_pause_super(root);
>>>>   	ret = write_ctree_super(NULL, root, 0);
>>>> +	btrfs_scrub_continue_super(root);
>>>>   	return ret;
>>>>   }
>>>> 
>>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>>> index 277fe81..3ebcbbd 100644
>>>> --- a/fs/btrfs/transaction.c
>>>> +++ b/fs/btrfs/transaction.c
>>>> @@ -1892,7 +1892,9 @@ int btrfs_commit_transaction(struct
>>>> btrfs_trans_handle *trans,
>>>>   		goto cleanup_transaction;
>>>>   	}
>>>> 
>>>> +	btrfs_scrub_pause_super(root);
>>>>   	ret = write_ctree_super(trans, root, 0);
>>>> +	btrfs_scrub_continue_super(root);
>>>>   	if (ret) {
>>>>   		mutex_unlock(&root->fs_info->tree_log_mutex);
>>>>   		goto cleanup_transaction;
>>>> 
>>> 
>>> What kind of race do you see between writing the 4K superblock and
scrub
>>> checking its checksum? Or in other words, what could happen?
> 
> > Yeah, it did not hurt. but it may  output checksum mismatch.  For
example:
> > Writing 4k superblock is not totally finished, but we are trying to
scrub it.
> 
> Have you ever seen this issue?
> 
> If yes, let''s find a different solution. You scrub, let''s
say, once a week. Scrubbing the superblock takes, let''s say, 100ms,
then it''s finished. This short race doesn''t justify to add
such code to btrfs_commit_transaction and btrfs_commit_super IMHO. And commiting
a transaction is synchronized to scrub already when the commit root is updated.
> 
> If this is really an issue and these 4K disk writes and reads interfere,
let''s find a better solution please.
How about this approach?

We let scrub_supers in a transaction context.

btrfs_join_transaction()

scrub_supers

btrfs_commit_transaction().

This is not elegant, but we can remove scrub_lock with supers(Notice, there is
another place that have used
this lock).

Thanks,
Wang> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bob Marley

2013-Oct-20 07:28 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

On 19/10/2013 16:03, Stefan Behrens wrote:> On 10/19/2013 12:32, Shilong Wang wrote:
>
> > Yeah, it did not hurt. but it may  output checksum mismatch. For 
> example:
> > Writing 4k superblock is not totally finished, but we are trying to 
> scrub it.
>
> Have you ever seen this issue?
>
...
> If this is really an issue and these 4K disk writes and reads 
> interfere, let''s find a better solution please.
>

Why don''t you scrub optimistically as is now, and then just in case of 
checksum mismatch you re-scrub in transaction context?

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Stefan Behrens

2013-Oct-22 08:37 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

On Sun, 20 Oct 2013 12:03:01 +0800, Wang Shilong wrote:>> On 10/19/2013 12:32, Shilong Wang wrote:
>>> 2013/10/19, Stefan Behrens <sbehrens@giantdisaster.de>:
>>>> On 10/19/2013 06:17, Wang Shilong wrote:
>>>>> From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
>>>>>
>>>>> Scrubing supers is not in a transaction context, when
trying to
>>>>> write supers to disk, we should check if we are trying to
>>>>> scrub supers.Fix it.
>>>>>
>>>>> Signed-off-by: Wang Shilong
<wangsl.fnst@cn.fujitsu.com>
[...]>>>> What kind of race do you see between writing the 4K superblock
and scrub
>>>> checking its checksum? Or in other words, what could happen?
>>
>>> Yeah, it did not hurt. but it may  output checksum mismatch.  For
example:
>>> Writing 4k superblock is not totally finished, but we are trying to
scrub it.
>>
>> Have you ever seen this issue?
You replied with "No, just noticing it by accident" in the mail
before.

I don''t believe that this issue can ever happen. I don''t
believe that
somewhere on the path to the flash memory, to the magnetic disc or to
the drive''s cache memory, someone interrupts a 4KB write in the middle
of operation to read from this 4KB area. This is not an issue IMHO.

>> If yes, let''s find a different solution. You scrub,
let''s say, once a week. Scrubbing the superblock takes, let''s
say, 100ms, then it''s finished. This short race doesn''t
justify to add such code to btrfs_commit_transaction and btrfs_commit_super
IMHO. And commiting a transaction is synchronized to scrub already when the
commit root is updated.
>>
>> If this is really an issue and these 4K disk writes and reads
interfere, let''s find a better solution please.
> 
> How about this approach?
> 
> We let scrub_supers in a transaction context.
> 
> btrfs_join_transaction()
> 
> scrub_supers
> 
> btrfs_commit_transaction().
> 
> This is not elegant, but we can remove scrub_lock with supers(Notice, there
is another place that have used
> this lock).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bob Marley

2013-Oct-22 16:55 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

On 22/10/2013 10:37, Stefan Behrens wrote:> I don''t believe that this issue can ever happen. I don''t
believe that
> somewhere on the path to the flash memory, to the magnetic disc or to
> the drive''s cache memory, someone interrupts a 4KB write in the
middle
> of operation to read from this 4KB area. This is not an issue IMHO.
I think I have read that unfortunately it can happen.
SAS and SATA specs for disks do not mandate that if a write is in-flight 
but still not completed, reads from the same sector should return the 
value it is being written; they can return the old value.
I also think that Linux does not check either.
Much worse, I think I have even read that two simultaneous in-flight 
writes to the same sector can be completed in any order by the disk, and 
since the write which wins is the latter being completed, this results 
in an indeterminate value persisting on that sector at the end. One 
needs to synchronize cache between the two writes to guarantee the 
outcome. Way worse is when the drives also cheat on synchronize cache, 
and that one is impossible to fix I believe.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Stefan Behrens

2013-Oct-23 17:21 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:> On 22/10/2013 10:37, Stefan Behrens wrote:
>> I don''t believe that this issue can ever happen. I
don''t believe that
>> somewhere on the path to the flash memory, to the magnetic disc or to
>> the drive''s cache memory, someone interrupts a 4KB write in
the middle
>> of operation to read from this 4KB area. This is not an issue IMHO.
> 
> I think I have read that unfortunately it can happen.
> SAS and SATA specs for disks do not mandate that if a write is in-flight
> but still not completed, reads from the same sector should return the
> value it is being written; they can return the old value.
> I also think that Linux does not check either.
If the _old_ 4KB block is returned, that''s fine and won''t
cause a
checksum error.

The patch in question addresses the case that Btrfs submits a write
request for a 4KB block, and a concurrent read request for that 4KB
block reads partially the old block and partially the new block,
resulting in a checksum error reported in the scrub statistic counters.

> Much worse, I think I have even read that two simultaneous in-flight
> writes to the same sector can be completed in any order by the disk, and
> since the write which wins is the latter being completed, this results
> in an indeterminate value persisting on that sector at the end. One
> needs to synchronize cache between the two writes to guarantee the
> outcome. Way worse is when the drives also cheat on synchronize cache,
> and that one is impossible to fix I believe.
Two simultaneous in-flight writes to the same superblock cannot happen
in Btrfs.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2013-Oct-24 10:08 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Quoting Stefan Behrens (2013-10-23 13:21:34)> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:
> > On 22/10/2013 10:37, Stefan Behrens wrote:
> >> I don''t believe that this issue can ever happen. I
don''t believe that
> >> somewhere on the path to the flash memory, to the magnetic disc or
to
> >> the drive''s cache memory, someone interrupts a 4KB write
in the middle
> >> of operation to read from this 4KB area. This is not an issue
IMHO.
> > 
> > I think I have read that unfortunately it can happen.
> > SAS and SATA specs for disks do not mandate that if a write is
in-flight
> > but still not completed, reads from the same sector should return the
> > value it is being written; they can return the old value.
> > I also think that Linux does not check either.
> 
> If the _old_ 4KB block is returned, that''s fine and won''t
cause a
> checksum error.
> 
> The patch in question addresses the case that Btrfs submits a write
> request for a 4KB block, and a concurrent read request for that 4KB
> block reads partially the old block and partially the new block,
> resulting in a checksum error reported in the scrub statistic counters.
Concurrent reads and writes to the device are completely undefined, and
Any combination of old, new, random memory corruption wouldn''t
surprise me...I''d rather avoid them ;)

Doing the transaction join during the super read is probably the least
complex choice.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Miao Xie

2013-Oct-24 10:42 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

On thu, 24 Oct 2013 06:08:42 -0400, Chris Mason wrote:> Quoting Stefan Behrens (2013-10-23 13:21:34)
>> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:
>>> On 22/10/2013 10:37, Stefan Behrens wrote:
>>>> I don''t believe that this issue can ever happen. I
don''t believe that
>>>> somewhere on the path to the flash memory, to the magnetic disc
or to
>>>> the drive''s cache memory, someone interrupts a 4KB
write in the middle
>>>> of operation to read from this 4KB area. This is not an issue
IMHO.
>>>
>>> I think I have read that unfortunately it can happen.
>>> SAS and SATA specs for disks do not mandate that if a write is
in-flight
>>> but still not completed, reads from the same sector should return
the
>>> value it is being written; they can return the old value.
>>> I also think that Linux does not check either.
>>
>> If the _old_ 4KB block is returned, that''s fine and
won''t cause a
>> checksum error.
>>
>> The patch in question addresses the case that Btrfs submits a write
>> request for a 4KB block, and a concurrent read request for that 4KB
>> block reads partially the old block and partially the new block,
>> resulting in a checksum error reported in the scrub statistic counters.
> 
> Concurrent reads and writes to the device are completely undefined, and
> Any combination of old, new, random memory corruption wouldn''t
> surprise me...I''d rather avoid them ;)
> 
> Doing the transaction join during the super read is probably the least
> complex choice.
But it can not block the log tree sync, I think using device_list_mutex is
better since we should acquire this mutex when writing the super blocks and
we are sure that the super blocks are on non-volatile media on completion
after we unlock the mutex.

Thanks
Miao
> 
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wang Shilong

2013-Oct-24 11:32 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

On 10/24/2013 06:08 PM, Chris Mason wrote:> Quoting Stefan Behrens (2013-10-23 13:21:34)
>> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:
>>> On 22/10/2013 10:37, Stefan Behrens wrote:
>>>> I don''t believe that this issue can ever happen. I
don''t believe that
>>>> somewhere on the path to the flash memory, to the magnetic disc
or to
>>>> the drive''s cache memory, someone interrupts a 4KB
write in the middle
>>>> of operation to read from this 4KB area. This is not an issue
IMHO.
>>> I think I have read that unfortunately it can happen.
>>> SAS and SATA specs for disks do not mandate that if a write is
in-flight
>>> but still not completed, reads from the same sector should return
the
>>> value it is being written; they can return the old value.
>>> I also think that Linux does not check either.
>> If the _old_ 4KB block is returned, that''s fine and
won''t cause a
>> checksum error.
>>
>> The patch in question addresses the case that Btrfs submits a write
>> request for a 4KB block, and a concurrent read request for that 4KB
>> block reads partially the old block and partially the new block,
>> resulting in a checksum error reported in the scrub statistic counters.
> Concurrent reads and writes to the device are completely undefined, and
> Any combination of old, new, random memory corruption wouldn''t
> surprise me...I''d rather avoid them ;)
>
> Doing the transaction join during the super read is probably the least
> complex choice.Yeah, by joining transaction we can solve this problem, but it is a 
little confused,
because we don''t involve writting in scrubing supers.

And the only race condition happens in commiting transaction, Miao also 
pointed out that
maybe the best way is to move btrfs_scrub_continue after 
write_ctree_super().

Thanks,
Wang> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Miao Xie

2013-Oct-25 02:14 UTC

head link

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

On 	thu, 24 Oct 2013 19:32:15 +0800, Wang Shilong wrote:> On 10/24/2013 06:08 PM, Chris Mason wrote:
>> Quoting Stefan Behrens (2013-10-23 13:21:34)
>>> On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote:
>>>> On 22/10/2013 10:37, Stefan Behrens wrote:
>>>>> I don''t believe that this issue can ever happen. I
don''t believe that
>>>>> somewhere on the path to the flash memory, to the magnetic
disc or to
>>>>> the drive''s cache memory, someone interrupts a 4KB
write in the middle
>>>>> of operation to read from this 4KB area. This is not an
issue IMHO.
>>>> I think I have read that unfortunately it can happen.
>>>> SAS and SATA specs for disks do not mandate that if a write is
in-flight
>>>> but still not completed, reads from the same sector should
return the
>>>> value it is being written; they can return the old value.
>>>> I also think that Linux does not check either.
>>> If the _old_ 4KB block is returned, that''s fine and
won''t cause a
>>> checksum error.
>>>
>>> The patch in question addresses the case that Btrfs submits a write
>>> request for a 4KB block, and a concurrent read request for that 4KB
>>> block reads partially the old block and partially the new block,
>>> resulting in a checksum error reported in the scrub statistic
counters.
>> Concurrent reads and writes to the device are completely undefined, and
>> Any combination of old, new, random memory corruption wouldn''t
>> surprise me...I''d rather avoid them ;)
>>
>> Doing the transaction join during the super read is probably the least
>> complex choice.
> Yeah, by joining transaction we can solve this problem, but it is a little
confused,
> because we don''t involve writting in scrubing supers.
> 
> And the only race condition happens in commiting transaction, Miao also
pointed out that
> maybe the best way is to move btrfs_scrub_continue after
write_ctree_super().
Sorry, My miss.

btrfs_scrub_continue() is behind write_ctree_super() all the while, so the above
problem
doesn''t exist.

Thanks
Miao
> 
> Thanks,
> Wang
>> -chris
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Maybe Matching Threads

Search for more seemingly similar threads

Btrfs devel - Oct 2013 - [PATCH] Btrfs: fix race condition between writting and scrubing supers

[PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers

Maybe Matching Threads