thr3ads.net - Btrfs devel - [PATCH] rwsem: add rwsem_is

If this information is useful, please help other people find it:
Share via:

Josef Bacik

2013-Aug-30 14:14 UTC

[PATCH] rwsem: add rwsem_is_contended

Btrfs uses an rwsem to control access to its extent tree.  Threads will hold a
read lock on this rwsem while they scan the extent tree, and if need_resched()
they will drop the lock and schedule.  The transaction commit needs to take a
write lock for this rwsem for a very short period to switch out the commit
roots.  If there are a lot of threads doing this caching operation we can starve
out the committers which slows everybody out.  To address this we want to add
this functionality to see if our rwsem has anybody waiting to take a write lock
so we can drop it and schedule for a bit to allow the commit to continue.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
---
I''ve cc''ed people who seemed like they may be in
charge/familiar with this code,
hopefully I got the right people.

 include/linux/rwsem.h |    1 +
 lib/rwsem.c           |   17 +++++++++++++++++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 0616ffe..61a8802 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -35,6 +35,7 @@ extern struct rw_semaphore *rwsem_down_read_failed(struct
rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
+extern int rwsem_is_contended(struct rw_semaphore *sem);
 
 /* Include the arch specific part */
 #include <asm/rwsem.h>
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 19c5fa9..20858cd 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -287,6 +287,23 @@ struct rw_semaphore *rwsem_downgrade_wake(struct
rw_semaphore *sem)
 	return sem;
 }
 
+/*
+ * check to see if the rwsem we''re holding has anybody waiting to
acquire it.
+ */
+int rwsem_is_contended(struct rw_semaphore *sem)
+{
+	int ret = 0;
+	unsigned long flags;
+
+	if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags))
+		return 1;
+	if (!list_empty(&sem->wait_list))
+		ret = 1;
+	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+	return ret;
+}
+
+EXPORT_SYMBOL(rwsem_is_contended);
 EXPORT_SYMBOL(rwsem_down_read_failed);
 EXPORT_SYMBOL(rwsem_down_write_failed);
 EXPORT_SYMBOL(rwsem_wake);
-- 
1.7.7.6

Peter Zijlstra

2013-Aug-31 14:51 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On Fri, Aug 30, 2013 at 10:14:01AM -0400, Josef Bacik
wrote:> Btrfs uses an rwsem to control access to its extent tree.  Threads will
hold a
> read lock on this rwsem while they scan the extent tree, and if
need_resched()
> they will drop the lock and schedule.  The transaction commit needs to take
a
> write lock for this rwsem for a very short period to switch out the commit
> roots.  If there are a lot of threads doing this caching operation we can
starve
> out the committers which slows everybody out.  To address this we want to
add
> this functionality to see if our rwsem has anybody waiting to take a write
lock
> so we can drop it and schedule for a bit to allow the commit to continue.
> +/*
> + * check to see if the rwsem we''re holding has anybody waiting to
acquire it.
> + */
> +int rwsem_is_contended(struct rw_semaphore *sem)
> +{
> +	int ret = 0;
> +	unsigned long flags;
> +
> +	if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags))
> +		return 1;
> +	if (!list_empty(&sem->wait_list))
> +		ret = 1;
> +	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
> +	return ret;
> +}
> +
> +EXPORT_SYMBOL(rwsem_is_contended);
Modeled after spin_is_contended(), so no problem with that. One thing I
was wondering about is if it wants to be called
rwsem_is_write_contended() or similar, since it explicitly only tests
for pending writers.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michel Lespinasse

2013-Sep-01 08:32 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

Hi Josef,

On Fri, Aug 30, 2013 at 7:14 AM, Josef Bacik <jbacik@fusionio.com>
wrote:> Btrfs uses an rwsem to control access to its extent tree.  Threads will
hold a
> read lock on this rwsem while they scan the extent tree, and if
need_resched()
> they will drop the lock and schedule.  The transaction commit needs to take
a
> write lock for this rwsem for a very short period to switch out the commit
> roots.  If there are a lot of threads doing this caching operation we can
starve
> out the committers which slows everybody out.  To address this we want to
add
> this functionality to see if our rwsem has anybody waiting to take a write
lock
> so we can drop it and schedule for a bit to allow the commit to continue.
> Thanks,
>
> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
FYI, I once tried to introduce something like this before, but my use
case was pretty weak so it was not accepted at the time. I don''t think
there were any objections to the API itself though, and I think it''s
potentially a good idea if you use case justifies it.

Two comments:

- Note that there are two rwsem implementations - if you are going to
add functionality to rwsem.h you want to add the same functionality in
rwsem-spinlock.h as well.

- I would prefer if you could avoid taking the wait_lock in your
rwsem.h implementation. In your use case (read lock is known to be
held), checking for sem->count < 0 would be sufficient to indicate a
writer is queued (or getting onto the queue). In the general case,
some architectures have the various values set up so that
RWSEM_WAITING_BIAS != RWSEM_ACTIVE_WRITE_BIAS - for these
architectures at least, you can check for waiters by looking if the
lowest bit of RWSEM_WAITING_BIAS is set in sem->count.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

Peter Hurley

2013-Sep-02 17:18 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On 09/01/2013 04:32 AM, Michel Lespinasse wrote:> Hi Josef,
>
> On Fri, Aug 30, 2013 at 7:14 AM, Josef Bacik <jbacik@fusionio.com>
wrote:
>> Btrfs uses an rwsem to control access to its extent tree.  Threads will
hold a
>> read lock on this rwsem while they scan the extent tree, and if
need_resched()
>> they will drop the lock and schedule.  The transaction commit needs to
take a
>> write lock for this rwsem for a very short period to switch out the
commit
>> roots.  If there are a lot of threads doing this caching operation we
can starve
>> out the committers which slows everybody out.  To address this we want
to add
>> this functionality to see if our rwsem has anybody waiting to take a
write lock
>> so we can drop it and schedule for a bit to allow the commit to
continue.
>> Thanks,
>>
>> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
>
> FYI, I once tried to introduce something like this before, but my use
> case was pretty weak so it was not accepted at the time. I don''t
think
> there were any objections to the API itself though, and I think
it''s
> potentially a good idea if you use case justifies it.
Exactly, I''m concerned about the use case: readers can''t
starve writers.
Of course, lots of existing readers can temporarily prevent a writer from
acquiring, but those readers would already have the lock. Any new readers
wouldn''t be able to prevent a waiting writer from obtaining the lock.

Josef,
Could you be more explicit, maybe with some detailed numbers about the
condition you report?

I say that because a subtle bug that could mistakenly wait a reader
existed in the rwsem implementation until relatively recently. Is there
some other lurking problem?
> Two comments:
>
> - Note that there are two rwsem implementations - if you are going to
> add functionality to rwsem.h you want to add the same functionality in
> rwsem-spinlock.h as well.
>
> - I would prefer if you could avoid taking the wait_lock in your
> rwsem.h implementation. In your use case (read lock is known to be
> held), checking for sem->count < 0 would be sufficient to indicate a
> writer is queued (or getting onto the queue). In the general case,
> some architectures have the various values set up so that
> RWSEM_WAITING_BIAS != RWSEM_ACTIVE_WRITE_BIAS - for these
> architectures at least, you can check for waiters by looking if the
> lowest bit of RWSEM_WAITING_BIAS is set in sem->count.
Michel,

I''m glad you point out a much better approach --- but why are we
considering open-coding down_read_trylock()/down_write_trylock?

Regards,
Peter Hurley

Josef Bacik

2013-Sep-03 13:18 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On Mon, Sep 02, 2013 at 01:18:08PM -0400, Peter Hurley
wrote:> On 09/01/2013 04:32 AM, Michel Lespinasse wrote:
> >Hi Josef,
> >
> >On Fri, Aug 30, 2013 at 7:14 AM, Josef Bacik
<jbacik@fusionio.com> wrote:
> >>Btrfs uses an rwsem to control access to its extent tree.  Threads
will hold a
> >>read lock on this rwsem while they scan the extent tree, and if
need_resched()
> >>they will drop the lock and schedule.  The transaction commit needs
to take a
> >>write lock for this rwsem for a very short period to switch out the
commit
> >>roots.  If there are a lot of threads doing this caching operation
we can starve
> >>out the committers which slows everybody out.  To address this we
want to add
> >>this functionality to see if our rwsem has anybody waiting to take
a write lock
> >>so we can drop it and schedule for a bit to allow the commit to
continue.
> >>Thanks,
> >>
> >>Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> >
> >FYI, I once tried to introduce something like this before, but my use
> >case was pretty weak so it was not accepted at the time. I
don''t think
> >there were any objections to the API itself though, and I think
it''s
> >potentially a good idea if you use case justifies it.
> 
> Exactly, I''m concerned about the use case: readers can''t
starve writers.
> Of course, lots of existing readers can temporarily prevent a writer from
> acquiring, but those readers would already have the lock. Any new readers
> wouldn''t be able to prevent a waiting writer from obtaining the
lock.
> 
> Josef,
> Could you be more explicit, maybe with some detailed numbers about the
> condition you report?
> 
Sure, this came from a community member

http://article.gmane.org/gmane.comp.file-systems.btrfs/28081

With the old approach we could block between 1-2 seconds waiting for this rwsem,
and with the new approach where we allow many more of these caching threads we
were staving out the writer for 80 seconds.

So what happens is these threads will scan our extent tree to put together the
free space cache, and they''ll hold this lock while they are doing the
scanning.
The only way they will drop this lock is if we hit need_resched(), but because
these threads are going to do quite a bit of IO I imagine we''re not
ever being
flagged with need_resched() because we schedule while waiting for IO.  So these
threads will hold onto this lock for bloody ever without giving it up so the
committer can take the write lock.  His patch to "fix" the problem was
to have
an atomic that let us know somebody was waiting for a write lock and then
we''d
drop the reader lock and schedule.

So really we''re just using a rwsem in a really mean way for writers. 
I''m open
to other suggestions but I think this probably the cleanest way.  Thanks,

Josef

Josef Bacik

2013-Sep-03 15:47 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On Sun, Sep 01, 2013 at 01:32:36AM -0700, Michel Lespinasse
wrote:> Hi Josef,
> 
> On Fri, Aug 30, 2013 at 7:14 AM, Josef Bacik <jbacik@fusionio.com>
wrote:
> > Btrfs uses an rwsem to control access to its extent tree.  Threads
will hold a
> > read lock on this rwsem while they scan the extent tree, and if
need_resched()
> > they will drop the lock and schedule.  The transaction commit needs to
take a
> > write lock for this rwsem for a very short period to switch out the
commit
> > roots.  If there are a lot of threads doing this caching operation we
can starve
> > out the committers which slows everybody out.  To address this we want
to add
> > this functionality to see if our rwsem has anybody waiting to take a
write lock
> > so we can drop it and schedule for a bit to allow the commit to
continue.
> > Thanks,
> >
> > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> 
> FYI, I once tried to introduce something like this before, but my use
> case was pretty weak so it was not accepted at the time. I don''t
think
> there were any objections to the API itself though, and I think
it''s
> potentially a good idea if you use case justifies it.
> 
> Two comments:
> 
> - Note that there are two rwsem implementations - if you are going to
> add functionality to rwsem.h you want to add the same functionality in
> rwsem-spinlock.h as well.
> 
Sure thing.
> - I would prefer if you could avoid taking the wait_lock in your
> rwsem.h implementation. In your use case (read lock is known to be
> held), checking for sem->count < 0 would be sufficient to indicate a
> writer is queued (or getting onto the queue). In the general case,
> some architectures have the various values set up so that
> RWSEM_WAITING_BIAS != RWSEM_ACTIVE_WRITE_BIAS - for these
> architectures at least, you can check for waiters by looking if the
> lowest bit of RWSEM_WAITING_BIAS is set in sem->count.
Question about this one, I can''t just do

if (sem->count < 0)

since each arch has their own atomic way of looking at count, so I''d
have to add
something to do just a normal read of count for each arch and call that
wouldn''t
I?  If that''s what you want me to do then I''m fine with that
(though I''ll need a
really thorough review), just want to double check before I make a bunch of
extra work for myself.  Thanks,

Josef

Josef Bacik

2013-Sep-03 15:49 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On Sat, Aug 31, 2013 at 04:51:36PM +0200, Peter Zijlstra
wrote:> On Fri, Aug 30, 2013 at 10:14:01AM -0400, Josef Bacik wrote:
> > Btrfs uses an rwsem to control access to its extent tree.  Threads
will hold a
> > read lock on this rwsem while they scan the extent tree, and if
need_resched()
> > they will drop the lock and schedule.  The transaction commit needs to
take a
> > write lock for this rwsem for a very short period to switch out the
commit
> > roots.  If there are a lot of threads doing this caching operation we
can starve
> > out the committers which slows everybody out.  To address this we want
to add
> > this functionality to see if our rwsem has anybody waiting to take a
write lock
> > so we can drop it and schedule for a bit to allow the commit to
continue.
> 
> 
> > +/*
> > + * check to see if the rwsem we''re holding has anybody
waiting to acquire it.
> > + */
> > +int rwsem_is_contended(struct rw_semaphore *sem)
> > +{
> > +	int ret = 0;
> > +	unsigned long flags;
> > +
> > +	if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags))
> > +		return 1;
> > +	if (!list_empty(&sem->wait_list))
> > +		ret = 1;
> > +	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
> > +	return ret;
> > +}
> > +
> > +EXPORT_SYMBOL(rwsem_is_contended);
> 
> Modeled after spin_is_contended(), so no problem with that. One thing I
> was wondering about is if it wants to be called
> rwsem_is_write_contended() or similar, since it explicitly only tests
> for pending writers.
> 
Well it checks all pending waiters, the waiters list isn''t split
between
readers waiters and write waiters, so people holding the write lock could call
this to see if readers are waiting on the lock.  Thanks,

Josef

Peter Hurley

2013-Sep-04 11:46 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On 09/03/2013 09:18 AM, Josef Bacik wrote:> On Mon, Sep 02, 2013 at 01:18:08PM -0400, Peter Hurley wrote:
>> On 09/01/2013 04:32 AM, Michel Lespinasse wrote:
>>> Hi Josef,
>>>
>>> On Fri, Aug 30, 2013 at 7:14 AM, Josef Bacik
<jbacik@fusionio.com> wrote:
>>>> Btrfs uses an rwsem to control access to its extent tree. 
Threads will hold a
>>>> read lock on this rwsem while they scan the extent tree, and if
need_resched()
>>>> they will drop the lock and schedule.  The transaction commit
needs to take a
>>>> write lock for this rwsem for a very short period to switch out
the commit
>>>> roots.  If there are a lot of threads doing this caching
operation we can starve
>>>> out the committers which slows everybody out.  To address this
we want to add
>>>> this functionality to see if our rwsem has anybody waiting to
take a write lock
>>>> so we can drop it and schedule for a bit to allow the commit to
continue.
>>>> Thanks,
>>>>
>>>> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
>>>
>>> FYI, I once tried to introduce something like this before, but my
use
>>> case was pretty weak so it was not accepted at the time. I
don''t think
>>> there were any objections to the API itself though, and I think
it''s
>>> potentially a good idea if you use case justifies it.
>>
>> Exactly, I''m concerned about the use case: readers
can''t starve writers.
>> Of course, lots of existing readers can temporarily prevent a writer
from
>> acquiring, but those readers would already have the lock. Any new
readers
>> wouldn''t be able to prevent a waiting writer from obtaining
the lock.
>>
>> Josef,
>> Could you be more explicit, maybe with some detailed numbers about the
>> condition you report?
>>
>
> Sure, this came from a community member
>
> http://article.gmane.org/gmane.comp.file-systems.btrfs/28081
>
> With the old approach we could block between 1-2 seconds waiting for this
rwsem,
> and with the new approach where we allow many more of these caching threads
we
> were staving out the writer for 80 seconds.
>
> So what happens is these threads will scan our extent tree to put together
the
> free space cache, and they''ll hold this lock while they are doing
the scanning.
> The only way they will drop this lock is if we hit need_resched(), but
because
> these threads are going to do quite a bit of IO I imagine we''re
not ever being
> flagged with need_resched() because we schedule while waiting for IO.  So
these
> threads will hold onto this lock for bloody ever without giving it up so
the
> committer can take the write lock.  His patch to "fix" the
problem was to have
> an atomic that let us know somebody was waiting for a write lock and then
we''d
> drop the reader lock and schedule.
Thanks for the additional clarification.
> So really we''re just using a rwsem in a really mean way for
writers.  I''m open
> to other suggestions but I think this probably the cleanest way.
Is there substantial saved state at the point where the caching thread is
checking need_resched() that precludes dropping and reacquiring the
extent_commit_sem (or before find_next_key())?  Not that it''s a cleaner
solution; just want to understand better the situation.

Regards,
Peter Hurley

Peter Hurley

2013-Sep-04 12:11 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On 09/03/2013 11:47 AM, Josef Bacik wrote:> On Sun, Sep 01, 2013 at 01:32:36AM -0700, Michel Lespinasse wrote:
>> Hi Josef,
>>
>> On Fri, Aug 30, 2013 at 7:14 AM, Josef Bacik
<jbacik@fusionio.com> wrote:
>>> Btrfs uses an rwsem to control access to its extent tree.  Threads
will hold a
>>> read lock on this rwsem while they scan the extent tree, and if
need_resched()
>>> they will drop the lock and schedule.  The transaction commit needs
to take a
>>> write lock for this rwsem for a very short period to switch out the
commit
>>> roots.  If there are a lot of threads doing this caching operation
we can starve
>>> out the committers which slows everybody out.  To address this we
want to add
>>> this functionality to see if our rwsem has anybody waiting to take
a write lock
>>> so we can drop it and schedule for a bit to allow the commit to
continue.
>>> Thanks,
>>>
>>> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
>>
>> FYI, I once tried to introduce something like this before, but my use
>> case was pretty weak so it was not accepted at the time. I
don''t think
>> there were any objections to the API itself though, and I think
it''s
>> potentially a good idea if you use case justifies it.
>>
>> Two comments:
>>
>> - Note that there are two rwsem implementations - if you are going to
>> add functionality to rwsem.h you want to add the same functionality in
>> rwsem-spinlock.h as well.
>>
>
> Sure thing.
>
>> - I would prefer if you could avoid taking the wait_lock in your
>> rwsem.h implementation. In your use case (read lock is known to be
>> held), checking for sem->count < 0 would be sufficient to
indicate a
>> writer is queued (or getting onto the queue). In the general case,
>> some architectures have the various values set up so that
>> RWSEM_WAITING_BIAS != RWSEM_ACTIVE_WRITE_BIAS - for these
>> architectures at least, you can check for waiters by looking if the
>> lowest bit of RWSEM_WAITING_BIAS is set in sem->count.
>
> Question about this one, I can''t just do
>
> if (sem->count < 0)
>
> since each arch has their own atomic way of looking at count, so
I''d have to add
> something to do just a normal read of count for each arch and call that
wouldn''t
> I?
Reading sem->count is atomic.

For that matter, in your particular use case (which is more heuristic), you
could perform the list_empty() check without acquiring the wait_lock.

Regards,
Peter Hurley



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Sep-04 12:13 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On Wed, Sep 04, 2013 at 07:46:56AM -0400, Peter Hurley
wrote:> On 09/03/2013 09:18 AM, Josef Bacik wrote:
> >On Mon, Sep 02, 2013 at 01:18:08PM -0400, Peter Hurley wrote:
> >>On 09/01/2013 04:32 AM, Michel Lespinasse wrote:
> >>>Hi Josef,
> >>>
> >>>On Fri, Aug 30, 2013 at 7:14 AM, Josef Bacik
<jbacik@fusionio.com> wrote:
> >>>>Btrfs uses an rwsem to control access to its extent tree. 
Threads will hold a
> >>>>read lock on this rwsem while they scan the extent tree,
and if need_resched()
> >>>>they will drop the lock and schedule.  The transaction
commit needs to take a
> >>>>write lock for this rwsem for a very short period to switch
out the commit
> >>>>roots.  If there are a lot of threads doing this caching
operation we can starve
> >>>>out the committers which slows everybody out.  To address
this we want to add
> >>>>this functionality to see if our rwsem has anybody waiting
to take a write lock
> >>>>so we can drop it and schedule for a bit to allow the
commit to continue.
> >>>>Thanks,
> >>>>
> >>>>Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> >>>
> >>>FYI, I once tried to introduce something like this before, but
my use
> >>>case was pretty weak so it was not accepted at the time. I
don''t think
> >>>there were any objections to the API itself though, and I think
it''s
> >>>potentially a good idea if you use case justifies it.
> >>
> >>Exactly, I''m concerned about the use case: readers
can''t starve writers.
> >>Of course, lots of existing readers can temporarily prevent a
writer from
> >>acquiring, but those readers would already have the lock. Any new
readers
> >>wouldn''t be able to prevent a waiting writer from
obtaining the lock.
> >>
> >>Josef,
> >>Could you be more explicit, maybe with some detailed numbers about
the
> >>condition you report?
> >>
> >
> >Sure, this came from a community member
> >
> >http://article.gmane.org/gmane.comp.file-systems.btrfs/28081
> >
> >With the old approach we could block between 1-2 seconds waiting for
this rwsem,
> >and with the new approach where we allow many more of these caching
threads we
> >were staving out the writer for 80 seconds.
> >
> >So what happens is these threads will scan our extent tree to put
together the
> >free space cache, and they''ll hold this lock while they are
doing the scanning.
> >The only way they will drop this lock is if we hit need_resched(), but
because
> >these threads are going to do quite a bit of IO I imagine
we''re not ever being
> >flagged with need_resched() because we schedule while waiting for IO. 
So these
> >threads will hold onto this lock for bloody ever without giving it up
so the
> >committer can take the write lock.  His patch to "fix" the
problem was to have
> >an atomic that let us know somebody was waiting for a write lock and
then we''d
> >drop the reader lock and schedule.
> 
> Thanks for the additional clarification.
> 
> >So really we''re just using a rwsem in a really mean way for
writers.  I''m open
> >to other suggestions but I think this probably the cleanest way.
> 
> Is there substantial saved state at the point where the caching thread is
> checking need_resched() that precludes dropping and reacquiring the
> extent_commit_sem (or before find_next_key())?  Not that it''s a
cleaner
> solution; just want to understand better the situation.
>
Yes I had thought of just dropping our locks everytime we had to do
find_next_key() but that isn''t going to work.  We do have to save state
but
that''s not the hard part, it''s the fact that we could race
with the committing
transaction and lose space.  So what would happen is something like this

caching_thread:
	save last cached offset
	drop locks
	find_next_key
		get a ref on the current commit root
		search down to the next leaf
	re-take locks
	process leaf

transaction committer:
	acquire locks
	swap commit root
	write transaction
	unpin all extents up to the last saved cached offset

So if the caching thread grabs a ref on the commit root before the transaction
committer swaps out the commit root we are dealing with too old of a tree.  So
say the leaf we''re going to process next has data that was
free''ed (and
therefore would have been unpinned) during that transaction, because it is after
our last cached offset we don''t unpin it because we know that the
caching thread
is going to find the leaf where that extent is not there and add free space for
that extent.  However we got the leaf from two transactions ago which will still
show that extent in use, so we won''t add free space for it, which will
cause us
to leak the extent.  We need to make sure we are always consistently on the
previous extent root which is why we hold this lock.

I hope that makes sense.  Thanks,

Josef

Andrew Morton

2013-Sep-16 23:05 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On Fri, 30 Aug 2013 10:14:01 -0400 Josef Bacik <jbacik@fusionio.com>
wrote:
> Btrfs uses an rwsem to control access to its extent tree.  Threads will
hold a
> read lock on this rwsem while they scan the extent tree, and if
need_resched()
> they will drop the lock and schedule.  The transaction commit needs to take
a
> write lock for this rwsem for a very short period to switch out the commit
> roots.  If there are a lot of threads doing this caching operation we can
starve
> out the committers which slows everybody out.  To address this we want to
add
> this functionality to see if our rwsem has anybody waiting to take a write
lock
> so we can drop it and schedule for a bit to allow the commit to continue.
> Thanks,
> 
This sounds rather nasty and hacky.  Rather then working around a
locking shortcoming in a caller it would be better to fix/enhance the
core locking code.  What would such a change need to do?

Presently rwsem waiters are fifo-queued, are they not?  So the commit
thread will eventually get that lock.  Apparently that''s not working
adequately for you but I don''t fully understand what it is about these
dynamics which is causing observable problems.
> I''ve cc''ed people who seemed like they may be in
charge/familiar with this code,
> hopefully I got the right people.
> 
>  include/linux/rwsem.h |    1 +
>  lib/rwsem.c           |   17 +++++++++++++++++
This will break CONFIG_RWSEM_GENERIC_SPINLOCK=n?

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Sep-17 00:05 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On Mon, Sep 16, 2013 at 04:05:47PM -0700, Andrew Morton
wrote:> On Fri, 30 Aug 2013 10:14:01 -0400 Josef Bacik <jbacik@fusionio.com>
wrote:
> 
> > Btrfs uses an rwsem to control access to its extent tree.  Threads
will hold a
> > read lock on this rwsem while they scan the extent tree, and if
need_resched()
> > they will drop the lock and schedule.  The transaction commit needs to
take a
> > write lock for this rwsem for a very short period to switch out the
commit
> > roots.  If there are a lot of threads doing this caching operation we
can starve
> > out the committers which slows everybody out.  To address this we want
to add
> > this functionality to see if our rwsem has anybody waiting to take a
write lock
> > so we can drop it and schedule for a bit to allow the commit to
continue.
> > Thanks,
> > 
> 
> This sounds rather nasty and hacky.  Rather then working around a
> locking shortcoming in a caller it would be better to fix/enhance the
> core locking code.  What would such a change need to do?
> 
> Presently rwsem waiters are fifo-queued, are they not?  So the commit
> thread will eventually get that lock.  Apparently that''s not
working
> adequately for you but I don''t fully understand what it is about
these
> dynamics which is causing observable problems.
> 
So the problem is not that its normal lock starvation, it''s more our
particular
use case that is causing the starvation.  We can have lots of people holding
readers and simply never give them up for long periods of time, which is why we
need this is_contended helper so we know to drop things and let the committer
through.  Thanks,

Josef

David Daney

2013-Sep-17 00:29 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On 09/16/2013 05:05 PM, Josef Bacik wrote:> On Mon, Sep 16, 2013 at 04:05:47PM -0700, Andrew Morton wrote:
>> On Fri, 30 Aug 2013 10:14:01 -0400 Josef Bacik
<jbacik@fusionio.com> wrote:
>>
>>> Btrfs uses an rwsem to control access to its extent tree.  Threads
will hold a
>>> read lock on this rwsem while they scan the extent tree, and if
need_resched()
>>> they will drop the lock and schedule.  The transaction commit needs
to take a
>>> write lock for this rwsem for a very short period to switch out the
commit
>>> roots.  If there are a lot of threads doing this caching operation
we can starve
>>> out the committers which slows everybody out.  To address this we
want to add
>>> this functionality to see if our rwsem has anybody waiting to take
a write lock
>>> so we can drop it and schedule for a bit to allow the commit to
continue.
>>> Thanks,
>>>
>>
>> This sounds rather nasty and hacky.  Rather then working around a
>> locking shortcoming in a caller it would be better to fix/enhance the
>> core locking code.  What would such a change need to do?
>>
>> Presently rwsem waiters are fifo-queued, are they not?  So the commit
>> thread will eventually get that lock.  Apparently that''s not
working
>> adequately for you but I don''t fully understand what it is
about these
>> dynamics which is causing observable problems.
>>
>
> So the problem is not that its normal lock starvation, it''s more
our particular
> use case that is causing the starvation.  We can have lots of people
holding
> readers and simply never give them up for long periods of time, which is
why we
> need this is_contended helper so we know to drop things and let the
committer
> through.  Thanks,
You could easily achieve the same thing by putting an "is_contending" 
flag in parallel with the rwsem and testing that:

DECLARE_RWSEM(foo);
atomic_t is_contended =  ATOMIC_INIT(0);
.
.
.

    /* writing context */
    atomic_inc(&is_contended);
    down_write(&foo);
    do_writing_action();
    up_write(&foo);
    atomic_dec(&is_contended);


    /* reading context */
    down_read(&foo);
    while (!atomic_read(&is_contended))
       do_reading_actions();
    up_read(&foo);


David Daney

Peter Hurley

2013-Sep-17 00:37 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On 09/16/2013 08:29 PM, David Daney wrote:> On 09/16/2013 05:05 PM, Josef Bacik wrote:
>> On Mon, Sep 16, 2013 at 04:05:47PM -0700, Andrew Morton wrote:
>>> On Fri, 30 Aug 2013 10:14:01 -0400 Josef Bacik
<jbacik@fusionio.com> wrote:
>>>
>>>> Btrfs uses an rwsem to control access to its extent tree. 
Threads will hold a
>>>> read lock on this rwsem while they scan the extent tree, and if
need_resched()
>>>> they will drop the lock and schedule.  The transaction commit
needs to take a
>>>> write lock for this rwsem for a very short period to switch out
the commit
>>>> roots.  If there are a lot of threads doing this caching
operation we can starve
>>>> out the committers which slows everybody out.  To address this
we want to add
>>>> this functionality to see if our rwsem has anybody waiting to
take a write lock
>>>> so we can drop it and schedule for a bit to allow the commit to
continue.
>>>> Thanks,
>>>>
>>>
>>> This sounds rather nasty and hacky.  Rather then working around a
>>> locking shortcoming in a caller it would be better to fix/enhance
the
>>> core locking code.  What would such a change need to do?
>>>
>>> Presently rwsem waiters are fifo-queued, are they not?  So the
commit
>>> thread will eventually get that lock.  Apparently that''s
not working
>>> adequately for you but I don''t fully understand what it is
about these
>>> dynamics which is causing observable problems.
>>>
>>
>> So the problem is not that its normal lock starvation, it''s
more our particular
>> use case that is causing the starvation.  We can have lots of people
holding
>> readers and simply never give them up for long periods of time, which
is why we
>> need this is_contended helper so we know to drop things and let the
committer
>> through.  Thanks,
>
> You could easily achieve the same thing by putting an
"is_contending" flag in parallel with the rwsem and testing that:
Which adds a bunch more bus-locked operations to contended over, when
a unlocked if (list_empty()) is sufficient.

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Daney

2013-Sep-17 01:08 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On 09/16/2013 05:37 PM, Peter Hurley wrote:> On 09/16/2013 08:29 PM, David Daney wrote:
>> On 09/16/2013 05:05 PM, Josef Bacik wrote:
>>> On Mon, Sep 16, 2013 at 04:05:47PM -0700, Andrew Morton wrote:
>>>> On Fri, 30 Aug 2013 10:14:01 -0400 Josef Bacik
<jbacik@fusionio.com>
>>>> wrote:
>>>>
>>>>> Btrfs uses an rwsem to control access to its extent tree. 
Threads
>>>>> will hold a
>>>>> read lock on this rwsem while they scan the extent tree,
and if
>>>>> need_resched()
>>>>> they will drop the lock and schedule.  The transaction
commit needs
>>>>> to take a
>>>>> write lock for this rwsem for a very short period to switch
out the
>>>>> commit
>>>>> roots.  If there are a lot of threads doing this caching
operation
>>>>> we can starve
>>>>> out the committers which slows everybody out.  To address
this we
>>>>> want to add
>>>>> this functionality to see if our rwsem has anybody waiting
to take
>>>>> a write lock
>>>>> so we can drop it and schedule for a bit to allow the
commit to
>>>>> continue.
>>>>> Thanks,
>>>>>
>>>>
>>>> This sounds rather nasty and hacky.  Rather then working around
a
>>>> locking shortcoming in a caller it would be better to
fix/enhance the
>>>> core locking code.  What would such a change need to do?
>>>>
>>>> Presently rwsem waiters are fifo-queued, are they not?  So the
commit
>>>> thread will eventually get that lock.  Apparently
that''s not working
>>>> adequately for you but I don''t fully understand what
it is about these
>>>> dynamics which is causing observable problems.
>>>>
>>>
>>> So the problem is not that its normal lock starvation,
it''s more our
>>> particular
>>> use case that is causing the starvation.  We can have lots of
people
>>> holding
>>> readers and simply never give them up for long periods of time,
which
>>> is why we
>>> need this is_contended helper so we know to drop things and let the
>>> committer
>>> through.  Thanks,
>>
>> You could easily achieve the same thing by putting an
"is_contending"
>> flag in parallel with the rwsem and testing that:
>
> Which adds a bunch more bus-locked operations to contended over
Would that be a problem in this particular case?  Has it been measured?
> , when
> a unlocked if (list_empty()) is sufficient.
I don''t object to adding rwsem_is_contended() *if* it is required.  I 
was just pointing out that there may be other options.

The patch adds a bunch of new semantics to rwsem.  There is a trade off 
between increased complexity of core code, and generalizing subsystem 
specific optimizations that may not be globally useful.

Is it worth it in this case?  I do not know.

<insert quote relating to occam''s razor>

David Daney

Josef Bacik

2013-Sep-17 01:11 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On Mon, Sep 16, 2013 at 06:08:42PM -0700, David Daney
wrote:> On 09/16/2013 05:37 PM, Peter Hurley wrote:
> >On 09/16/2013 08:29 PM, David Daney wrote:
> >>On 09/16/2013 05:05 PM, Josef Bacik wrote:
> >>>On Mon, Sep 16, 2013 at 04:05:47PM -0700, Andrew Morton wrote:
> >>>>On Fri, 30 Aug 2013 10:14:01 -0400 Josef Bacik
<jbacik@fusionio.com>
> >>>>wrote:
> >>>>
> >>>>>Btrfs uses an rwsem to control access to its extent
tree.  Threads
> >>>>>will hold a
> >>>>>read lock on this rwsem while they scan the extent
tree, and if
> >>>>>need_resched()
> >>>>>they will drop the lock and schedule.  The transaction
commit needs
> >>>>>to take a
> >>>>>write lock for this rwsem for a very short period to
switch out the
> >>>>>commit
> >>>>>roots.  If there are a lot of threads doing this
caching operation
> >>>>>we can starve
> >>>>>out the committers which slows everybody out.  To
address this we
> >>>>>want to add
> >>>>>this functionality to see if our rwsem has anybody
waiting to take
> >>>>>a write lock
> >>>>>so we can drop it and schedule for a bit to allow the
commit to
> >>>>>continue.
> >>>>>Thanks,
> >>>>>
> >>>>
> >>>>This sounds rather nasty and hacky.  Rather then working
around a
> >>>>locking shortcoming in a caller it would be better to
fix/enhance the
> >>>>core locking code.  What would such a change need to do?
> >>>>
> >>>>Presently rwsem waiters are fifo-queued, are they not?  So
the commit
> >>>>thread will eventually get that lock.  Apparently
that''s not working
> >>>>adequately for you but I don''t fully understand
what it is about these
> >>>>dynamics which is causing observable problems.
> >>>>
> >>>
> >>>So the problem is not that its normal lock starvation,
it''s more our
> >>>particular
> >>>use case that is causing the starvation.  We can have lots of
people
> >>>holding
> >>>readers and simply never give them up for long periods of time,
which
> >>>is why we
> >>>need this is_contended helper so we know to drop things and let
the
> >>>committer
> >>>through.  Thanks,
> >>
> >>You could easily achieve the same thing by putting an
"is_contending"
> >>flag in parallel with the rwsem and testing that:
> >
> >Which adds a bunch more bus-locked operations to contended over
> 
> Would that be a problem in this particular case?  Has it been measured?
> 
> >, when
> >a unlocked if (list_empty()) is sufficient.
> 
> I don''t object to adding rwsem_is_contended() *if* it is required.
I was
> just pointing out that there may be other options.
> 
> The patch adds a bunch of new semantics to rwsem.  There is a trade off
> between increased complexity of core code, and generalizing subsystem
> specific optimizations that may not be globally useful.
> 
> Is it worth it in this case?  I do not know.
> 
So what you suggested is actually what we did in order to prove that this was
what the problem was.  I''m ok with continuing to do that, I just
figured adding
something like rwsem_is_contended() would be nice in case anybody else runs into
the issue in the future, plus it would save me an atomic_t in an already large
structure.  Thanks,

Josef

Peter Hurley

2013-Sep-17 01:22 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

On 09/16/2013 09:11 PM, Josef Bacik wrote:> On Mon, Sep 16, 2013 at 06:08:42PM -0700, David Daney wrote:
>> On 09/16/2013 05:37 PM, Peter Hurley wrote:
>>> On 09/16/2013 08:29 PM, David Daney wrote:
>>>> On 09/16/2013 05:05 PM, Josef Bacik wrote:
>>>>> On Mon, Sep 16, 2013 at 04:05:47PM -0700, Andrew Morton
wrote:
>>>>>> On Fri, 30 Aug 2013 10:14:01 -0400 Josef Bacik
<jbacik@fusionio.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Btrfs uses an rwsem to control access to its extent
tree.  Threads
>>>>>>> will hold a
>>>>>>> read lock on this rwsem while they scan the extent
tree, and if
>>>>>>> need_resched()
>>>>>>> they will drop the lock and schedule.  The
transaction commit needs
>>>>>>> to take a
>>>>>>> write lock for this rwsem for a very short period
to switch out the
>>>>>>> commit
>>>>>>> roots.  If there are a lot of threads doing this
caching operation
>>>>>>> we can starve
>>>>>>> out the committers which slows everybody out.  To
address this we
>>>>>>> want to add
>>>>>>> this functionality to see if our rwsem has anybody
waiting to take
>>>>>>> a write lock
>>>>>>> so we can drop it and schedule for a bit to allow
the commit to
>>>>>>> continue.
>>>>>>> Thanks,
>>>>>>>
>>>>>>
>>>>>> This sounds rather nasty and hacky.  Rather then
working around a
>>>>>> locking shortcoming in a caller it would be better to
fix/enhance the
>>>>>> core locking code.  What would such a change need to
do?
>>>>>>
>>>>>> Presently rwsem waiters are fifo-queued, are they not? 
So the commit
>>>>>> thread will eventually get that lock.  Apparently
that''s not working
>>>>>> adequately for you but I don''t fully
understand what it is about these
>>>>>> dynamics which is causing observable problems.
>>>>>>
>>>>>
>>>>> So the problem is not that its normal lock starvation,
it''s more our
>>>>> particular
>>>>> use case that is causing the starvation.  We can have lots
of people
>>>>> holding
>>>>> readers and simply never give them up for long periods of
time, which
>>>>> is why we
>>>>> need this is_contended helper so we know to drop things and
let the
>>>>> committer
>>>>> through.  Thanks,
>>>>
>>>> You could easily achieve the same thing by putting an
"is_contending"
>>>> flag in parallel with the rwsem and testing that:
>>>
>>> Which adds a bunch more bus-locked operations to contended over
>>
>> Would that be a problem in this particular case?  Has it been measured?
>>
>>> , when
>>> a unlocked if (list_empty()) is sufficient.
>>
>> I don''t object to adding rwsem_is_contended() *if* it is
required.  I was
>> just pointing out that there may be other options.
>>
>> The patch adds a bunch of new semantics to rwsem.  There is a trade off
>> between increased complexity of core code, and generalizing subsystem
>> specific optimizations that may not be globally useful.
>>
>> Is it worth it in this case?  I do not know.
>>
>
> So what you suggested is actually what we did in order to prove that this
was
> what the problem was.  I''m ok with continuing to do that, I just
figured adding
> something like rwsem_is_contended() would be nice in case anybody else runs
into
> the issue in the future, plus it would save me an atomic_t in an already
large
> structure.
I saw the original patch you linked to earlier in the discussion, and
I agree that for your use case adding a contention test is cleaner and clearer
than other options.

That said, I think this extension is only useful for readers: writers should be
getting their business done and releasing the sem.

Also, I think the comment above the function should be clearer that the lock
must already be held by the caller; IOW, this is not a trylock replacement.

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ingo Molnar

2013-Sep-17 06:53 UTC

head link

Re: [PATCH] rwsem: add rwsem_is_contended

* Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 30 Aug 2013 10:14:01 -0400 Josef Bacik <jbacik@fusionio.com>
wrote:
> 
> > Btrfs uses an rwsem to control access to its extent tree.  Threads 
> > will hold a read lock on this rwsem while they scan the extent tree, 
> > and if need_resched() they will drop the lock and schedule.  The 
> > transaction commit needs to take a write lock for this rwsem for a 
> > very short period to switch out the commit roots.  If there are a lot 
> > of threads doing this caching operation we can starve out the 
> > committers which slows everybody out.  To address this we want to add 
> > this functionality to see if our rwsem has anybody waiting to take a 
> > write lock so we can drop it and schedule for a bit to allow the 
> > commit to continue. Thanks,
> 
> This sounds rather nasty and hacky.  Rather then working around a 
> locking shortcoming in a caller it would be better to fix/enhance the 
> core locking code.  What would such a change need to do?
> 
> Presently rwsem waiters are fifo-queued, are they not?  So the commit 
> thread will eventually get that lock.  Apparently that''s not
working
> adequately for you but I don''t fully understand what it is about
these
> dynamics which is causing observable problems.
It would be nice to see the whole solution, together with the btrfs patch.

The problem I have is that this new primitive is only superficially like 
spin_is_contended(): in the spinlock case dropping the lock will guarantee 
some sort of progress, because another CPU will almost certainly pick up 
the lock if we cpu_relax().

In the rwsem case there''s no such guarantee of progress, especially if
a
read-lock is dropped. So I''d like to see how it''s implemented
in practice.

Thanks,

	Ingo

Maybe Matching Threads

Search for more reasonably related threads

Btrfs devel - Aug 2013 - [PATCH] rwsem: add rwsem_is_contended

[PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Re: [PATCH] rwsem: add rwsem_is_contended

Maybe Matching Threads