thr3ads.net - Linux Virtualization - [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation [Apr 2014]

If this information is useful, please help other people find it:
Share via:

Waiman Long

2014-Apr-04 17:08 UTC

[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

On 04/04/2014 12:57 PM, Konrad Rzeszutek Wilk wrote:> On Fri, Apr 04, 2014 at 03:00:12PM +0200, Peter Zijlstra wrote:
>> So I'm just not ever going to pick up this patch; I spend a week
trying
>> to reverse engineer this; I posted a 7 patch series creating the
>> equivalent, but in a gradual and readable fashion:
>>
>>    http://lkml.kernel.org/r/20140310154236.038181843 at infradead.org
>>
>> You keep on ignoring that; I'll keep on ignoring your patches.
>>
>> I might at some point rewrite some of your pv stuff on top to get this
>> moving again, but I'm not really motivated to work with you atm.
> Uh? Did you CC also xen-devel at lists.xenproject.org on your patches
Peter?
> I hadn't had a chance to see or comment on them :-(
>
Peter's patch is a rewrite of my patches 1-4, there is no PV or unfair 
lock support in there.

-Longman

Ingo Molnar

2014-Apr-04 17:54 UTC

head link

[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

* Waiman Long <waiman.long at hp.com> wrote:
> On 04/04/2014 12:57 PM, Konrad Rzeszutek Wilk wrote:
> >On Fri, Apr 04, 2014 at 03:00:12PM +0200, Peter Zijlstra wrote:
> >>So I'm just not ever going to pick up this patch; I spend a
week trying
> >>to reverse engineer this; I posted a 7 patch series creating the
> >>equivalent, but in a gradual and readable fashion:
> >>
> >>   http://lkml.kernel.org/r/20140310154236.038181843 at
infradead.org
> >>
> >>You keep on ignoring that; I'll keep on ignoring your patches.
> >>
> >>I might at some point rewrite some of your pv stuff on top to get
this
> >>moving again, but I'm not really motivated to work with you
atm.
> >Uh? Did you CC also xen-devel at lists.xenproject.org on your patches
Peter?
> >I hadn't had a chance to see or comment on them :-(
> >
> 
> Peter's patch is a rewrite of my patches 1-4, there is no PV or 
> unfair lock support in there.
It is a fine grained split-up, which does one thing at a time, so it 
all becomes reviewable and mergable (and the claimed effects become 
testable!). Please use that as a base.

Thanks,

	Ingo

Peter Zijlstra

2014-Apr-07 14:09 UTC

head link

[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

On Fri, Apr 04, 2014 at 01:08:16PM -0400, Waiman Long
wrote:> Peter's patch is a rewrite of my patches 1-4, there is no PV or unfair
lock
> support in there.
Yes, because your patches were unreadable and entirely non obvious.

And while I appreciate that its not entirely your fault; the subject is
hard, you didn't even try to make it better and explain things in a
normal gradual fashion.

So what I did was start with a 'simple' correct implementation (although
I could have started simpler still I suppose and added 3-4 more patches)
and then added each optimization on top and explained the what and why
for them.

The result is similar code, but the path is ever so much easier to
understand and review.

And no, it doesn't have PV support; I spend the whole week trying to
reverse engineer your patch 1; whereas if you'd presented it in the form
I posted I might have agreed in a day or so.

I still have to look at the PV bits; but seeing how you have shown no
interest in writing coherent and understandable patch sets I'm less
inclined to go stare at them for another few weeks and rewrite that code
too.

Also; theres a few subtle but important differences between the patch
sets. Your series only makes x86_64 use the qspinlocks; the very last we
want is i386 and x86_64 to use different spinlock implementations; we
want less differences between them, not more.

You stuff a whole lot of code into arch/x86 for no reason what so ever.

Prior to that; I also rewrote your benchmark thing; using jiffies for
timing is just vile. Not to mention you require a full kernel build and
reboot (which is just stupid slow on big machines) to test anything.

Waiman Long

2014-Apr-07 16:59 UTC

head link

[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

On 04/07/2014 10:09 AM, Peter Zijlstra wrote:> On Fri, Apr 04, 2014 at 01:08:16PM -0400, Waiman Long wrote:
>> Peter's patch is a rewrite of my patches 1-4, there is no PV or
unfair lock
>> support in there.
> Yes, because your patches were unreadable and entirely non obvious.
>
> And while I appreciate that its not entirely your fault; the subject is
> hard, you didn't even try to make it better and explain things in a
> normal gradual fashion.
>
> So what I did was start with a 'simple' correct implementation
(although
> I could have started simpler still I suppose and added 3-4 more patches)
> and then added each optimization on top and explained the what and why
> for them.
>
> The result is similar code, but the path is ever so much easier to
> understand and review.
I appreciate your time in rewriting the code to make it easier to 
review. I will based my next patch on your rewrite. However, I am going 
to make the following minor changes:

1. I would like to series to be compilable with tip/master and v3.15-rc. 
Your patches seem to use some constants like __BYTE_ORDER__ that are not 
defined there. I will make change that to make them compilable on top of 
the latest upstream code.

2. Your patch uses atomic_test_and_set_bit to set the pending bit. I 
would prefer to use atomic_cmpxchg() which will ensure that the pending 
bit won't be set when some other tasks are already on the queue. This 
will also enable me to allow the simple setting of the lock bit without 
using atomic instruction. This change doesn't change performance that 
much when I tested it on Westmere. But on IvyBridge, it had a pretty big 
performance impact.

3. I doubt it is a good idea for the queue head to use atomic_cmpxchg() 
to pound on the lock cacheline repeatedly in the waiting loop. It will 
make it much harder for the lock holder to access the lock cacheline. I 
will use a simple atomic_read to query the status of the lock before 
doing any write on it.

>
> And no, it doesn't have PV support; I spend the whole week trying to
> reverse engineer your patch 1; whereas if you'd presented it in the
form
> I posted I might have agreed in a day or so.
>
> I still have to look at the PV bits; but seeing how you have shown no
> interest in writing coherent and understandable patch sets I'm less
> inclined to go stare at them for another few weeks and rewrite that code
> too.
>
> Also; theres a few subtle but important differences between the patch
> sets. Your series only makes x86_64 use the qspinlocks; the very last we
> want is i386 and x86_64 to use different spinlock implementations; we
> want less differences between them, not more.
I have no problem of making qspinlock the default for all x86 code.
>
> You stuff a whole lot of code into arch/x86 for no reason what so ever.
>
> Prior to that; I also rewrote your benchmark thing; using jiffies for
> timing is just vile. Not to mention you require a full kernel build and
> reboot (which is just stupid slow on big machines) to test anything.
Yes, using jiffies isn't the best idea, it is just an easier way. BTW, I 
don't need to reboot the machine to run my test, I only need to build 
the .ko file and use insmod to insert into the running kernel. When it 
is done, rmmod can be used to remove it. I do need to rebuild the kernel 
when I make changes to the qspinlock patch.

-Longman

Maybe Matching Threads

Search for more seemingly similar threads

Linux Virtualization - Apr 2014 - [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

Maybe Matching Threads