Waiman Long
2014-Apr-04 17:08 UTC
[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation
On 04/04/2014 12:57 PM, Konrad Rzeszutek Wilk wrote:> On Fri, Apr 04, 2014 at 03:00:12PM +0200, Peter Zijlstra wrote: >> So I'm just not ever going to pick up this patch; I spend a week trying >> to reverse engineer this; I posted a 7 patch series creating the >> equivalent, but in a gradual and readable fashion: >> >> http://lkml.kernel.org/r/20140310154236.038181843 at infradead.org >> >> You keep on ignoring that; I'll keep on ignoring your patches. >> >> I might at some point rewrite some of your pv stuff on top to get this >> moving again, but I'm not really motivated to work with you atm. > Uh? Did you CC also xen-devel at lists.xenproject.org on your patches Peter? > I hadn't had a chance to see or comment on them :-( >Peter's patch is a rewrite of my patches 1-4, there is no PV or unfair lock support in there. -Longman
Ingo Molnar
2014-Apr-04 17:54 UTC
[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation
* Waiman Long <waiman.long at hp.com> wrote:> On 04/04/2014 12:57 PM, Konrad Rzeszutek Wilk wrote: > >On Fri, Apr 04, 2014 at 03:00:12PM +0200, Peter Zijlstra wrote: > >>So I'm just not ever going to pick up this patch; I spend a week trying > >>to reverse engineer this; I posted a 7 patch series creating the > >>equivalent, but in a gradual and readable fashion: > >> > >> http://lkml.kernel.org/r/20140310154236.038181843 at infradead.org > >> > >>You keep on ignoring that; I'll keep on ignoring your patches. > >> > >>I might at some point rewrite some of your pv stuff on top to get this > >>moving again, but I'm not really motivated to work with you atm. > >Uh? Did you CC also xen-devel at lists.xenproject.org on your patches Peter? > >I hadn't had a chance to see or comment on them :-( > > > > Peter's patch is a rewrite of my patches 1-4, there is no PV or > unfair lock support in there.It is a fine grained split-up, which does one thing at a time, so it all becomes reviewable and mergable (and the claimed effects become testable!). Please use that as a base. Thanks, Ingo
Peter Zijlstra
2014-Apr-07 14:09 UTC
[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation
On Fri, Apr 04, 2014 at 01:08:16PM -0400, Waiman Long wrote:> Peter's patch is a rewrite of my patches 1-4, there is no PV or unfair lock > support in there.Yes, because your patches were unreadable and entirely non obvious. And while I appreciate that its not entirely your fault; the subject is hard, you didn't even try to make it better and explain things in a normal gradual fashion. So what I did was start with a 'simple' correct implementation (although I could have started simpler still I suppose and added 3-4 more patches) and then added each optimization on top and explained the what and why for them. The result is similar code, but the path is ever so much easier to understand and review. And no, it doesn't have PV support; I spend the whole week trying to reverse engineer your patch 1; whereas if you'd presented it in the form I posted I might have agreed in a day or so. I still have to look at the PV bits; but seeing how you have shown no interest in writing coherent and understandable patch sets I'm less inclined to go stare at them for another few weeks and rewrite that code too. Also; theres a few subtle but important differences between the patch sets. Your series only makes x86_64 use the qspinlocks; the very last we want is i386 and x86_64 to use different spinlock implementations; we want less differences between them, not more. You stuff a whole lot of code into arch/x86 for no reason what so ever. Prior to that; I also rewrote your benchmark thing; using jiffies for timing is just vile. Not to mention you require a full kernel build and reboot (which is just stupid slow on big machines) to test anything.
Waiman Long
2014-Apr-07 16:59 UTC
[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation
On 04/07/2014 10:09 AM, Peter Zijlstra wrote:> On Fri, Apr 04, 2014 at 01:08:16PM -0400, Waiman Long wrote: >> Peter's patch is a rewrite of my patches 1-4, there is no PV or unfair lock >> support in there. > Yes, because your patches were unreadable and entirely non obvious. > > And while I appreciate that its not entirely your fault; the subject is > hard, you didn't even try to make it better and explain things in a > normal gradual fashion. > > So what I did was start with a 'simple' correct implementation (although > I could have started simpler still I suppose and added 3-4 more patches) > and then added each optimization on top and explained the what and why > for them. > > The result is similar code, but the path is ever so much easier to > understand and review.I appreciate your time in rewriting the code to make it easier to review. I will based my next patch on your rewrite. However, I am going to make the following minor changes: 1. I would like to series to be compilable with tip/master and v3.15-rc. Your patches seem to use some constants like __BYTE_ORDER__ that are not defined there. I will make change that to make them compilable on top of the latest upstream code. 2. Your patch uses atomic_test_and_set_bit to set the pending bit. I would prefer to use atomic_cmpxchg() which will ensure that the pending bit won't be set when some other tasks are already on the queue. This will also enable me to allow the simple setting of the lock bit without using atomic instruction. This change doesn't change performance that much when I tested it on Westmere. But on IvyBridge, it had a pretty big performance impact. 3. I doubt it is a good idea for the queue head to use atomic_cmpxchg() to pound on the lock cacheline repeatedly in the waiting loop. It will make it much harder for the lock holder to access the lock cacheline. I will use a simple atomic_read to query the status of the lock before doing any write on it.> > And no, it doesn't have PV support; I spend the whole week trying to > reverse engineer your patch 1; whereas if you'd presented it in the form > I posted I might have agreed in a day or so. > > I still have to look at the PV bits; but seeing how you have shown no > interest in writing coherent and understandable patch sets I'm less > inclined to go stare at them for another few weeks and rewrite that code > too. > > Also; theres a few subtle but important differences between the patch > sets. Your series only makes x86_64 use the qspinlocks; the very last we > want is i386 and x86_64 to use different spinlock implementations; we > want less differences between them, not more.I have no problem of making qspinlock the default for all x86 code.> > You stuff a whole lot of code into arch/x86 for no reason what so ever. > > Prior to that; I also rewrote your benchmark thing; using jiffies for > timing is just vile. Not to mention you require a full kernel build and > reboot (which is just stupid slow on big machines) to test anything.Yes, using jiffies isn't the best idea, it is just an easier way. BTW, I don't need to reboot the machine to run my test, I only need to build the .ko file and use insmod to insert into the running kernel. When it is done, rmmod can be used to remove it. I do need to rebuild the kernel when I make changes to the qspinlock patch. -Longman
Possibly Parallel Threads
- [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation
- [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation
- [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation
- [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation
- [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation