Juergen Gross
2008-Dec-17 12:22 UTC
[Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
-- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2008-Dec-17 15:06 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
>--- a/include/asm-x86_64/mach-xen/asm/irqflags.h Sat Dec 13 16:00:43 2008 +0000 >+++ b/include/asm-x86_64/mach-xen/asm/irqflags.h Wed Dec 17 13:12:53 2008 +0100 >@@ -33,8 +33,12 @@ do { \ > vcpu_info_t *_vcpu; \ > barrier(); \ > _vcpu = current_vcpu_info(); \ >- if ((_vcpu->evtchn_upcall_mask = (x)) == 0) { \ >+ if ( !(x) ) { \This isn''t correct, as it breaks 0->1 transitions (there are a few instances of this in the kernel).>+ _vcpu->no_desched = 0; \ >+ _vcpu->evtchn_upcall_mask = 0; \ > barrier(); /* unmask then check (avoid races) */ \ >+ if ( unlikely(_vcpu->desched_delay) ) \ >+ (void)((HYPERVISOR_sched_op(SCHEDOP_yield, _vcpu))?:0); \Why not just cast the function result to void? Likewise further below...> if ( unlikely(_vcpu->evtchn_upcall_pending) ) \ > force_evtchn_callback(); \ > } \Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2008-Dec-18 07:18 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Jan Beulich wrote:>> --- a/include/asm-x86_64/mach-xen/asm/irqflags.h Sat Dec 13 16:00:43 2008 +0000 >> +++ b/include/asm-x86_64/mach-xen/asm/irqflags.h Wed Dec 17 13:12:53 2008 +0100 >> @@ -33,8 +33,12 @@ do { \ >> vcpu_info_t *_vcpu; \ >> barrier(); \ >> _vcpu = current_vcpu_info(); \ >> - if ((_vcpu->evtchn_upcall_mask = (x)) == 0) { \ >> + if ( !(x) ) { \ > > This isn''t correct, as it breaks 0->1 transitions (there are a few instances of > this in the kernel).Thanks! I will correct it.> >> + _vcpu->no_desched = 0; \ >> + _vcpu->evtchn_upcall_mask = 0; \ >> barrier(); /* unmask then check (avoid races) */ \ >> + if ( unlikely(_vcpu->desched_delay) ) \ >> + (void)((HYPERVISOR_sched_op(SCHEDOP_yield, _vcpu))?:0); \ > > Why not just cast the function result to void? Likewise further below...I took that from include/xen/hypercall.h, which mentioned problems with just casting the function result. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2008-Dec-18 07:41 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
>>> Juergen Gross <juergen.gross@fujitsu-siemens.com> 18.12.08 08:18 >>> >Jan Beulich wrote: >> >>> + _vcpu->no_desched = 0; \ >>> + _vcpu->evtchn_upcall_mask = 0; \ >>> barrier(); /* unmask then check (avoid races) */ \ >>> + if ( unlikely(_vcpu->desched_delay) ) \ >>> + (void)((HYPERVISOR_sched_op(SCHEDOP_yield, _vcpu))?:0); \ >> >> Why not just cast the function result to void? Likewise further below... > >I took that from include/xen/hypercall.h, which mentioned problems with just >casting the function result.Ah, yes, I recall. But then why don''t you use that macro? After all, hypercall.h must have been included if you''re able to call HYPERVISOR_sched_op(). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2008-Dec-19 08:12 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Jan Beulich wrote:>>>> Juergen Gross <juergen.gross@fujitsu-siemens.com> 18.12.08 08:18 >>> >> Jan Beulich wrote: >>>> + _vcpu->no_desched = 0; \ >>>> + _vcpu->evtchn_upcall_mask = 0; \ >>>> barrier(); /* unmask then check (avoid races) */ \ >>>> + if ( unlikely(_vcpu->desched_delay) ) \ >>>> + (void)((HYPERVISOR_sched_op(SCHEDOP_yield, _vcpu))?:0); \ >>> Why not just cast the function result to void? Likewise further below... >> I took that from include/xen/hypercall.h, which mentioned problems with just >> casting the function result. > > Ah, yes, I recall. But then why don''t you use that macro? After all, hypercall.h > must have been included if you''re able to call HYPERVISOR_sched_op().This was an artefact from previous tests, sorry. Attached patch should be clean now. Keir, what is your main objection? Putting this interface into the hypervisor or only the Linux part? I still think this would be good to have in linux, but our BS2000 port would really be much easier having this interface in the hypervisor! So if you don''t mind adding only the first patch, this would be absolutely okay for us. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Dec-19 09:10 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
On 19/12/2008 08:12, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> wrote:>> Ah, yes, I recall. But then why don''t you use that macro? After all, >> hypercall.h >> must have been included if you''re able to call HYPERVISOR_sched_op(). > > This was an artefact from previous tests, sorry. > Attached patch should be clean now. > > Keir, what is your main objection? > Putting this interface into the hypervisor or only the Linux part? > I still think this would be good to have in linux, but our BS2000 port would > really be much easier having this interface in the hypervisor! > So if you don''t mind adding only the first patch, this would be absolutely > okay for us.I haven''t seen any win on any real world setup. So I remain unconvinced, and it''ll need more than you alone championing the patch to get it in. There have been no other general comments so far (Jan''s have been about specific details). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2008-Dec-19 09:25 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Keir Fraser wrote:> On 19/12/2008 08:12, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> > wrote: >> Keir, what is your main objection? >> Putting this interface into the hypervisor or only the Linux part? >> I still think this would be good to have in linux, but our BS2000 port would >> really be much easier having this interface in the hypervisor! >> So if you don''t mind adding only the first patch, this would be absolutely >> okay for us. > > I haven''t seen any win on any real world setup. So I remain unconvinced, and > it''ll need more than you alone championing the patch to get it in. There > have been no other general comments so far (Jan''s have been about specific > details).Okay, would the following scenario be "real world" enough? Multiple domUs being busy leading to enough vcpu scheduling, several parallel kernel builds in dom0 acting as benchmark. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2008-Dec-19 09:33 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
>>> Keir Fraser <keir.fraser@eu.citrix.com> 19.12.08 10:10 >>> >I haven''t seen any win on any real world setup. So I remain unconvinced, and >it''ll need more than you alone championing the patch to get it in. There >have been no other general comments so far (Jan''s have been about specific >details).I think I''d generally welcome a change like this, but I''m not certain how far I feel convinced that the submission meets one very basic criteria: avoidance of mis-use of the feature by a domain (simply stating that a vCPU will be de-scheduled after 1ms anyway doesn''t seem sufficient to me). This might need to include ways to differentiate between Dom0/DomU and/or CPU- vs IO-bound vCPU-s. Beyond that, it seems questionable that tying this to event delivery being disabled on the vCPU is very useful - Xen could do this on its own, without needing a second flag. Having an extra flag really seems useful only when one can set it while holding spin locks, which at least the Linux part of the patch doesn''t seem to aim at. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Dec-19 09:56 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
On 19/12/2008 09:25, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> wrote:>> I haven''t seen any win on any real world setup. So I remain unconvinced, and >> it''ll need more than you alone championing the patch to get it in. There >> have been no other general comments so far (Jan''s have been about specific >> details). > > Okay, would the following scenario be "real world" enough? > > Multiple domUs being busy leading to enough vcpu scheduling, several parallel > kernel builds in dom0 acting as benchmark.Something like that would be better. Of course you''d need to measure work done in the domUs as well, as one of the critical factors for this patch would be how it affects fairness. It''s one reason I''m leery of this patch -- our scheduler is unpredictable enough as it is without giving domains another lever to pull! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Dec-19 09:56 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
On 19/12/2008 09:33, "Jan Beulich" <jbeulich@novell.com> wrote:>>>> Keir Fraser <keir.fraser@eu.citrix.com> 19.12.08 10:10 >>> >> I haven''t seen any win on any real world setup. So I remain unconvinced, and >> it''ll need more than you alone championing the patch to get it in. There >> have been no other general comments so far (Jan''s have been about specific >> details). > > I think I''d generally welcome a change like this, but I''m not certain how far > I feel convinced that the submission meets one very basic criteria: > avoidance of mis-use of the feature by a domain (simply stating that a vCPU > will be de-scheduled after 1ms anyway doesn''t seem sufficient to me). This > might need to include ways to differentiate between Dom0/DomU and/or > CPU- vs IO-bound vCPU-s.The most likely person to comment on that in the coming weeks would be George, who''s kindly signed up to do some design work on the scheduler. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2008-Dec-19 10:06 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Jan Beulich wrote:>>>> Keir Fraser <keir.fraser@eu.citrix.com> 19.12.08 10:10 >>> >> I haven''t seen any win on any real world setup. So I remain unconvinced, and >> it''ll need more than you alone championing the patch to get it in. There >> have been no other general comments so far (Jan''s have been about specific >> details). > > I think I''d generally welcome a change like this, but I''m not certain how far > I feel convinced that the submission meets one very basic criteria: > avoidance of mis-use of the feature by a domain (simply stating that a vCPU > will be de-scheduled after 1ms anyway doesn''t seem sufficient to me). This > might need to include ways to differentiate between Dom0/DomU and/or > CPU- vs IO-bound vCPU-s.It would be possible to sum up the mis-usages. The extra 1 msec could be allowed only if the total consumed time of the vcpu is at least 1000 times the extra time spent so far. Or we could allow it only if no mis-usage had been recorded in the last second.> > Beyond that, it seems questionable that tying this to event delivery being > disabled on the vCPU is very useful - Xen could do this on its own, without > needing a second flag. Having an extra flag really seems useful only when > one can set it while holding spin locks, which at least the Linux part of the > patch doesn''t seem to aim at.My first test was not limited to the irq disabling. I tried to tie it to the preemption_count of the current thread by adding special functions for manipulating the preemption_count. I found this approach was to complicated for a first test as I obviously missed some details (the resulting system did work, but was much slower as before). I think it would be interesting to spend some more time to tune this, but I wanted to get some feedback first. And, to be honest, I don''t think I am the right person to do this, as I''m really no Linux scheduling specialist :-) Regarding handling this in Xen only: not to deschedule a vcpu in case of interrupts disabled is easy, but how would you deschedule it after interrupts are allowed again? Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2008-Dec-19 10:36 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
>>> Juergen Gross <juergen.gross@fujitsu-siemens.com> 19.12.08 11:06 >>> >Regarding handling this in Xen only: not to deschedule a vcpu in case of >interrupts disabled is easy, but how would you deschedule it after interrupts >are allowed again?It''s all the same as with the newly added flag of yours - it requires cooperation from the guest (plus the forced de-schedule if it fails to do so). It''s just that you don''t need to set two flags when disabling interrupts, that second flag would only be needed when you want to avoid being de-scheduled for reasons other than event delivery being disabled. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2008-Dec-19 10:42 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Jan Beulich wrote:>>>> Juergen Gross <juergen.gross@fujitsu-siemens.com> 19.12.08 11:06 >>> >> Regarding handling this in Xen only: not to deschedule a vcpu in case of >> interrupts disabled is easy, but how would you deschedule it after interrupts >> are allowed again? > > It''s all the same as with the newly added flag of yours - it requires cooperation > from the guest (plus the forced de-schedule if it fails to do so). It''s just that > you don''t need to set two flags when disabling interrupts, that second flag > would only be needed when you want to avoid being de-scheduled for > reasons other than event delivery being disabled.I just wanted to be independent from the event mechanism. I''m fine with merging the two features, but it would be more limiting than necessary. It''s like disabling IRQs and avoiding preemption of a thread: you can avoid preemption by disabling IRQs, but this is not the best way to do it... Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2008-Dec-19 10:48 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Jan Beulich wrote:>>>> Juergen Gross <juergen.gross@fujitsu-siemens.com> 19.12.08 11:06 >>> >> Regarding handling this in Xen only: not to deschedule a vcpu in case of >> interrupts disabled is easy, but how would you deschedule it after interrupts >> are allowed again? > > It''s all the same as with the newly added flag of yours - it requires cooperation > from the guest (plus the forced de-schedule if it fails to do so). It''s just that > you don''t need to set two flags when disabling interrupts, that second flag > would only be needed when you want to avoid being de-scheduled for > reasons other than event delivery being disabled.Just another thought: My approach was more compatible. Only a guest which is aware of the new flag will set it and in turn respect the request of the hypervisor to give up control after enabling de-scheduling again. Old guests would always be regarded as non-cooperative. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Dec-19 15:15 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
The general idea seems interesting. I think we''ve kicked it around internally before, but ended up sticking with a "yield after spinning for awhile" strategy just for simplicity. However, as Jeurgen says, this flag could, in principle, save all of the "spin for a while" timewasting in the first place. As for mis-use: If we do things right, a guest shouldn''t be able to get an advantage from setting the flag when it doesn''t need to. If we add the ability to preempt it after 1ms, and deduct the extra credits from the VM for the extra time run, then it will only run a little longer, and then have to wait longer to be scheduled again time. (I think the more accurate credit accounting part of Naoki''s patches are sure to be included in the scheduler revision.) If it doesn''t yield after the critical section is over, it risks being pre-empted at the next critical section. The thing to test would be concurrent kernel builds and dbench, with multiple domains, each domain vcpus == pcpus. Would you mind coding up a yield-after-spinning-awhile patch, and comparing the results to your "don''t-deschedule-me" patch kernel build at least, and possibly dbench? I''m including some patches which should be included when testing the "yield after spinning awhile" patch, otherwise nothing interesting will happen. They''re a bit hackish, but seem to work pretty well for their purpose. -George On Fri, Dec 19, 2008 at 9:56 AM, Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 19/12/2008 09:33, "Jan Beulich" <jbeulich@novell.com> wrote: > >>>>> Keir Fraser <keir.fraser@eu.citrix.com> 19.12.08 10:10 >>> >>> I haven''t seen any win on any real world setup. So I remain unconvinced, and >>> it''ll need more than you alone championing the patch to get it in. There >>> have been no other general comments so far (Jan''s have been about specific >>> details). >> >> I think I''d generally welcome a change like this, but I''m not certain how far >> I feel convinced that the submission meets one very basic criteria: >> avoidance of mis-use of the feature by a domain (simply stating that a vCPU >> will be de-scheduled after 1ms anyway doesn''t seem sufficient to me). This >> might need to include ways to differentiate between Dom0/DomU and/or >> CPU- vs IO-bound vCPU-s. > > The most likely person to comment on that in the coming weeks would be > George, who''s kindly signed up to do some design work on the scheduler. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2009-Jan-12 12:55 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
George Dunlap wrote:> The general idea seems interesting. I think we''ve kicked it around > internally before, but ended up sticking with a "yield after spinning > for awhile" strategy just for simplicity. However, as Jeurgen says, > this flag could, in principle, save all of the "spin for a while" > timewasting in the first place. > > As for mis-use: If we do things right, a guest shouldn''t be able to > get an advantage from setting the flag when it doesn''t need to. If we > add the ability to preempt it after 1ms, and deduct the extra credits > from the VM for the extra time run, then it will only run a little > longer, and then have to wait longer to be scheduled again time. (I > think the more accurate credit accounting part of Naoki''s patches are > sure to be included in the scheduler revision.) If it doesn''t yield > after the critical section is over, it risks being pre-empted at the > next critical section. > > The thing to test would be concurrent kernel builds and dbench, with > multiple domains, each domain vcpus == pcpus. > > Would you mind coding up a yield-after-spinning-awhile patch, and > comparing the results to your "don''t-deschedule-me" patch kernel build > at least, and possibly dbench? I''m including some patches which > should be included when testing the "yield after spinning awhile" > patch, otherwise nothing interesting will happen. They''re a bit > hackish, but seem to work pretty well for their purpose.It took some time (other problems, as always ;-) ), but here are the results: Hardware: 4 cpu x86_64 machine, 8 GB memory. Domain 0 with 4 vcpus, 8 other domains with 1 vcpu each spinning to force vcpu scheduling. 8 parallel xen hypervisor builds on domain 0 plus scp from another machine to have some network load. Additional test with dbench after the build jobs. Results with patched system (no deschedule): -------------------------------------------- Domain 0 consumed 581.2 seconds, the other domains each about 535 seconds. While the builds were running 60 scp jobs finished. Real time for the build was between 1167 and 1214 seconds (av. 1192 seconds). Summed user time was 562.77 seconds, system time 12.17 seconds. dbench: Throughput 141.764 MB/sec 10 procs System reaction to shell commands: okay Original system: ---------------- Domain 0 consumed 583.8 seconds, the other domains each about 540 seconds. While the builds were running 60 scp jobs finished. Real time for the build was between 1181 and 1222 seconds (av. 1204 seconds). Summed user time was 563.02 seconds, system time 12.65 seconds. dbench: Throughput 133.249 MB/sec 10 procs System reaction to shell commands: slower than patched system Yield in spinlock: ------------------ Domain 0 consumed 582.2 seconds, the other domains each about 555 seconds. While the builds were running 50 scp jobs finished. Real time for the build was between 1226 and 1254 seconds (av. 1244 seconds). Summed user time was 563.43 seconds, system time 12.63 seconds. dbench: Throughput 145.218 MB/sec 10 procs System reaction to shell commands: sometimes "hickups" for up to 30 seconds Included were the hypervisor patches of George Conclusion: ----------- Differences not really big, but my "no deschedule" patch had least elapsed time for build-jobs, while scp was able to transfer same amount of data as in slower original system. The "Yield in spinlock" patch had slightly better dbench performance, but interactive shell commands were a pain sometimes! I suspect some problem in George''s patches during low system load to be the main reason for this behaviour. Without George''s patches the "Yield in spinlock" was very similar to the original system. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2009-Jan-16 07:16 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Keir Fraser wrote:> On 19/12/2008 09:25, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> > wrote: > >>> I haven''t seen any win on any real world setup. So I remain unconvinced, and >>> it''ll need more than you alone championing the patch to get it in. There >>> have been no other general comments so far (Jan''s have been about specific >>> details). >> Okay, would the following scenario be "real world" enough? >> >> Multiple domUs being busy leading to enough vcpu scheduling, several parallel >> kernel builds in dom0 acting as benchmark. > > Something like that would be better. Of course you''d need to measure work > done in the domUs as well, as one of the critical factors for this patch > would be how it affects fairness. It''s one reason I''m leery of this patch -- > our scheduler is unpredictable enough as it is without giving domains > another lever to pull!Keir, is the data I posted recently okay? I think my approach requires less changes than the "yield after spin" variant, which needed more patches in the hypervisor and didn''t seem to be settled. Having my patches in the hypervisor at least would make life much easier for our BS2000 system... I would add some code to ensure a domain isn''t misusing the new interface. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Venefax
2009-Jan-16 07:38 UTC
RE: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
I can test it a real-world situation. I have SUSE 10-SP2 and have terrible performance issues with fully virtualized SMP machines. I had to start using Standard PC as HAL to avoid the penalty. Federico -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Juergen Gross Sent: Friday, January 16, 2009 2:16 AM To: Keir Fraser Cc: George Dunlap; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part Keir Fraser wrote:> On 19/12/2008 09:25, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> > wrote: > >>> I haven''t seen any win on any real world setup. So I remain unconvinced,and>>> it''ll need more than you alone championing the patch to get it in. There >>> have been no other general comments so far (Jan''s have been aboutspecific>>> details). >> Okay, would the following scenario be "real world" enough? >> >> Multiple domUs being busy leading to enough vcpu scheduling, severalparallel>> kernel builds in dom0 acting as benchmark. > > Something like that would be better. Of course you''d need to measure work > done in the domUs as well, as one of the critical factors for this patch > would be how it affects fairness. It''s one reason I''m leery of this patch--> our scheduler is unpredictable enough as it is without giving domains > another lever to pull!Keir, is the data I posted recently okay? I think my approach requires less changes than the "yield after spin" variant, which needed more patches in the hypervisor and didn''t seem to be settled. Having my patches in the hypervisor at least would make life much easier for our BS2000 system... I would add some code to ensure a domain isn''t misusing the new interface. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2009-Jan-16 07:48 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Venefax wrote:> I can test it a real-world situation. > I have SUSE 10-SP2 and have terrible performance issues with fully > virtualized SMP machines. I had to start using Standard PC as HAL to avoid > the penalty. > FedericoFederico, thanks for your support. But my patches are for PV-domains (or at least for domains with PV-drivers) only. I think you are using Windows as guest, so you would have to build a XEN-aware HAL... If you are testing with PV-domains, too, you are welcome, of course! Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Venefax
2009-Jan-16 07:57 UTC
RE: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Just for my education and the rest of the list, what are you talking about? What is a PV domain compared to a Windows guest? I use the GPLPV drivers from James. -----Original Message----- From: Juergen Gross [mailto:juergen.gross@fujitsu-siemens.com] Sent: Friday, January 16, 2009 2:49 AM To: Venefax Cc: ''Keir Fraser''; ''George Dunlap''; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part Venefax wrote:> I can test it a real-world situation. > I have SUSE 10-SP2 and have terrible performance issues with fully > virtualized SMP machines. I had to start using Standard PC as HAL to avoid > the penalty. > FedericoFederico, thanks for your support. But my patches are for PV-domains (or at least for domains with PV-drivers) only. I think you are using Windows as guest, so you would have to build a XEN-aware HAL... If you are testing with PV-domains, too, you are welcome, of course! Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-16 08:17 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
On 16/01/2009 07:16, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> wrote:>> Something like that would be better. Of course you''d need to measure work >> done in the domUs as well, as one of the critical factors for this patch >> would be how it affects fairness. It''s one reason I''m leery of this patch -- >> our scheduler is unpredictable enough as it is without giving domains >> another lever to pull! > > Keir, is the data I posted recently okay? > I think my approach requires less changes than the "yield after spin" variant, > which needed more patches in the hypervisor and didn''t seem to be settled. > Having my patches in the hypervisor at least would make life much easier for > our BS2000 system... > I would add some code to ensure a domain isn''t misusing the new interface.It didn''t sound like there was much average difference between the two approaches, also that George''s patches may be going in anyway for general scheduling stability reasons, and also that any other observed hiccups may also simply point to limitations of the scheduler implementation which George may look at further. Do you have an explanation for why shell commands behave differently with your patch, or alternatively why they can be delayed so long with the yield approach? The approach taken in Linux is not merely ''yield on spinlock'' by the way, it is ''block on event channel on spinlock'' essentially turning a contended spinlock into a sleeping mutex. I think that is quite different behaviour from merely yielding, and expecting the scheduler to do something sensible with your yield request. Overall I think George should consider your patch as part of his overall scheduler refurbishment work. I personally remain unconvinced that the reactive approach cannot get predictable performance close to your approach, and without needing new hypervisor interfaces. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2009-Jan-16 08:19 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Venefax wrote:> Just for my education and the rest of the list, what are you talking about? > What is a PV domain compared to a Windows guest? I use the GPLPV drivers > from James.PV-domains are adapted to XEN in many aspects (memory management, I/O, trap handling, ...). Instead of using the privileged x86-instructions they normally call the hypervisor via hypercalls to perform privileged operations. HVM-domains like Windows require virtualization support by the processor (VT-i for INTEL, Pacifica for AMD). To boost I/O-performance, it is possible to use PV-drivers which use a virtual PCI-device to do I/O, but most of the privileged actions are performed as on native systems, trapping into the hypervisor if necessary. You could read some of the links in http://wiki.xensource.com/xenwiki/XenArchitecture especially http://wiki.xensource.com/xenwiki/XenArchitecture?action=AttachFile&do=get&target=Xen+Architecture_Q1+2008.pdf http://www.linuxjournal.com/article/8540 and http://www.linuxjournal.com/article/8909 might be useful. My patches require special actions in the domain to avoid descheduling in critical paths. In Windows this is normally an area covered by HAL, so you would have to port HAL to be XEN-aware, which I think would be a task for Microsoft to do... Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2009-Jan-16 09:36 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Keir Fraser wrote:> On 16/01/2009 07:16, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> > wrote: > >>> Something like that would be better. Of course you''d need to measure work >>> done in the domUs as well, as one of the critical factors for this patch >>> would be how it affects fairness. It''s one reason I''m leery of this patch -- >>> our scheduler is unpredictable enough as it is without giving domains >>> another lever to pull! >> Keir, is the data I posted recently okay? >> I think my approach requires less changes than the "yield after spin" variant, >> which needed more patches in the hypervisor and didn''t seem to be settled. >> Having my patches in the hypervisor at least would make life much easier for >> our BS2000 system... >> I would add some code to ensure a domain isn''t misusing the new interface. > > It didn''t sound like there was much average difference between the two > approaches, also that George''s patches may be going in anyway for general > scheduling stability reasons, and also that any other observed hiccups may > also simply point to limitations of the scheduler implementation which > George may look at further.I think in extreme situations my approach will give better results. The higher the number of vcpus the better it will be. Avoiding descheduling in a critical path should always be preferred to a statistical search for the processor locking a resource.> > Do you have an explanation for why shell commands behave differently with > your patch, or alternatively why they can be delayed so long with the yield > approach?No hard data. It must be related to the yield in my spinlock patch somehow, as the problem did not occur with the same hypervisor and the "no deschedule" patch in Linux. But the problem requires George''s hypervisor patches to show up.> > The approach taken in Linux is not merely ''yield on spinlock'' by the way, it > is ''block on event channel on spinlock'' essentially turning a contended > spinlock into a sleeping mutex. I think that is quite different behaviour > from merely yielding, and expecting the scheduler to do something sensible > with your yield request.Could you explain this a little bit more in detail, please?> > Overall I think George should consider your patch as part of his overall > scheduler refurbishment work. I personally remain unconvinced that the > reactive approach cannot get predictable performance close to your approach, > and without needing new hypervisor interfaces.Perhaps a combination could be even better. My approach reduces latency while holding a lock, while the ''block on event channel on spinlock'' approach will use the time of an otherwise spinning vcpu for productive work. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-16 09:53 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
On 16/01/2009 09:36, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> wrote:>> The approach taken in Linux is not merely ''yield on spinlock'' by the way, it >> is ''block on event channel on spinlock'' essentially turning a contended >> spinlock into a sleeping mutex. I think that is quite different behaviour >> from merely yielding, and expecting the scheduler to do something sensible >> with your yield request. > > Could you explain this a little bit more in detail, please?Jeremy Fitzhardinge did the implementation for Linux, so I''m cc''ing him in case he remembers more details than me. Basically each CPU allocates itself an IPI event channel at start of day. When a CPU attempts to acquire a spinlock it spins a short while (perhaps a few microseconds?) and then adds itself to a bitmap stored in the lock structure (I think, or it might be a linked list of sleepers?). It then calls SCHEDOP_poll listing its IPI evtchn as its wakeup requirement. When the lock holder releases the lock it checks for sleepers and if it sees one then it pings one of them (or is it all of them?) on its event channel, thus waking it to take the lock. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2009-Jan-16 10:16 UTC
RE: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
> > Venefax wrote: > > I can test it a real-world situation. > > I have SUSE 10-SP2 and have terrible performance issues with fully > > virtualized SMP machines. I had to start using Standard PC as HAL to > avoid > > the penalty. > > Federico > > Federico, thanks for your support. But my patches are for PV-domains(or> at > least for domains with PV-drivers) only. I think you are using Windowsas> guest, so you would have to build a XEN-aware HAL... > > If you are testing with PV-domains, too, you are welcome, of course! >Juergen, Do you think your changes could be applicable to HVM domains with appropriately patched kernel spinlock routines? I had previously wondered about optimizing spinlocks, my idea was basically for Xen to set a bit in a structure to indicate what vcpus are currently scheduled, and my modified spinlock acquire routine would check if the current vcpu wants a spinlock that is held by a currently unscheduled vcpu, and if so yield to Xen to let the other vcpu schedule. The only thing I would need from Xen is to know which vcpus were currently scheduled, the rest would be DomU based. Does that approximate what you do? I''ll re-read your patch, I seem to remember something about borrowing time from Xen to keep the vcpu a little longer if a spinlock was held, so maybe you are taking a proactive approach to my reactive approach? The likelihood of this actually doing anything useful assumes that: . Windows always uses the KeAcquireXxx and KeReleaseXxx calls and there is no inlined spinlock access in the kernel (which would bypass my hooks) . That when Windows spins, it doesn''t yield already . That Xen actually deschedules a vcpu with a spinlock held often enough for this to matter Kernel patching only works on 32 bits though, so I''m not sure I''ll bother. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2009-Jan-16 10:31 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
James Harper wrote:> Juergen, > > Do you think your changes could be applicable to HVM domains with > appropriately patched kernel spinlock routines?Absolutely, yes. I''m doing this with our BS2000-port to x86/XEN, which is running as a HVM-domain using some PV-interfaces.> > I had previously wondered about optimizing spinlocks, my idea was > basically for Xen to set a bit in a structure to indicate what vcpus are > currently scheduled, and my modified spinlock acquire routine would > check if the current vcpu wants a spinlock that is held by a currently > unscheduled vcpu, and if so yield to Xen to let the other vcpu schedule. > > The only thing I would need from Xen is to know which vcpus were > currently scheduled, the rest would be DomU based. > > Does that approximate what you do? I''ll re-read your patch, I seem to > remember something about borrowing time from Xen to keep the vcpu a > little longer if a spinlock was held, so maybe you are taking a > proactive approach to my reactive approach?Correct. I avoid losing the physical cpu in critical paths. In my case I''ve used my new interface always when asynchronous interrupts are disabled explicitly, as this is done always when entering critical code paths. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-16 10:41 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
On 16/01/2009 10:16, "James Harper" <james.harper@bendigoit.com.au> wrote:> I had previously wondered about optimizing spinlocks, my idea was > basically for Xen to set a bit in a structure to indicate what vcpus are > currently scheduled, and my modified spinlock acquire routine would > check if the current vcpu wants a spinlock that is held by a currently > unscheduled vcpu, and if so yield to Xen to let the other vcpu schedule.That''s a lot more like our existing Linux pv_ops spinlock handling (yield/block instead of spin) than Juergen''s patch (don''t deschedule me while in a critical section). The difference from what you suggest is that we heuristically detect unscheduled lock holders by spinning a short while. You can pv up your Windows spinlocks in the block-instead-of-spin way already (and yield-instead-of-spin is obviously even easier). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2009-Jan-16 11:01 UTC
RE: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
> On 16/01/2009 10:16, "James Harper" <james.harper@bendigoit.com.au>wrote:> > > I had previously wondered about optimizing spinlocks, my idea was > > basically for Xen to set a bit in a structure to indicate what vcpusare> > currently scheduled, and my modified spinlock acquire routine would > > check if the current vcpu wants a spinlock that is held by acurrently> > unscheduled vcpu, and if so yield to Xen to let the other vcpuschedule.> > That''s a lot more like our existing Linux pv_ops spinlock handling > (yield/block instead of spin) than Juergen''s patch (don''t descheduleme> while in a critical section). The difference from what you suggest isthat> we heuristically detect unscheduled lock holders by spinning a short > while. > > You can pv up your Windows spinlocks in the block-instead-of-spin way > already (and yield-instead-of-spin is obviously even easier). >But only in spinlocks that I ''own'' completely right? I''m more concerned about spinlocks that I share with Windows (eg in NDIS). James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-16 11:14 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
On 16/01/2009 11:01, "James Harper" <james.harper@bendigoit.com.au> wrote:> But only in spinlocks that I ''own'' completely right? I''m more concerned > about spinlocks that I share with Windows (eg in NDIS).So only some critical sections will be pv''ed? Then the best you can currently do is yield-on-spin in your custom spin_lock(). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Jan-16 11:18 UTC
RE: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
>>> "James Harper" <james.harper@bendigoit.com.au> 16.01.09 12:01 >>> >> You can pv up your Windows spinlocks in the block-instead-of-spin way >> already (and yield-instead-of-spin is obviously even easier). >> > >But only in spinlocks that I ''own'' completely right? I''m more concerned >about spinlocks that I share with Windows (eg in NDIS).As long as you can hook the respective OS interface (and you know it is always used), you could do this on all spinlocks, since all that''s needed is logic in the acquire/release code (unless you want to add state to each lock [like the spinning CPUs bitmap Keir mentioned], but that shouldn''t be necessary to achieve the intended effect). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Prochniak
2009-Jan-16 14:40 UTC
RE: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
''enlightened'' windows OSs have SpinLock routines that make a hypercall to yield CPU to another VM. So if you comply with the windows hypervisor spec, you can get a performance boost when virtualized. This doesn''t help you out with Pre - vista versions though... Look at the disassembly for KfAcquireSpinLock. -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of James Harper Sent: Friday, January 16, 2009 6:02 AM To: Keir Fraser; Juergen Gross; Venefax Cc: George Dunlap; xen-devel@lists.xensource.com Subject: RE: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part> On 16/01/2009 10:16, "James Harper" <james.harper@bendigoit.com.au>wrote:> > > I had previously wondered about optimizing spinlocks, my idea was > > basically for Xen to set a bit in a structure to indicate what vcpusare> > currently scheduled, and my modified spinlock acquire routine would > > check if the current vcpu wants a spinlock that is held by acurrently> > unscheduled vcpu, and if so yield to Xen to let the other vcpuschedule.> > That''s a lot more like our existing Linux pv_ops spinlock handling > (yield/block instead of spin) than Juergen''s patch (don''t descheduleme> while in a critical section). The difference from what you suggest isthat> we heuristically detect unscheduled lock holders by spinning a short > while. > > You can pv up your Windows spinlocks in the block-instead-of-spin way > already (and yield-instead-of-spin is obviously even easier). >But only in spinlocks that I ''own'' completely right? I''m more concerned about spinlocks that I share with Windows (eg in NDIS). James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Jan-16 17:41 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
Keir Fraser wrote:> On 16/01/2009 09:36, "Juergen Gross" <juergen.gross@fujitsu-siemens.com> > wrote: > > >>> The approach taken in Linux is not merely ''yield on spinlock'' by the way, it >>> is ''block on event channel on spinlock'' essentially turning a contended >>> spinlock into a sleeping mutex. I think that is quite different behaviour >>> from merely yielding, and expecting the scheduler to do something sensible >>> with your yield request. >>> >> Could you explain this a little bit more in detail, please? >> > > Jeremy Fitzhardinge did the implementation for Linux, so I''m cc''ing him in > case he remembers more details than me. > > Basically each CPU allocates itself an IPI event channel at start of day. > When a CPU attempts to acquire a spinlock it spins a short while (perhaps a > few microseconds?) and then adds itself to a bitmap stored in the lock > structure (I think, or it might be a linked list of sleepers?). It then > calls SCHEDOP_poll listing its IPI evtchn as its wakeup requirement. When > the lock holder releases the lock it checks for sleepers and if it sees one > then it pings one of them (or is it all of them?) on its event channel, thus > waking it to take the lock. >Yes, that''s more or less right. Each lock has a count of how many cpus are waiting for the lock; if its non-zero on unlock, the unlocker kicks all the waiting cpus via IPI. There''s a per-cpu variable of "lock I am waiting for"; the kicker looks at each cpu''s entry and kicks it if its waiting for the lock being unlocked. The locking side does the expected "spin for a while, then block on timeout". The timeout is settable if you have the appropriate debugfs option enabled (which also produces quite a lot of detailed stats about locking behaviour). The IPI is never delivered as an event BTW; the locker uses the event poll hypercall to block until the event is pending (this hypercall had some performance problems until relatively recent versions of Xen; I''m not sure which release versions has the fix). The lock itself is a simple byte spinlock, with no fairness guarantees; I''m assuming (hoping) that the pathological cases that ticket locks were introduced to solve will be mitigated by the timeout/blocking path (and/or less likely in a virtual environment anyway). I measured a small performance improvement within the domain with this patch (kernbench-type workload), but an overall 10% reduction in system-wide CPU use with multiple competing domains. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Jan-16 17:43 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
James Harper wrote:> Do you think your changes could be applicable to HVM domains with > appropriately patched kernel spinlock routines? > > I had previously wondered about optimizing spinlocks, my idea was > basically for Xen to set a bit in a structure to indicate what vcpus are > currently scheduled, and my modified spinlock acquire routine would > check if the current vcpu wants a spinlock that is held by a currently > unscheduled vcpu, and if so yield to Xen to let the other vcpu schedule. > > The only thing I would need from Xen is to know which vcpus were > currently scheduled, the rest would be DomU based. >In a PV domain you can already get that information from the runstate_info structure, which can be mapped into the domain''s memory and just read directly. I don''t know if its available to an hvm domain, but I don''t think it would be hard to implement if it isn''t. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-Jan-19 17:15 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
On Fri, Jan 16, 2009 at 5:41 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:> Yes, that''s more or less right. Each lock has a count of how many cpus are > waiting for the lock; if its non-zero on unlock, the unlocker kicks all the > waiting cpus via IPI. There''s a per-cpu variable of "lock I am waiting > for"; the kicker looks at each cpu''s entry and kicks it if its waiting for > the lock being unlocked. > > The locking side does the expected "spin for a while, then block on > timeout". The timeout is settable if you have the appropriate debugfs > option enabled (which also produces quite a lot of detailed stats about > locking behaviour). The IPI is never delivered as an event BTW; the locker > uses the event poll hypercall to block until the event is pending (this > hypercall had some performance problems until relatively recent versions of > Xen; I''m not sure which release versions has the fix). > > The lock itself is a simple byte spinlock, with no fairness guarantees; I''m > assuming (hoping) that the pathological cases that ticket locks were > introduced to solve will be mitigated by the timeout/blocking path (and/or > less likely in a virtual environment anyway). > > I measured a small performance improvement within the domain with this patch > (kernbench-type workload), but an overall 10% reduction in system-wide CPU > use with multiple competing domains.This is in the pv-ops kernel; is it in the Xen 2.6.18 kernel yet? The advantage of the block approach over yielding is that you don''t have these crazy priority problems: The reason v0 (who is waiting for the spinlock) is running right now and v1 (which holds the spinlock) is not is usually because v1 is out of credits and v0 isn''t; so calling "schedule" often just results in v0 being chosen as the "best candidate" over again. The solution in the patch I sent is to temporarily reduce the priority on a yield; but that''s inherently a little unpredictable. (Another option might be to re-balance credits to vcpus on a yield.) The disadvantage of this approach is that it is rather complicated, and would have to be re-implemented for each OS. In theory it should be able to be implemented in Windows, but it may not be that simple. And it''s got to be implemented all-or-nothing for each spinlock; i.e., if any caller of the spin_lock() for a given lock blocks, all callers of spin_unlock() on that lock need to know to wake the blocker up. I don''t expect that to be a problem in Windows, but it may be. Another thing to consider is how the approach applies to a related problem, that of "syncronous" IPI function calls: i.e., when v0 sends an IPI to v1 to do something, and spins waiting for it to be done, expecting it to be finished pretty quickly. But v1 is over credits, so it doesn''t get to run, and v0 burns its credits waiting. At any rate, I''m working on the scheduler now, and I''ll be considering the "don''t-deschedule" option in due time. :-) Peace, -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-Jan-19 17:32 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
On Mon, Jan 12, 2009 at 12:55 PM, Juergen Gross <juergen.gross@fujitsu-siemens.com> wrote:> Conclusion: > ----------- > Differences not really big, but my "no deschedule" patch had least elapsed > time for build-jobs, while scp was able to transfer same amount of data as > in slower original system. > The "Yield in spinlock" patch had slightly better dbench performance, but > interactive shell commands were a pain sometimes! I suspect some problem in > George''s patches during low system load to be the main reason for this > behaviour. Without George''s patches the "Yield in spinlock" was very similar > to the original system.Hmm, the shell performance is a little worrying. There may be something strange going on... Without my patches (at least, without the "yield reduces priority" patch), "yield" is basically a no-op, so "yield in spinlock" is functionally equivalent to the original system. According to your numbers, the "user time" and "system time" were exactly the same (only 0.6 seconds longer on system time), even though the overall build took 52 seconds longer. Is it possible that the "yield" patches actually made it run less often? scp works over tcp, which is often sensitive to latency; so it''s possible that the lowered priority on yield caused "hiccoughs", both in the scp connections, and the interactive shell performance. Anyway, I''ll be looking into it after doing a scheduler update. Peace, -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2009-Jan-20 07:56 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
George Dunlap wrote:> On Mon, Jan 12, 2009 at 12:55 PM, Juergen Gross > <juergen.gross@fujitsu-siemens.com> wrote: >> Conclusion: >> ----------- >> Differences not really big, but my "no deschedule" patch had least elapsed >> time for build-jobs, while scp was able to transfer same amount of data as >> in slower original system. >> The "Yield in spinlock" patch had slightly better dbench performance, but >> interactive shell commands were a pain sometimes! I suspect some problem in >> George''s patches during low system load to be the main reason for this >> behaviour. Without George''s patches the "Yield in spinlock" was very similar >> to the original system. > > Hmm, the shell performance is a little worrying. There may be > something strange going on... > > Without my patches (at least, without the "yield reduces priority" > patch), "yield" is basically a no-op, so "yield in spinlock" is > functionally equivalent to the original system. > > According to your numbers, the "user time" and "system time" were > exactly the same (only 0.6 seconds longer on system time), even though > the overall build took 52 seconds longer. Is it possible that the > "yield" patches actually made it run less often?I assume this could be possible. Do you think the following is reasonable? If a vcpu is waiting for a lock it will yield, which will reduce it''s priority. This will increase the latency, if other vcpus are ready to run. Normally the vcpu waiting for a lock is in some kind of critical path which might be delayed significantly by the yield. In sum the complete machine isn''t burning cycles spinning, but the code paths with lock conflicts are the loosers...> > scp works over tcp, which is often sensitive to latency; so it''s > possible that the lowered priority on yield caused "hiccoughs", both > in the scp connections, and the interactive shell performance.This sounds reasonable to me. Juergen -- Juergen Gross Principal Developer IP SW OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Siemens Computers e-mail: juergen.gross@fujitsu-siemens.com Otto-Hahn-Ring 6 Internet: www.fujitsu-siemens.com D-81739 Muenchen Company details: www.fujitsu-siemens.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Jan-20 20:12 UTC
Re: [Xen-devel] [Patch 2 of 2]: PV-domain SMP performance Linux-part
George Dunlap wrote:> On Fri, Jan 16, 2009 at 5:41 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > >> Yes, that''s more or less right. Each lock has a count of how many cpus are >> waiting for the lock; if its non-zero on unlock, the unlocker kicks all the >> waiting cpus via IPI. There''s a per-cpu variable of "lock I am waiting >> for"; the kicker looks at each cpu''s entry and kicks it if its waiting for >> the lock being unlocked. >> >> The locking side does the expected "spin for a while, then block on >> timeout". The timeout is settable if you have the appropriate debugfs >> option enabled (which also produces quite a lot of detailed stats about >> locking behaviour). The IPI is never delivered as an event BTW; the locker >> uses the event poll hypercall to block until the event is pending (this >> hypercall had some performance problems until relatively recent versions of >> Xen; I''m not sure which release versions has the fix). >> >> The lock itself is a simple byte spinlock, with no fairness guarantees; I''m >> assuming (hoping) that the pathological cases that ticket locks were >> introduced to solve will be mitigated by the timeout/blocking path (and/or >> less likely in a virtual environment anyway). >> >> I measured a small performance improvement within the domain with this patch >> (kernbench-type workload), but an overall 10% reduction in system-wide CPU >> use with multiple competing domains. >> > > This is in the pv-ops kernel; is it in the Xen 2.6.18 kernel yet? >Yes. No plans to backport.> Another thing to consider is how the approach applies to a related > problem, that of "syncronous" IPI function calls: i.e., when v0 sends > an IPI to v1 to do something, and spins waiting for it to be done, > expecting it to be finished pretty quickly. But v1 is over credits, > so it doesn''t get to run, and v0 burns its credits waiting. >Yes. Some kind of direct yield might work in that case. In practice it hasn''t been a huge problem in Linux because most synchronous IPIs are for cross-cpu TLB flushes, which we use a hypercall for anyway. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel