Igor Pavlikevich
2013-Feb-25  08:20 UTC
[PATCH] credit: fix race condition in csched_vcpu flags
When vcpu stops yielding, it''s possible to overwrite
CSCHED_FLAG_VCPU_PARKED flag after clearing CSCHED_FLAG_VCPU_YIELD.
 332.970218393 -------x d1v1 runstate_continue d1v1 running->running
]332.970248371 -------x d1v1   28005(2:8:5) 2 [ 1 1 ]                  yielding,
flags=0x0002
]332.970251968 -------x d1v1                                           clearing
0x0002, flags=0
 332.970252175 -------x d1v1 runstate_continue d1v1 running->running
]332.970282574 -------x d1v1   28005(2:8:5) 2 [ 1 1 ]                  yielding,
flags=0x0002
]332.970283068 x------| d32767v0                                       entering
csched_acct(), d1v1 flags=0x0002
]332.970285763 x------| d32767v0   28003(2:8:3) 2 [ 1 1 ]              calling
vcpu_pause_nosync() for d1v1
]332.970286647 -------x d1v1                                           clearing
0x0002, flags=0
]332.970286771 x------| d32767v0                                       flags |=
CSCHED_FLAG_VCPU_PARKED, flags=0x0003
]332.970286990 -------x d1v1   2800e(2:8:e) 2 [ 1 1cacfca ]
]332.970287122 -------x d1v1   2800f(2:8:f) 3 [ 7fff 1cacfca ffffffff ]
]332.970287242 -------x d1v1   2800a(2:8:a) 4 [ 1 1 7fff 7 ]           exiting
schedule(), flags=0
 332.970287417 -------x d1v1 runstate_change d1v1 running->offline
 332.970287629 -------x d?v? runstate_change d32767v7 runnable->running
]332.970288397 x------- d32767v0                                       exiting
csched_acct(), flags=0x0003
]332.995349690 x------- d32767v0                                       entering
csched_acct(), d1v1 flags=0x0000, flag lost.
]332.995352170 x------- d32767v0   28003(2:8:3) 2 [ 1 1 ]              calling
vcpu_pause_nosync() for d1v1 again
From this moment d1v1 has extra +1 in pause_count, so it goes offline right
after credit>prv->credits_per_tslice forever.
Signed-off-by: Igor Pavlikevich <ipavlikevich@gmail.com>
diff -r 66f563be41d9 -r 4d2fb69d362b xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Tue Feb 19 10:49:53 2013 +0100
+++ b/xen/common/sched_credit.c	Mon Feb 25 07:54:02 2013 +0000
@@ -1155,7 +1155,9 @@ csched_acct(void* dummy)
                 {
                     SCHED_STAT_CRANK(vcpu_park);
                     vcpu_pause_nosync(svc->vcpu);
+                    vcpu_schedule_lock_irq(svc->vcpu);
                     svc->flags |= CSCHED_FLAG_VCPU_PARKED;
+                    vcpu_schedule_unlock_irq(svc->vcpu);
                 }
 
                 /* Lower bound on credits */
George Dunlap
2013-Feb-26  14:19 UTC
Re: [PATCH] credit: fix race condition in csched_vcpu flags
On Mon, Feb 25, 2013 at 8:20 AM, Igor Pavlikevich <ipavlikevich@gmail.com>wrote:> When vcpu stops yielding, it''s possible to overwrite > CSCHED_FLAG_VCPU_PARKED flag after clearing CSCHED_FLAG_VCPU_YIELD. > > 332.970218393 -------x d1v1 runstate_continue d1v1 running->running > ]332.970248371 -------x d1v1 28005(2:8:5) 2 [ 1 1 ] > yielding, flags=0x0002 > ]332.970251968 -------x d1v1 > clearing 0x0002, flags=0 > 332.970252175 -------x d1v1 runstate_continue d1v1 running->running > ]332.970282574 -------x d1v1 28005(2:8:5) 2 [ 1 1 ] > yielding, flags=0x0002 > ]332.970283068 x------| d32767v0 > entering csched_acct(), d1v1 flags=0x0002 > ]332.970285763 x------| d32767v0 28003(2:8:3) 2 [ 1 1 ] > calling vcpu_pause_nosync() for d1v1 > ]332.970286647 -------x d1v1 > clearing 0x0002, flags=0 > ]332.970286771 x------| d32767v0 > flags |= CSCHED_FLAG_VCPU_PARKED, flags=0x0003 > ]332.970286990 -------x d1v1 2800e(2:8:e) 2 [ 1 1cacfca ] > ]332.970287122 -------x d1v1 2800f(2:8:f) 3 [ 7fff 1cacfca ffffffff ] > ]332.970287242 -------x d1v1 2800a(2:8:a) 4 [ 1 1 7fff 7 ] > exiting schedule(), flags=0 > 332.970287417 -------x d1v1 runstate_change d1v1 running->offline > 332.970287629 -------x d?v? runstate_change d32767v7 runnable->running > ]332.970288397 x------- d32767v0 > exiting csched_acct(), flags=0x0003 > ]332.995349690 x------- d32767v0 > entering csched_acct(), d1v1 flags=0x0000, flag lost. > ]332.995352170 x------- d32767v0 28003(2:8:3) 2 [ 1 1 ] > calling vcpu_pause_nosync() for d1v1 again > > From this moment d1v1 has extra +1 in pause_count, so it goes offline > right after credit>prv->credits_per_tslice forever. >Good catch -- I guess we don''t really test the scheduler "cap" functionality enough to catch this one. Unfortunately I think the real problem is that the ->flags field should be uniformly using the atomic bit operations. Can you try the attached patch and see if it works for you? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Igor Pavlikevich
2013-Feb-26  18:34 UTC
Re: [PATCH] credit: fix race condition in csched_vcpu flags
On Tue, Feb 26, 2013 at 6:19 PM, George Dunlap <George.Dunlap@eu.citrix.com>wrote:> On Mon, Feb 25, 2013 at 8:20 AM, Igor Pavlikevich <ipavlikevich@gmail.com>wrote: > >> When vcpu stops yielding, it''s possible to overwrite >> CSCHED_FLAG_VCPU_PARKED flag after clearing CSCHED_FLAG_VCPU_YIELD. >> >> 332.970218393 -------x d1v1 runstate_continue d1v1 running->running >> ]332.970248371 -------x d1v1 28005(2:8:5) 2 [ 1 1 ] >> yielding, flags=0x0002 >> ]332.970251968 -------x d1v1 >> clearing 0x0002, flags=0 >> 332.970252175 -------x d1v1 runstate_continue d1v1 running->running >> ]332.970282574 -------x d1v1 28005(2:8:5) 2 [ 1 1 ] >> yielding, flags=0x0002 >> ]332.970283068 x------| d32767v0 >> entering csched_acct(), d1v1 flags=0x0002 >> ]332.970285763 x------| d32767v0 28003(2:8:3) 2 [ 1 1 ] >> calling vcpu_pause_nosync() for d1v1 >> ]332.970286647 -------x d1v1 >> clearing 0x0002, flags=0 >> ]332.970286771 x------| d32767v0 >> flags |= CSCHED_FLAG_VCPU_PARKED, flags=0x0003 >> ]332.970286990 -------x d1v1 2800e(2:8:e) 2 [ 1 1cacfca ] >> ]332.970287122 -------x d1v1 2800f(2:8:f) 3 [ 7fff 1cacfca ffffffff ] >> ]332.970287242 -------x d1v1 2800a(2:8:a) 4 [ 1 1 7fff 7 ] >> exiting schedule(), flags=0 >> 332.970287417 -------x d1v1 runstate_change d1v1 running->offline >> 332.970287629 -------x d?v? runstate_change d32767v7 runnable->running >> ]332.970288397 x------- d32767v0 >> exiting csched_acct(), flags=0x0003 >> ]332.995349690 x------- d32767v0 >> entering csched_acct(), d1v1 flags=0x0000, flag lost. >> ]332.995352170 x------- d32767v0 28003(2:8:3) 2 [ 1 1 ] >> calling vcpu_pause_nosync() for d1v1 again >> >> From this moment d1v1 has extra +1 in pause_count, so it goes offline >> right after credit>prv->credits_per_tslice forever. >> > > Good catch -- I guess we don''t really test the scheduler "cap" > functionality enough to catch this one. > > Unfortunately I think the real problem is that the ->flags field should be > uniformly using the atomic bit operations. > > Can you try the attached patch and see if it works for you? > >Thanks for help, your patch works well. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel