Marek Marczykowski
2011-Jul-06 18:57 UTC
[Xen-devel] locking mechanism in hotplug scripts not working
Hello,
I''ve found that locking isn''t working as it should... It
allows many
processes to claim lock simultaneously:
Full _claim_lock and description of race:
48 _claim_lock()
49 {
50 local lockdir="$1"
51 local owner=$(_lock_owner "$lockdir")
52 local retries=0
53
54 while [ $retries -lt $LOCK_RETRIES ]
55 do
56 mkdir "$lockdir" 2>/dev/null && trap
"_release_lock $lockdir;
sigerr" ERR &&
57 _update_lock_info "$lockdir" && return
58
59 local new_owner=$(_lock_owner "$lockdir")
60 if [ "$new_owner" != "$owner" ]
61 then
62 owner="$new_owner"
63 retries=0
64 else
65 local pid=$(echo $owner | cut -d : -f 1)
66 if [ -n "$pid" -a "$pid" != "unknown" -a
! -f
"/proc/$pid/status" ]
67 then
68 _release_lock $lockdir
69 fi
70 fi
71
72 if [ $retries -gt $LOCK_SPINNING_RETRIES ]
73 then
74 sleep $LOCK_SLEEPTIME
75 else
76 sleep 0
77 fi
78 retries=$(($retries + 1))
79 done
80 _steal_lock "$lockdir"
81 }
The race is:
P1: claim_lock - success
P2: claim_lock - read lock owner ("owner" file in lock dir) - l51
P1: done things and release_lock
P1: exit
P3: claim_lock - success
P2: notice that P1 is dead (read previously) -> release_lock l68 (!!!)
P2: claim_lock l56 - success
Both P2 and P3 in critical section
I don''t have idea how to fix it in current shape. Some workaround is to
remove lines 64-69... Perhaps proper fix is to use flock(1) utility, but
this will may be less portable...
--
Pozdrawiam / Best Regards,
Marek Marczykowski | RLU #390519
marmarek at mimuw edu pl | xmpp:marmarek at staszic waw pl
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jul-07 08:16 UTC
Re: [Xen-devel] locking mechanism in hotplug scripts not working
On Wed, 2011-07-06 at 19:57 +0100, Marek Marczykowski wrote:> Hello, > > I''ve found that locking isn''t working as it should... It allows many > processes to claim lock simultaneously:[...]> The race is: > P1: claim_lock - success > P2: claim_lock - read lock owner ("owner" file in lock dir) - l51 > P1: done things and release_lock > P1: exit > P3: claim_lock - success > P2: notice that P1 is dead (read previously) -> release_lock l68 (!!!) > P2: claim_lock l56 - success > > Both P2 and P3 in critical sectionUrk, yes, I think you are correct about this.> I don''t have idea how to fix it in current shape.Me neither.> Some workaround is to remove lines 64-69...I don''t think that would be all that bad. It seems like this is trying to handle the case where a script exits without unlocking but we have a trap on exit handler for that. If the trap handler isn''t working then either (a) the hotplug script has added another trap handler during the critical section overwriting this one IMHO this is buggy if it doesn''t also release the lock or (b) there is a bug of some sort in the shell implementation itself. We should fix cases of (a) and ignore cases of (b) since it indicates a more fundamental problem is at work.> Perhaps proper fix is to use flock(1) utility, but > this will may be less portable...The scripts in question live under tools/hotplug/Linux, which suggests they are at least somewhat Linux specific, where I think it is reasonable to rely on flock(1) being available (it comes from util-linux which is widespread in the Linux world). Looking at flock(1) it seems that using it would involve quite a bit of restructuring of the callers (since we''d likely be using the 3rd form shown in the manpage). Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel