Marek Marczykowski
2011-Jul-06 18:57 UTC
[Xen-devel] locking mechanism in hotplug scripts not working
Hello, I''ve found that locking isn''t working as it should... It allows many processes to claim lock simultaneously: Full _claim_lock and description of race: 48 _claim_lock() 49 { 50 local lockdir="$1" 51 local owner=$(_lock_owner "$lockdir") 52 local retries=0 53 54 while [ $retries -lt $LOCK_RETRIES ] 55 do 56 mkdir "$lockdir" 2>/dev/null && trap "_release_lock $lockdir; sigerr" ERR && 57 _update_lock_info "$lockdir" && return 58 59 local new_owner=$(_lock_owner "$lockdir") 60 if [ "$new_owner" != "$owner" ] 61 then 62 owner="$new_owner" 63 retries=0 64 else 65 local pid=$(echo $owner | cut -d : -f 1) 66 if [ -n "$pid" -a "$pid" != "unknown" -a ! -f "/proc/$pid/status" ] 67 then 68 _release_lock $lockdir 69 fi 70 fi 71 72 if [ $retries -gt $LOCK_SPINNING_RETRIES ] 73 then 74 sleep $LOCK_SLEEPTIME 75 else 76 sleep 0 77 fi 78 retries=$(($retries + 1)) 79 done 80 _steal_lock "$lockdir" 81 } The race is: P1: claim_lock - success P2: claim_lock - read lock owner ("owner" file in lock dir) - l51 P1: done things and release_lock P1: exit P3: claim_lock - success P2: notice that P1 is dead (read previously) -> release_lock l68 (!!!) P2: claim_lock l56 - success Both P2 and P3 in critical section I don''t have idea how to fix it in current shape. Some workaround is to remove lines 64-69... Perhaps proper fix is to use flock(1) utility, but this will may be less portable... -- Pozdrawiam / Best Regards, Marek Marczykowski | RLU #390519 marmarek at mimuw edu pl | xmpp:marmarek at staszic waw pl _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jul-07 08:16 UTC
Re: [Xen-devel] locking mechanism in hotplug scripts not working
On Wed, 2011-07-06 at 19:57 +0100, Marek Marczykowski wrote:> Hello, > > I''ve found that locking isn''t working as it should... It allows many > processes to claim lock simultaneously:[...]> The race is: > P1: claim_lock - success > P2: claim_lock - read lock owner ("owner" file in lock dir) - l51 > P1: done things and release_lock > P1: exit > P3: claim_lock - success > P2: notice that P1 is dead (read previously) -> release_lock l68 (!!!) > P2: claim_lock l56 - success > > Both P2 and P3 in critical sectionUrk, yes, I think you are correct about this.> I don''t have idea how to fix it in current shape.Me neither.> Some workaround is to remove lines 64-69...I don''t think that would be all that bad. It seems like this is trying to handle the case where a script exits without unlocking but we have a trap on exit handler for that. If the trap handler isn''t working then either (a) the hotplug script has added another trap handler during the critical section overwriting this one IMHO this is buggy if it doesn''t also release the lock or (b) there is a bug of some sort in the shell implementation itself. We should fix cases of (a) and ignore cases of (b) since it indicates a more fundamental problem is at work.> Perhaps proper fix is to use flock(1) utility, but > this will may be less portable...The scripts in question live under tools/hotplug/Linux, which suggests they are at least somewhat Linux specific, where I think it is reasonable to rely on flock(1) being available (it comes from util-linux which is widespread in the Linux world). Looking at flock(1) it seems that using it would involve quite a bit of restructuring of the callers (since we''d likely be using the 3rd form shown in the manpage). Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel