Chun Yan Liu
2010-Nov-25 03:10 UTC
[Xen-devel][PATCH]improve suspend_evtchn lock processing
While doing migration, sometimes found suspend lock file was not unlinked in previous operations, then there is an obsolete lock file in place, which causes the current and later migration cannot get lock. That happens seldomly but do happen. After checking the source code, I found there are some places that potentially cause lock file unlinked, including: 1) in lock_suspend_event() function, when write_exact fails, it doesn''t remove the lock file and later there is no chance to remove it. 2) in xc_suspend_evtchn_init() function, in "cleanup" block, I can''t see any reason for checking if(suspend_evtchn != -1), if suspend_evtchn=-1, it doesn''t remove the lock file and later there is no chance to remove it. These places have been modified in the following patch, and, in addition, a function clean_obsolete_lock() is added to clean an invalid lock file which is unlinked from previous operations for any kind of reason (e.g., kill the process). Any comments? Signed-off-by Chunyan Liu diff -r 3c4c3d48a835 tools/libxc/xc_suspend.c --- a/tools/libxc/xc_suspend.c Thu Aug 26 11:16:56 2010 +0100 +++ b/tools/libxc/xc_suspend.c Thu Nov 25 18:44:35 2010 +0800 @@ -16,8 +16,40 @@ #include "xc_private.h" #include "xenguest.h" +#include #define SUSPEND_LOCK_FILE "/var/lib/xen/suspend_evtchn" +/* cleanup obsolete suspend lock file which is unlinked for any reason, +so that current process can get lock */ +static void clean_obsolete_lock(int domid) +{ + int fd, pid, n; + char buf[128]; + char suspend_file[256]; + + snprintf(suspend_file, sizeof(suspend_file), "%s_%d_lock.d", + SUSPEND_LOCK_FILE, domid); + fd = open(suspend_file, O_RDWR); + + if (fd < 0) + return; + + n = read(fd, buf, 127); + + close(fd); + + if (n > 0) + { + sscanf(buf, "%d", &pid); + /* pid does not exist, this lock file is obsolete, just delete it */ + if ( kill(pid,0) ) + { + unlink(suspend_file); + return; + } + } +} + static int lock_suspend_event(xc_interface *xch, int domid) { int fd, rc; @@ -27,6 +59,7 @@ snprintf(suspend_file, sizeof(suspend_file), "%s_%d_lock.d", SUSPEND_LOCK_FILE, domid); + clean_obsolete_lock(domid); mask = umask(022); fd = open(suspend_file, O_CREAT | O_EXCL | O_RDWR, 0666); if (fd < 0) @@ -41,6 +74,9 @@ rc = write_exact(fd, buf, strlen(buf)); close(fd); + if(rc) + unlink(suspend_file); + return rc; } @@ -127,8 +163,7 @@ return suspend_evtchn; cleanup: - if (suspend_evtchn != -1) - xc_suspend_evtchn_release(xch, xce, domid, suspend_evtchn); + xc_suspend_evtchn_release(xch, xce, domid, suspend_evtchn); return -1; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Nov-25 20:03 UTC
Re: [Xen-devel][PATCH]improve suspend_evtchn lock processing
Chun Yan Liu writes ("[Xen-devel][PATCH]improve suspend_evtchn lock processing"):> While doing migration, sometimes found suspend lock file was not > unlinked in previous operations, then there is an obsolete lock file > in place, which causes the current and later migration cannot get > lock. That happens seldomly but do happen.I think we should be using fcntl or flock, not an O_EXCL lockfile. This kind of messing about with pids reduces the proportion of situations where things go wrong, but it cannot be made race-free. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Chun Yan Liu
2010-Nov-26 03:43 UTC
Re: [Xen-devel][PATCH]improve suspend_evtchn lock processing
Yes. Using fctrl or flock instead can avoid the problem, better than an O_EXCL lock file. I made a patch to use flock instead, please have a review: diff -r 3c4c3d48a835 tools/libxc/xc_suspend.c --- a/tools/libxc/xc_suspend.c Thu Aug 26 11:16:56 2010 +0100 +++ b/tools/libxc/xc_suspend.c Fri Nov 26 19:41:23 2010 +0800 @@ -16,19 +16,19 @@ #include "xc_private.h" #include "xenguest.h" +#include #define SUSPEND_LOCK_FILE "/var/lib/xen/suspend_evtchn" static int lock_suspend_event(xc_interface *xch, int domid) { int fd, rc; mode_t mask; - char buf[128]; char suspend_file[256]; snprintf(suspend_file, sizeof(suspend_file), "%s_%d_lock.d", SUSPEND_LOCK_FILE, domid); mask = umask(022); - fd = open(suspend_file, O_CREAT | O_EXCL | O_RDWR, 0666); + fd = open(suspend_file, O_CREAT | O_RDWR, 0666); if (fd < 0) { ERROR("Can''t create lock file for suspend event channel %s\n", @@ -36,9 +36,7 @@ return -EINVAL; } umask(mask); - snprintf(buf, sizeof(buf), "%10ld", (long)getpid()); - - rc = write_exact(fd, buf, strlen(buf)); + rc = flock(fd, LOCK_EX | LOCK_NB); close(fd); return rc; @@ -46,8 +44,7 @@ static int unlock_suspend_event(xc_interface *xch, int domid) { - int fd, pid, n; - char buf[128]; + int fd, rc; char suspend_file[256]; snprintf(suspend_file, sizeof(suspend_file), "%s_%d_lock.d", @@ -57,22 +54,13 @@ if (fd < 0) return -EINVAL; - n = read(fd, buf, 127); - + rc = flock(fd, LOCK_UN | LOCK_NB); close(fd); - if (n > 0) - { - sscanf(buf, "%d", &pid); - /* We are the owner, so we can simply delete the file */ - if (pid == getpid()) - { - unlink(suspend_file); - return 0; - } - } + if(!rc) + unlink(suspend_file); - return -EPERM; + return rc; } int xc_await_suspend(xc_interface *xch, int xce, int suspend_evtchn) @@ -127,8 +115,7 @@ return suspend_evtchn; cleanup: - if (suspend_evtchn != -1) - xc_suspend_evtchn_release(xch, xce, domid, suspend_evtchn); + xc_suspend_evtchn_release(xch, xce, domid, suspend_evtchn); return -1; }>>> Ian Jackson 11/25/10 3:03 PM >>>Chun Yan Liu writes ("[Xen-devel][PATCH]improve suspend_evtchn lock processing"):> While doing migration, sometimes found suspend lock file was not > unlinked in previous operations, then there is an obsolete lock file > in place, which causes the current and later migration cannot get > lock. That happens seldomly but do happen.I think we should be using fcntl or flock, not an O_EXCL lockfile. This kind of messing about with pids reduces the proportion of situations where things go wrong, but it cannot be made race-free. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Dec-14 18:46 UTC
Re: [Xen-devel][PATCH]improve suspend_evtchn lock processing
Chun Yan Liu writes ("Re: [Xen-devel][PATCH]improve suspend_evtchn lock processing"):> Yes. Using fctrl or flock instead can avoid the problem, better than an O_EXCL lock file. > I made a patch to use flock instead, please have a review:Thanks for this patch and sorry for the delay replying. I have just a few comments:> diff -r 3c4c3d48a835 tools/libxc/xc_suspend.c > --- a/tools/libxc/xc_suspend.c Thu Aug 26 11:16:56 2010 +0100 > +++ b/tools/libxc/xc_suspend.c Fri Nov 26 19:41:23 2010 +0800 > @@ -16,19 +16,19 @@...> +#includeThat looks like a mistake.> @@ -127,8 +115,7 @@ > return suspend_evtchn; > > cleanup: > - if (suspend_evtchn != -1) > - xc_suspend_evtchn_release(xch, xce, domid, suspend_evtchn); > + xc_suspend_evtchn_release(xch, xce, domid, suspend_evtchn); > > return -1; > }What is this change for ? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Chun Yan Liu
2010-Dec-22 06:32 UTC
答复: Re: [Xen-devel][PATCH]improve suspend_evtchn lock processing
Sorry for delay, following is an explanation:>> --- a/tools/libxc/xc_suspend.c Thu Aug 26 11:16:56 2010 +0100 >> +++ b/tools/libxc/xc_suspend.c Fri Nov 26 19:41:23 2010 +0800 >> @@ -16,19 +16,19 @@ >... >> +#include>That looks like a mistake.I wonder if you think the header file has something wrong? flock is defined in <sys/file.h>. To use it, need to include that header file.>> @@ -127,8 +115,7 @@ >> return suspend_evtchn; >> >> cleanup: >> - if (suspend_evtchn != -1) >> - xc_suspend_evtchn_release(xch, xce, domid, suspend_evtchn); >> + xc_suspend_evtchn_release(xch, xce, domid, suspend_evtchn); >> >> return -1; >> }>What is this change for ?if (suspend_evtchn != -1) part is unnecessary and might leave an orphan lock file. Only suspend_evtchn > 0 means we got the lock successfully, all other values means we failed to bind suspend event channel, so we should do cleanup work, including: the lock file we created before should be removed. Otherwise, in xc_save, if suspend_evtchn = -1, there will be no chance to remove the lock file. I noticed this part was added by Jiang, Yunhong, I''ve talked with him before, he also thought that the if (suspend_evtchn != -1) part should be removed altogether. Thanks. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Maybe Matching Threads
- [PATCH] Janitorial work on xc_save.c
- [PATCH 0 of 4] Support for VM generation ID save/restore and migrate
- [PATCH 0 of 3] Support for VM generation ID save/restore and migrate
- [PATCH 0 of 2] Support for VM generation ID save/restore and migrate
- Manipulating Arch specific code generator state