Hi all, I think this is a serious bug, existing in the pvops 2.6.31.13 (also existing in linux 2.6.35); In the xenbus driver (drivers/xen/xenfs/xenbus.c), the function of xenbus_file_read has a section of source code like this: if (ret != sz) { if (i == 0) i = -EFAULT; goto out; } /* Clear out buffer if it has been consumed */ if (rb->cons == rb->len) { list_del(&rb->list); kfree(rb); if (list_empty(&u->read_buffers)) break; rb = list_entry(u->read_buffers.next, struct read_buffer, list); } It should be like this: // if (ret != sz) { if (ret != 0) { if (i == 0) i = -EFAULT; goto out; } This bug makes the read_buffer not be cleared most of the time. If the xenstore client uses PTHREAD to create a thread to receive reply message, the problem will incur. The new thread can not read what it wants to read, since the list is not empty. I found this problem from the xenstore client xs_watch function. xs_watch creates the new thread on demand. So I recommend that in the function of read_message(xen/tools/xenstore/xs.c), if using thread to receive message, in the case of read fault, it should signal to the listener and print out the error. Jun Zhu Citrix Systems UK _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/25/2010 11:57 AM, Jun Zhu (Intern) wrote:> Hi all, > > I think this is a serious bug, existing in the pvops 2.6.31.13 (also existing in linux 2.6.35); > > In the xenbus driver (drivers/xen/xenfs/xenbus.c), the function of xenbus_file_read has a section of source code like this: > if (ret != sz) { > if (i == 0) > i = -EFAULT; > goto out; > } > /* Clear out buffer if it has been consumed */ > if (rb->cons == rb->len) { > list_del(&rb->list); > kfree(rb); > if (list_empty(&u->read_buffers)) > break; > rb = list_entry(u->read_buffers.next, > struct read_buffer, list); > } > It should be like this: > // if (ret != sz) { > if (ret != 0) { > if (i == 0) > i = -EFAULT; > goto out; > } > This bug makes the read_buffer not be cleared most of the time. If the xenstore client uses PTHREAD to create a thread to receive reply message, the problem will incur. The new thread can not read what it wants to read, since the list is not empty. > > I found this problem from the xenstore client xs_watch function. xs_watch creates the new thread on demand. So I recommend that in the function of read_message(xen/tools/xenstore/xs.c), if using thread to receive message, in the case of read fault, it should signal to the listener and print out the error.I don''t follow your description of the problem. The full context of the code in question is this: ret = copy_to_user(ubuf + i, &rb->msg[rb->cons], sz); i += sz - ret; rb->cons += sz - ret; if (ret != sz) { if (i == 0) i = -EFAULT; goto out; } copy_to_user returns the number of uncopied bytes, so if ret != sz, then some part of the data was not copied due to a memory fault problem. Changing the test to "ret != 0" means that it will return a partial result (at the end of a page, for example) without reporting an error, and discarding the uncopied data. If you''re hitting the "ret != sz" case often, then I think there''s some problem with the memory you''re passing into read() (either the buffer itself, or the size argument). Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/25/2010 12:11 PM, Jeremy Fitzhardinge wrote:> copy_to_user returns the number of uncopied bytes, so if ret != sz, then > some part of the data was not copied due to a memory fault problem.No, wait, I''ve contradicted myself. copy_to_user does return the number of uncopied bytes, so the result should be 0 on success, and you''re right, the test should be "ret != 0". Sorry about that. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/25/2010 11:57 AM, Jun Zhu (Intern) wrote:> This bug makes the read_buffer not be cleared most of the time. If the xenstore client uses PTHREAD to create a thread to receive reply message, the problem will incur. The new thread can not read what it wants to read, since the list is not empty.This still puzzles me. The bug you found means that partially-successful reads will not be reported correctly, but completely successful and completely unsuccessful reads should be OK. So if you''re seeing a problem as a result, it still indicates there''s something wrong with what you''re passing to read(). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Sorry for my description. "Does not report correctly" is not the problem of xenbus read. I mean that in the xenstore/xs.c code, when using another thread (#ifdef USE_PTHREAD) to receive responses from xenbus, if this thread exits incorrectly, the listener should be noticed, or the listener will wait for response for ever. Jun Zhu Citrix Systems UK ________________________________________ From: Jeremy Fitzhardinge [jeremy@goop.org] Sent: Wednesday, August 25, 2010 3:19 PM To: Jun Zhu (Intern) Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] A bug in Xenbus driver On 08/25/2010 11:57 AM, Jun Zhu (Intern) wrote:> This bug makes the read_buffer not be cleared most of the time. If the xenstore client uses PTHREAD to create a thread to receive reply message, the problem will incur. The new thread can not read what it wants to read, since the list is not empty.This still puzzles me. The bug you found means that partially-successful reads will not be reported correctly, but completely successful and completely unsuccessful reads should be OK. So if you''re seeing a problem as a result, it still indicates there''s something wrong with what you''re passing to read(). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/26/2010 01:25 AM, Jun Zhu (Intern) wrote:> Sorry for my description. "Does not report correctly" is not the problem of xenbus read. > I mean that in the xenstore/xs.c code, when using another thread (#ifdef USE_PTHREAD) to receive responses from xenbus, if this thread exits incorrectly, the listener should be noticed, or the listener will wait for response for ever.How does that relate to your proposed fix? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Besides xenbus driver bug, the other problem is that there will be a deadlock if the read_thread exits! Now I have recognized where the deadlock comes if the child thread (read_thread in xs.c) exists. After the child thread dies, the main thread (read_reply in xs.c) will go on. So in the following invocation of read_reply (no reading thread exists now), read_reply will use read_message(h) == -1 to read the reply. If the reply is XS_WATCH_EVENT type, the read_message function returns zero but leaves the reply_list empty, so read_reply goes to condvar_wait(&h->reply_condvar, &h->reply_mutex, h) since the reply_list is empty. The problem incurs here. Since condvar_wait is a different definition when using PTHREAD to read reply message, it will just wait on this signal, do nothing. This is one possibility of deadlocks. You can replay this bug by commenting read_message function in the read_thread, just letting the read_thread die directly. In this case, the reading thread does nothing. I am not sure I have a good solution to fix this problem now. Maybe we can let the read_message function returns a different value when the reply is XS_WATCH_EVENT, and the read_reply function checks. If it is XS_WATCH_EVENT type, call the read_message again. Jun Zhu Citrix Systems UK ________________________________________ From: Jeremy Fitzhardinge [jeremy@goop.org] Sent: Thursday, August 26, 2010 12:34 PM To: Jun Zhu (Intern) Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] A bug in Xenbus driver On 08/26/2010 01:25 AM, Jun Zhu (Intern) wrote:> Sorry for my description. "Does not report correctly" is not the problem of xenbus read. > I mean that in the xenstore/xs.c code, when using another thread (#ifdef USE_PTHREAD) to receive responses from xenbus, if this thread exits incorrectly, the listener should be noticed, or the listener will wait for response for ever.How does that relate to your proposed fix? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Besides xenbus driver bug, the problem is that there will be a deadlock if the read_thread exits! Now I have recognized where the deadlock comes if the child thread (read_thread in xs.c) exists. After the child thread dies, the main thread (read_reply in xs.c) will go on. So in the following invocation of read_reply (no reading thread exists now), read_reply will use read_message(h) == -1 to read the reply. If the reply is XS_WATCH_EVENT type, the read_message function returns zero but leaves the reply_list empty, so read_reply goes to condvar_wait(&h->reply_condvar, &h->reply_mutex, h) since the reply_list is empty. The problem incurs here. Since condvar_wait is a different definition when using PTHREAD to read reply message, it will just wait on this signal, do nothing. This is one possibility of deadlocks. You can replay this bug by commenting read_message function in the read_thread, just letting the read_thread die directly. In this case, the reading thread does nothing. I am not sure I have a good solution to fix this problem now. Maybe we can let the read_message function returns a different value when the reply is XS_WATCH_EVENT, and the read_reply function checks. If it is XS_WATCH_EVENT type, call the read_message again. Jun Zhu Citrix Systems UK ________________________________________ From: Jeremy Fitzhardinge [jeremy@goop.org] Sent: Thursday, August 26, 2010 12:34 PM To: Jun Zhu (Intern) Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] A bug in Xenbus driver On 08/26/2010 01:25 AM, Jun Zhu (Intern) wrote:> Sorry for my description. "Does not report correctly" is not the problem of xenbus read. > I mean that in the xenstore/xs.c code, when using another thread (#ifdef USE_PTHREAD) to receive responses from xenbus, if this thread exits incorrectly, the listener should be noticed, or the listener will wait for response for ever.How does that relate to your proposed fix? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Seemingly Similar Threads
- [PATCH][RFC] permit domU userspace to watch xenstore
- [PATCH 0 of 2 V3] xs: fixes to watch thread creation
- Re: [libvirt] increase number of libvirt threads by starting tansient guest doamin - is it a bug?
- Bug#767295: Bug#767295: xl: apparent memory leak
- Re: [libvirt] increase number of libvirt threads by starting tansient guest doamin - is it a bug?