Hi! xend causes python to segfault on startup. The changeset in error is: 21904:6a0dd2c29999 Core was generated by `python2.5''. Program terminated with signal 11, Segmentation fault. #0 0x00007f7ffca981de in _malloc_prefork () from /usr/lib/libc.so.12 (gdb) bt #0 0x00007f7ffca981de in _malloc_prefork () from /usr/lib/libc.so.12 #1 0x00007f7ffca98473 in free () from /usr/lib/libc.so.12 #2 0x00007f7ffac0db52 in _xc_clean_hcall_buf (m=<value optimized out>) at xc_private.c:226 #3 0x00007f7ffac0e955 in xc_interface_close (xch=0x7f7ff9f47000) at xc_private.c:240 #4 0x00007f7ffb0054f9 in PyXc_dealloc (self=0x7f7ffd114378) at xen/lowlevel/xc/xc.c:2967 #5 0x00007f7ffd9a0923 in PyEval_EvalFrameEx () from /usr/pkg/lib/libpython2.5.so.1.0 #6 0x00007f7ffd9a401d in PyEval_EvalFrameEx () from /usr/pkg/lib/libpython2.5.so.1.0 #7 0x00007f7ffd9a498f in PyEval_EvalCodeEx () from /usr/pkg/lib/libpython2.5.so.1.0 #8 0x00007f7ffd9a3cb9 in PyEval_EvalFrameEx () from /usr/pkg/lib/libpython2.5.so.1.0 #9 0x00007f7ffd9a401d in PyEval_EvalFrameEx () from /usr/pkg/lib/libpython2.5.so.1.0 #10 0x00007f7ffd9a498f in PyEval_EvalCodeEx () from /usr/pkg/lib/libpython2.5.so.1.0 #11 0x00007f7ffd9a49e0 in PyEval_EvalCode () from /usr/pkg/lib/libpython2.5.so.1.0 #12 0x00007f7ffd9bbf0b in run_mod () from /usr/pkg/lib/libpython2.5.so.1.0 #13 0x00007f7ffd9bbfbb in PyRun_FileExFlags () from /usr/pkg/lib/libpython2.5.so.1.0 #14 0x00007f7ffd9bd30c in PyRun_SimpleFileExFlags () from /usr/pkg/lib/libpython2.5.so.1.0 #15 0x00007f7ffd9c5816 in Py_Main () from /usr/pkg/lib/libpython2.5.so.1.0 #16 0x0000000000400884 in ___start () #17 0x0000000000000003 in ?? () #18 0x00007f7ffffffdc8 in ?? () #19 0x00007f7ffffffddf in ?? () #20 0x00007f7ffffffdfa in ?? () #21 0x0000000000000000 in ?? () Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-08-04 at 14:12 +0100, Christoph Egger wrote:> Hi! > > xend causes python to segfault on startup. > The changeset in error is: 21904:6a0dd2c29999There is a fix for this in: [PATCH 1 of 6] xl: PCI code cleanups However the python wrapper ought not segfault because of an unexpected error code in either case Stefano, please apply that, this has cropped up several times for people now, typically as a spurious error message in xl. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wednesday 04 August 2010 15:27:02 Gianni Tedesco wrote:> On Wed, 2010-08-04 at 14:12 +0100, Christoph Egger wrote: > > Hi! > > > > xend causes python to segfault on startup. > > The changeset in error is: 21904:6a0dd2c29999 > > There is a fix for this in: > > [PATCH 1 of 6] xl: PCI code cleanups > > However the python wrapper ought not segfault because of an unexpected > error code in either case > > Stefano, please apply that, this has cropped up several times for people > now, typically as a spurious error message in xl.How does this affect the python wrapper code ? Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 4 Aug 2010, Christoph Egger wrote:> On Wednesday 04 August 2010 15:27:02 Gianni Tedesco wrote: > > On Wed, 2010-08-04 at 14:12 +0100, Christoph Egger wrote: > > > Hi! > > > > > > xend causes python to segfault on startup. > > > The changeset in error is: 21904:6a0dd2c29999 > > > > There is a fix for this in: > > > > [PATCH 1 of 6] xl: PCI code cleanups > > > > However the python wrapper ought not segfault because of an unexpected > > error code in either case > > > > Stefano, please apply that, this has cropped up several times for people > > now, typically as a spurious error message in xl. > > How does this affect the python wrapper code ? >It doesn''t, in fact: changeset: 21907:6a0dd2c29999 parent: 21904:9f49667fec71 user: Ian Campbell <ian.campbell@citrix.com> date: Fri Jul 30 16:20:48 2010 +0100 summary: libxc: free thread specific hypercall buffer on xc_interface_close I am going to revert this and leave it to Ian to fix it properly (currently on vacation). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-08-04 at 14:27 +0100, Gianni Tedesco wrote:> On Wed, 2010-08-04 at 14:12 +0100, Christoph Egger wrote: > > Hi! > > > > xend causes python to segfault on startup. > > The changeset in error is: 21904:6a0dd2c29999 > > There is a fix for this in: > > [PATCH 1 of 6] xl: PCI code cleanupsMy mistake, ignore the spam, local changes in what I thought was a pristine tree. This is unrelated to the afore-mentioned bug. Of course xend etc. is not using xl yet, so I am living in my own fantasy :) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-08-04 at 15:55 +0100, Stefano Stabellini wrote:> > > On Wed, 2010-08-04 at 14:12 +0100, Christoph Egger wrote: > > > > Hi! > > > > > > > > xend causes python to segfault on startup. > > > > The changeset in error is: 21904:6a0dd2c29999 > > >> It doesn''t, in fact: > > changeset: 21907:6a0dd2c29999 > parent: 21904:9f49667fec71 > user: Ian Campbell <ian.campbell@citrix.com> > date: Fri Jul 30 16:20:48 2010 +0100 > summary: libxc: free thread specific hypercall buffer on > xc_interface_close > > I am going to revert this and leave it to Ian to fix it properly > (currently on vacation).I''m currently looking at this but I''m not seeing this issue, xend starts up fine and I can start a (PV) VM. When you said "segfault on startup" did you mean of xend or of a domain? (I think the former). Can you give me a little more information about your environment please? Is it NetBSD by any chance? Please could you reapply this changeset add some tracing to hcall_buf_prep and _xc_clean_hcall_buf to print out the hcall_buff and hcall_buff->buf as they are allocated and freed. The line numbers indicate that the free(hcall_buf->buf) is faulting. We''ve just called unlock_pages on the same address but since we seem to deliberately throw away any errors from munlock (see "safe_munlock") that doesn''t really tell us much about its validity. Perhaps this whole area needs looking at with an eye to NetBSD portability? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 11 Aug 2010, Ian Campbell wrote:> On Wed, 2010-08-04 at 15:55 +0100, Stefano Stabellini wrote: > > > > On Wed, 2010-08-04 at 14:12 +0100, Christoph Egger wrote: > > > > > Hi! > > > > > > > > > > xend causes python to segfault on startup. > > > > > The changeset in error is: 21904:6a0dd2c29999 > > > > > > > It doesn''t, in fact: > > > > changeset: 21907:6a0dd2c29999 > > parent: 21904:9f49667fec71 > > user: Ian Campbell <ian.campbell@citrix.com> > > date: Fri Jul 30 16:20:48 2010 +0100 > > summary: libxc: free thread specific hypercall buffer on > > xc_interface_close > > > > I am going to revert this and leave it to Ian to fix it properly > > (currently on vacation). > > I''m currently looking at this but I''m not seeing this issue, xend starts > up fine and I can start a (PV) VM. > > When you said "segfault on startup" did you mean of xend or of a domain? > (I think the former). > > Can you give me a little more information about your environment please? > Is it NetBSD by any chance? > > Please could you reapply this changeset add some tracing to > hcall_buf_prep and _xc_clean_hcall_buf to print out the hcall_buff and > hcall_buff->buf as they are allocated and freed. The line numbers > indicate that the free(hcall_buf->buf) is faulting. We''ve just called > unlock_pages on the same address but since we seem to deliberately throw > away any errors from munlock (see "safe_munlock") that doesn''t really > tell us much about its validity. > > Perhaps this whole area needs looking at with an eye to NetBSD > portability? >I gave few days to Christoph to reply, I''ll reapply the patch for now but if Christoph can come up with a good explanation of the problem I''ll revert it again or fix the bug. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Friday 13 August 2010 13:58:53 Stefano Stabellini wrote:> On Wed, 11 Aug 2010, Ian Campbell wrote: > > On Wed, 2010-08-04 at 15:55 +0100, Stefano Stabellini wrote: > > > > > On Wed, 2010-08-04 at 14:12 +0100, Christoph Egger wrote: > > > > > > Hi! > > > > > > > > > > > > xend causes python to segfault on startup. > > > > > > The changeset in error is: 21904:6a0dd2c29999 > > > > > > It doesn''t, in fact: > > > > > > changeset: 21907:6a0dd2c29999 > > > parent: 21904:9f49667fec71 > > > user: Ian Campbell <ian.campbell@citrix.com> > > > date: Fri Jul 30 16:20:48 2010 +0100 > > > summary: libxc: free thread specific hypercall buffer on > > > xc_interface_close > > > > > > I am going to revert this and leave it to Ian to fix it properly > > > (currently on vacation). > > > > I''m currently looking at this but I''m not seeing this issue, xend starts > > up fine and I can start a (PV) VM. > > > > When you said "segfault on startup" did you mean of xend or of a domain? > > (I think the former). > > > > Can you give me a little more information about your environment please? > > Is it NetBSD by any chance?Yes, it is NetBSD -current.> > Please could you reapply this changeset add some tracing to > > hcall_buf_prep and _xc_clean_hcall_buf to print out the hcall_buff and > > hcall_buff->buf as they are allocated and freed. The line numbers > > indicate that the free(hcall_buf->buf) is faulting. We''ve just called > > unlock_pages on the same address but since we seem to deliberately throw > > away any errors from munlock (see "safe_munlock") that doesn''t really > > tell us much about its validity.Will do when I get some time.> > Perhaps this whole area needs looking at with an eye to NetBSD > > portability? > > I gave few days to Christoph to reply, I''ll reapply the patch for now > but if Christoph can come up with a good explanation of the problem I''ll > revert it again or fix the bug.I haven''t got the opportunity to further analysis. I am pretty busy with nested virtualization. Sorry. If you apply it, I''ll revert it in my local tree to keep it in working state. (I already have several local patches to do so, i.e. I have the blktap/noblktap changes for libxl in my tree, I need to ping Ian Jackson for this again) Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wednesday 11 August 2010 15:11:54 Ian Campbell wrote:> On Wed, 2010-08-04 at 15:55 +0100, Stefano Stabellini wrote: > > > > On Wed, 2010-08-04 at 14:12 +0100, Christoph Egger wrote: > > > > > Hi! > > > > > > > > > > xend causes python to segfault on startup. > > > > > The changeset in error is: 21904:6a0dd2c29999 > > > > It doesn''t, in fact: > > > > changeset: 21907:6a0dd2c29999 > > parent: 21904:9f49667fec71 > > user: Ian Campbell <ian.campbell@citrix.com> > > date: Fri Jul 30 16:20:48 2010 +0100 > > summary: libxc: free thread specific hypercall buffer on > > xc_interface_close > > > > I am going to revert this and leave it to Ian to fix it properly > > (currently on vacation). > > I''m currently looking at this but I''m not seeing this issue, xend starts > up fine and I can start a (PV) VM. > > When you said "segfault on startup" did you mean of xend or of a domain? > (I think the former). > > Can you give me a little more information about your environment please? > Is it NetBSD by any chance? > > Please could you reapply this changeset add some tracing to > hcall_buf_prep and _xc_clean_hcall_buf to print out the hcall_buff and > hcall_buff->buf as they are allocated and freed. The line numbers > indicate that the free(hcall_buf->buf) is faulting. We''ve just called > unlock_pages on the same address but since we seem to deliberately throw > away any errors from munlock (see "safe_munlock") that doesn''t really > tell us much about its validity.I tracked down where the error happens. In safe_munlock(), the munlock() fails. The trace is: xc_interface_close -> _xc_clean_hcall_buf -> unlock_pages -> safe_munlock -> munlock hcall_buf->buf has the address 0x7f7ffdfe7040 In unlock_pages, the address and length passed to munlock() is: laddr 0x7f7ffdfe7000, llen 0x2000 The reason why munlock() fails is that mlock() hasn''t been called before. The hcall_buf_prep() is not called at all before the first call to _xc_clean_hcall_buf(). Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks for the analysis. I''m a bit confused though. On Wed, 2010-08-18 at 11:44 +0100, Christoph Egger wrote:> I tracked down where the error happens. In safe_munlock(), > the munlock() fails. > > The trace is: > > xc_interface_close -> _xc_clean_hcall_buf -> unlock_pages -> safe_munlock -> > munlock > > hcall_buf->buf has the address 0x7f7ffdfe7040Mustn''t this be page aligned, due to hcall_buf->buf = xc_memalign(PAGE_SIZE, PAGE_SIZE); ? This appears to turn into valloc on NetBSD which (at least according to the Linux manpages) returns a page aligned result.> In unlock_pages, the address and length passed to munlock() is: > > laddr 0x7f7ffdfe7000, llen 0x2000> The reason why munlock() fails is that mlock() hasn''t been called before. > The hcall_buf_prep() is not called at all before the first call to > _xc_clean_hcall_buf().If hcall_buf_prep() has never been called then "pthread_getspecific(hcall_buf_pkey)" should return NULL and _xc_clean_hcall_buf will never be called from xc_clean_hcall_buf. _xc_clean_hcall_buf also ignores NULL values itself. However you say that hcall_buf_pkey is not NULL, but rather contains a valid hcall_buf containing 0x7f7ffdfe7040. The only call to "pthread_setspecific(hcall_buf_pkey, ...)" with a non-NULL value is in hcall_buf_prep(), so it must have been called at some point. Please can you confirm if _xc_init_hcall_buf() is ever called and what the behaviour of "pthread_getspecific(hcall_buf_pkey)" is if _xc_init_hcall_buf() has never been called. I think it is supposed to return NULL in this case and we certainly rely on that. pthread_getspecific(hcall_buf_pkey) is supposed to return NULL on error, however hcall_buf_pkey is uninitialised until _xc_init_hcall_buf, perhaps on NetBSD the uninitialised value somehow looks valid? It''s not clear what the correct value to initialise a pthread_key_t to in order for it to appear invalid until it is properly setup is, but I suppose we should be initialising it before use. Please can you try this patch: diff -r 7e4d798e8726 tools/libxc/xc_private.c --- a/tools/libxc/xc_private.c Wed Aug 18 11:29:35 2010 +0100 +++ b/tools/libxc/xc_private.c Wed Aug 18 13:12:41 2010 +0100 @@ -232,17 +236,19 @@ static void _xc_clean_hcall_buf(void *m) pthread_setspecific(hcall_buf_pkey, NULL); } +static void _xc_init_hcall_buf(void) +{ + pthread_key_create(&hcall_buf_pkey, _xc_clean_hcall_buf); +} + static void xc_clean_hcall_buf(void) { void *hcall_buf = pthread_getspecific(hcall_buf_pkey); + pthread_once(&hcall_buf_pkey_once, _xc_init_hcall_buf); + if (hcall_buf) _xc_clean_hcall_buf(hcall_buf); -} - -static void _xc_init_hcall_buf(void) -{ - pthread_key_create(&hcall_buf_pkey, _xc_clean_hcall_buf); } int hcall_buf_prep(void **addr, size_t len) If that doesn''t work perhaps you can reduce the issue to a simple test case like the attached? (which doesn''t reproduce the issue for me on Linux) If you can do that then please run it with the attached libxc patch and post the output. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-08-18 at 13:14 +0100, Ian Campbell wrote:> Please can you try this patch:Ooops, that''s not quite right (it still calls pthread_getspecific before _xc_init_hcall_buf). Please can you try this replacement for the original patch instead: diff -r ddbd38da0739 -r d31bfd188fd0 tools/libxc/xc_private.c --- a/tools/libxc/xc_private.c Wed Aug 18 13:14:57 2010 +0100 +++ b/tools/libxc/xc_private.c Wed Aug 18 13:17:48 2010 +0100 @@ -57,6 +57,8 @@ xc_interface *xc_interface_open(xentooll return 0; } +static void xc_clean_hcall_buf(void); + int xc_interface_close(xc_interface *xch) { int rc = 0; @@ -68,6 +70,9 @@ int xc_interface_close(xc_interface *xch rc = xc_interface_close_core(xch, xch->fd); if (rc) PERROR("Could not close hypervisor interface"); } + + xc_clean_hcall_buf(); + free(xch); return rc; } @@ -180,6 +185,8 @@ int hcall_buf_prep(void **addr, size_t l int hcall_buf_prep(void **addr, size_t len) { return 0; } void hcall_buf_release(void **addr, size_t len) { } +static void xc_clean_hcall_buf(void) { } + #else /* !__sun__ */ int lock_pages(void *addr, size_t len) @@ -228,6 +235,13 @@ static void _xc_init_hcall_buf(void) static void _xc_init_hcall_buf(void) { pthread_key_create(&hcall_buf_pkey, _xc_clean_hcall_buf); +} + +static void xc_clean_hcall_buf(void) +{ + pthread_once(&hcall_buf_pkey_once, _xc_init_hcall_buf); + + _xc_clean_hcall_buf(pthread_getspecific(hcall_buf_pkey)); } int hcall_buf_prep(void **addr, size_t len) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wednesday 18 August 2010 14:14:19 Ian Campbell wrote:> Thanks for the analysis. I''m a bit confused though. > > On Wed, 2010-08-18 at 11:44 +0100, Christoph Egger wrote: > > I tracked down where the error happens. In safe_munlock(), > > the munlock() fails. > > > > The trace is: > > > > xc_interface_close -> _xc_clean_hcall_buf -> unlock_pages -> safe_munlock > > -> munlock > > > > hcall_buf->buf has the address 0x7f7ffdfe7040 > > Mustn''t this be page aligned, due to > hcall_buf->buf = xc_memalign(PAGE_SIZE, PAGE_SIZE); > ? > > This appears to turn into valloc on NetBSD which (at least according to > the Linux manpages) returns a page aligned result.Yes, correct.> > In unlock_pages, the address and length passed to munlock() is: > > > > laddr 0x7f7ffdfe7000, llen 0x2000 > > > > The reason why munlock() fails is that mlock() hasn''t been called before. > > The hcall_buf_prep() is not called at all before the first call to > > _xc_clean_hcall_buf(). > > If hcall_buf_prep() has never been called then > "pthread_getspecific(hcall_buf_pkey)" should return NULL and > _xc_clean_hcall_buf will never be called from xc_clean_hcall_buf. > _xc_clean_hcall_buf also ignores NULL values itself.Who calls hcall_buf_prep() in your case ? Only hypercalls call hcall_buf_prep(). What if no hypercalls are not called during xend startup ? If you call xc_clean_hcall_buf() from xc_interface_close() then you should also call hcall_buf_prep() from xc_interface_open().> However you say that hcall_buf_pkey is not NULL, but rather contains a > valid hcall_buf containing 0x7f7ffdfe7040.hcall_buf itself has the address 0x7f7ffdfe7000. hcall_buf->buf has the address 0x7f7ffdfe7040.> The only call to "pthread_setspecific(hcall_buf_pkey, ...)" with a non-NULL > value is in hcall_buf_prep(), so it must have been called at some point.In that case, I am puzzled why I don''t get the trace. Something really fishy is going on.> Please can you confirm if _xc_init_hcall_buf() is ever called and what > the behaviour of "pthread_getspecific(hcall_buf_pkey)" is if > _xc_init_hcall_buf() has never been called. I think it is supposed to > return NULL in this case and we certainly rely on that._xc_init_hcall_buf() is not called. pthread_getspecific() should return NULL but doesn''t. I am starting to ask myself "How did libxc ever work?". It feels like we are hunting down a long-term hidden bug.> pthread_getspecific(hcall_buf_pkey) is supposed to return NULL on error, > however hcall_buf_pkey is uninitialised until _xc_init_hcall_buf, > perhaps on NetBSD the uninitialised value somehow looks valid? It''s not > clear what the correct value to initialise a pthread_key_t to in order > for it to appear invalid until it is properly setup is, but I suppose we > should be initialising it before use. Please can you try this patch:I tried the replacement patch from the other mail. With it, xend does not crash, hcall_buf is NULL, pthread_getspecific() returns NULL, and I am not able to start a guest with ''xm'' Xend has probably crashed! Invalid or missing HTTP status code.> > If that doesn''t work perhaps you can reduce the issue to a simple test > case like the attached? (which doesn''t reproduce the issue for me on > Linux) If you can do that then please run it with the attached libxc > patch and post the output.xc_interface is 0x7f7ffdb03800 before prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040 after prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb20000 after release buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040 xc interface close returned 0 No crash. Is this the expected output ? Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-08-18 at 15:02 +0100, Christoph Egger wrote:> On Wednesday 18 August 2010 14:14:19 Ian Campbell wrote:> > > In unlock_pages, the address and length passed to munlock() is: > > > > > > laddr 0x7f7ffdfe7000, llen 0x2000 > > > > > > The reason why munlock() fails is that mlock() hasn''t been called before. > > > The hcall_buf_prep() is not called at all before the first call to > > > _xc_clean_hcall_buf(). > > > > If hcall_buf_prep() has never been called then > > "pthread_getspecific(hcall_buf_pkey)" should return NULL and > > _xc_clean_hcall_buf will never be called from xc_clean_hcall_buf. > > _xc_clean_hcall_buf also ignores NULL values itself. > > Who calls hcall_buf_prep() in your case ? > > Only hypercalls call hcall_buf_prep(). > What if no hypercalls are not called during xend startup ?Then I would have expected pthread_getspecific(hcall_buf_pkey) to return NULL (because _xc_init_hcall_buf was never called) and therefore for xc_clean_hcall_buf to not doing any unlocking. However I think my expectation was wrong. If _xc_init_hcall_buf is never called then hcall_buf_pkey is undefined but not necessarily invalid -- and it seems to be the case on your system that it turns out to be valid (perhaps pthread_key_t is valid on NetBSD and invalid on Linux or something like that) and therefore we try an unlock some random address. My updated patch ensured that hcall_buf_pkey is always initialised before use.> If you call xc_clean_hcall_buf() from xc_interface_close() > then you should also call hcall_buf_prep() from xc_interface_open(). > > > However you say that hcall_buf_pkey is not NULL, but rather contains a > > valid hcall_buf containing 0x7f7ffdfe7040. > > hcall_buf itself has the address 0x7f7ffdfe7000. > > hcall_buf->buf has the address 0x7f7ffdfe7040.That''s very odd -- hcall_buf->buf is allocated with xc_memalign and therefore should be page aligned. Are you sure the addresses aren''t the other way round?> > The only call to "pthread_setspecific(hcall_buf_pkey, ...)" with a non-NULL > > value is in hcall_buf_prep(), so it must have been called at some point. > > In that case, I am puzzled why I don''t get the trace. > Something really fishy is going on. > > > Please can you confirm if _xc_init_hcall_buf() is ever called and what > > the behaviour of "pthread_getspecific(hcall_buf_pkey)" is if > > _xc_init_hcall_buf() has never been called. I think it is supposed to > > return NULL in this case and we certainly rely on that. > > _xc_init_hcall_buf() is not called. pthread_getspecific() should return NULL > but doesn''t. > > I am starting to ask myself "How did libxc ever work?". It feels like we are > hunting down a long-term hidden bug.Previously _xc_clean_hcall_buf would be called IFF hcall_buf_prep had been called. My patch changed this to also be called on close (even if hcall_buf_prep was never called) and could therefore access an uninitialised hcall_buf_pkey. I am reasonably confident that before my patch libxc was OK.> > pthread_getspecific(hcall_buf_pkey) is supposed to return NULL on error, > > however hcall_buf_pkey is uninitialised until _xc_init_hcall_buf, > > perhaps on NetBSD the uninitialised value somehow looks valid? It''s not > > clear what the correct value to initialise a pthread_key_t to in order > > for it to appear invalid until it is properly setup is, but I suppose we > > should be initialising it before use. Please can you try this patch: > > I tried the replacement patch from the other mail. > With it, xend does not crash, hcall_buf is NULL, > pthread_getspecific() returns NULL,OK, I think that suggests that my updated patch does the right thing here.> and I am not able to start a guest with ''xm'' > > Xend has probably crashed! Invalid or missing HTTP status code.There was another HTTP (XML/RPC) related mail on the list this morning -- is this related to that? Are you sure it is related to the libxc patch? (did you by any chance update to python2.7 recently?)> > If that doesn''t work perhaps you can reduce the issue to a simple test > > case like the attached? (which doesn''t reproduce the issue for me on > > Linux) If you can do that then please run it with the attached libxc > > patch and post the output. > > xc_interface is 0x7f7ffdb03800 > before prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040 > after prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb20000 > after release buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040 > xc interface close returned 0 > > No crash. Is this the expected output ?It looks correct but didn''t reproduce the crash so is of limited utility. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wednesday 18 August 2010 16:59:30 Ian Campbell wrote:> On Wed, 2010-08-18 at 15:02 +0100, Christoph Egger wrote: > > On Wednesday 18 August 2010 14:14:19 Ian Campbell wrote: > > > > In unlock_pages, the address and length passed to munlock() is: > > > > > > > > laddr 0x7f7ffdfe7000, llen 0x2000 > > > > > > > > The reason why munlock() fails is that mlock() hasn''t been called > > > > before. The hcall_buf_prep() is not called at all before the first > > > > call to _xc_clean_hcall_buf(). > > > > > > If hcall_buf_prep() has never been called then > > > "pthread_getspecific(hcall_buf_pkey)" should return NULL and > > > _xc_clean_hcall_buf will never be called from xc_clean_hcall_buf. > > > _xc_clean_hcall_buf also ignores NULL values itself. > > > > Who calls hcall_buf_prep() in your case ? > > > > Only hypercalls call hcall_buf_prep(). > > What if no hypercalls are not called during xend startup ? > > Then I would have expected pthread_getspecific(hcall_buf_pkey) to return > NULL (because _xc_init_hcall_buf was never called) and therefore for > xc_clean_hcall_buf to not doing any unlocking. > > However I think my expectation was wrong. If _xc_init_hcall_buf is never > called then hcall_buf_pkey is undefined but not necessarily invalid -- > and it seems to be the case on your system that it turns out to be valid > (perhaps pthread_key_t is valid on NetBSD and invalid on Linux or > something like that) and therefore we try an unlock some random address.To make it even more mysterious, the "random" address is always the same even across machine reboots.> > My updated patch ensured that hcall_buf_pkey is always initialised > before use.Yes, but we also need to figure out why hcall_buf_prep is never called. Who calls hcall_buf_prep() on your machine ? Can you provide a call trace when hcall_buf_prep() is called the first time, please ?> > If you call xc_clean_hcall_buf() from xc_interface_close() > > then you should also call hcall_buf_prep() from xc_interface_open(). > > > > > However you say that hcall_buf_pkey is not NULL, but rather contains a > > > valid hcall_buf containing 0x7f7ffdfe7040. > > > > hcall_buf itself has the address 0x7f7ffdfe7000. > > > > hcall_buf->buf has the address 0x7f7ffdfe7040. > > That''s very odd -- hcall_buf->buf is allocated with xc_memalign and > therefore should be page aligned. Are you sure the addresses aren''t the > other way round?Yes, I am.> > > > The only call to "pthread_setspecific(hcall_buf_pkey, ...)" with a > > > non-NULL value is in hcall_buf_prep(), so it must have been called at > > > some point. > > > > In that case, I am puzzled why I don''t get the trace. > > Something really fishy is going on. > > > > > Please can you confirm if _xc_init_hcall_buf() is ever called and what > > > the behaviour of "pthread_getspecific(hcall_buf_pkey)" is if > > > _xc_init_hcall_buf() has never been called. I think it is supposed to > > > return NULL in this case and we certainly rely on that. > > > > _xc_init_hcall_buf() is not called. pthread_getspecific() should return > > NULL but doesn''t. > > > > I am starting to ask myself "How did libxc ever work?". It feels like we > > are hunting down a long-term hidden bug. > > Previously _xc_clean_hcall_buf would be called IFF hcall_buf_prep had > been called. My patch changed this to also be called on close (even if > hcall_buf_prep was never called) and could therefore access an > uninitialised hcall_buf_pkey.Calling _xc_clean_hcall_buf() unconditionally and hcall_buf_prep() conditionally sounds to me like calling free() unconditionally and malloc() conditionally. I will give calling hcall_buf_prep() from xc_interface_open() a try with your patch tomorrow.> I am reasonably confident that before my patch libxc was OK.And is ok again after it has been backed out. :)> > > pthread_getspecific(hcall_buf_pkey) is supposed to return NULL on > > > error, however hcall_buf_pkey is uninitialised until > > > _xc_init_hcall_buf, perhaps on NetBSD the uninitialised value somehow > > > looks valid? It''s not clear what the correct value to initialise a > > > pthread_key_t to in order for it to appear invalid until it is properly > > > setup is, but I suppose we should be initialising it before use. Please > > > can you try this patch: > > > > I tried the replacement patch from the other mail. > > With it, xend does not crash, hcall_buf is NULL, > > pthread_getspecific() returns NULL, > > OK, I think that suggests that my updated patch does the right thing > here.Is it possible that xend can call xc_interface_close() during startup and hcall_buf_prep() later when xend comes in interaction with xm ?> > and I am not able to start a guest with ''xm'' > > > > Xend has probably crashed! Invalid or missing HTTP status code. > > There was another HTTP (XML/RPC) related mail on the list this morningI saw this mail. No, I don''t think it is related to this.> -- is this related to that? Are you sure it is related to the libxc > patch?Yes.> (did you by any chance update to python2.7 recently?)No, I am on python 2.5.> > > If that doesn''t work perhaps you can reduce the issue to a simple test > > > case like the attached? (which doesn''t reproduce the issue for me on > > > Linux) If you can do that then please run it with the attached libxc > > > patch and post the output. > > > > xc_interface is 0x7f7ffdb03800 > > before prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040 > > after prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb20000 > > after release buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040 > > xc interface close returned 0 > > > > No crash. Is this the expected output ? > > It looks correct but didn''t reproduce the crash so is of limited > utility. > > Ian.Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2010-08-18 at 17:02 +0100, Christoph Egger wrote:> > > > hcall_buf itself has the address 0x7f7ffdfe7000. > > > > > > hcall_buf->buf has the address 0x7f7ffdfe7040. > > > > That''s very odd -- hcall_buf->buf is allocated with xc_memalign and > > therefore should be page aligned. Are you sure the addresses aren''t > the > > other way round? > > Yes, I am.Then NetBSD has a bug where it apparently returns non-page aligned allocations from valloc.> Calling _xc_clean_hcall_buf() unconditionally and hcall_buf_prep() > conditionally sounds to me like calling free() unconditionally > and malloc() conditionally.Not quite because (in the updated patch) _xc_clean_hcall_buf() effectively contains a check to see if hcall_buf_prep() has been called. IOW it is OK for the same reason free(NULL) is OK.> Is it possible that xend can call xc_interface_close() during startup > and hcall_buf_prep() later when xend comes in interaction with xm ?Yes, but this is safe. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel