Hi, I''m working on NetBSD support in pygrub: I''ve posted my work on the NetBSD port-xen list [1]. It adds support for FFSv2 to tools/libfsimage and NetBSD partition/disklabel support to tools/pygrub. I started off using xen-unstable and the xl tools on a kernel 3.2, but I''m now on xen-4.3 and kernel 3.10. Retrieving the kernel and arguments appears to work, but the boot process waits for a console escape character before continuing: dom0# pygrub --debug --output-directory=/tmp /dev/mapper/vg00-ffsv2 linux (kernel /tmp/boot_kernel.NR37Lm)(args "root=xbd0 ") dom0# md5sum /tmp/boot_kernel.NR37Lm 9e492b31e18f7e816f3e374d7f365857 /tmp/boot_kernel.NR37Lm dom0# $XEN/tools/xcutils/readnotes /tmp/boot_kernel.NR37Lm __xen_guest: GUEST_OS=NetBSD,GUEST_VER=4.99,XEN_VER=xen-3.0,LOADER=generic, VIRT_BASE=0xffffffff80000000,ELF_PADDR_OFFSET=0xffffffff80000000, VIRT_ENTRY=0xffffffff80100000,HYPERCALL_PAGE=0x00000101,BSD_SYMTAB=yes dom0# xl create -c ffsv2/config Parsing config from ffsv2/config #XXX here it hangs in a xenconsole until a Control-] is enterted ^] Daemon running with PID 3635 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 The NetBSD Foundation, Inc. All rights reserved. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. NetBSD 6.1 (XEN3_DOMU) total memory = 512 MB avail memory = 487 MB mainbus0 (root) hypervisor0 at mainbus0: Xen version 4.3.1-pre [... snip] boot device: xbd0 root on xbd0a dumps on xbd0b Your machine does not initialize mem_clusters; sparse_dumps disabled root file system type: ffs /etc/rc.conf is not configured. Multiuser boot aborted. Enter pathname of shell or RETURN for /bin/sh: /bin/ksh # md5 /netbsd MD5 (/netbsd) = 9e492b31e18f7e816f3e374d7f365857 I''ve tracked this issue down to the boot hanging in a xenconsole: xenconsoled creates a xenconsole which basically does nothing. The boot continues with a new xenconsole after sending a control-]. I experimented with different boot arguments like "console=hvc0 xencons=tty" to no avail. I''m really clueless about what''s wrong. Any help appreciated. [1] http://mail-index.netbsd.org/port-xen/2013/09/02/msg008020.html
On Sat, 2013-09-07 at 11:56 +0000, M. Boerschig wrote:> Hi, > > I''m working on NetBSD support in pygrub: I''ve posted my work on the > NetBSD port-xen list [1]. It adds support for FFSv2 to tools/libfsimage > and NetBSD partition/disklabel support to tools/pygrub. > I started off using xen-unstable and the xl tools on a kernel 3.2, but > I''m now on xen-4.3 and kernel 3.10.I think this implies a Linux dom0? I imagine this is equally broken with a Linux domU, because it doesn''t seem likely to be guest specific, given that it is mostly before the guest runs...> Retrieving the kernel and arguments appears to work, but the boot > process waits for a console escape character before continuing: > > dom0# pygrub --debug --output-directory=/tmp /dev/mapper/vg00-ffsv2 > linux (kernel /tmp/boot_kernel.NR37Lm)(args "root=xbd0 ") > dom0# md5sum /tmp/boot_kernel.NR37Lm > 9e492b31e18f7e816f3e374d7f365857 /tmp/boot_kernel.NR37Lm > dom0# $XEN/tools/xcutils/readnotes /tmp/boot_kernel.NR37Lm > __xen_guest: GUEST_OS=NetBSD,GUEST_VER=4.99,XEN_VER=xen-3.0,LOADER=generic, > VIRT_BASE=0xffffffff80000000,ELF_PADDR_OFFSET=0xffffffff80000000, > VIRT_ENTRY=0xffffffff80100000,HYPERCALL_PAGE=0x00000101,BSD_SYMTAB=yes > > dom0# xl create -c ffsv2/config > Parsing config from ffsv2/config > #XXX here it hangs in a xenconsole until a Control-] is enterted > ^] > Daemon running with PID 3635 > Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, > 2006, 2007, 2008, 2009, 2010, 2011, 2012 > The NetBSD Foundation, Inc. All rights reserved. > Copyright (c) 1982, 1986, 1989, 1991, 1993 > The Regents of the University of California. All rights reserved. > > NetBSD 6.1 (XEN3_DOMU) > total memory = 512 MB > avail memory = 487 MB > mainbus0 (root) > hypervisor0 at mainbus0: Xen version 4.3.1-pre > [... snip] > boot device: xbd0 > root on xbd0a dumps on xbd0b > Your machine does not initialize mem_clusters; sparse_dumps disabled > root file system type: ffs > /etc/rc.conf is not configured. Multiuser boot aborted. > Enter pathname of shell or RETURN for /bin/sh: /bin/ksh > # md5 /netbsd > MD5 (/netbsd) = 9e492b31e18f7e816f3e374d7f365857 > > I''ve tracked this issue down to the boot hanging in a xenconsole: > xenconsoled creates a xenconsole which basically does nothing. > The boot continues with a new xenconsole after sending a control-]. > I experimented with different boot arguments like "console=hvc0 > xencons=tty" to no avail. > > I''m really clueless about what''s wrong.This stuff ha been a bit fragile in the past, I wouldn''t be too surprised if it had regressed, especially since it isn''t especially amenable to automated testing. What is supposed to happen is that pygrub gets launched attached to a pty, and that ptr gets written to xenstore such that the xenconsole client connects to it and presents the pygrub output to the user as if it were the guest console, xl acts as a pump copying data back and forth between the console pty and pygrubs pty (since both console and pygrub expect to be a slave). Much of this code is in libxl_bootloader.c. Once pygrub exits this pty disappears and the console client should reattach to the real guest console pty, which is provided by xenconsoled. It sounds like that first connection is not going to the right place or is otherwise broken. Are you seeing the pygrub menu? DO you expect to? Perhaps the automatic exit of the first session is not working? The last relevant looking commit to libxl_bootloader.c was 7253e0fd1aeb3ae7d4714bcc1d86b846b3331995 which looks like the sort of thing which might accidentally introduce such behaviour.> Any help appreciated. > > [1] http://mail-index.netbsd.org/port-xen/2013/09/02/msg008020.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 09/09/13 14:35, Ian Campbell wrote:> On Sat, 2013-09-07 at 11:56 +0000, M. Boerschig wrote: > I think this implies a Linux dom0?Yes, of course. I didn''t even get a recent xen (> 4.2) to compile on my NetBSD machine.> > I imagine this is equally broken with a Linux domU, because it doesn''t > seem likely to be guest specific, given that it is mostly before the > guest runs... >An Ubuntu domU works on this same machine, so I think this problem is limited to my use case (or a similar one, like solaris)> This stuff ha been a bit fragile in the past, I wouldn''t be too > surprised if it had regressed, especially since it isn''t especially > amenable to automated testing. > > What is supposed to happen is that pygrub gets launched attached to a > pty, and that ptr gets written to xenstore such that the xenconsole > client connects to it and presents the pygrub output to the user as if > it were the guest console, xl acts as a pump copying data back and forth > between the console pty and pygrubs pty (since both console and pygrub > expect to be a slave). Much of this code is in libxl_bootloader.c. > > Once pygrub exits this pty disappears and the console client should > reattach to the real guest console pty, which is provided by > xenconsoled. > > It sounds like that first connection is not going to the right place or > is otherwise broken. > > Are you seeing the pygrub menu? DO you expect to? Perhaps the automatic > exit of the first session is not working?No, and I didn''t expect a menu to appear. I assumed a menu is only displayed if the run_grub()-method succeeds in finding a valid GRUB menu.lst/config on the disk. I used the sniff_solaris() code as a guideline: if I understand the code correctly it just returns a configuration dictionary without displaying anything.> > The last relevant looking commit to libxl_bootloader.c was > 7253e0fd1aeb3ae7d4714bcc1d86b846b3331995 which looks like the sort of > thing which might accidentally introduce such behaviour.Thanks for the pointer, I''ll investigate if pygrub''s exit is properly handled later this week.> >> Any help appreciated. >> >> [1] http://mail-index.netbsd.org/port-xen/2013/09/02/msg008020.html >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > >
On 09.09.13 19:52, M. Boerschig wrote:> On 09/09/13 14:35, Ian Campbell wrote: >> On Sat, 2013-09-07 at 11:56 +0000, M. Boerschig wrote: >> I think this implies a Linux dom0? > > Yes, of course. I didn''t even get a recent xen (> 4.2) to compile on my > NetBSD machine.Can you report build errors on this list, please? Christoph
On Mon, 2013-09-09 at 19:52 +0200, M. Boerschig wrote:> On 09/09/13 14:35, Ian Campbell wrote: > > On Sat, 2013-09-07 at 11:56 +0000, M. Boerschig wrote: > > I think this implies a Linux dom0? > > Yes, of course. I didn''t even get a recent xen (> 4.2) to compile on my > NetBSD machine. > > > > > I imagine this is equally broken with a Linux domU, because it doesn''t > > seem likely to be guest specific, given that it is mostly before the > > guest runs... > > > > An Ubuntu domU works on this same machine, so I think this problem is > limited to my use case (or a similar one, like solaris)In this case do you get the interactive menu? I wonder if passing bootloader_args = ["--entry=0"] would reproduce the same error you are seeing by skipping the menu and making it behave like sniff_solaris?> > This stuff ha been a bit fragile in the past, I wouldn''t be too > > surprised if it had regressed, especially since it isn''t especially > > amenable to automated testing. > > > > What is supposed to happen is that pygrub gets launched attached to a > > pty, and that ptr gets written to xenstore such that the xenconsole > > client connects to it and presents the pygrub output to the user as if > > it were the guest console, xl acts as a pump copying data back and forth > > between the console pty and pygrubs pty (since both console and pygrub > > expect to be a slave). Much of this code is in libxl_bootloader.c. > > > > Once pygrub exits this pty disappears and the console client should > > reattach to the real guest console pty, which is provided by > > xenconsoled. > > > > It sounds like that first connection is not going to the right place or > > is otherwise broken. > > > > Are you seeing the pygrub menu? DO you expect to? Perhaps the automatic > > exit of the first session is not working? > > No, and I didn''t expect a menu to appear. > I assumed a menu is only displayed if the run_grub()-method succeeds in > finding a valid GRUB menu.lst/config on the disk. > I used the sniff_solaris() code as a guideline: if I understand the code > correctly it just returns a configuration dictionary without displaying > anything.OK.> > The last relevant looking commit to libxl_bootloader.c was > > 7253e0fd1aeb3ae7d4714bcc1d86b846b3331995 which looks like the sort of > > thing which might accidentally introduce such behaviour. > > Thanks for the pointer, I''ll investigate if pygrub''s exit is properly > handled later this week.Great, please let us know how you get on! Ian.
On Fri, 2013-09-13 at 18:00 +0000, M. Boerschig wrote:> On 09/10/13 08:56, Ian Campbell wrote: > > On Mon, 2013-09-09 at 19:52 +0200, M. Boerschig wrote: > >> On 09/09/13 14:35, Ian Campbell wrote: > >>> On Sat, 2013-09-07 at 11:56 +0000, M. Boerschig wrote: > >>> I think this implies a Linux dom0? > >> > >> Yes, of course. I didn''t even get a recent xen (> 4.2) to compile on my > >> NetBSD machine. > >> > >>> > >>> I imagine this is equally broken with a Linux domU, because it doesn''t > >>> seem likely to be guest specific, given that it is mostly before the > >>> guest runs... > >>> > >> > >> An Ubuntu domU works on this same machine, so I think this problem is > >> limited to my use case (or a similar one, like solaris) > > > > In this case do you get the interactive menu? I wonder if passing > > bootloader_args = ["--entry=0"] would reproduce the same error you are > > seeing by skipping the menu and making it behave like sniff_solaris? > > > > This triggers exactly the same error as with booting a netbsd domU. > I tested this also on a Ubuntu 13.04 dom0 with an unmodified stable-4.3 > (4.3.1-pre) on a different machine (with another 13.04 domU), just to > make sure it''s nothing I broke. > So either this problem is specific to debian(-derived) systems or a > generic problem with libxl.Probably generic libxl.> >>> This stuff ha been a bit fragile in the past, I wouldn''t be too > >>> surprised if it had regressed, especially since it isn''t especially > >>> amenable to automated testing. > >>> > >>> What is supposed to happen is that pygrub gets launched attached to a > >>> pty, and that ptr gets written to xenstore such that the xenconsole > >>> client connects to it and presents the pygrub output to the user as if > >>> it were the guest console, xl acts as a pump copying data back and forth > >>> between the console pty and pygrubs pty (since both console and pygrub > >>> expect to be a slave). Much of this code is in libxl_bootloader.c. > >>> > >>> Once pygrub exits this pty disappears and the console client should > >>> reattach to the real guest console pty, which is provided by > >>> xenconsoled. > >>> > >>> It sounds like that first connection is not going to the right place or > >>> is otherwise broken. > >>> > >>> Are you seeing the pygrub menu? DO you expect to? Perhaps the automatic > >>> exit of the first session is not working? > >> > >> No, and I didn''t expect a menu to appear. > >> I assumed a menu is only displayed if the run_grub()-method succeeds in > >> finding a valid GRUB menu.lst/config on the disk. > >> I used the sniff_solaris() code as a guideline: if I understand the code > >> correctly it just returns a configuration dictionary without displaying > >> anything. > > > > OK. > > > >>> The last relevant looking commit to libxl_bootloader.c was > >>> 7253e0fd1aeb3ae7d4714bcc1d86b846b3331995 which looks like the sort of > >>> thing which might accidentally introduce such behaviour. > >> > >> Thanks for the pointer, I''ll investigate if pygrub''s exit is properly > >> handled later this week. > > > > Great, please let us know how you get on! > > I did have less time than anticipated to familiarize with libxl, as I''m > quite busy studying for an exam ... > However, I did some debugging and the > libxl_aoutils.c:datacopier_pollhup_handled() never received a POLLHUP > event, although execution goes through > libxl_bootloader.c:bootloader_finished(). > Maybe someone more knowledgable about libxl might want a look at this.Ian J (cc-d) knows all about POSIX semantics i.e. when POLLHUP can and should be received etc, I think I might leave this one to him ;-)> > > > > > Ian. > > > > >
Ian Campbell writes ("Re: [Xen-devel] Booting NetBSD in pygrub"):> I imagine this is equally broken with a Linux domU, because it doesn''t > seem likely to be guest specific, given that it is mostly before the > guest runs...It may be that the guest fs makes pygrub behave differently. I''m suspecting some kind of race.> > I''ve tracked this issue down to the boot hanging in a xenconsole: > > xenconsoled creates a xenconsole which basically does nothing. > > The boot continues with a new xenconsole after sending a control-]. > > I experimented with different boot arguments like "console=hvc0 > > xencons=tty" to no avail. > > > > I''m really clueless about what''s wrong.Can you please run this with xl''s verbosity turned way up ? xl -vvvv create [etc] Also, can you see (in ps) whether pygrub is running, and send us the output of xenstore-ls -fp ? Ian.
On Fri, 2013-09-13 at 17:16 +0100, Ian Jackson wrote:> Ian Campbell writes ("Re: [Xen-devel] Booting NetBSD in pygrub"): > > I imagine this is equally broken with a Linux domU, because it doesn''t > > seem likely to be guest specific, given that it is mostly before the > > guest runs... > > It may be that the guest fs makes pygrub behave differently.It''s pygrub being in non-interactive mode either due to explicit --entry=FOO or because of the guest type being Solaris or NetBSD for which pygrub doesn''t support an interactive menu.> I''m suspecting some kind of race.Yes, given the above pygrub will exit more quickly than normal.
On 09/10/13 08:56, Ian Campbell wrote:> On Mon, 2013-09-09 at 19:52 +0200, M. Boerschig wrote: >> On 09/09/13 14:35, Ian Campbell wrote: >>> On Sat, 2013-09-07 at 11:56 +0000, M. Boerschig wrote: >>> I think this implies a Linux dom0? >> >> Yes, of course. I didn''t even get a recent xen (> 4.2) to compile on my >> NetBSD machine. >> >>> >>> I imagine this is equally broken with a Linux domU, because it doesn''t >>> seem likely to be guest specific, given that it is mostly before the >>> guest runs... >>> >> >> An Ubuntu domU works on this same machine, so I think this problem is >> limited to my use case (or a similar one, like solaris) > > In this case do you get the interactive menu? I wonder if passing > bootloader_args = ["--entry=0"] would reproduce the same error you are > seeing by skipping the menu and making it behave like sniff_solaris? >This triggers exactly the same error as with booting a netbsd domU. I tested this also on a Ubuntu 13.04 dom0 with an unmodified stable-4.3 (4.3.1-pre) on a different machine (with another 13.04 domU), just to make sure it''s nothing I broke. So either this problem is specific to debian(-derived) systems or a generic problem with libxl.>>> This stuff ha been a bit fragile in the past, I wouldn''t be too >>> surprised if it had regressed, especially since it isn''t especially >>> amenable to automated testing. >>> >>> What is supposed to happen is that pygrub gets launched attached to a >>> pty, and that ptr gets written to xenstore such that the xenconsole >>> client connects to it and presents the pygrub output to the user as if >>> it were the guest console, xl acts as a pump copying data back and forth >>> between the console pty and pygrubs pty (since both console and pygrub >>> expect to be a slave). Much of this code is in libxl_bootloader.c. >>> >>> Once pygrub exits this pty disappears and the console client should >>> reattach to the real guest console pty, which is provided by >>> xenconsoled. >>> >>> It sounds like that first connection is not going to the right place or >>> is otherwise broken. >>> >>> Are you seeing the pygrub menu? DO you expect to? Perhaps the automatic >>> exit of the first session is not working? >> >> No, and I didn''t expect a menu to appear. >> I assumed a menu is only displayed if the run_grub()-method succeeds in >> finding a valid GRUB menu.lst/config on the disk. >> I used the sniff_solaris() code as a guideline: if I understand the code >> correctly it just returns a configuration dictionary without displaying >> anything. > > OK. > >>> The last relevant looking commit to libxl_bootloader.c was >>> 7253e0fd1aeb3ae7d4714bcc1d86b846b3331995 which looks like the sort of >>> thing which might accidentally introduce such behaviour. >> >> Thanks for the pointer, I''ll investigate if pygrub''s exit is properly >> handled later this week. > > Great, please let us know how you get on!I did have less time than anticipated to familiarize with libxl, as I''m quite busy studying for an exam ... However, I did some debugging and the libxl_aoutils.c:datacopier_pollhup_handled() never received a POLLHUP event, although execution goes through libxl_bootloader.c:bootloader_finished(). Maybe someone more knowledgable about libxl might want a look at this.> > Ian. > >