Hi all, I have written initial support for running rump kernels directly on top the Xen hypervisor. Rump kernels essentially consist of unmodified kernel drivers, but without the added baggage associated with full operating systems such as a scheduling policy, VM, multiprocess support, and so forth. In essence, the work enables running minimal single-image application domains on top of Xen while relying on real-world proven kernel-quality drivers, including file systems, TCP/IP, SoftRAID, disk encryption, etc. Rump kernels provide a subset of the POSIX''y application interfaces relevant for e.g. file systems and networking, so adapting existing applications is possible. I have pushed the implementation to github. If you wish the try the few demos I put together, follow the instructions in the README. Please report any and all bugs via the github interface. Testing so far was light, but given that I wrote less than 1k lines of code including comments and whitespace, I hope I haven''t managed to cram too many bugs in there. I''ve done my testing on a x86_32 Dom0 with Xen 4.2.2. https://github.com/anttikantee/rumpuser-xen/ I''ll explain the implementation in a bit more detail. Rump kernels are made possible the anykernel architecture of NetBSD. Rump kernels run on top of the rump kernel hypercall layer, so the implementation was a matter of writing an implementation of the hypercalls for the Xen hypervisor. I started looking at the Xen Mini-OS to figure out how to bootstrap a domU, and quickly realized that Mini-OS implements almost everything the rump kernel hypercall layer requires: a build infra, cooperative thread scheduling, physical memory allocation, simple interfaces to I/O devices such as block/net, and so forth. As a result, the implementation is more or less plugged on top of Mini-OS, and contains a lot of code unnecessary for rump kernels. I''m unsure if I should fully fork Mini-OS or attempt to merge some of my changes back. For example, it seems like the host namespace leaks into Mini-OS (i.e. -nostdinc isn''t used), and it would be nice to get that fixed. If anyone has any smart ideas about which direction to go in, please advise. I thank the people who have suggested this project over the years. I believe the first one to suggest it on a public list was Jean-Yves Migeon quite some years ago, so explicit thanks go to him and "you know who you are" thanks go to others. I also thank Juan RP of Void Linux, who re-added support to Void for Xen 4.2 about 5 minutes after I told him I''d like to do some Xen testing with a x86_32 Dom0 (moving to 64bit systems is on my TODO list ;) - antti
On Fri, 2013-08-16 at 02:58 +0300, Antti Kantee wrote:> Hi all, > > I have written initial support for running rump kernels directly on top > the Xen hypervisor. Rump kernels essentially consist of unmodified > kernel drivers, but without the added baggage associated with full > operating systems such as a scheduling policy, VM, multiprocess support, > and so forth. In essence, the work enables running minimal single-image > application domains on top of Xen while relying on real-world proven > kernel-quality drivers, including file systems, TCP/IP, SoftRAID, disk > encryption, etc. Rump kernels provide a subset of the POSIX''y > application interfaces relevant for e.g. file systems and networking, so > adapting existing applications is possible.Sounds really cool! What sort of applications have you tried this with? Does this provide enough of a posix like system (libc etc) to run "normal" applications or do applications need to be written explicitly for rump use? What I''m wondering is if this would be a more maintainable way to implement the qemu stub domains or the xenstore one?> I have pushed the implementation to github. If you wish the try the few > demos I put together, follow the instructions in the README. Please > report any and all bugs via the github interface. Testing so far was > light, but given that I wrote less than 1k lines of code including > comments and whitespace, I hope I haven''t managed to cram too many bugs > in there.I think the usual statistic is that 1000 line should only have a few dozen bugs ;-D> I''ve done my testing on a x86_32 Dom0 with Xen 4.2.2. > > https://github.com/anttikantee/rumpuser-xen/ > > I''ll explain the implementation in a bit more detail. Rump kernels are > made possible the anykernel architecture of NetBSD. Rump kernels run on > top of the rump kernel hypercall layer, so the implementation was a > matter of writing an implementation of the hypercalls for the Xen > hypervisor. I started looking at the Xen Mini-OS to figure out how to > bootstrap a domU, and quickly realized that Mini-OS implements almost > everything the rump kernel hypercall layer requires: a build infra, > cooperative thread scheduling, physical memory allocation, simple > interfaces to I/O devices such as block/net, and so forth. As a result, > the implementation is more or less plugged on top of Mini-OS, and > contains a lot of code unnecessary for rump kernels. I''m unsure if I > should fully fork Mini-OS or attempt to merge some of my changes back. > For example, it seems like the host namespace leaks into Mini-OS (i.e. > -nostdinc isn''t used), and it would be nice to get that fixed. If > anyone has any smart ideas about which direction to go in, please advise.Ian Jackson has also been looking at these sorts of issues with mini-os recently, I''ll let him comment on where he has gotten to etc. I think as a general rule we''d be happy with any cleanups made to mini-os. Ian.
On 17.8.2013 17:17, Ian Campbell wrote:> Sounds really cool! What sort of applications have you tried this with?With rump kernels hosted on POSIX systems, I''ve tested "everything", e.g. TCP/IP with firefox and sshd, file systems with ls, tar, etc., both for nfs, and so forth. When hosted on mini-os, things will be a bit different. Let me elaborate below.> Does this provide enough of a posix like system (libc etc) to run > "normal" applications or do applications need to be written explicitly > for rump use?Like the name suggests, a rump kernel is a kernel with some pieces missing. I''ve sometimes (jokingly?) described a rump kernel as KaaS (kernel-as-a-service), because an application can request services from a rump kernel, but a rump kernel will not directly affect the application -- think exec, fork, mmap, signals, etc. which are provided by "normal" kernels but do not fit into a request-response model. The first problem with complex applications is going to be their use of fork, exec, userspace threads, dlfun etc. For example, sshd does quite a few tricks with execs and forks when it starts. When running on a POSIX host, these interfaces come from the host even if sshd uses a TCP/IP service provided by a rump kernel. When the rump kernel container is a Xen domU instead of a process, well, I''m not sure how to define exec for a domU, and doubly so for exec''s fd semantics ... The second, and easier, problem is that "libc etc" is not included in the scope of rump kernels and has to come from somewhere else. The good news is that the syscall stubs provided by rump kernels are fully compatible with the POSIX''y API and could be used as such (look at <rump/rump_syscalls.h> if curious). The non-syscall bits would still have to appear from somewhere. I noticed mini-os has a HAVE_LIBC conditional, but I did not yet look into how much that would provide of "libc etc". So, to answer your question, applications do not need to be explicitly written to use rump kernels, but all of the interfaces used by applications need to of course be provided somehow. You conceivably could run e.g. a simple web server very easily (especially one which uses an event loop over user threads or fork), but I wouldn''t bet on firefox running on top of mini-os/rump kernel anytime soon.> What I''m wondering is if this would be a more maintainable way to > implement the qemu stub domains or the xenstore one?I''ve played with Xen only for a few days now, so I''ve yet to discover what the maintenance problem with the current implementation is. If you can elaborate, I can reciprocate.>> less than 1k lines of code including >> comments and whitespace, I hope I haven''t managed to cram too many bugs >> in there. > > I think the usual statistic is that 1000 line should only have a few > dozen bugs ;-DMy approach for beating the odds: more comments and whitespace ;)> Ian Jackson has also been looking at these sorts of issues with mini-os > recently, I''ll let him comment on where he has gotten to etc. > > I think as a general rule we''d be happy with any cleanups made to > mini-os.Nice. I''d be happy if I can strip out the mini-os bits from by github repo and somehow make my work pluggable into upstream mini-os.
Antti Kantee writes ("Re: [Xen-users] rump kernels running on the Xen hypervisor"):> So, to answer your question, applications do not need to be explicitly > written to use rump kernels, but all of the interfaces used by > applications need to of course be provided somehow. [...]This is all very exciting and as Ian Jackson says very similar to something I''ve been working on. I started from the other end and it may be that I can combine what I''ve done so far with what you''ve done. I compiled up your example, against Xen 4.4-unstable, and I''m afraid it doesn''t work for me. Console log below. Do you have any suggestions for debugging it or should I just plunge in ? Did you test this on i386 or should I rebuild as amd64 ? I think this approach will be an excellent one for stub domains such as qemu, proposed stub-pygrub, etc. But thinking about what you''ve done, I think we probably want to do something a bit different with block and networking. Does the NetBSD VFS have - tmpfs on variable-sized ramdisk - romfs based on a cpio archive or the like ? Producing an ffs image for something like a qemu-dm is going to be annoying. And often networking wants to be handled by something like SOCKS rather than by having an extra TCP stack in the stub domain. The reason for this is that it avoids having to allocate MAC and IP addresses to a whole bunch of service domains; the administrator probably wants them to piggyback on dom0''s networking. Regards, Ian. Xen Minimal OS! start_info: 001fa000(VA) nr_pages: 0x800 shared_inf: 0x7f03e000(MA) pt_base: 001fd000(VA) nr_pt_frames: 0x5 mfn_list: 001f8000(VA) mod_start: 0x0(VA) mod_len: 0 flags: 0x0 cmd_line: 3 stack: 001d2240-001f2240 MM: Init _text: 00000000(VA) _etext: 0016df32(VA) _erodata: 00199000(VA) _edata: 0019ddbc(VA) stack start: 001d2240(VA) _end: 001f73f8(VA) start_pfn: 205 max_pfn: 800 Mapping memory range 0x400000 - 0x800000 setting 00000000-00199000 readonly skipped 00001000 MM: Initialise page allocator for 207000(207000)-0(800000) MM: done Demand map pfns at 801000-80801000. Initialising timer interface Initialising console ... done. gnttab_table mapped at 00801000. Initialising scheduler xenbus initialised on irq 1 mfn 0x60cb1 Dummy main: start_info=001f3180 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 The NetBSD Foundation, Inc. All rights reserved. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. NetBSD 6.99.23 (RUMP-ROAST) #0: Tue Aug 20 18:33:04 BST 2013 iwj@mariner:/u/iwj/work/Rump-kernels/xen/extras/rumpuser-xen/rumpobj/lib/librump total memory = unlimited (host limit) timecounter: Timecounters tick every 10.000 msec timecounter: Timecounter "rumpclk" frequency 100 Hz quality 0 cpu0 at thinair0: rump virtual cpu root file system type: rumpfs ******************* BLKFRONT for device/vbd/768 ********** Page fault at linear address c2c2c36a, eip 0016961c, regs 0024ff04, sp c2c2c2c2, our_sp 0024fee8, code 0 Thread: biopoll EIP: 16961c, EFLAGS 10212. EBX: c2c2c2c2 ECX: 0020c0bc EDX: 0020c138 ESI: 00000000 EDI: 0000001a EBP: 0024ff80 EAX: c2c2c2c2 DS: c2c2e021 ES: e021 orig_eax: ffffffff, eip: 0016961c CS: e019 EFLAGS: 00010212 esp: c2c2c2c2 ss: c2c2c2c2 base is 0x24ff80 caller is 0x169819 base is 0x24ffa0 caller is 0x169a0a base is 0x24ffc0 caller is 0xfbf2 base is 0x24fff0 caller is 0x31ad c2c2c2b0:Page fault in pagetable walk (access to invalid memory?).
On Wed, 2013-08-21 at 14:41 +0100, Ian Jackson wrote:> Antti Kantee writes ("Re: [Xen-users] rump kernels running on the Xen hypervisor"): > > So, to answer your question, applications do not need to be explicitly > > written to use rump kernels, but all of the interfaces used by > > applications need to of course be provided somehow. [...] > > This is all very exciting and as Ian Jackson saysOh dear, if *you* are going to starting getting us confused I''m not sure what chance we have. :-P Ian.
On 21.8.2013 16:41, Ian Jackson wrote:> Antti Kantee writes ("Re: [Xen-users] rump kernels running on the Xen hypervisor"): >> So, to answer your question, applications do not need to be explicitly >> written to use rump kernels, but all of the interfaces used by >> applications need to of course be provided somehow. [...] > > This is all very exciting and as Ian Jackson says very similar to > something I''ve been working on. I started from the other end and it > may be that I can combine what I''ve done so far with what you''ve done.It would be great to find immediate synergies. Can you be more specific about what you''ve been working on?> I compiled up your example, against Xen 4.4-unstable, and I''m afraid > it doesn''t work for me. Console log below. Do you have any > suggestions for debugging it or should I just plunge in ? > > Did you test this on i386 or should I rebuild as amd64 ?I''m testing i386 dom0+domU with Xen 4.2.2. But I think we should make all of them work eventually, so might as well start now. I fixed/pushed one use-after-free which stood out. If the above wasn''t it ... I''m not sure I can teach anything about debugging Xen guests on these lists. I''ve been using simple gdbsx. Additionally, "l *0xEIP" in gdb has been quite effective for debugging crashes even without gdbsx -- the rump kernel bits are quite well tested and everything outside of it is so simple that it''s usually easy to just guess what''s going wrong. For debugging, everything is built with symbols, so you can dive right in.> I think this approach will be an excellent one for stub domains such > as qemu, proposed stub-pygrub, etc. > > But thinking about what you''ve done, I think we probably want to do > something a bit different with block and networking. > > Does the NetBSD VFS have > - tmpfs on variable-sized ramdisk > - romfs based on a cpio archive > or the like ? Producing an ffs image for something like a qemu-dm is > going to be annoying.You can create a FFS image with the portable makefs userspace utility and even edit the contents with the equally userspace fs-utils. Though, I''m not sure what qemu-dm is or why it needs an FFS image. For me, this is a bit like reading the proverbial math book where they leave 20 intermediate steps out of a proof because they''re considered "obvious" ;)> And often networking wants to be handled by something like SOCKS > rather than by having an extra TCP stack in the stub domain. The > reason for this is that it avoids having to allocate MAC and IP > addresses to a whole bunch of service domains; the administrator > probably wants them to piggyback on dom0''s networking.Ok, sounds like we shouldn''t include a full TCP/IP stack for that use case. There''s something called "sockin" for rump kernels that includes only the sockets layer but assumes the actual networking stack is elsewhere. I wrote it originally so that NFS clients in rump kernels could use host networking ... because configuring IP/MAC addresses for each NFS client wasn''t attractive. Maybe sockin fits your use case too? (I''m guessing here. see: math book) - antti
The sauce thickens. Up until recently rump kernels on Xen offered only the kernel API (i.e. a POSIX''y set of syscalls), but we now also support a mostly complete user-level API courtesy of an unmodified NetBSD''s libc (just a few build infra modifications which will hopefully go away soon''ish). Since the kernel services and libc interfaces are directly from a real OS, a good number of real programs will compile and work out-of-the-box as standalone Domu''s. As an early proof-of-concept, I''ve thrown together a domain which boots as a web server. Try to it out, clone the repo (*), run "./buildxen.sh", and create a domain with xl create. The service will configure an IP address for itself using dhcp, and after that it''s port 80 as usual. *) https://github.com/anttikantee/rumpuser-xen/ - antti