Rusty Russell
2005-Feb-26 20:57 UTC
[Xen-devel] Proposal for init/kexec/hotplug format for Xen
Hi all, This has a degree of overlap with Jeremy''s excellent work: I''ve been looking at the bundling of device information passed to guest OSes when they boot, and future uses for kexec and possibly the implementation of hotplug. For kexec and bare-metal bringup, the PPC64 port uses a fairly simple header + flattened tree of keyword/value pairs (on PPC64, used to hold the Open Firmware tree plus Linux extras). This offers flexibility for new virtual devices, etc; I propose that we adopt this format or something very similar for Xen, first by putting a pointer into it in start_info_t, and then migrate entries across as appropriate. Here''s the code from PPC64: /* Definitions used by the flattened device tree */ #define OF_DT_HEADER 0xd00dfeed /* 4: version, 4: total size */ #define OF_DT_BEGIN_NODE 0x1 /* Start node: full name */ #define OF_DT_END_NODE 0x2 /* End node */ #define OF_DT_PROP 0x3 /* Property: name off, size, content */ #define OF_DT_END 0x9 #define OF_DT_VERSION 1 /* * This is what gets passed to the kernel by prom_init or kexec * * The dt struct contains the device tree structure, full pathes and * property contents. The dt strings contain a separate block with just * the strings for the property names, and is fully page aligned and * self contained in a page, so that it can be kept around by the kernel, * each property name appears only once in this page (cheap compression) * * the mem_rsvmap contains a map of reserved ranges of physical memory, * passing it here instead of in the device-tree itself greatly simplifies * the job of everybody. It''s just a list of u64 pairs (base/size) that * ends when size is 0 */ struct boot_param_header { u32 magic; /* magic word OF_DT_HEADER */ u32 totalsize; /* total size of DT block */ u32 off_dt_struct; /* offset to structure */ u32 off_dt_strings; /* offset to strings */ u32 off_mem_rsvmap; /* offset to memory reserve map */ u32 version; /* format version */ u32 last_comp_version; /* last compatible version */ /* version 2 fields below */ u32 boot_cpuid_phys; /* Which physical CPU id we''re booting on */ }; Thoughts? Rusty. -- A bad analogy is like a leaky screwdriver -- Richard Braakman ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2005-Feb-27 10:53 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On 26 Feb 2005, at 20:57, Rusty Russell wrote:> This has a degree of overlap with Jeremy''s excellent work: I''ve been > looking at the bundling of device information passed to guest OSes when > they boot, and future uses for kexec and possibly the implementation of > hotplug. > > For kexec and bare-metal bringup, the PPC64 port uses a fairly simple > header + flattened tree of keyword/value pairs (on PPC64, used to hold > the Open Firmware tree plus Linux extras). This offers flexibility for > new virtual devices, etc; I propose that we adopt this format or > something very similar for Xen, first by putting a pointer into it in > start_info_t, and then migrate entries across as appropriate.I like the idea of bringing out device discovery, bringup, teardown, recovery all into its own driver or subsystem -- it seems the obvious way to go. But I think the ''device tree'' should be in the to-be-designed persistent store, and we publish an interface to allow guests to peek/poke that store. -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Rusty Russell
2005-Feb-27 11:46 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Sun, 2005-02-27 at 10:53 +0000, Keir Fraser wrote:> On 26 Feb 2005, at 20:57, Rusty Russell wrote: > > > This has a degree of overlap with Jeremy''s excellent work: I''ve been > > looking at the bundling of device information passed to guest OSes when > > they boot, and future uses for kexec and possibly the implementation of > > hotplug. > > > > For kexec and bare-metal bringup, the PPC64 port uses a fairly simple > > header + flattened tree of keyword/value pairs (on PPC64, used to hold > > the Open Firmware tree plus Linux extras). This offers flexibility for > > new virtual devices, etc; I propose that we adopt this format or > > something very similar for Xen, first by putting a pointer into it in > > start_info_t, and then migrate entries across as appropriate. > > I like the idea of bringing out device discovery, bringup, teardown, > recovery all into its own driver or subsystem -- it seems the obvious > way to go. But I think the ''device tree'' should be in the > to-be-designed persistent store, and we publish an interface to allow > guests to peek/poke that store.OK, I''ll hack some simple code together in anticipation of you supplying a place to put it. The PPC64 code has baggage we don''t want, but the tree-of-keyword-value-pairs idea is a winner I think. I''ll give you a call Monday and if you''re free I''ll head over to Cambridge and show you what I''ve got. Thanks! Rusty. -- A bad analogy is like a leaky screwdriver -- Richard Braakman ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-27 15:25 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
Keir Fraser wrote:>> For kexec and bare-metal bringup, the PPC64 port uses a fairly >> simple >> header + flattened tree of keyword/value pairs (on PPC64, used to hold >> the Open Firmware tree plus Linux extras). This offers flexibility for >> new virtual devices, etc; I propose that we adopt this format or >> something very similar for Xen, first by putting a pointer into it in >> start_info_t, and then migrate entries across as appropriate. > > > I like the idea of bringing out device discovery, bringup, teardown, > recovery all into its own driver or subsystem -- it seems the obvious > way to go. But I think the ''device tree'' should be in the > to-be-designed persistent store, and we publish an interface to allow > guests to peek/poke that store.I think publishing domain-information in an OF-like tree would be great. I think we want the persistent store to be outside the OF-tree though. It would provide a good buffer against ill-written management apps. The way I envision this working is to have a persistent store in user-space on a priviledged domain that exported within it''s tree the OF device-tree. This way management app information (the domain''s name, an icon associated with it, etc.) would not be stored in the OF tree. If you needed to blow away a portion of the store because of an misbehaving management app you do not lose any of the vital device information. Does PPC64 or rHype provide a mechanism to notify user-space daemons when a value in the tree changes? I think this is a great proposal. Regards, Anthony Liguori> -- Keir > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2005-Feb-27 15:48 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On 27 Feb 2005, at 15:25, Anthony Liguori wrote:>> I like the idea of bringing out device discovery, bringup, teardown, >> recovery all into its own driver or subsystem -- it seems the obvious >> way to go. But I think the ''device tree'' should be in the >> to-be-designed persistent store, and we publish an interface to allow >> guests to peek/poke that store. > > I think publishing domain-information in an OF-like tree would be > great. > > I think we want the persistent store to be outside the OF-tree though. > It would provide a good buffer against ill-written management apps. > > The way I envision this working is to have a persistent store in > user-space on a priviledged domain that exported within it''s tree the > OF device-tree. This way management app information (the domain''s > name, an icon associated with it, etc.) would not be stored in the OF > tree. If you needed to blow away a portion of the store because of an > misbehaving management app you do not lose any of the vital device > information.So you agree that the device info ought to reside within the persistent store? I certainly wasn''t suggesting that the persistent store exists within the device tree -- I don''t think that statement even makes sense (device tree is per-domain; persistent store is global). -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-27 15:54 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
Keir Fraser wrote:>> The way I envision this working is to have a persistent store in >> user-space on a priviledged domain that exported within it''s tree the >> OF device-tree. This way management app information (the domain''s >> name, an icon associated with it, etc.) would not be stored in the OF >> tree. If you needed to blow away a portion of the store because of >> an misbehaving management app you do not lose any of the vital device >> information. > > > So you agree that the device info ought to reside within the > persistent store? I certainly wasn''t suggesting thatYes. I was only pointing out that I think the persistent store should be in userspace. One approach would be to have another tree within the hypervisor that was the global persistent store. An advantage of that would be that it would be accessible by all domains without any special supporting software (like a TCP/IP stack). I think keeping the store in userspace has more advantages (mainly robustness and extensibility). It sounds like we''re all in agreement though :-) Regards, Anthony Liguori> -- Keir > >------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Harry Butterworth
2005-Feb-27 16:05 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
Anthony Liguori wrote> The way I envision this working is to have a persistent store in > user-space on a priviledged domain that exported within it''s tree the OF > device-tree.Keir and I briefly discussed the idea of putting the persistent store into a fault-tolerant domain to avoid it becoming a single point of failure for clustered systems. I think something along these lines is the right way to go for the long term. In the near term, if the rest of the system is architected such that fault-tolerant domains can be introduced transparently (for example by ensuring, amongst other things, that the API exposed to driver authors for the FE-BE inter-domain communication interface is compatible with an alternative network transparent implementation so you don''t have to rewrite all the drivers ;-) then the domain running the persistent store can be upgraded with fault-tolerance in the future. -- Harry Butterworth <harry@hebutterworth.freeserve.co.uk> ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2005-Feb-27 16:09 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On 27 Feb 2005, at 15:54, Anthony Liguori wrote:> Yes. I was only pointing out that I think the persistent store should > be in userspace. One approach would be to have another tree within > the hypervisor that was the global persistent store. An advantage of > that would be that it would be accessible by all domains without any > special supporting software (like a TCP/IP stack).I think we will provide a custom protocol for allowing guest kernels to access the persistent store -- it needn''t be very complicated, and will allow things like basic device bootstrap to be done via the store without any chicken-and-egg or deadlock problems. Also there may be security implications in allowing arbitrary guests to make TCP connections to domain0 (at the very least, there may be possible DoS attacks) -- of course we allow this by default right now, but we don''t want to make it a requirement of using Xen.> I think keeping the store in userspace has more advantages (mainly > robustness and extensibility). It sounds like we''re all in agreement > though :-)Yes, I think we are. :-) -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-27 16:16 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
Keir Fraser wrote:> I think we will provide a custom protocol for allowing guest kernels > to access the persistent store -- it needn''t be very complicated, and > will allow things like basic device bootstrap to be done via the store > without any chicken-and-egg or deadlock problems. Also there may be > security implications in allowing arbitrary guests to make TCP > connections to domain0 (at the very least, there may be possible DoS > attacks) -- of course we allow this by default right now, but we don''t > want to make it a requirement of using Xen.One of the things I''d like to see in the new management tools is a higher level interdomain communication library. A very useful abstraction would be a interdomain stream built on top of shared memory and event channels. Interdomain streams would be perfect for something like this. Regards, Anthony Liguori ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Harry Butterworth
2005-Feb-27 16:31 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Sun, 2005-02-27 at 10:16 -0600, Anthony Liguori wrote:> One of the things I''d like to see in the new management tools is a > higher level interdomain communication library. A very useful > abstraction would be a interdomain stream built on top of shared memory > and event channels. > > Interdomain streams would be perfect for something like this.Hey, this would help to make subsequent introduction of fault-tolerant domains transparent too ;-) --- OK, I''ll stop now :-) -- Harry Butterworth <harry@hebutterworth.freeserve.co.uk> ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-27 16:42 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
Harry Butterworth wrote:>>Interdomain streams would be perfect for something like this. >> >> > >Hey, this would help to make subsequent introduction of fault-tolerant >domains transparent too ;-) --- OK, I''ll stop now :-) > >lol :-) I can briefly outline the current design. Note the names are not really important. libxen-hcall Library for exporting hypercalls. libxen-idc Library for interdomain-communication primatives (providing the Sys-V IPC mechanisms seems like a reasonable start). libxen-store Interface for interacting with the domain store. Persisting daemons and these sorts of things link against this. libxen Management tool interface. Provide high-level functions that authors of management tools would expect. In most cases, this would be all that would be needed to link against. Developers can link against lower level libraries if necessary. All libraries should use autotool, pkg-config, proper versioning info, etc. All interfaces should be documented (this is a requirement of EAL besides being good practice). We could begin work today on libxen-hcall and libxen-idc while we work out what the store is going to like and how the OF structure is going to work. Thoughts? Regards, Anthony Liguori ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2005-Feb-27 17:32 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On 27 Feb 2005, at 16:42, Anthony Liguori wrote:> libxen-idc > Library for interdomain-communication primatives (providing the Sys-V > IPC mechanisms seems like a reasonable start).A reasonable interface to user-space, but what about when one endpoint is inside the kernel? I''m also concerned that SysV IPC is usually used between mutually trusting parties -- this is *not* necessarily the case between management services and managed domains. We need e.g., control over event-channel masking to be able to limit management resources consumed by overzealous or malicious guests. Providing SysV IPC would be generally useful, but perhaps not the right thing for guest access to management services (persistent store, console daemon, ...). -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-27 17:59 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
Keir Fraser wrote:> A reasonable interface to user-space, but what about when one endpoint > is inside the kernel? I''m also concerned that SysV IPC is usually used > between mutually trusting parties -- this is *not* necessarily the > case between management services and managed domains. We need e.g., > control over event-channel masking to be able to limit management > resources consumed by overzealous or malicious guests.I''m not suggesting we use the SysV interfaces, just the mechanisms (named shared memory, message queues, semaphores). As primatives, this seems like a good place to start. As interfaces, I agree that SysV is not the way to go :-) The persistent store protocol could be implemented on top of these mechanisms. Any primatives would also have to be shared between kernel and userspace (either through common header files or perhaps a kernel interface to userspace for these mechanisms). It''s really just about generalizing what''s already being used. Regards, Anthony Liguori ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Harry Butterworth
2005-Feb-27 18:12 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Sun, 2005-02-27 at 10:42 -0600, Anthony Liguori wrote:> We could begin work today on libxen-hcall and libxen-idc while we work > out what the store is going to like and how the OF structure is going to > work. Thoughts?The most difficult aspect of the inter-domain communication API to express from the point of view of forwards compatibility with a fault-tolerant implementation is that, in a fault-tolerant system with different levels of fault tolerance, some domains will come and go whilst others persist across failures. So, basically, the system model has to include the concept of different domains coming and going and the API must be sufficient for surviving domains to be able to implement a correct recovery. In turn, one of the difficult aspects of recovery is the problem of stale messages in the system. For example, if you have a driver domain providing a fault-tolerant domain with access to shared storage then, if the FT domain temporarily loses connectivity with the driver domain and then reconnects, it faces the problem that there may still be some of its old requests outstanding in the driver domain which could interfere with its subsequent operation. This kind of thing is a general problem that applies to all protocols in fault-tolerant systems and there is a choice as to whether to deal with stale messages on a per-protocol basis or come up with a global solution that works for all protocols. In the past, I''ve had some success with small clusters with a global approach that basically quiesces the whole system when something changes: the domain topology is determined; communication is established between all domains; clients in all domains are told the communication network is connected; clients make use of it; something goes wrong; the domain topology is redetermined; all the clients are told the communication network is disconnected and they quiesce all stale operations; once all clients are quiesced they are reconnected to a new epoch of the communications network; in the new epoch, all clients are guaranteed there is no stale activity in progress from the previous epoch. This deals with the problem of restarting protocols amongst the domains that recover connectivity after a failure but, on its own, isn''t quite sufficient because of the problem of disconnected domains. Consider the following example: A FT domain is served access to shared storage by two independent driver domains. To start with, the FT domain sends all I/O down a path through one of the driver domains. There is a problem and that domain becomes disconnected from the FT domain. The FT domain starts to send its I/O down through the other driver domain but stale requests outstanding in the disconnected domain are still in progress to the storage and interfere with its operation. One possible solution to the problem of disconnected domains is to maintain a lease such that when a domain is disconnected it is sufficient to wait for the lease to expire to guarantee that the disconnected domain will have stopped and will not interfere with subsequent operation. Another possible solution is to have some kind of fencing scheme which can prevent the disconnected domain from being able to access the shared resource after it is disconnected. The global quiesce and lease schemes are OK for fail-stop fault-tolerant systems with relatively infrequent failures but are not appropriate for byzantine fault tolerant systems. For byzantine fault-tolerance you''re going to need to contain the effect of a failure to the minimum scope and you can''t rely on a domain stopping when its lease expires so you need some kind of fencing scheme for shared resources. Trying to think too far ahead is possibly dangerous but you might at least like to evaluate any proposed IDC API against the above scenarios to see how well it might serve you in the future. -- Harry Butterworth <harry@hebutterworth.freeserve.co.uk> ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-27 18:24 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
Harry Butterworth wrote:>>We could begin work today on libxen-hcall and libxen-idc while we work >>out what the store is going to like and how the OF structure is going to >>work. Thoughts? >> >> > >The most difficult aspect of the inter-domain communication API to >express from the point of view of forwards compatibility with a >fault-tolerant implementation is that, in a fault-tolerant system with >different levels of fault tolerance, some domains will come and go >whilst others persist across failures. > > >I''m not sure fault-tolerance has to be implemented at the IDC primative level. That seems like something that''s implemented at a slightly higher-level in the stack. For instance, I''m not sure how to even think about what a fault-tolerant semaphore would be however I can certainly imagine being able to implement a fault-tolerant protocol that uses semaphores and shared memory. I think fault-tolerant primatives can quite comfortably sit on top of lower-level primatives. My initial reaction is that a fault-tolerant primative is going to have a fair bit of overhead. I think the interface is going to be fairly different too. I''m not sure you want to pay the price of transparent fault-tolerance in all circumstances. It would probably be better to expect to implement a separate set of fault tolerant devices and just design the non-tolerant devices for maximum code-reuse. Regards, Anthony Liguori ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Harry Butterworth
2005-Feb-27 18:55 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Sun, 2005-02-27 at 12:24 -0600, Anthony Liguori wrote:> Harry Butterworth wrote: > > >>We could begin work today on libxen-hcall and libxen-idc while we work > >>out what the store is going to like and how the OF structure is going to > >>work. Thoughts? > >> > >> > > > >The most difficult aspect of the inter-domain communication API to > >express from the point of view of forwards compatibility with a > >fault-tolerant implementation is that, in a fault-tolerant system with > >different levels of fault tolerance, some domains will come and go > >whilst others persist across failures. > > > >When I said "from the point of view of forwards compatibility with a fault-tolerant implementation" above I meant from the point of view of forwards compatibility with a fault-tolerant _domain_ implementation.> > > I''m not sure fault-tolerance has to be implemented at the IDC primative > level. That seems like something that''s implemented at a slightly > higher-level in the stack.Right, the IDC primitives themselves do not have to be fault tolerant...> It would probably be better to expect to implement a separate set of > fault tolerant devices and just design the non-tolerant devices for > maximum code-reuse....the trick is to implement a set of IDC primitives that A) can be used as the underlying communication mechanism to implement fault tolerant domains by, for example, using the replicated state machine approach to create a fault-tolerant domain out of a set of base domains and B) are then compatible with providing _exactly_ _the_ _same_ _API_ inside the fault-tolerant domain such that the software running inside the FT domain can be the same software that would run in a base domain. With this approach, you only have to implement fault-tolerance once and from then on you get it for free wherever you want it and you get _maximum_ code reuse because you can reuse all of your non-fault tolerant code as fault-tolerant code simply by running it unchanged but in a fault-tolerant domain. Do not underestimate the importance of this. -- Harry Butterworth <harry@hebutterworth.freeserve.co.uk> ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2005-Feb-27 18:57 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On 27 Feb 2005, at 18:55, Harry Butterworth wrote:> ..the trick is to implement a set of IDC primitives that A) can be used > as the underlying communication mechanism to implement fault tolerant > domains by, for example, using the replicated state machine approach to > create a fault-tolerant domain out of a set of base domains and B) are > then compatible with providing _exactly_ _the_ _same_ _API_ inside the > fault-tolerant domain such that the software running inside the FT > domain can be the same software that would run in a base domain.The IDC primitives are likely to be *so* low level that this will be a non-issue. Anything that can needs to be fault-tolerance aware in any way I think will be in higher layers. Really the libidc is almost a no-op -- we have shared memory and notifications -- semaphores and message queues on top of that is very little code. You sound like you are more worried about the device-channel setup/teardown/probe/recovery code. That would be above libidc, if we use libidc at all. -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-27 19:10 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
Keir Fraser wrote:> The IDC primitives are likely to be *so* low level that this will be a > non-issue. Anything that can needs to be fault-tolerance aware in any > way I think will be in higher layers. > > Really the libidc is almost a no-op -- we have shared memory and > notifications -- semaphores and message queues on top of that is very > little code.I agree.> You sound like you are more worried about the device-channel > setup/teardown/probe/recovery code. That would be above libidc, if we > use libidc at all.I''m currently prototyping a semaphore mechanism. One of the nice things I realized is that if a message queue uses semaphores, then something like xcs is unnecessary. What I''m thinking about right now is how to assign out ports for notification. It''s somewhat non-trivial to figure out the best way to manage that. Any thoughts? Regards, Anthony Liguori ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Harry Butterworth
2005-Feb-27 19:38 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Sun, 2005-02-27 at 18:57 +0000, Keir Fraser wrote:> On 27 Feb 2005, at 18:55, Harry Butterworth wrote: > > > ..the trick is to implement a set of IDC primitives that A) can be used > > as the underlying communication mechanism to implement fault tolerant > > domains by, for example, using the replicated state machine approach to > > create a fault-tolerant domain out of a set of base domains and B) are > > then compatible with providing _exactly_ _the_ _same_ _API_ inside the > > fault-tolerant domain such that the software running inside the FT > > domain can be the same software that would run in a base domain. > > The IDC primitives are likely to be *so* low level that this will be a > non-issue. Anything that can needs to be fault-tolerance aware in any > way I think will be in higher layers. >Part of what I''m trying to say is that, if you get the architecture right, the only code in the system that needs to be aware of its own fault tolerance is the code that takes N base domains and creates a fault-tolerant domain from it. All the other code in the system is completely unaware of its own fault tolerance (it only needs multi-pathing and hotplug to deal with external failures) even when it is made fault-tolerant by virtue of running in a fault-tolerant domain. If you get the architecture wrong, on the other hand, then you''ll end up implementing fault-tolerance over and over again in all the different components that require it which is much more work and much more error prone.> Really the libidc is almost a no-op -- we have shared memory and > notifications -- semaphores and message queues on top of that is very > little code.So, for example, if the IDC API exposes the shared memory implementation to the clients then the API will not be compatible with the above trick because shared memory will not work for the communication between the base domains and the FT domain as it is not compatible with the distributed consensus protocol and replication. This would mean that all the clients written to the shared memory API would need to be rewritten to a different API before they could run inside a FT domain. On the other hand, a shared memory implementation used to implement a channel model API is OK because the shared memory implementation will be nice and efficient for inter-base-domain communication and the shared memory implementation can be transparently replaced with a message passing implementation for base-to-FT-domain communication which must go via the consensus protocol. Similarly, an API which assumes all domains are mutually fully connected will not work with the above trick because other domains can only communicate with the FT domains by routing inter-domain communication though their base domains and the consensus protocol.> > You sound like you are more worried about the device-channel > setup/teardown/probe/recovery code. That would be above libidc, if we > use libidc at all.I''m not worried :-) Right, so the code above libidc including the device-channel setup/teardown/probe/recovery code should run unchanged in both base domains and FT domains. This code will link against the IDC API which means that the IDC API must be the same in both base and FT domains which means in turn that the IDC API is subject to the constraints above. Assuming you are ever interested in doing any of this at all of course which you might not be. -- Harry Butterworth <harry@hebutterworth.freeserve.co.uk> ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2005-Feb-27 21:49 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On 27 Feb 2005, at 19:10, Anthony Liguori wrote:>> You sound like you are more worried about the device-channel >> setup/teardown/probe/recovery code. That would be above libidc, if we >> use libidc at all. > > I''m currently prototyping a semaphore mechanism. One of the nice > things I realized is that if a message queue uses semaphores, then > something like xcs is unnecessary.The purpose of xcs was to shim under xend and allow certain message types to be redirected to other tools. It won''t be required in the next-gen tools anyway -- it''s usefulness is entirely orthogonal to whether or not we have semaphores.> What I''m thinking about right now is how to assign out ports for > notification. It''s somewhat non-trivial to figure out the best way to > manage that. Any thoughts?The difficulty may be that, in the case of normal SysV IPC you have a common OS instance to manage shmem namespaces and semaphores and so on. For IDC over Xen you do not have this luxury, unless you modify Xen, or you are building over higher-level communication primitives (which perhaps defeats the purpose). -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-27 22:39 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
Keir Fraser wrote:>> What I''m thinking about right now is how to assign out ports for >> notification. It''s somewhat non-trivial to figure out the best way >> to manage that. Any thoughts? > > > The difficulty may be that, in the case of normal SysV IPC you have a > common OS instance to manage shmem namespaces and semaphores and so > on. For IDC over Xen you do not have this luxury, unless you modify > Xen, or you are building over higher-level communication primitives > (which perhaps defeats the purpose).This is what makes the OF directory structure so interesting. The per-domain OF structure could be used as an IDC namespace. This requires no additional modification to Xen (other than what the OF structure would). Another approach would be to have a common, well-known shared memory location for storing namespace info for each domain. That''s a bit hairy though. I agree that we don''t want to modify Xen in order to implement higher-level communications mechanisms. Regards,> -- Keir > >------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Harry Butterworth
2005-Feb-27 23:29 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Sun, 2005-02-27 at 16:39 -0600, Anthony Liguori wrote:> Keir Fraser wrote: > > >> What I''m thinking about right now is how to assign out ports for > >> notification. It''s somewhat non-trivial to figure out the best way > >> to manage that. Any thoughts? > > > > > > The difficulty may be that, in the case of normal SysV IPC you have a > > common OS instance to manage shmem namespaces and semaphores and so > > on. For IDC over Xen you do not have this luxury, unless you modify > > Xen, or you are building over higher-level communication primitives > > (which perhaps defeats the purpose). > > This is what makes the OF directory structure so interesting. The > per-domain OF structure could be used as an IDC namespace. This > requires no additional modification to Xen (other than what the OF > structure would). >Say you have an IDC API which allows services to be installed in different domains and accessed remotely from other domains. For resource discovery and hotplug, what you require is a service that implements a publish and subscribe API, call it a registry for the sake of argument. When a driver domain boots, it can be passed the IDC API address of the registry to use to publish the devices that it is serving. When a guest domain boots, it can be passed the IDC API address of the registry to use to discover the devices it is allowed to use. When driver domains discover devices, they publish the availability of the devices in their registry. When guest domains boot, they connect to their registry to subscribe to notification of device availability. If the connection process has an asynchronous completion then the protocol might specify that notification for all devices already registered on connection is given to the guest domain before completion of the connection process. This allows the guest domain to know how long to wait for devices to appear before continuing with the boot process. When driver domains lose access to devices or discover new ones they keep their registry updated which in turn notifies connected guest domains. If one of the classes of devices that can be advertised in a registry is allowed to be a bus device which implements a registry interface then you can implement a hierarchical space. Also, there''s no reason that the driver domains need to be directly connected to the registry used by the guest domains. You could for example have the driver domain connect to a registry connected to a domain implementing access control which would then republish the availability of the devices in separate registries for a number of guest domains according to the access control policy configured for those guests. Another option might be for a guest domain to republish the availability of devices in a child registry for a child domain created out of the resources of the guest domain. Maybe you can think of how to construct something like this based around the OF directory structure. -- Harry Butterworth <harry@hebutterworth.freeserve.co.uk> ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2005-Feb-28 08:59 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On 27 Feb 2005, at 22:39, Anthony Liguori wrote:>>> What I''m thinking about right now is how to assign out ports for >>> notification. It''s somewhat non-trivial to figure out the best way >>> to manage that. Any thoughts? >> >> >> The difficulty may be that, in the case of normal SysV IPC you have a >> common OS instance to manage shmem namespaces and semaphores and so >> on. For IDC over Xen you do not have this luxury, unless you modify >> Xen, or you are building over higher-level communication primitives >> (which perhaps defeats the purpose). > > This is what makes the OF directory structure so interesting. The > per-domain OF structure could be used as an IDC namespace. This > requires no additional modification to Xen (other than what the OF > structure would).If you''re talking about the persistent store, then each guest will probably ''connect'' via some custom protocol and present an event channel at that time. Or we''ll preallocate one when we build the domain. Then all notifications will be on that channel. -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Rusty Russell
2005-Feb-28 12:06 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Sun, 2005-02-27 at 09:25 -0600, Anthony Liguori wrote:> Keir Fraser wrote: > > I like the idea of bringing out device discovery, bringup, teardown, > > recovery all into its own driver or subsystem -- it seems the obvious > > way to go. But I think the ''device tree'' should be in the > > to-be-designed persistent store, and we publish an interface to allow > > guests to peek/poke that store. > > I think publishing domain-information in an OF-like tree would be great.I''m not convinced of the persistent store idea, at least for this. I think it''s simpler to have it dropped into memory at boot (just like the initrd image), and then later messages are sent which update it (ie. hotlplug) which have a similar form. The former is implemented, but breaks when I actually test it (you can see that code #ifdef''ed out, debugging now). I added simple routines so that you can build a device tree and pass it to the domain builder. Included below for your reading pleasure, Rusty. diff -urpN --exclude=''*.py'' --exclude dist --exclude html --exclude ps --exclude ''*-xen0'' --exclude ''*-xenU'' --exclude ''pristine-*'' --exclude TAGS --exclude ''*.o'' --exclude asm-offsets.h --exclude asm-offsets.s --exclude .chkbuild --exclude ''*~'' --exclude ''.*.d'' --exclude classlist.h --exclude devlist.h --exclude asm --exclude banner.h --exclude compile.h --minimal xen-unstable/linux-2.6.10-xen-sparse/arch/xen/i386/kernel/setup.c xen-unstable-devtree/linux-2.6.10-xen-sparse/arch/xen/i386/kernel/setup.c --- xen-unstable/linux-2.6.10-xen-sparse/arch/xen/i386/kernel/setup.c 2005-02-26 01:20:55.000000000 +1100 +++ xen-unstable-devtree/linux-2.6.10-xen-sparse/arch/xen/i386/kernel/setup.c 2005-02-28 09:27:36.000000000 +1100 @@ -1142,6 +1142,11 @@ static unsigned long __init setup_memory } #endif + /* We''re going to keep pointers into this */ + if (xen_start_info.devtree_len) + reserve_bootmem(xen_start_info.devtree_start, + xen_start_info.devtree_len); + phys_to_machine_mapping = (unsigned int *)xen_start_info.mfn_list; return max_low_pfn; diff -urpN --exclude=''*.py'' --exclude dist --exclude html --exclude ps --exclude ''*-xen0'' --exclude ''*-xenU'' --exclude ''pristine-*'' --exclude TAGS --exclude ''*.o'' --exclude asm-offsets.h --exclude asm-offsets.s --exclude .chkbuild --exclude ''*~'' --exclude ''.*.d'' --exclude classlist.h --exclude devlist.h --exclude asm --exclude banner.h --exclude compile.h --minimal xen-unstable/linux-2.6.10-xen-sparse/arch/xen/kernel/devtree.c xen-unstable-devtree/linux-2.6.10-xen-sparse/arch/xen/kernel/devtree.c --- xen-unstable/linux-2.6.10-xen-sparse/arch/xen/kernel/devtree.c 1970-01-01 10:00:00.000000000 +1000 +++ xen-unstable-devtree/linux-2.6.10-xen-sparse/arch/xen/kernel/devtree.c 2005-02-28 10:08:44.000000000 +1100 @@ -0,0 +1,148 @@ +/****************************************************************************** + * Simple device tree unbundling and query interface. + * + * Copyright (C) 2005 Rusty Russell IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#include <asm-xen/devtree.h> + +LIST_HEAD(devtree_root); +spinlock_t devtree_lock; + +/* Skip separator, return next path element. */ +static inline const char *path_element(const char *path, unsigned int *len) +{ + path += strspn(path, "/"); + *len = strcspn(path, "/"); + return path; +} + +struct xen_devtree *devtree_get(const char *path) +{ + struct xen_devtree *i; + struct list_head *list = &devtree_root; + unsigned int pathlen; + + assert_spin_locked(&devtree_lock); + + path = path_element(path, &pathlen); +next_level: + list_for_each_entry(i, list, siblings) { + /* Doesn''t match? Keep looking through names. */ + if (strlen(i->name) != pathlen + || memcmp(path, i->name, pathlen) != 0) + continue; + + path = path_element(path, &pathlen); + /* Finished iteration? This is the element they want. */ + if (pathlen == 0) + return i; + + list = &i->children; + goto next_level; + } + return NULL; +} + +/* This is recursive, but used for debugging simple trees, so an OK hack. */ +static void dump_tree(struct list_head *list, int indent) +{ + struct xen_devtree *i; + + list_for_each_entry(i, list, siblings) { + unsigned int j; + for (j = 0; j < indent; j++) + printk(" "); + printk("- %s (%u)\n", i->name, i->length); + dump_tree(&i->children, indent+2); + } +} + +#define DEVTREE_VERSION 1 + +/* Initial routine to populate tree from flattened structure. */ +static void __init devtree_init(void) +{ + struct xen_devtree_header *head; + const struct xen_devtree_node *nodes; + const char *names; + void *data; + struct xen_devtree **dev_array; + unsigned int i, length; + + BUG_ON(!list_empty(&devtree_root)); + + head = (void *)xen_start_info.devtree_start; + length = xen_start_info.devtree_len; + + /* No device tree at all currently for domain 0. */ + if (length == 0) + return; + + if (head->last_comp_version > DEVTREE_VERSION) + panic("Incompatible Xen device tree version %i (%i)\n", + head->version, head->last_comp_version); + + /* Simple sanity checks. */ + if (head->root_offset >= length + || head->root_offset + head->num_nodes * sizeof(*nodes) >= length + || head->name_offset >= length + || head->data_offset >= length) + panic("Xen device tree v%i %u/%u/%u/%u overflows %lu\n", + head->version, head->root_offset, head->num_nodes, + head->name_offset, head->data_offset, + length); + + nodes = (void *)head + head->root_offset; + names = (void *)head + head->name_offset; + data = (void *)head + head->data_offset; + + /* Set up temporary node number to node mapping. */ + dev_array = kmalloc(sizeof(*dev_array) * head->num_nodes, GFP_KERNEL); + + /* Element 0 is simple a root node marker; ignore it. */ + for (i = 1; i < head->num_nodes; i++) { + BUG_ON(nodes[i].parent >= i); + + dev_array[i] = kmalloc(sizeof(*dev_array[i]), GFP_KERNEL); + dev_array[i]->name = names + nodes[i].name; + dev_array[i]->length = nodes[i].len; + dev_array[i]->data = data + nodes[i].data; + INIT_LIST_HEAD(&dev_array[i]->children); + /* Locking here is simple paranoia. */ + spin_lock_irq(&devtree_lock); + if (nodes[i].parent == 0) + list_add(&dev_array[i]->siblings, &devtree_root); + else + list_add(&dev_array[i]->siblings, + &dev_array[nodes[i].parent]->children); + spin_unlock_irq(&devtree_lock); + } + kfree(dev_array); + + dump_tree(&devtree_root, 0); +} + +core_initcall(devtree_init); diff -urpN --exclude=''*.py'' --exclude dist --exclude html --exclude ps --exclude ''*-xen0'' --exclude ''*-xenU'' --exclude ''pristine-*'' --exclude TAGS --exclude ''*.o'' --exclude asm-offsets.h --exclude asm-offsets.s --exclude .chkbuild --exclude ''*~'' --exclude ''.*.d'' --exclude classlist.h --exclude devlist.h --exclude asm --exclude banner.h --exclude compile.h --minimal xen-unstable/linux-2.6.10-xen-sparse/include/asm-xen/devtree.h xen-unstable-devtree/linux-2.6.10-xen-sparse/include/asm-xen/devtree.h --- xen-unstable/linux-2.6.10-xen-sparse/include/asm-xen/devtree.h 1970-01-01 10:00:00.000000000 +1000 +++ xen-unstable-devtree/linux-2.6.10-xen-sparse/include/asm-xen/devtree.h 2005-02-28 10:05:46.000000000 +1100 @@ -0,0 +1,27 @@ +#ifndef __ASM_DEVTREE_H__ +#define __ASM_DEVTREE_H__ +#include <linux/list.h> +#include <linux/types.h> +#include <linux/spinlock.h> +#include <linux/init.h> + +struct xen_devtree +{ + /* Siblings list */ + struct list_head siblings; + + /* Children */ + struct list_head children; + + const char *name; + u32 length; + void *data; +}; + +/* Root of the tree, and the big lock to protect all allocations. */ +extern struct list_head devtree_root; +extern spinlock_t devtree_lock; + +/* Must be holding devtree_lock. Returns NULL if no such entry. */ +struct xen_devtree *devtree_get(const char *path); +#endif /* __ASM_DEVTREE_H__ */ diff -urpN --exclude=''*.py'' --exclude dist --exclude html --exclude ps --exclude ''*-xen0'' --exclude ''*-xenU'' --exclude ''pristine-*'' --exclude TAGS --exclude ''*.o'' --exclude asm-offsets.h --exclude asm-offsets.s --exclude .chkbuild --exclude ''*~'' --exclude ''.*.d'' --exclude classlist.h --exclude devlist.h --exclude asm --exclude banner.h --exclude compile.h --minimal xen-unstable/tools/libxc/Makefile xen-unstable-devtree/tools/libxc/Makefile --- xen-unstable/tools/libxc/Makefile 2005-02-26 01:20:58.000000000 +1100 +++ xen-unstable-devtree/tools/libxc/Makefile 2005-02-28 04:59:05.000000000 +1100 @@ -29,6 +29,7 @@ SRCS += xc_misc.c SRCS += xc_physdev.c SRCS += xc_private.c SRCS += xc_rrobin.c +SRCS += xc_devtree.c SRCS += xc_vmx_build.c CFLAGS += -Wall diff -urpN --exclude=''*.py'' --exclude dist --exclude html --exclude ps --exclude ''*-xen0'' --exclude ''*-xenU'' --exclude ''pristine-*'' --exclude TAGS --exclude ''*.o'' --exclude asm-offsets.h --exclude asm-offsets.s --exclude .chkbuild --exclude ''*~'' --exclude ''.*.d'' --exclude classlist.h --exclude devlist.h --exclude asm --exclude banner.h --exclude compile.h --minimal xen-unstable/tools/libxc/xc.h xen-unstable-devtree/tools/libxc/xc.h --- xen-unstable/tools/libxc/xc.h 2005-02-07 15:12:20.000000000 +1100 +++ xen-unstable-devtree/tools/libxc/xc.h 2005-02-28 07:07:35.000000000 +1100 @@ -169,6 +169,30 @@ int xc_shadow_control(int xc_handle, struct XcIOContext; +/*\ + * Functions to handle device trees, which are fed into the image in a + * flattened form to describe their environment. Device trees are + * trees of keyword value pairs, eg /cpu/1/foo = 1. +\*/ +struct devtree; + +struct devtree *xc_devtree_root_alloc(void); + +#define xc_devtree_add(r, d, n, var) \ + xc_devtree_add_bytes(r, d, n, &(var), sizeof(var), __alignof__(var)) + +#define xc_devtree_add_string(r, d, n, string) \ + xc_devtree_add_bytes(r, d, n, (string), strlen(string)+1, 1) + +struct devtree *xc_devtree_add_bytes(struct devtree *root, + const char *dirname, + const char *name, + void *data, + u32 length, + u32 align); + +void xc_devtree_free(struct devtree *root); + /** * This function will save a domain running Linux to an IO context. This * IO context is currently a private interface making this function difficult @@ -198,7 +222,8 @@ int xc_linux_build(int xc_handle, const char *cmdline, unsigned int control_evtchn, unsigned long flags, - unsigned int vcpus); + unsigned int vcpus, + struct devtree *devtree); int xc_plan9_build (int xc_handle, diff -urpN --exclude=''*.py'' --exclude dist --exclude html --exclude ps --exclude ''*-xen0'' --exclude ''*-xenU'' --exclude ''pristine-*'' --exclude TAGS --exclude ''*.o'' --exclude asm-offsets.h --exclude asm-offsets.s --exclude .chkbuild --exclude ''*~'' --exclude ''.*.d'' --exclude classlist.h --exclude devlist.h --exclude asm --exclude banner.h --exclude compile.h --minimal xen-unstable/tools/libxc/xc_devtree.c xen-unstable-devtree/tools/libxc/xc_devtree.c --- xen-unstable/tools/libxc/xc_devtree.c 1970-01-01 10:00:00.000000000 +1000 +++ xen-unstable-devtree/tools/libxc/xc_devtree.c 2005-02-28 10:09:32.000000000 +1100 @@ -0,0 +1,222 @@ +/* + * Code to set up the device tree for a partition. + * + * Copyright (C) 2005 Rusty Russell IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#include "xc_private.h" + +#define DEVTREE_VERSION 1 +#define DEVTREE_COMPAT_VERSION 1 + +struct devtree +{ + struct devtree *sibling; + struct devtree *children; + + u32 datalen; + u32 align; + void *data; + + char name[0]; +}; + +/* Skip separator, return next path element. */ +static inline const char *path_element(const char *path, unsigned int *len) +{ + path += strspn(path, "/"); + *len = strcspn(path, "/"); + return path; +} + +/* Find this name (of this length) under the root. */ +static struct devtree *find(struct devtree *root, + const char *name, unsigned int len) +{ + struct devtree *i; + for (i = root->children; i; i = i->sibling) + if (strlen(i->name) == len && memcmp(name, i->name, len) == 0) + return i; + return NULL; +} + +static struct devtree *xc_dt_new(const char *name, unsigned int namelen, + void *data, unsigned int datalen, + unsigned int align) +{ + struct devtree *dt = malloc(sizeof(*dt) + namelen + 1 + datalen); + + if (!dt) + return NULL; + + dt->children = NULL; + dt->sibling = NULL; + + memcpy(dt->name, name, namelen); + dt->name[namelen] = ''\0''; + + dt->datalen = datalen; + dt->align = align; + dt->data = dt->name + namelen + 1; + memcpy(dt->data, data, datalen); + return dt; +} + +struct devtree *xc_devtree_root_alloc(void) +{ + return xc_dt_new(NULL, 0, NULL, 0, 1); +} + +/* Find directory, make it if neccessary. */ +static struct devtree *find_dir(struct devtree *root, const char *dirname) +{ + unsigned int dirlen; + struct devtree *subdir; + + dirname = path_element(dirname, &dirlen); + if (dirlen == 0) + return root; + + subdir = find(root, dirname, dirlen); + if (!subdir) { + subdir = xc_dt_new(dirname, dirlen, NULL, 0, 0); + if (!subdir) + return NULL; + subdir->sibling = root->children; + root->children = subdir; + } + return find_dir(subdir, dirname + dirlen); +} + +struct devtree *xc_devtree_add_bytes(struct devtree *root, + const char *dirname, + const char *name, + void *data, + u32 length, + u32 align) +{ + struct devtree *dir, *dt; + + dir = find_dir(root, dirname); + if (!dir) + return NULL; + + dt = xc_dt_new(name, strlen(name), data, length, align); + if (!dt) + return NULL; + + dt->sibling = dir->children; + dir->children = dt; + return root; +} + +void xc_devtree_free(struct devtree *root) +{ + struct devtree *i; + + for (i = root->children; i; i = i->sibling) + xc_devtree_free(i); + free(root); +} + +/* Assume alignment is a power of 2. */ +static inline u32 align_up(u32 val, u32 alignment) +{ + return (val + alignment-1) & ~(alignment-1); +} + +static u32 calculate_sizes(struct devtree *root, u32 *namelen, u32 *datalen, + u32 *maxalign) +{ + u32 nodes = 1; + struct devtree *i; + + *namelen += strlen(root->name) + 1; + *datalen = align_up(*datalen, root->align) + root->datalen; + + if (root->align > *maxalign) + *maxalign = root->align; + + for (i = root->children; i; i = i->sibling) + nodes += calculate_sizes(i, namelen, datalen, maxalign); + return nodes; +} + +/* FIXME: In future if trees get big, search for duplicate names. */ +static void copy_nodes(struct devtree *root, u32 parent_index, + struct xen_devtree_node *nodes, u32 *nodeindex, + char *names, u32 *nameindex, + void *data, u32 *dataindex) +{ + struct devtree *i; + + /* Copy myself in. */ + nodes[*nodeindex].parent = parent_index; + strcpy(names + *nameindex, root->name); + nodes[*nodeindex].name = *nameindex; + *nameindex += strlen(root->name) + 1; + *dataindex = align_up(*dataindex, root->align); + memcpy(data + *dataindex, root->data, root->datalen); + nodes[*nodeindex].data = *dataindex; + *dataindex += root->datalen; + nodes[*nodeindex].len = root->datalen; + + parent_index = (*nodeindex); + (*nodeindex)++; + + /* Copy children in. */ + for (i = root->children; i; i = i->sibling) + copy_nodes(i, parent_index, nodes, nodeindex, + names, nameindex, data, dataindex); +} + +struct xen_devtree_header *xc_devtree_flatten(struct devtree *root, u32 *len) +{ + u32 nodes, namelen, datalen, maxalign; + u32 nodeindex, nameindex, dataindex; + struct xen_devtree_header *hdr; + + namelen = datalen = 0; + maxalign = 1; + + /* Walk once to calculate lengths (and hence offsets). */ + nodes = calculate_sizes(root, &namelen, &datalen, &maxalign); + + *len = sizeof(*hdr) + nodes * sizeof(struct xen_devtree_node); + *len += namelen; + *len = align_up(*len, maxalign); + *len += datalen; + + hdr = malloc(*len); + if (!hdr) + return NULL; + + hdr->version = DEVTREE_VERSION; + hdr->last_comp_version = DEVTREE_COMPAT_VERSION; + hdr->root_offset = sizeof(*hdr); + hdr->num_nodes = nodes; + hdr->name_offset = hdr->root_offset + + nodes * sizeof(struct xen_devtree_node); + hdr->data_offset = align_up(hdr->name_offset + namelen, maxalign); + + nodeindex = nameindex = dataindex = 0; + /* Walk a second time actually copying. */ + copy_nodes(root, 0, + (void *)hdr + hdr->root_offset, &nodeindex, + (void *)hdr + hdr->name_offset, &nameindex, + (void *)hdr + hdr->data_offset, &dataindex); + return hdr; +} diff -urpN --exclude=''*.py'' --exclude dist --exclude html --exclude ps --exclude ''*-xen0'' --exclude ''*-xenU'' --exclude ''pristine-*'' --exclude TAGS --exclude ''*.o'' --exclude asm-offsets.h --exclude asm-offsets.s --exclude .chkbuild --exclude ''*~'' --exclude ''.*.d'' --exclude classlist.h --exclude devlist.h --exclude asm --exclude banner.h --exclude compile.h --minimal xen-unstable/tools/libxc/xc_linux_build.c xen-unstable-devtree/tools/libxc/xc_linux_build.c --- xen-unstable/tools/libxc/xc_linux_build.c 2005-02-26 01:20:58.000000000 +1100 +++ xen-unstable-devtree/tools/libxc/xc_linux_build.c 2005-02-28 06:25:05.000000000 +1100 @@ -52,7 +52,8 @@ static int setup_guest(int xc_handle, unsigned long shared_info_frame, unsigned int control_evtchn, unsigned long flags, - unsigned int vcpus) + unsigned int vcpus, + void *flat_tree, unsigned long flat_size) { l1_pgentry_t *vl1tab=NULL, *vl1e=NULL; l2_pgentry_t *vl2tab=NULL, *vl2e=NULL; @@ -72,6 +73,8 @@ static int setup_guest(int xc_handle, struct domain_setup_info dsi; unsigned long vinitrd_start; unsigned long vinitrd_end; + unsigned long vdevtree_start; + unsigned long vdevtree_end; unsigned long vphysmap_start; unsigned long vphysmap_end; unsigned long vstartinfo_start; @@ -110,7 +113,9 @@ static int setup_guest(int xc_handle, */ vinitrd_start = round_pgup(dsi.v_end); vinitrd_end = vinitrd_start + initrd_len; - vphysmap_start = round_pgup(vinitrd_end); + vdevtree_start = round_pgup(vinitrd_end); + vdevtree_end = vdevtree_start + flat_size; + vphysmap_start = round_pgup(vdevtree_end); vphysmap_end = vphysmap_start + (nr_pages * sizeof(unsigned long)); vpt_start = round_pgup(vphysmap_end); for ( nr_pt_pages = 2; ; nr_pt_pages++ ) @@ -131,6 +136,7 @@ static int setup_guest(int xc_handle, printf("VIRTUAL MEMORY ARRANGEMENT:\n" " Loaded kernel: %08lx->%08lx\n" " Init. ramdisk: %08lx->%08lx\n" + " Device tree: %08lx->%08lx\n" " Phys-Mach map: %08lx->%08lx\n" " Page tables: %08lx->%08lx\n" " Start info: %08lx->%08lx\n" @@ -138,6 +144,7 @@ static int setup_guest(int xc_handle, " TOTAL: %08lx->%08lx\n", dsi.v_kernstart, dsi.v_kernend, vinitrd_start, vinitrd_end, + vdevtree_start, vdevtree_end, vphysmap_start, vphysmap_end, vpt_start, vpt_end, vstartinfo_start, vstartinfo_end, @@ -187,6 +194,15 @@ static int setup_guest(int xc_handle, } } + /* Load the flattened device tree. */ + for ( i = (vdevtree_start - dsi.v_start); + i < (vdevtree_end - dsi.v_start); i += PAGE_SIZE ) + { + xc_copy_to_domain_page(xc_handle, dom, + page_array[i>>PAGE_SHIFT], + flat_tree + i); + } + if ( (mmu = init_mmu_updates(xc_handle, dom)) == NULL ) goto error_out; @@ -323,7 +339,8 @@ int xc_linux_build(int xc_handle, const char *cmdline, unsigned int control_evtchn, unsigned long flags, - unsigned int vcpus) + unsigned int vcpus, + struct devtree *devtree) { dom0_op_t launch_op, op; int initrd_fd = -1; @@ -333,7 +350,9 @@ int xc_linux_build(int xc_handle, unsigned long nr_pages; char *image = NULL; unsigned long image_size, initrd_size=0; + u32 flat_size = 0; unsigned long vstartinfo_start, vkern_entry; + struct xen_devtree_header *flat_tree = NULL; if ( (nr_pages = xc_get_tot_pages(xc_handle, domid)) < 0 ) { @@ -383,13 +402,22 @@ int xc_linux_build(int xc_handle, ERROR("Domain is already constructed"); goto error_out; } + if ( devtree ) + { + flat_tree = xc_devtree_flatten(devtree, &flat_size); + if ( !flat_tree ) { + ERROR("Out of memory flattening device tree"); + goto error_out; + } + } if ( setup_guest(xc_handle, domid, image, image_size, initrd_gfd, initrd_size, nr_pages, &vstartinfo_start, &vkern_entry, ctxt, cmdline, op.u.getdomaininfo.shared_info_frame, - control_evtchn, flags, vcpus) < 0 ) + control_evtchn, flags, vcpus, + flat_tree, flat_size) < 0 ) { ERROR("Error constructing guest OS"); goto error_out; @@ -401,6 +429,8 @@ int xc_linux_build(int xc_handle, gzclose(initrd_gfd); if ( image != NULL ) free(image); + if ( flat_tree ) + free(flat_tree); ctxt->flags = 0; diff -urpN --exclude=''*.py'' --exclude dist --exclude html --exclude ps --exclude ''*-xen0'' --exclude ''*-xenU'' --exclude ''pristine-*'' --exclude TAGS --exclude ''*.o'' --exclude asm-offsets.h --exclude asm-offsets.s --exclude .chkbuild --exclude ''*~'' --exclude ''.*.d'' --exclude classlist.h --exclude devlist.h --exclude asm --exclude banner.h --exclude compile.h --minimal xen-unstable/tools/libxc/xc_private.h xen-unstable-devtree/tools/libxc/xc_private.h --- xen-unstable/tools/libxc/xc_private.h 2005-02-07 15:12:19.000000000 +1100 +++ xen-unstable-devtree/tools/libxc/xc_private.h 2005-02-28 07:08:07.000000000 +1100 @@ -202,4 +202,9 @@ void xc_map_memcpy(unsigned long dst, ch int xch, u32 dom, unsigned long *parray, unsigned long vstart); +/* Returns a flattened structure: free() as normal. */ +struct xen_devtree_header; +struct devtree; +struct xen_devtree_header *xc_devtree_flatten(struct devtree *root, u32 *len); + #endif /* __XC_PRIVATE_H__ */ diff -urpN --exclude=''*.py'' --exclude dist --exclude html --exclude ps --exclude ''*-xen0'' --exclude ''*-xenU'' --exclude ''pristine-*'' --exclude TAGS --exclude ''*.o'' --exclude asm-offsets.h --exclude asm-offsets.s --exclude .chkbuild --exclude ''*~'' --exclude ''.*.d'' --exclude classlist.h --exclude devlist.h --exclude asm --exclude banner.h --exclude compile.h --minimal xen-unstable/tools/python/xen/lowlevel/xc/xc.c xen-unstable-devtree/tools/python/xen/lowlevel/xc/xc.c --- xen-unstable/tools/python/xen/lowlevel/xc/xc.c 2005-02-26 01:20:59.000000000 +1100 +++ xen-unstable-devtree/tools/python/xen/lowlevel/xc/xc.c 2005-02-28 10:18:30.000000000 +1100 @@ -349,19 +349,34 @@ static PyObject *pyxc_linux_build(PyObje u32 dom; char *image, *ramdisk = NULL, *cmdline = ""; int control_evtchn, flags = 0, vcpus = 1; + struct devtree *root = NULL; static char *kwd_list[] = { "dom", "control_evtchn", "image", "ramdisk", "cmdline", "flags", "vcpus", NULL }; +#if 0 + root = xc_devtree_root_alloc(); + root = xc_devtree_add_string(root, "/", "test", "teststring"); + { + int x = 7; + root = xc_devtree_add(root, "/dir/subdir", "seven", x); + } +#endif + if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iis|ssii", kwd_list, &dom, &control_evtchn, &image, &ramdisk, &cmdline, &flags, &vcpus) ) return NULL; + if ( xc_linux_build(xc->xc_handle, dom, image, - ramdisk, cmdline, control_evtchn, flags, vcpus) != 0 ) + ramdisk, cmdline, control_evtchn, flags, vcpus, root) != 0 ) return PyErr_SetFromErrno(xc_error); + +#if 0 + xc_devtree_free(root); +#endif Py_INCREF(zero); return zero; diff -urpN --exclude=''*.py'' --exclude dist --exclude html --exclude ps --exclude ''*-xen0'' --exclude ''*-xenU'' --exclude ''pristine-*'' --exclude TAGS --exclude ''*.o'' --exclude asm-offsets.h --exclude asm-offsets.s --exclude .chkbuild --exclude ''*~'' --exclude ''.*.d'' --exclude classlist.h --exclude devlist.h --exclude asm --exclude banner.h --exclude compile.h --minimal xen-unstable/xen/include/public/xen.h xen-unstable-devtree/xen/include/public/xen.h --- xen-unstable/xen/include/public/xen.h 2005-02-26 01:21:04.000000000 +1100 +++ xen-unstable-devtree/xen/include/public/xen.h 2005-02-28 04:50:02.000000000 +1100 @@ -428,8 +428,30 @@ typedef struct { _MEMORY_PADDING(F); memory_t mod_len; /* 56: Size (bytes) of pre-loaded module. */ _MEMORY_PADDING(G); - u8 cmd_line[MAX_CMDLINE]; /* 64 */ -} PACKED start_info_t; /* 320 bytes */ + memory_t devtree_start; /* 64: VIRTUAL address of device tree. */ + _MEMORY_PADDING(H); + memory_t devtree_len; /* 72: Size (bytes) of device tree. */ + _MEMORY_PADDING(I); + u8 cmd_line[MAX_CMDLINE]; /* 80 */ +} PACKED start_info_t; /* 336 bytes */ + +/* Nodes are arranged so parent always preceeds us. */ +struct xen_devtree_node +{ + u32 parent; /* Index of parent (root has parent 0 = self) */ + u32 name; /* Offset of name in name table. */ + u32 data, len; /* Offset of data in data table, and length */ +}; + +struct xen_devtree_header +{ + u32 version; /* Version of this structure. */ + u32 last_comp_version; /* You must be >= this to read it. */ + u32 root_offset; /* Offset to root of tree. */ + u32 num_nodes; /* Total number of nodes in tree. */ + u32 name_offset; /* Offset to (nul-terminated) names table. */ + u32 data_offset; /* Offset to data table. */ +}; /* These flags are passed in the ''flags'' field of start_info_t. */ #define SIF_PRIVILEGED (1<<0) /* Is the domain privileged? */ -- A bad analogy is like a leaky screwdriver -- Richard Braakman ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
harry@hebutterworth.freeserve.co.uk
2005-Feb-28 12:22 UTC
Re: Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
Why have two separate mechanisms when the mechanism required for hotplug can be used to build the tree from scratch? -- Whatever you Wanadoo: http://www.wanadoo.co.uk/time/ This email has been checked for most known viruses - find out more at: http://www.wanadoo.co.uk/help/id/7098.htm ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Rusty Russell
2005-Feb-28 13:02 UTC
Re: Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Mon, 2005-02-28 at 13:22 +0100, harry@hebutterworth.freeserve.co.uk wrote:> Why have two separate mechanisms when the mechanism required for > hotplug can be used to build the tree from scratch?The only difference would be the transport for hotplug events, which is infrastructure which needs to exist anyway to transport other information between domains. We''ll see when the code''s finished, but I see the code being: __init boot_devtree_setup() { devtree_add(boot_devtree_ptr); } hotplug_receive_event() { receive data if (op == ADD) devtree_add(data); else if (op == REMOVE) devtree_remove(data); } ie. the mere difference in transport should be a few lines of code. Rusty. -- A bad analogy is like a leaky screwdriver -- Richard Braakman ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
harry@hebutterworth.freeserve.co.uk
2005-Feb-28 13:08 UTC
Re: Re: Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
You are thinking about the code differences from the point of view of the code running inside the domain. What about the code outside the domain, for example, is the code required to remove a device from one domain and give it to another going to have to be different depending on whether the target domain has been booted yet?> Message date : Feb 28 2005, 01:01 PM > From : "Rusty Russell" <rusty@rustcorp.com.au> > To : harry@hebutterworth.freeserve.co.uk > Copy to : "Anthony Liguori" <aliguori@us.ibm.com>, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk>, "Jeremy Katz" <katzj@redhat.com>, "Xen Mailing List" <xen-devel@lists.sourceforge.net> > Subject : Re: Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen > > On Mon, 2005-02-28 at 13:22 +0100, harry@hebutterworth.freeserve.co.uk > wrote: > > Why have two separate mechanisms when the mechanism required for > > hotplug can be used to build the tree from scratch? > > The only difference would be the transport for hotplug events, which is > infrastructure which needs to exist anyway to transport other > information between domains. > > We''ll see when the code''s finished, but I see the code being: > > __init boot_devtree_setup() > { > devtree_add(boot_devtree_ptr); > } > > hotplug_receive_event() > { > receive data > if (op == ADD) > devtree_add(data); > else if (op == REMOVE) > devtree_remove(data); > } > > ie. the mere difference in transport should be a few lines of code. > Rusty. > -- > A bad analogy is like a leaky screwdriver -- Richard Braakman > > >-- Whatever you Wanadoo: http://www.wanadoo.co.uk/time/ This email has been checked for most known viruses - find out more at: http://www.wanadoo.co.uk/help/id/7098.htm ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Rusty Russell
2005-Feb-28 13:32 UTC
Re: Re: Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Mon, 2005-02-28 at 14:08 +0100, harry@hebutterworth.freeserve.co.uk wrote:> You are thinking about the code differences from the point of view of > the code running inside the domain. What about the code outside the > domain, for example, is the code required to remove a device from one > domain and give it to another going to have to be different depending > on whether the target domain has been booted yet?Fundamentally true, because you can''t send a hotplug event to a domain which hasn''t been booted yet. Whatever your mechanism, the this difference has to be handled at some level, and I don''t think the difference in transport makes it harder. Cheers, Rusty. -- A bad analogy is like a leaky screwdriver -- Richard Braakman ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Harry Butterworth
2005-Feb-28 13:40 UTC
Re: Re: Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
No, not fundamentally true. The registry approach decouples the generation of the hotplug events from the provider of devices such that if the domain hasn''t booted yet then the provider of devices doesn''t care: the registry will generate all the required hotplug events when the domain connects after it boots. As an aside, my previous messages were breaking threading so I changed from the web based email to Evolution. Should be OK now. On Tue, 2005-03-01 at 00:32 +1100, Rusty Russell wrote:> On Mon, 2005-02-28 at 14:08 +0100, harry@hebutterworth.freeserve.co.uk > wrote: > > You are thinking about the code differences from the point of view of > > the code running inside the domain. What about the code outside the > > domain, for example, is the code required to remove a device from one > > domain and give it to another going to have to be different depending > > on whether the target domain has been booted yet? > > Fundamentally true, because you can''t send a hotplug event to a domain > which hasn''t been booted yet. Whatever your mechanism, the this > difference has to be handled at some level, and I don''t think the > difference in transport makes it harder. > > Cheers, > Rusty.------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2005-Feb-28 14:33 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On 28 Feb 2005, at 13:40, Harry Butterworth wrote:> No, not fundamentally true. The registry approach decouples the > generation of the hotplug events from the provider of devices such that > if the domain hasn''t booted yet then the provider of devices doesn''t > care: the registry will generate all the required hotplug events when > the domain connects after it boots.Yes, this is what I envision. Cooking things into binary-encoded device trees or hotplug events could also work, and might be less code in the guest OS, but I think it''s a lerss clean design and would need more special-casing in domain0. -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Anthony Liguori
2005-Feb-28 15:46 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
I have a few questions about some of the future features: 1) What is the transport going to be for querying/updating? How are you going to handle security? 2) Are we going to use any sort of standard for storing device information? If so, what? 3) How does this change the device setup exchange? Right now, a series of control messages is exchanged so that the back-end can get notified to create the virtual device and then something has to proxy some information from the front-end (usually just a shared memory location) to that backend. All of this is still necessary right? How does a value being updated trigger an appropriate event? Regards, Anthony Liguori>Included below for your reading pleasure, >Rusty. > >------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Rusty Russell
2005-Feb-28 16:46 UTC
Re: Re: Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Mon, 2005-02-28 at 13:40 +0000, Harry Butterworth wrote:> No, not fundamentally true. The registry approach decouples the > generation of the hotplug events from the provider of devices such that > if the domain hasn''t booted yet then the provider of devices doesn''t > care: the registry will generate all the required hotplug events when > the domain connects after it boots.Good point; you are, of course, correct. Moreover, the Xen guys have convinced me that they want such a persistent store for other purposes. So ignore my code and look at an explicit Xen interface to such a store, rather than having the domain keep track of their own copy. Cheers, Rusty. -- A bad analogy is like a leaky screwdriver -- Richard Braakman ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Jeremy Katz
2005-Feb-28 20:28 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Mon, 2005-02-28 at 23:06 +1100, Rusty Russell wrote:>On Sun, 2005-02-27 at 09:25 -0600, Anthony Liguori wrote: >> Keir Fraser wrote: >> > I like the idea of bringing out device discovery, bringup, teardown, >> > recovery all into its own driver or subsystem -- it seems the obvious >> > way to go. But I think the ''device tree'' should be in the >> > to-be-designed persistent store, and we publish an interface to allow >> > guests to peek/poke that store. >> >> I think publishing domain-information in an OF-like tree would be great. > >I''m not convinced of the persistent store idea, at least for this. I >think it''s simpler to have it dropped into memory at boot (just like the >initrd image), and then later messages are sent which update it (ie. >hotlplug) which have a similar form.I tend to agree wrt the persistent store. Given the choice between complexity in a) guest OS (multiplied by N guest OS''s) or b) domain0/tools I''d rather batch my complexity up to where it only has to be written once and not every time you port an OS. Keeping things simple for the guest (like this) seems reasonable to me.>The former is implemented, but breaks when I actually test it (you can >see that code #ifdef''ed out, debugging now). I added simple routines so >that you can build a device tree and pass it to the domain builder. > >Included below for your reading pleasure,Cool. I actually like this better conceptually than what I was working on (but didn''t have working due to a flurry of meetings before, after and during LinuxWorld). I''ll play with it some here and see how easy it can tie in with the sysfs bits, which I should also just clean up separately and post an update. Jeremy ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Harry Butterworth
2005-Feb-28 22:24 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On Mon, 2005-02-28 at 15:28 -0500, Jeremy Katz wrote:> I''d rather batch my complexity up to where it only has to be written > once and not every time you port an OS. Keeping things simple for the > guest (like this) seems reasonable to me.I think you are going to have to write some OS specific code every time you port an OS anyway. The question is where it ends up. With the registry approach you have a clean protocol in domain 0 and the guest OS specific code stays in each corresponding guest OS. With the batching up the complexity approach I think there is a risk that you end up with a bundle of guest OS specific code all coupled together in the domain 0 code which seems less good to me. -- Harry Butterworth <harry@hebutterworth.freeserve.co.uk> ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Harry Butterworth
2005-Mar-01 13:15 UTC
Re: Re: Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
> So ignore my code and look at an explicit Xen interface to such a store, > rather than having the domain keep track of their own copy.I''m trying to get stuck into the USB code as not having finished it is starting to look a bit lame. I was assuming that you, Keir and Anthony were looking at this stuff. In any case, I''m not sure you want an explicit Xen interface for this. It should be sufficient to define a good IDC API for Xen and then build on top of that. The registry is just a service accessible to a domain using the IDC API. You need to solve the bootstrap issue for domain 0: two obvious alternatives: A) have a root registry inside Xen and then domain 0 can have the same interface as the other domains or B) push everything out into domain 0 in a platform specific way and then have domain 0 build a root registry out of the platform specific discovery. I was under the impression that the architectural direction was to push as much as possible out of Xen into domain 0 which is consistent with the second alternative. In a clustered system with FT domains, the code running in the FT domain would be provided with access (over the IDC API) to one registry per base domain which would give it hotplug notification about the devices accessible via that base domain. This means that the code in the FT domain would access multiple registries over the IDC API and pool the results, doing multipathing for any devices which were accessible through multiple base domains. OK, I''ll take my bunny off the boil now. The key to success is a good IDC API. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2005-Mar-01 15:08 UTC
Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
On 1 Mar 2005, at 13:15, Harry Butterworth wrote:> In any case, I''m not sure you want an explicit Xen interface for this. > It should be sufficient to define a good IDC API for Xen and then build > on top of that. > > The registry is just a service accessible to a domain using the IDC > API.I think Rusty doesn''t really mean a ''Xen'' interface. Access to the PS will build on the primitives already provided by Xen, requiring no extra modifications in the hypervisor. -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Harry Butterworth
2005-Mar-01 15:40 UTC
Re: Re: Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen
[The mail reflector seems to have dropped the original of this so here''s another copy for the mail archive] On Tue, 2005-03-01 at 03:46 +1100, Rusty Russell wrote:> So ignore my code and look at an explicit Xen interface to such a store, > rather than having the domain keep track of their own copy.I''m trying to get stuck into the USB code as not having finished it is starting to look a bit lame. I was assuming that you, Keir and Anthony were looking at this stuff. In any case, I''m not sure you want an explicit Xen interface for this. It should be sufficient to define a good IDC API for Xen and then build on top of that. The registry is just a service accessible to a domain using the IDC API. You need to solve the bootstrap issue for domain 0: two obvious alternatives: A) have a root registry inside Xen and then domain 0 can have the same interface as the other domains or B) push everything out into domain 0 in a platform specific way and then have domain 0 build a root registry out of the platform specific discovery. I was under the impression that the architectural direction was to push as much as possible out of Xen into domain 0 which is consistent with the second alternative. In a clustered system with FT domains, the code running in the FT domain would be provided with access (over the IDC API) to one registry per base domain which would give it hotplug notification about the devices accessible via that base domain. This means that the code in the FT domain would access multiple registries over the IDC API and pool the results, doing multipathing for any devices which were accessible through multiple base domains. OK, I''ll take my bunny off the boil now. The key to success is a good IDC API. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel