There are some folks here interested in using Myrinet with Xen. Applications using Myrinet link against a library that registers memory and DMAs directly to it. The library loads the memory addresses onto the card which is setting off some alarms and we''re attempting to think about exactly how XenoLinux guests will handle this situation. What memory issues might arise using this in a guest domain with privileged drivers, in non privileged domains (with one privileged domain actually using the libraries), and with multiple privileged domains simultaneously? Is it possible? I would think in the case that the one privileged domain bridging to other, non-privileged domains that full speed transfers would be impossible if there is any copying necessary. But would several privileged guests using the libraries be able to coexist? Would there be swapping and memory pinning issues? The card returns a port, one of several (I believe six). Would it be possible for six guests to each have access to one of these channels? Thanks for any input! ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Wed, 25 Aug 2004, Tim Freeman wrote:> What memory issues might arise using this in a guest domain with > privileged drivers, in non privileged domains (with one privileged > domain actually using the libraries), and with multiple privileged > domains simultaneously? Is it possible?hairy but doable is my guess. You''ll have to ask permission about where to set up DMA as you don''t want guests splatting all over memory (I assume). This says to me that DOM0 or Xen have to do it for you? In general it''d be nice to have a way to let guests virtualize hardware such as myrinet. ron ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> There are some folks here interested in using Myrinet with Xen. > Applications using Myrinet link against a library that registers memory > and DMAs directly to it. The library loads the memory addresses onto the > card which is setting off some alarms and we''re attempting to think > about exactly how XenoLinux guests will handle this situation.Presumably it''s not just a library, and there''s some trusted OS component that pins the physical pages and registers their bus addresses with the Myrinet NIC? I guess the library sets up a mmap of some of the NICs control registers, then provides a set of library functions to do open/close, send/receive/RDMA etc and provide block-until-receive functionality.> What memory issues might arise using this in a guest domain with > privileged drivers, in non privileged domains (with one privileged > domain actually using the libraries), and with multiple privileged > domains simultaneously? Is it possible?Interesting. Assuming I''m right about there being a trusted OS component that deals with the creation/deletion of memory apertures, you''d want exactly one of these, running in e.g. domain 0. You''d then need to create a way of virtualising this functionality to other domains, so their OS component doesn''t talk to the card directly but talks to the controlling domain that will then interact with Xen''s mmu to check the pages actually belong to the domain, and then pin them and register them with the NIC. This is going to require a little coding, but shouldn''t be too hard. It''s quite an interesting problem, so I''d be happy to help with the design. It''s something we''ll have to do for inifiniband anyhow.> I would think in the case that the one privileged domain bridging to > other, non-privileged domains that full speed transfers would be > impossible if there is any copying necessary.I presume it''s also possible to use the Myrinet card as a plain ehernet/ip interface with a suitable kernel driver (yes, I know this sucks)? You could use this to enable the privileged domain to at least provide network connectivity to other domains using the normal netback/netfront drivers. (the privileged domain could also use the normal library directly). This would be zero-copy into the domain, but the normal OS stack would usually end up copying things into the application socket buffer.> But would several privileged guests using the libraries be able to > coexist? Would there be swapping and memory pinning issues? The card > returns a port, one of several (I believe six). Would it be possible > for six guests to each have access to one of these channels?Only six ports? That''s a bit lame. I''d like to see the memory mapped communication extended right down into user-space applications in multiple domains, but six doesn''t give a whole lot of flexibility... Ian ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
What exactly is the "port" thing that the card returns? Is it the channel that the client uses to talk to the card (with the number of clients therefore being limited by the number of ports)? Having multiple domains with hardware privileges to configure the card would not work. The proper way to do things would be as Ian described with one privileged kernel agent in dom0 that configures the hardware. If you really need a workaround, you could always put one card per domain into the machine and give them each privileges to access their respective card. Obviously, this is overkill ;-) but it should be a quick fix. Note that if domains have hardware privileges then you shouldn''t give them to any potentially malicious users! HTH, Mark ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Wed, 25 Aug 2004 21:27:25 +0100 Ian Pratt <Ian.Pratt@cl.cam.ac.uk> wrote:> > There are some folks here interested in using Myrinet with Xen. > > Applications using Myrinet link against a library that registers memory > > and DMAs directly to it. The library loads the memory addresses onto the > > card which is setting off some alarms and we''re attempting to think > > about exactly how XenoLinux guests will handle this situation. > > Presumably it''s not just a library, and there''s some trusted OS > component that pins the physical pages and registers their bus > addresses with the Myrinet NIC? > > I guess the library sets up a mmap of some of the NICs control > registers, then provides a set of library functions to do > open/close, send/receive/RDMA etc and provide block-until-receive > functionality.Yes, as far as I understand. There is an OS component that pins the pages and from then on the API can interact directly with the NIC. There is also a processor and firmware on the NIC itself, the library on the host is pretty lightweight in my understanding. I think the library does share a specific not-for-data memory range with the NIC and that is how it can control it directly. I wonder what effects that the control library getting less processor time (because it is in a VMM) will have. It is userspace so I assume it is coded to expect unpredictable scheduling..> > > What memory issues might arise using this in a guest domain with > > privileged drivers, in non privileged domains (with one privileged > > domain actually using the libraries), and with multiple privileged > > domains simultaneously? Is it possible? > > Interesting. > > Assuming I''m right about there being a trusted OS component that > deals with the creation/deletion of memory apertures, you''d want > exactly one of these, running in e.g. domain 0. You''d then need > to create a way of virtualising this functionality to other > domains, so their OS component doesn''t talk to the card directly > but talks to the controlling domain that will then interact with > Xen''s mmu to check the pages actually belong to the domain, and > then pin them and register them with the NIC. > > This is going to require a little coding, but shouldn''t be too > hard. It''s quite an interesting problem, so I''d be happy to help > with the design. It''s something we''ll have to do for inifiniband > anyhow.It sounds like only the OS component will need to be modified to pass the negotiation on to domain0 to pin the memory. Once done, the library can interact with this shared memory without modification to the library I presume (i.e., no virtualized address issues, right?). And the NIC will also be allowed to DMA to the memory at any time. So the coding involved would be to create an idealized interface for guest domains specifically for Myrinet (or would it be best to create on for a certain ''class'' of devices like the I/O paper discusses)? The modifying-the-OS-interface-only approach only allows for six guests, I think, this is what you mean saying the six ports won''t provide much flexibility, right? (to answer Mark''s question, yes) I believe each library thread expects to totally control each channel via shared memory. To multiplex this without modifying the card or library code sounds complicated to me, especially because we want zero-copy straight to the application (not through the guest OS''s buffers) to attain the desired speeds. How would that even work? Control messages would be intercepted, interpreted, and passed to the card channel''s real control registers and for the data specific subregions of the channel;s memory allocation are mapped to separate domains? yikes. Do your plans for infiniband allow 100s of guests to each have high speed networking? How much might the performance degrade?> > I would think in the case that the one privileged domain bridging to > > other, non-privileged domains that full speed transfers would be > > impossible if there is any copying necessary. > > I presume it''s also possible to use the Myrinet card as a plain > ehernet/ip interface with a suitable kernel driver (yes, I know > this sucks)? > > You could use this to enable the privileged domain to at least > provide network connectivity to other domains using the normal > netback/netfront drivers. (the privileged domain could also use > the normal library directly). This would be zero-copy into the > domain, but the normal OS stack would usually end up copying > things into the application socket buffer.Well, this ''sucks'' but it is still cool to do it this way because the domains would probably still have better than plain ethernet performance and weird OSs could take advantage of highER performance networking. If I''m thinking about this correctly, it sounds like all of these domains'' traffic could be put onto one Myrinet channel and five special domains could truly take advantage of Myrinet? Is that feasible? This would rule out every domain being able to use the special Myrinet message passing protocols but there could be some interesting mixed latency MPI simulations. I think they might be interested in this anyhow. This is all a very nascent interest, I think we''re just trying to grasp the issues. I''m sorry I don''t know more about the guts of how Myrinet works. I appreciate everyone''s responses, thankyou!> > > But would several privileged guests using the libraries be able to > > coexist? Would there be swapping and memory pinning issues? The card > > returns a port, one of several (I believe six). Would it be possible > > for six guests to each have access to one of these channels? > > Only six ports? That''s a bit lame. I''d like to see the memory > mapped communication extended right down into user-space > applications in multiple domains, but six doesn''t give a whole > lot of flexibility...> > Ian > > > ------------------------------------------------------- > SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media > 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 > Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. > http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >-- ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
IMO, this is roughly what would need to be done: Direct data path: The OS component would have to be modified so that in dom0 it would perform the usual tasks of pinning memory AND talking to the hardware but in unprivileged domains it would pin memory itself and then request that dom0 set up the hardware. This is control path, not data path so the indirection shouldn''t hurt performance - guest applications can talk to the hardware directly. It may be possible to use an existing library as-is, I''m not sure. Writing the code to do this should be quite tractable for someone with the appropriate experience. I''d imagine that user applications would receive similar performance to in non-virtualised configurations, with the qualification that if you run lots of domains on one CPU, they will obviously tend to experience less CPU time and higher latency anyway. This approach limits you to no more clients than you have channels. Multiplexed data path: Multiplexing multiple guests onto single a channel seems a bit more difficult. Perhaps it could be done with modifications to allow dom0 to control the channel, with other domains requesting data path as well as control path operations from it. This could still give zero copy into guest applications but there might be some performance hit in latency due to the extra level of indirection, although suitable pipelining may provide good bandwith (as for the existing net and block drivers). This would be more work to implement than direct data path. I guess there''s also the possibility that your next interface might have lots of channels, making such multiplexing less important...> Do your plans for infiniband allow 100s of guests to each have high speed > networking? How much might the performance degrade?Simply having plenty of channels on the host interface card would be more straightforward than sharing them, see the above comment for the direct data path. I don''t personally know what is planned regarding infiniband support, though.> If I''m thinking about this correctly, it sounds like all of these domains'' > traffic could be put onto one Myrinet channel and five special domains > could truly take advantage of Myrinet?As for the issue of multiplexing some domains onto an ethernet-type interface and having some privileged domains also accessing the card directly, yes this sounds plausible in the first scenario described above (control-path multiplexing with direct data-path). Just my $0.02 Mark ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Thankyou for your comments, they are very helpful. Your reply makes it sound like some of these options could be feasible which means I will put everyone''s comments to the folks with the cluster in question. Thankyou for everyone''s help! On Fri, 27 Aug 2004 16:14:58 +0100 Mark Williamson <Mark.Williamson@cl.cam.ac.uk> wrote:> IMO, this is roughly what would need to be done: > > Direct data path: > The OS component would have to be modified so that in dom0 it would perform > the usual tasks of pinning memory AND talking to the hardware but in > unprivileged domains it would pin memory itself and then request that dom0 set > up the hardware. This is control path, not data path so the indirection > shouldn''t hurt performance - guest applications can talk to the hardware > directly. > > It may be possible to use an existing library as-is, I''m not sure. > > Writing the code to do this should be quite tractable for someone with the > appropriate experience. I''d imagine that user applications would receive > similar performance to in non-virtualised configurations, with the > qualification that if you run lots of domains on one CPU, they will obviously > tend to experience less CPU time and higher latency anyway. > > This approach limits you to no more clients than you have channels. > > Multiplexed data path: > Multiplexing multiple guests onto single a channel seems a bit more difficult. > Perhaps it could be done with modifications to allow dom0 to control the > channel, with other domains requesting data path as well as control path > operations from it. This could still give zero copy into guest applications > but there might be some performance hit in latency due to the extra level of > indirection, although suitable pipelining may provide good bandwith (as for > the existing net and block drivers). > > This would be more work to implement than direct data path. I guess there''s > also the possibility that your next interface might have lots of channels, > making such multiplexing less important... > > > Do your plans for infiniband allow 100s of guests to each have high speed > > networking? How much might the performance degrade? > > Simply having plenty of channels on the host interface card would be more > straightforward than sharing them, see the above comment for the direct data > path. > > I don''t personally know what is planned regarding infiniband support, though. > > > If I''m thinking about this correctly, it sounds like all of these domains'' > > traffic could be put onto one Myrinet channel and five special domains > > could truly take advantage of Myrinet? > > As for the issue of multiplexing some domains onto an ethernet-type interface > and having some privileged domains also accessing the card directly, yes this > sounds plausible in the first scenario described above (control-path > multiplexing with direct data-path). > > Just my $0.02 > Mark >-- ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel