Hi, Is there any information... any information at all... on the proposed driver API for crossbow and how this may link to Xen. I''m working on a Nemo driver for a virtualizable NIC and I have no idea how this may link in with crossbow: are there new entry points for adding/removing packet classification entries? does the driver get notification of vnic creation so that it may allocate the necessary h/w resources? I don''t really want to come up with my own vnic architecture but without any info. on crossbow (which is supposed to be an *Open*Solaris project) I will have little choice but to do so. Paul -- Paul Durrant http://www.linkedin.com/in/pdurrant
* pdurrant at gmail.com [2006-09-22 13:33:00]> Is there any information... any information at all... on the > proposed driver API for crossbow and how this may link to Xen. > I''m working on a Nemo driver for a virtualizable NIC and I have no > idea how this may link in with crossbow: are there new entry points > for adding/removing packet classification entries? does the driver get > notification of vnic creation so that it may allocate the necessary > h/w resources?There are several parts to this, some of which are further advanced than others. The Crossbow vnic currently implemented provides support for virtualising a NIC within a single OS instance (e.g. dom0 in Xen). This includes assigning MAC addresses, rx rings, etc. to vnics. Exactly how far along this work has got and how much has detailed specifications is better answered by someone more closely involved in the Crossbow project than me. So far there is no written design for an API to support mapping the hardware resources of a "virtualisable" NIC directly into the address space of a (Xen) guest domain. Mike Speer and I discussed this last week and agreed that we would work on a proposal "real soon now". Generally speaking, the intention is to extend the resource declarations currently made by mac drivers to allow them to indicate their capability in this respect. This would be used by a combination of the vnic infrastructure, the Xen network backend driver[1] and the Xen network frontend driver to map the relevant regions into the guest domain and rewire the interrupt(s). We''d need a hardware specific plugin for the frontend driver, for which Michael made the excellent suggesting of re-using the existing mac interface. Footnotes: [1] A new one that sits on top of the mac interface - not the current Nemo driver. dme. -- David Edmondson, Sun Microsystems, http://www.dme.org
* dme at sun.com [2006-09-22 14:23:37]> So far there is no written design for an API to support mapping the > hardware resources of a "virtualisable" NIC directly into the address > space of a (Xen) guest domain.I should add that I''m aware of the work recently made available by Solarflare on the xen-devel list and will be looking closely at it, with a hope that we can agree on the inter-domain protocol components of the work. dme. -- David Edmondson, Sun Microsystems, http://www.dme.org
Paul Durrant wrote On 09/22/06 05:33,:> Hi, > > Is there any information... any information at all... on the > proposed driver API for crossbow and how this may link to Xen.(Dave commented on the Xen interaction)> I''m working on a Nemo driver for a virtualizable NIC and I have no > idea how this may link in with crossbow: are there new entry points > for adding/removing packet classification entries? does the driver getyes, all those entry points are there, and we''re getting them out in source code along with some documentation as soon as we legally can. About creation of VNICs, it is already possible with any Nemo compliant driver, without need to change the driver. However it won''t be using the h/w classification capabilities. That part needs the Crossbow extentions> notification of vnic creation so that it may allocate the necessary > h/w resources?not quite. at least not in the current approach. The virtualization happens at the MAC layer. The driver is oblivious to where its resources are assigned to.> I don''t really want to come up with my own vnic architecture but > without any info. on crossbow (which is supposed to be an > *Open*Solaris project) I will have little choice but to do so.it is a project. it''s in the making. I think we probably reached the point where we can start inviting members of the OpenSolaris community to our design discussions. We usually hold those on Tuesdays afternoon. Kais.> > Paul >
Paul,> * pdurrant at gmail.com [2006-09-22 13:33:00] > > Is there any information... any information at all... on the > > proposed driver API for crossbow and how this may link to Xen. > > I''m working on a Nemo driver for a virtualizable NIC and I have no > > idea how this may link in with crossbow: are there new entry points > > for adding/removing packet classification entries? does the driver get > > notification of vnic creation so that it may allocate the necessary > > h/w resources?Kais is owrking on defining the APIs. He should be able to send the draft out very soon. Also Nicolas and Carol are working through the process to get the Crossbow gate src out to opensolaris. It will make a lot more sense when the code and draft is all out. But in short there will be entry point for adding removing classification rules and stack will tell the driver which Rx rings etc to assign to each rule. The idea is that during registering the NIC, Nemo finds out about all the classification capabilities and Rx and Tx rings for the H/W. Whenever a VNIC is created, Nemo will decide based on various criterions/policy whether to run it out of H/W (by setting a rule and assign an Rx ring) or use software classification to make it work. Dave gave some details below. I''ll try to clarify some more> There are several parts to this, some of which are further advanced > than others. > > The Crossbow vnic currently implemented provides support for > virtualising a NIC within a single OS instance (e.g. dom0 in Xen). > This includes assigning MAC addresses, rx rings, etc. to vnics. > > Exactly how far along this work has got and how much has detailed > specifications is better answered by someone more closely involved in > the Crossbow project than me.Its coming soon.> So far there is no written design for an API to support mapping the > hardware resources of a "virtualisable" NIC directly into the address > space of a (Xen) guest domain. > > Mike Speer and I discussed this last week and agreed that we would > work on a proposal "real soon now".That would be nice.> Generally speaking, the intention is to extend the resource > declarations currently made by mac drivers to allow them to indicate > their capability in this respect. This would be used by a combination > of the vnic infrastructure, the Xen network backend driver[1] and the > Xen network frontend driver to map the relevant regions into the guest > domain and rewire the interrupt(s). > > We''d need a hardware specific plugin for the frontend driver, for > which Michael made the excellent suggesting of re-using the existing > mac interface.Yes, you will probably want to use most of the Crossbow/MAC framework on how to map the relevent region to the domU. Also the NIC capability in this direction needs to be communicated upfront. Cheers, Sunay> > Footnotes: > [1] A new one that sits on top of the mac interface - not the current > Nemo driver. > > dme. > -- > David Edmondson, Sun Microsystems, http://www.dme.org > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://opensolaris.org/mailman/listinfo/crossbow-discuss >-- Sunay Tripathi Sr. Staff Engineer Solaris Core Networking Technologies Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
On Sep 23, 2006, at 9:50 AM, Sunay Tripathi wrote:> It will make a lot > more sense when the code and draft is all out. But in short there will > be entry point for adding removing classification rules and stack will > tell the driver which Rx rings etc to assign to each rule. The idea > is that during registering the NIC, Nemo finds out about all the > classification > capabilities and Rx and Tx rings for the H/W. Whenever a VNIC is > created, > Nemo will decide based on various criterions/policy whether to run > it out of > H/W (by setting a rule and assign an Rx ring) or use software > classification > to make it work.I''m also currently working on the architecture document for VNICs which will provide more information on how they take advantage of the capabilities provided by the underlying NIC hardware. First draft coming to this alias soon... Nicolas.
On 9/23/06, Sunay Tripathi <Sunay.Tripathi at eng.sun.com> wrote:> > Kais is owrking on defining the APIs. He should be able to send the draft > out very soon. Also Nicolas and Carol are working through the process > to get the Crossbow gate src out to opensolaris. It will make a lot > more sense when the code and draft is all out. But in short there will > be entry point for adding removing classification rules and stack will > tell the driver which Rx rings etc to assign to each rule. The idea > is that during registering the NIC, Nemo finds out about all the classification > capabilities and Rx and Tx rings for the H/W. Whenever a VNIC is created, > Nemo will decide based on various criterions/policy whether to run it out of > H/W (by setting a rule and assign an Rx ring) or use software classification > to make it work. >Thanks Sunay. This all sounds plausible but would it perhaps be more flexible to consider a scheme where crossbow ''allocates'' resources from the h/w. E.g. it asks for new RX and TX queues from the driver. If successful it then asks the driver to re-direct traffic matching a certain 5-tuple/3-tuple to it. If it fails however then crossbow can fall back to doing the virtualisation in s/w alone. The reason I ask is that having a scheme where the h/w driver tells crossbow how may resources it has up front essentially means those resources must be reserved for crossbow''s use. This is wasteful if they are never used and will impact on other clients such as a user-level TCP/IP stack. Paul -- Paul Durrant http://www.linkedin.com/in/pdurrant
> On 9/23/06, Sunay Tripathi <Sunay.Tripathi at eng.sun.com> wrote: > > > > Kais is owrking on defining the APIs. He should be able to send the draft > > out very soon. Also Nicolas and Carol are working through the process > > to get the Crossbow gate src out to opensolaris. It will make a lot > > more sense when the code and draft is all out. But in short there will > > be entry point for adding removing classification rules and stack will > > tell the driver which Rx rings etc to assign to each rule. The idea > > is that during registering the NIC, Nemo finds out about all the classification > > capabilities and Rx and Tx rings for the H/W. Whenever a VNIC is created, > > Nemo will decide based on various criterions/policy whether to run it out of > > H/W (by setting a rule and assign an Rx ring) or use software classification > > to make it work. > > > > Thanks Sunay. This all sounds plausible but would it perhaps be more > flexible to consider a scheme where crossbow ''allocates'' resources > from the h/w. E.g. it asks for new RX and TX queues from the driver. > If successful it then asks the driver to re-direct traffic matching a > certain 5-tuple/3-tuple to it. If it fails however then crossbow can > fall back to doing the virtualisation in s/w alone.The goals are bigger than just being able to use the Rx rings etc. We anticipate that Rx rings will always be in short supply so the stack wants to know whats available in order to use them best. An interesting scenario is with some support from intrd2.0, observing which traffic type (VNIC or flow) is seeing heavy traffic and moving them in/out of H/W based on need. If the H/W resources can disappear underneath us, it will get tricky.> The reason I ask is that having a scheme where the h/w driver tells > crossbow how may resources it has up front essentially means those > resources must be reserved for crossbow''s use. This is wasteful if > they are never used and will impact on other clients such as a > user-level TCP/IP stack.Interesting. Our assumption was that NIC (and all its resources) are owned by GLDv3. We will have direct APIs to the GLD driver to set classification rules and request (or show preference for) H/W Rx rings. So why can''t userland TCP and kernel stack live together both requesting what they want and giving their preference that they would prefer H/W support and let GLDv3 manage it the best it can. We can potentially add a flag to the API that if preference is stated for the Rx ring and none is available, then creation of new VNIC or flow will fail. This will give control to the sysadmin to choose which things he wants to run out of the H/W. Do you need something more than this? Cheers, Sunay> > Paul > > -- > Paul Durrant > http://www.linkedin.com/in/pdurrant >-- Sunay Tripathi Sr. Staff Engineer Solaris Core Networking Technologies Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
On 9/25/06, Sunay Tripathi <Sunay.Tripathi at eng.sun.com> wrote:> The goals are bigger than just being able to use the Rx rings etc. > We anticipate that Rx rings will always be in short supply so > the stack wants to know whats available in order to use them best. > An interesting scenario is with some support from intrd2.0, observing > which traffic type (VNIC or flow) is seeing heavy traffic and moving > them in/out of H/W based on need. If the H/W resources can disappear > underneath us, it will get tricky.We''ll have about 4k RX ''rings'' available in h/w so I don''t think there should be too much of a supply issue unless under very heavy load. Moving heavily loaded connections to a separate ring sounds like a good plan but, if you''re anticipating a shortage of supply, then presumably the code must be able to fail gracefully if no h/w resource is available and simply not perform the migration.> > > The reason I ask is that having a scheme where the h/w driver tells > > crossbow how may resources it has up front essentially means those > > resources must be reserved for crossbow''s use. This is wasteful if > > they are never used and will impact on other clients such as a > > user-level TCP/IP stack. > > Interesting. Our assumption was that NIC (and all its resources) are > owned by GLDv3. We will have direct APIs to the GLD driver to set > classification rules and request (or show preference for) H/W Rx > rings. So why can''t userland TCP and kernel stack live together both > requesting what they want and giving their preference that they would > prefer H/W support and let GLDv3 manage it the best it can.It''s fine for GLD to assume free access to the h/w providing it can accept the occasional failure. A userlevel stack essentially needs at least one vnic per process to ensure security so a system can hog a fair few of them depending upon what it''s doing.> > We can potentially add a flag to the API that if preference is stated > for the Rx ring and none is available, then creation of new VNIC or > flow will fail. This will give control to the sysadmin to choose > which things he wants to run out of the H/W. > > Do you need something more than this? >No, that sounds reasonable: a driver interface that simpy requests new resources from the h/w when the client needs them will be most flexible. Declaring resources up-front in this model should not be necessary. I''d envisage something along the lines of: m_vnic_create/destroy entry point - to request new vnic creation/destruction (creation should cause the driver to register the interface to the vnic using the normal mac_register() call). m_vnic_steer entry point - to be passed a description of a flow or flows that should be steered to a particular vnic mac_open() could then be used to access the vnic (or perhaps vmac?) as if it were just another piece of h/w. Paul -- Paul Durrant http://www.linkedin.com/in/pdurrant
Paul, On Sep 27, 2006, at 5:45 AM, Paul Durrant wrote:> m_vnic_create/destroy entry point - to request new vnic > creation/destruction (creation should cause the driver to register the > interface to the vnic using the normal mac_register() call).Do you require a kernel API in this case? Currently VNIC creation/ deletion is driven by a new libvnicadm library. This library is used by dladm(1M) and we could envision exporting it as a public interface as well. Note that the VNIC creation already triggers a mac_register () today, more on that below.> m_vnic_steer entry point - to be passed a description of a flow or > flows that should be steered to a particular vnicTraffic is steered to the VNICs according to the MAC address associated with the VNIC. The MAC address is basically what characterizes the main flow of the VNIC [1]. Within a VNIC, or any other data-link for that matter, we will allow sub-flows to be defined. The plan is to allow separate rings to be associated with these sub-flows, if hardware resources are available. When a sub-flow is created, it can then be associated with a callback routine which will be invoked to process traffic matching that sub-flow. Would this work for you?> > mac_open() could then be used to access the vnic (or perhaps vmac?) as > if it were just another piece of h/w.That''s already the case today. The VNIC driver registers a separate MAC for each created VNIC. To the rest of the system, VNICs look and behave like regular NICs, and can be opened through mac_open() of course. Thanks, Nicolas. [1] we are also considering allowing the administrator to have multiple VNICs share a MAC address of the underlying NIC, in this case the IP address will be part of what characterizes the main flow of a VNIC. -- Nicolas Droux, Solaris Kernel Networking Sun Microsystems, Inc. http://blogs.sun.com/droux
> On 9/25/06, Sunay Tripathi <Sunay.Tripathi at eng.sun.com> wrote: > > The goals are bigger than just being able to use the Rx rings etc. > > We anticipate that Rx rings will always be in short supply so > > the stack wants to know whats available in order to use them best. > > An interesting scenario is with some support from intrd2.0, observing > > which traffic type (VNIC or flow) is seeing heavy traffic and moving > > them in/out of H/W based on need. If the H/W resources can disappear > > underneath us, it will get tricky. > > We''ll have about 4k RX ''rings'' available in h/w so I don''t think there > should be too much of a supply issue unless under very heavy load. > Moving heavily loaded connections to a separate ring sounds like a > good plan but, if you''re anticipating a shortage of supply, then > presumably the code must be able to fail gracefully if no h/w resource > is available and simply not perform the migration.I think it would be worthwhile to make a distinction between connections and flows. For userland TCP, what you are dealing with is connections (5 tuples specified) and for the time being we are dealing with more wider definitions for flow (dst MAC or IP address, VLAN tag, tansport, or dst port). Multiple connections will map into a flow. For userland TCP, you shouldn''t need to create a VNIC. All you want to do is map a connection to a Rx ring and you are done. Crossbow will allow you to do that without needing to create VNICs. As for resources and graceful failover, your suggestions is correct except we want to enforce it at the top of the GLDv3 layer (with a possibility of a mac interface). The mac owns the resources. Both dladm/netrcm will trigger creation of VNICs or flow specific policies at the mac layer (pseudo H/W layer) and can state their preference whether they want to run out of H/W i.e. use a H/W Rx ring. I think userland TCp should become a client of the same interface and request specific TCP connections to be assigned to the H/W Rx ring. Its your choice to decide you wan to fail in case no Rx rings are available or let the S/W classifier and soft ring deal with it.> > > The reason I ask is that having a scheme where the h/w driver tells > > > crossbow how may resources it has up front essentially means those > > > resources must be reserved for crossbow''s use. This is wasteful if > > > they are never used and will impact on other clients such as a > > > user-level TCP/IP stack. > > > > Interesting. Our assumption was that NIC (and all its resources) are > > owned by GLDv3. We will have direct APIs to the GLD driver to set > > classification rules and request (or show preference for) H/W Rx > > rings. So why can''t userland TCP and kernel stack live together both > > requesting what they want and giving their preference that they would > > prefer H/W support and let GLDv3 manage it the best it can. > > It''s fine for GLD to assume free access to the h/w providing it can > accept the occasional failure. A userlevel stack essentially needs at > least one vnic per process to ensure security so a system can hog a > fair few of them depending upon what it''s doing.Why do you want to create a VNIC per process? How does it offer more security? I think you achieve the same thing as long as the underlying Rx rings etc are per process or per connection.> > We can potentially add a flag to the API that if preference is stated > > for the Rx ring and none is available, then creation of new VNIC or > > flow will fail. This will give control to the sysadmin to choose > > which things he wants to run out of the H/W. > > > > Do you need something more than this? > > > > No, that sounds reasonable: a driver interface that simpy requests new > resources from the h/w when the client needs them will be most > flexible. Declaring resources up-front in this model should not be > necessary. I''d envisage something along the lines of: > > m_vnic_create/destroy entry point - to request new vnic > creation/destruction (creation should cause the driver to register the > interface to the vnic using the normal mac_register() call). > m_vnic_steer entry point - to be passed a description of a flow or > flows that should be steered to a particular vnic > > mac_open() could then be used to access the vnic (or perhaps vmac?) as > if it were just another piece of h/w.The interfaces will be along the lines. The only difference is that mac will own the resources and both dladm/netrcm and userland TCP libraries will have to ask the mac to get the resources assigned. I don''t think the smarts need to be in the driver on how to assign the hardware resources. Cheers, Sunay> > Paul > > -- > Paul Durrant > http://www.linkedin.com/in/pdurrant >-- Sunay Tripathi Sr. Staff Engineer Solaris Core Networking Technologies Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
On 9/27/06, Nicolas Droux <Nicolas.Droux at sun.com> wrote:> Paul, > > On Sep 27, 2006, at 5:45 AM, Paul Durrant wrote: > > > m_vnic_create/destroy entry point - to request new vnic > > creation/destruction (creation should cause the driver to register the > > interface to the vnic using the normal mac_register() call). > > Do you require a kernel API in this case?Personally not from a client point of view, although I guess Xen required some sort of kernel API to set up a new vnic and then map it into a domU.> Currently VNIC creation/ > deletion is driven by a new libvnicadm library. This library is used > by dladm(1M) and we could envision exporting it as a public interface > as well. Note that the VNIC creation already triggers a mac_register > () today, more on that below. > > > m_vnic_steer entry point - to be passed a description of a flow or > > flows that should be steered to a particular vnic > > Traffic is steered to the VNICs according to the MAC address > associated with the VNIC. The MAC address is basically what > characterizes the main flow of the VNIC [1]. Within a VNIC, or any > other data-link for that matter, we will allow sub-flows to be > defined. The plan is to allow separate rings to be associated with > these sub-flows, if hardware resources are available. When a sub-flow > is created, it can then be associated with a callback routine which > will be invoked to process traffic matching that sub-flow. Would this > work for you?Yes. Our h/w will not steer traffic based on MAC address - our point of view is that the MAC is associated with the physical port rather than any virtualization thereof. We steer traffic to multiple receive queues based only on IP header information.> > > > > mac_open() could then be used to access the vnic (or perhaps vmac?) as > > if it were just another piece of h/w. > > That''s already the case today. The VNIC driver registers a separate > MAC for each created VNIC. To the rest of the system, VNICs look and > behave like regular NICs, and can be opened through mac_open() of > course. >Cool. Sounds good. Paul> Thanks, > Nicolas. > > [1] we are also considering allowing the administrator to have > multiple VNICs share a MAC address of the underlying NIC, in this > case the IP address will be part of what characterizes the main flow > of a VNIC. > > -- > Nicolas Droux, Solaris Kernel Networking > Sun Microsystems, Inc. http://blogs.sun.com/droux > > > >-- Paul Durrant http://www.linkedin.com/in/pdurrant
On 9/27/06, Sunay Tripathi <Sunay.Tripathi at eng.sun.com> wrote:> > I think it would be worthwhile to make a distinction between connections > and flows. For userland TCP, what you are dealing with is connections > (5 tuples specified) and for the time being we are dealing with more > wider definitions for flow (dst MAC or IP address, VLAN tag, tansport, > or dst port). Multiple connections will map into a flow. >Ok - I''d assumed that you were using a narrower definition of flow based on a 5 or 3-tuple rather than any layer 2 information.> For userland TCP, you shouldn''t need to create a VNIC. All you want to do > is map a connection to a Rx ring and you are done. Crossbow will allow > you to do that without needing to create VNICs. >Indeed, although we may map several connections to the same RX queue. For Xen though, we most definitely need to operate a VNIC.> As for resources and graceful failover, your suggestions is correct except > we want to enforce it at the top of the GLDv3 layer (with a possibility > of a mac interface). The mac owns the resources. Both dladm/netrcm will > trigger creation of VNICs or flow specific policies at the mac layer > (pseudo H/W layer) and can state their preference whether they want to > run out of H/W i.e. use a H/W Rx ring. I think userland TCp should > become a client of the same interface and request specific TCP connections > to be assigned to the H/W Rx ring. Its your choice to decide you wan to > fail in case no Rx rings are available or let the S/W classifier and > soft ring deal with it. >It would be nice if we could make our user-level stack talk to the driver through the same interface as Xen and Crossbow - that''s one of my aims and that''s why I want to see more API details; to determine whether that is possible.> > > > The reason I ask is that having a scheme where the h/w driver tells > > > > crossbow how may resources it has up front essentially means those > > > > resources must be reserved for crossbow''s use. This is wasteful if > > > > they are never used and will impact on other clients such as a > > > > user-level TCP/IP stack. > > > > > > Interesting. Our assumption was that NIC (and all its resources) are > > > owned by GLDv3. We will have direct APIs to the GLD driver to set > > > classification rules and request (or show preference for) H/W Rx > > > rings. So why can''t userland TCP and kernel stack live together both > > > requesting what they want and giving their preference that they would > > > prefer H/W support and let GLDv3 manage it the best it can. > > > > It''s fine for GLD to assume free access to the h/w providing it can > > accept the occasional failure. A userlevel stack essentially needs at > > least one vnic per process to ensure security so a system can hog a > > fair few of them depending upon what it''s doing. > > Why do you want to create a VNIC per process? How does it offer more > security? I think you achieve the same thing as long as the underlying > Rx rings etc are per process or per connection. >The main issue is denial of service. As you say we essentially need to firewall RX queues from each other, binding them to a process - but that''s essentially what I meant by VNIC.> > > We can potentially add a flag to the API that if preference is stated > > > for the Rx ring and none is available, then creation of new VNIC or > > > flow will fail. This will give control to the sysadmin to choose > > > which things he wants to run out of the H/W. > > > > > > Do you need something more than this? > > > > > > > No, that sounds reasonable: a driver interface that simpy requests new > > resources from the h/w when the client needs them will be most > > flexible. Declaring resources up-front in this model should not be > > necessary. I''d envisage something along the lines of: > > > > m_vnic_create/destroy entry point - to request new vnic > > creation/destruction (creation should cause the driver to register the > > interface to the vnic using the normal mac_register() call). > > m_vnic_steer entry point - to be passed a description of a flow or > > flows that should be steered to a particular vnic > > > > mac_open() could then be used to access the vnic (or perhaps vmac?) as > > if it were just another piece of h/w. > > The interfaces will be along the lines. The only difference is that > mac will own the resources and both dladm/netrcm and userland TCP libraries > will have to ask the mac to get the resources assigned. I don''t think the > smarts need to be in the driver on how to assign the hardware resources. >Ok - I guess that will become clearer when the API is published. My concern about publishing all resources to the mac layer in one go and it expecting them to all be available is because, e.g., if I need to populate 4000 RX queues with say 256 9k buffers each, that''s rather a lot of DMA memory I''ve just swallowed. If, on the other hand, the mac layer specifically requests a resource to be enabled/disabled then that''s ok. Paul -- Paul Durrant http://www.linkedin.com/in/pdurrant
Darren.Reed at Sun.COM
2006-Sep-28 20:43 UTC
[crossbow-discuss] Re: driver API information
Paul Durrant wrote:> On 9/27/06, Nicolas Droux <Nicolas.Droux at sun.com> wrote: > >> Paul, >> >> On Sep 27, 2006, at 5:45 AM, Paul Durrant wrote: >> >> > m_vnic_create/destroy entry point - to request new vnic >> > creation/destruction (creation should cause the driver to register the >> > interface to the vnic using the normal mac_register() call). >> >> Do you require a kernel API in this case? > > > Personally not from a client point of view, although I guess Xen > required some sort of kernel API to set up a new vnic and then map it > into a domU. > >> Currently VNIC creation/ >> deletion is driven by a new libvnicadm library. This library is used >> by dladm(1M) and we could envision exporting it as a public interface >> as well. Note that the VNIC creation already triggers a mac_register >> () today, more on that below. >> >> > m_vnic_steer entry point - to be passed a description of a flow or >> > flows that should be steered to a particular vnic >> >> Traffic is steered to the VNICs according to the MAC address >> associated with the VNIC. The MAC address is basically what >> characterizes the main flow of the VNIC [1]. Within a VNIC, or any >> other data-link for that matter, we will allow sub-flows to be >> defined. The plan is to allow separate rings to be associated with >> these sub-flows, if hardware resources are available. When a sub-flow >> is created, it can then be associated with a callback routine which >> will be invoked to process traffic matching that sub-flow. Would this >> work for you? > > > Yes. Our h/w will not steer traffic based on MAC address - our point > of view is that the MAC is associated with the physical port rather > than any virtualization thereof. We steer traffic to multiple receive > queues based only on IP header information.Can I ask what IPv4/v6 header fields you plan on allowing demultiplexing to occur? Do you have any plans to include any UDP/TCP header fields? Darren
On Sep 28, 2006, at 1:02 PM, Paul Durrant wrote:> On 9/27/06, Nicolas Droux <Nicolas.Droux at sun.com> wrote: >> Paul, >> >> On Sep 27, 2006, at 5:45 AM, Paul Durrant wrote: >> >> > m_vnic_create/destroy entry point - to request new vnic >> > creation/destruction (creation should cause the driver to >> register the >> > interface to the vnic using the normal mac_register() call). >> >> Do you require a kernel API in this case? > > Personally not from a client point of view, although I guess Xen > required some sort of kernel API to set up a new vnic and then map it > into a domU.The current plan is to create the VNICs via libvnicadm from the Xen admin interface, no dedicated kernel interface will be used for that operation directly from Xen.> >> Currently VNIC creation/ >> deletion is driven by a new libvnicadm library. This library is used >> by dladm(1M) and we could envision exporting it as a public interface >> as well. Note that the VNIC creation already triggers a mac_register >> () today, more on that below. >> >> > m_vnic_steer entry point - to be passed a description of a flow or >> > flows that should be steered to a particular vnic >> >> Traffic is steered to the VNICs according to the MAC address >> associated with the VNIC. The MAC address is basically what >> characterizes the main flow of the VNIC [1]. Within a VNIC, or any >> other data-link for that matter, we will allow sub-flows to be >> defined. The plan is to allow separate rings to be associated with >> these sub-flows, if hardware resources are available. When a sub-flow >> is created, it can then be associated with a callback routine which >> will be invoked to process traffic matching that sub-flow. Would this >> work for you? > > Yes. Our h/w will not steer traffic based on MAC address - our point > of view is that the MAC is associated with the physical port rather > than any virtualization thereof. We steer traffic to multiple receive > queues based only on IP header information.We''re planning to support both models, i.e. a per-VNIC MAC address or a MAC address shared by multiple VNICs (see the "[1]" note in my previous email). Thanks, Nicolas.> >> >> > >> > mac_open() could then be used to access the vnic (or perhaps >> vmac?) as >> > if it were just another piece of h/w. >> >> That''s already the case today. The VNIC driver registers a separate >> MAC for each created VNIC. To the rest of the system, VNICs look and >> behave like regular NICs, and can be opened through mac_open() of >> course. >> > > Cool. Sounds good. > > Paul > >> Thanks, >> Nicolas. >> >> [1] we are also considering allowing the administrator to have >> multiple VNICs share a MAC address of the underlying NIC, in this >> case the IP address will be part of what characterizes the main flow >> of a VNIC. >> >> -- >> Nicolas Droux, Solaris Kernel Networking >> Sun Microsystems, Inc. http://blogs.sun.com/droux >> >> >> >> > > > -- > Paul Durrant > http://www.linkedin.com/in/pdurrant-- Nicolas Droux, Solaris Kernel Networking Sun Microsystems, Inc. http://blogs.sun.com/droux
* pdurrant at gmail.com [2006-09-28 20:02:59]> Our h/w will not steer traffic based on MAC address - our point of > view is that the MAC is associated with the physical port rather > than any virtualization thereof. We steer traffic to multiple > receive queues based only on IP header information.That seems unfortunate from a Xen perspective, where the typical approach is to assign a MAC address to a guest domain. Is there no support for using multiple MAC addresses? dme. -- David Edmondson, Sun Microsystems, http://www.dme.org
> That seems unfortunate from a Xen perspective, where the typical> approach is to assign a MAC address to a guest domain.>From an architectural perspective, that seems the most elegant (and mostcorrect) approach. Everything else seems like a layering violation to me. -- meem
On 9/29/06, David Edmondson <dme at sun.com> wrote:> > That seems unfortunate from a Xen perspective, where the typical > approach is to assign a MAC address to a guest domain. > > Is there no support for using multiple MAC addresses? >There''s no support in h/w for this yet. We have to run the NIC promiscuous and then dynamically add 5-tuple filters to steer connections to the correct VNIC when we see them getting set up. Future h/w will have the ability to steer on the basis of MAC address. Paul -- Paul Durrant http://www.linkedin.com/in/pdurrant
On 9/29/06, Peter Memishian <peter.memishian at sun.com> wrote:> > > That seems unfortunate from a Xen perspective, where the typical > > approach is to assign a MAC address to a guest domain. > > From an architectural perspective, that seems the most elegant (and most > correct) approach. Everything else seems like a layering violation to me. >Agreed, for Xen virtualising on the basis of MAC address is the correct thing to do. For Crossbow, I''m not sure. For user-level stacks, definitely not. Paul -- Paul Durrant http://www.linkedin.com/in/pdurrant
On 9/28/06, Nicolas Droux <Nicolas.Droux at sun.com> wrote:> > We''re planning to support both models, i.e. a per-VNIC MAC address or > a MAC address shared by multiple VNICs (see the "[1]" note in my > previous email). >That''s good. Paul -- Paul Durrant http://www.linkedin.com/in/pdurrant
On 9/28/06, Darren.Reed at sun.com <Darren.Reed at sun.com> wrote:> > Can I ask what IPv4/v6 header fields you plan on allowing demultiplexing > to occur? Do you have any plans to include any UDP/TCP header fields? >Our h/w can steer based on 5-tuple (i.e. src IP, dst IP, TCP/UDP, src port, dst port) so, yes, we include the UDP/TCP ports from the header. Paul -- Paul Durrant http://www.linkedin.com/in/pdurrant
> > > That seems unfortunate from a Xen perspective, where the typical > > approach is to assign a MAC address to a guest domain. > > >From an architectural perspective, that seems the most elegant (and most > correct) approach. Everything else seems like a layering violation to me.I think if you have factory MAC addresses or can live with creation of random mac addresses, then mac based VNICs should be preferred model. However, a large number of customer are used to dealing with IP addresses and have asked that since they already have unique IP address per machine (or virtual machine), can they just use that. Thats why we are investigating IP based VNICs. At the end of the day, its just classification based on IP addresses. The only minor layering violation is some code sitting in VNIC layer which turns arounds the multicast and broadcast packets to other IP based VNICs (stated simply). At the end of the day, if we follow strict layering, that also ends up producing more hackiness. Remebers that interfaces between TCP and IP when they were "supposedly" separate layers :) Cheers, Sunay> > -- > meem > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://opensolaris.org/mailman/listinfo/crossbow-discuss >-- Sunay Tripathi Sr. Staff Engineer Solaris Core Networking Technologies Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
> * pdurrant at gmail.com [2006-09-28 20:02:59] > > Our h/w will not steer traffic based on MAC address - our point of > > view is that the MAC is associated with the physical port rather > > than any virtualization thereof. We steer traffic to multiple > > receive queues based only on IP header information. > > That seems unfortunate from a Xen perspective, where the typical > approach is to assign a MAC address to a guest domain. > > Is there no support for using multiple MAC addresses?Support for multiple MAC addresses is already there. Support for choosing a factory mac address to build VNICs on top of is coming. Cheers, Sunay> > dme. > -- > David Edmondson, Sun Microsystems, http://www.dme.org > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://opensolaris.org/mailman/listinfo/crossbow-discuss >-- Sunay Tripathi Sr. Staff Engineer Solaris Core Networking Technologies Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
* Sunay.Tripathi at eng.sun.com [2006-09-29 16:42:42]>> * pdurrant at gmail.com [2006-09-28 20:02:59] >> > Our h/w will not steer traffic based on MAC address - our point of >> > view is that the MAC is associated with the physical port rather >> > than any virtualization thereof. We steer traffic to multiple >> > receive queues based only on IP header information. >> >> That seems unfortunate from a Xen perspective, where the typical >> approach is to assign a MAC address to a guest domain. >> >> Is there no support for using multiple MAC addresses? > > Support for multiple MAC addresses is already there. Support for > choosing a factory mac address to build VNICs on top of is > coming.The question related specifically to the hardware that Paul described. dme. -- David Edmondson, Sun Microsystems, http://www.dme.org
> * Sunay.Tripathi at eng.sun.com [2006-09-29 16:42:42] > >> * pdurrant at gmail.com [2006-09-28 20:02:59] > >> > Our h/w will not steer traffic based on MAC address - our point of > >> > view is that the MAC is associated with the physical port rather > >> > than any virtualization thereof. We steer traffic to multiple > >> > receive queues based only on IP header information. > >> > >> That seems unfortunate from a Xen perspective, where the typical > >> approach is to assign a MAC address to a guest domain. > >> > >> Is there no support for using multiple MAC addresses? > > > > Support for multiple MAC addresses is already there. Support for > > choosing a factory mac address to build VNICs on top of is > > coming. > > The question related specifically to the hardware that Paul described.Ah, my bad :)> > dme. > -- > David Edmondson, Sun Microsystems, http://www.dme.org >-- Sunay Tripathi Sr. Staff Engineer Solaris Core Networking Technologies Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow