thr3ads.net - crossbow discuss - [crossbow-discuss] driver API information [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Paul Durrant

2006-Sep-22 12:33 UTC

[crossbow-discuss] driver API information

Hi,

  Is there any information... any information at all... on the
proposed driver API for crossbow and how this may link to Xen.
  I''m working on a Nemo driver for a virtualizable NIC and I have no
idea how this may link in with crossbow: are there new entry points
for adding/removing packet classification entries? does the driver get
notification of vnic creation so that it may allocate the necessary
h/w resources?
  I don''t really want to come up with my own vnic architecture but
without any info. on crossbow (which is supposed to be an
*Open*Solaris project) I will have little choice but to do so.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant

David Edmondson

2006-Sep-22 13:23 UTC

head link

[crossbow-discuss] Re: driver API information

* pdurrant at gmail.com [2006-09-22 13:33:00]>   Is there any information... any information at all... on the
> proposed driver API for crossbow and how this may link to Xen.
>   I''m working on a Nemo driver for a virtualizable NIC and I have
no
> idea how this may link in with crossbow: are there new entry points
> for adding/removing packet classification entries? does the driver get
> notification of vnic creation so that it may allocate the necessary
> h/w resources?
There are several parts to this, some of which are further advanced
than others.

The Crossbow vnic currently implemented provides support for
virtualising a NIC within a single OS instance (e.g. dom0 in Xen).
This includes assigning MAC addresses, rx rings, etc. to vnics.

Exactly how far along this work has got and how much has detailed
specifications is better answered by someone more closely involved in
the Crossbow project than me.

So far there is no written design for an API to support mapping the
hardware resources of a "virtualisable" NIC directly into the address
space of a (Xen) guest domain.

Mike Speer and I discussed this last week and agreed that we would
work on a proposal "real soon now".

Generally speaking, the intention is to extend the resource
declarations currently made by mac drivers to allow them to indicate
their capability in this respect.  This would be used by a combination
of the vnic infrastructure, the Xen network backend driver[1] and the
Xen network frontend driver to map the relevant regions into the guest
domain and rewire the interrupt(s).

We''d need a hardware specific plugin for the frontend driver, for
which Michael made the excellent suggesting of re-using the existing
mac interface.

Footnotes: 
[1]  A new one that sits on top of the mac interface - not the current
     Nemo driver.

dme.
-- 
David Edmondson, Sun Microsystems, http://www.dme.org

David Edmondson

2006-Sep-22 13:27 UTC

head link

[crossbow-discuss] Re: driver API information

* dme at sun.com [2006-09-22 14:23:37]> So far there is no written design for an API to support mapping the
> hardware resources of a "virtualisable" NIC directly into the
address
> space of a (Xen) guest domain.
I should add that I''m aware of the work recently made available by
Solarflare on the xen-devel list and will be looking closely at it,
with a hope that we can agree on the inter-domain protocol components
of the work.

dme.
-- 
David Edmondson, Sun Microsystems, http://www.dme.org

Kais Belgaied

2006-Sep-22 19:25 UTC

head link

[crossbow-discuss] driver API information

Paul Durrant wrote On 09/22/06 05:33,:
> Hi,
>
>   Is there any information... any information at all... on the
> proposed driver API for crossbow and how this may link to Xen.

(Dave commented on the Xen interaction)
>   I''m working on a Nemo driver for a virtualizable NIC and I have
no
> idea how this may link in with crossbow: are there new entry points
> for adding/removing packet classification entries? does the driver get


yes, all those entry points are there, and we''re getting them out in 
source code
along with some documentation as soon as we legally can.

About creation of VNICs, it is already possible with any Nemo compliant
driver, without need to change the driver. However it won''t be using
the
h/w classification capabilities. That part needs the Crossbow extentions

> notification of vnic creation so that it may allocate the necessary
> h/w resources?

not quite. at least not in the current approach. The virtualization 
happens at the MAC layer.
The driver is oblivious to where its resources are assigned to.

>   I don''t really want to come up with my own vnic architecture but
> without any info. on crossbow (which is supposed to be an
> *Open*Solaris project) I will have little choice but to do so.

it is a project. it''s in the making.
I think we probably reached the point where we can start inviting
members of the  OpenSolaris community to our design discussions.
We usually hold those on Tuesdays afternoon.



    Kais.
>
>   Paul
>

Sunay Tripathi

2006-Sep-23 15:50 UTC

head link

[crossbow-discuss] Re: driver API information

Paul,
> * pdurrant at gmail.com [2006-09-22 13:33:00]
> >   Is there any information... any information at all... on the
> > proposed driver API for crossbow and how this may link to Xen.
> >   I''m working on a Nemo driver for a virtualizable NIC and I
have no
> > idea how this may link in with crossbow: are there new entry points
> > for adding/removing packet classification entries? does the driver get
> > notification of vnic creation so that it may allocate the necessary
> > h/w resources?
Kais is owrking on defining the APIs. He should be able to send the draft
out very soon. Also Nicolas and Carol are working through the process
to get the Crossbow gate src out to opensolaris. It will make a lot
more sense when the code and draft is all out. But in short there will
be entry point for adding removing classification rules and stack will
tell the driver which Rx rings etc to assign to each rule. The idea
is that during registering the NIC, Nemo finds out about all the classification
capabilities and Rx and Tx rings for the H/W. Whenever a VNIC is created,
Nemo will decide based on various criterions/policy whether to run it out of
H/W (by setting a rule and assign an Rx ring) or use software classification
to make it work.

Dave gave some details below. I''ll try to clarify some more
> There are several parts to this, some of which are further advanced
> than others.
> 
> The Crossbow vnic currently implemented provides support for
> virtualising a NIC within a single OS instance (e.g. dom0 in Xen).
> This includes assigning MAC addresses, rx rings, etc. to vnics.
> 
> Exactly how far along this work has got and how much has detailed
> specifications is better answered by someone more closely involved in
> the Crossbow project than me.
Its coming soon.
> So far there is no written design for an API to support mapping the
> hardware resources of a "virtualisable" NIC directly into the
address
> space of a (Xen) guest domain.
> 
> Mike Speer and I discussed this last week and agreed that we would
> work on a proposal "real soon now".
That would be nice.
> Generally speaking, the intention is to extend the resource
> declarations currently made by mac drivers to allow them to indicate
> their capability in this respect.  This would be used by a combination
> of the vnic infrastructure, the Xen network backend driver[1] and the
> Xen network frontend driver to map the relevant regions into the guest
> domain and rewire the interrupt(s).
> 
> We''d need a hardware specific plugin for the frontend driver, for
> which Michael made the excellent suggesting of re-using the existing
> mac interface.
Yes, you will probably want to use most of the Crossbow/MAC framework
on how to map the relevent region to the domU. Also the NIC
capability in this direction needs to be communicated upfront.

Cheers,
Sunay
> 
> Footnotes: 
> [1]  A new one that sits on top of the mac interface - not the current
>      Nemo driver.
> 
> dme.
> -- 
> David Edmondson, Sun Microsystems, http://www.dme.org
> _______________________________________________
> crossbow-discuss mailing list
> crossbow-discuss at opensolaris.org
> http://opensolaris.org/mailman/listinfo/crossbow-discuss
> 

-- 
Sunay Tripathi
Sr. Staff Engineer
Solaris Core Networking Technologies 
Sun MicroSystems Inc.

Solaris Networking:	http://www.opensolaris.org/os/community/networking
Project Crossbow:	http://www.opensolaris.org/os/project/crossbow

Nicolas Droux

2006-Sep-25 04:01 UTC

head link

[crossbow-discuss] Re: driver API information

On Sep 23, 2006, at 9:50 AM, Sunay Tripathi wrote:
> It will make a lot
> more sense when the code and draft is all out. But in short there will
> be entry point for adding removing classification rules and stack will
> tell the driver which Rx rings etc to assign to each rule. The idea
> is that during registering the NIC, Nemo finds out about all the  
> classification
> capabilities and Rx and Tx rings for the H/W. Whenever a VNIC is  
> created,
> Nemo will decide based on various criterions/policy whether to run  
> it out of
> H/W (by setting a rule and assign an Rx ring) or use software  
> classification
> to make it work.
I''m also currently working on the architecture document for VNICs  
which will provide more information on how they take advantage of the  
capabilities provided by the underlying NIC hardware. First draft  
coming to this alias soon...

Nicolas.

Paul Durrant

2006-Sep-25 12:50 UTC

head link

[crossbow-discuss] Re: driver API information

On 9/23/06, Sunay Tripathi <Sunay.Tripathi at eng.sun.com>
wrote:>
> Kais is owrking on defining the APIs. He should be able to send the draft
> out very soon. Also Nicolas and Carol are working through the process
> to get the Crossbow gate src out to opensolaris. It will make a lot
> more sense when the code and draft is all out. But in short there will
> be entry point for adding removing classification rules and stack will
> tell the driver which Rx rings etc to assign to each rule. The idea
> is that during registering the NIC, Nemo finds out about all the
classification
> capabilities and Rx and Tx rings for the H/W. Whenever a VNIC is created,
> Nemo will decide based on various criterions/policy whether to run it out
of
> H/W (by setting a rule and assign an Rx ring) or use software
classification
> to make it work.
>
Thanks Sunay. This all sounds plausible but would it perhaps be more
flexible to consider a scheme where crossbow ''allocates''
resources
from the h/w. E.g. it asks for new RX and TX queues from the driver.
If successful it then asks the driver to re-direct traffic matching a
certain 5-tuple/3-tuple to it. If it fails however then crossbow can
fall back to doing the virtualisation in s/w alone.
The reason I ask is that having a scheme where the h/w driver tells
crossbow how may resources it has up front essentially means those
resources must be reserved for crossbow''s use. This is wasteful if
they are never used and will impact on other clients such as a
user-level TCP/IP stack.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant

Sunay Tripathi

2006-Sep-25 17:44 UTC

head link

[crossbow-discuss] Re: driver API information

> On 9/23/06, Sunay Tripathi <Sunay.Tripathi at eng.sun.com> wrote:
> >
> > Kais is owrking on defining the APIs. He should be able to send the
draft
> > out very soon. Also Nicolas and Carol are working through the process
> > to get the Crossbow gate src out to opensolaris. It will make a lot
> > more sense when the code and draft is all out. But in short there will
> > be entry point for adding removing classification rules and stack will
> > tell the driver which Rx rings etc to assign to each rule. The idea
> > is that during registering the NIC, Nemo finds out about all the
classification
> > capabilities and Rx and Tx rings for the H/W. Whenever a VNIC is
created,
> > Nemo will decide based on various criterions/policy whether to run it
out of
> > H/W (by setting a rule and assign an Rx ring) or use software
classification
> > to make it work.
> >
> 
> Thanks Sunay. This all sounds plausible but would it perhaps be more
> flexible to consider a scheme where crossbow ''allocates''
resources
> from the h/w. E.g. it asks for new RX and TX queues from the driver.
> If successful it then asks the driver to re-direct traffic matching a
> certain 5-tuple/3-tuple to it. If it fails however then crossbow can
> fall back to doing the virtualisation in s/w alone.
The goals are bigger than just being able to use the Rx rings etc. 
We anticipate that Rx rings will always be in short supply so 
the stack wants to know whats available in order to use them best.
An interesting scenario is with some support from intrd2.0, observing
which traffic type (VNIC or flow) is seeing heavy traffic and moving
them in/out of H/W based on need. If the H/W resources can disappear
underneath us, it will get tricky.
> The reason I ask is that having a scheme where the h/w driver tells
> crossbow how may resources it has up front essentially means those
> resources must be reserved for crossbow''s use. This is wasteful if
> they are never used and will impact on other clients such as a
> user-level TCP/IP stack.
Interesting. Our assumption was that NIC (and all its resources) are
owned by GLDv3. We will have direct APIs to the GLD driver to set
classification rules and request (or show preference for) H/W Rx
rings. So why can''t userland TCP and kernel stack live together both
requesting what they want and giving their preference that they would
prefer H/W support and let GLDv3 manage it the best it can.

We can potentially add a flag to the API that if preference is stated
for the Rx ring and none is available, then creation of new VNIC or
flow will fail. This will give control to the sysadmin to choose
which things he wants to run out of the H/W. 

Do you need something more than this?

Cheers,
Sunay

> 
>   Paul
> 
> -- 
> Paul Durrant
> http://www.linkedin.com/in/pdurrant
> 

-- 
Sunay Tripathi
Sr. Staff Engineer
Solaris Core Networking Technologies 
Sun MicroSystems Inc.

Solaris Networking:	http://www.opensolaris.org/os/community/networking
Project Crossbow:	http://www.opensolaris.org/os/project/crossbow

Paul Durrant

2006-Sep-27 11:45 UTC

head link

[crossbow-discuss] Re: driver API information

On 9/25/06, Sunay Tripathi <Sunay.Tripathi at eng.sun.com>
wrote:> The goals are bigger than just being able to use the Rx rings etc.
> We anticipate that Rx rings will always be in short supply so
> the stack wants to know whats available in order to use them best.
> An interesting scenario is with some support from intrd2.0, observing
> which traffic type (VNIC or flow) is seeing heavy traffic and moving
> them in/out of H/W based on need. If the H/W resources can disappear
> underneath us, it will get tricky.
We''ll have about 4k RX ''rings'' available in h/w so I
don''t think there
should be too much of a supply issue unless under very heavy load.
Moving heavily loaded connections to a separate ring sounds like a
good plan but, if you''re anticipating a shortage of supply, then
presumably the code must be able to fail gracefully if no h/w resource
is available and simply not perform the migration.
>
> > The reason I ask is that having a scheme where the h/w driver tells
> > crossbow how may resources it has up front essentially means those
> > resources must be reserved for crossbow''s use. This is
wasteful if
> > they are never used and will impact on other clients such as a
> > user-level TCP/IP stack.
>
> Interesting. Our assumption was that NIC (and all its resources) are
> owned by GLDv3. We will have direct APIs to the GLD driver to set
> classification rules and request (or show preference for) H/W Rx
> rings. So why can''t userland TCP and kernel stack live together
both
> requesting what they want and giving their preference that they would
> prefer H/W support and let GLDv3 manage it the best it can.
It''s fine for GLD to assume free access to the h/w providing it can
accept the occasional failure. A userlevel stack essentially needs at
least one vnic per process to ensure security so a system can hog a
fair few of them depending upon what it''s doing.
>
> We can potentially add a flag to the API that if preference is stated
> for the Rx ring and none is available, then creation of new VNIC or
> flow will fail. This will give control to the sysadmin to choose
> which things he wants to run out of the H/W.
>
> Do you need something more than this?
>
No, that sounds reasonable: a driver interface that simpy requests new
resources from the h/w when the client needs them will be most
flexible. Declaring resources up-front in this model should not be
necessary. I''d envisage something along the lines of:

m_vnic_create/destroy entry point - to request new vnic
creation/destruction (creation should cause the driver to register the
interface to the vnic using the normal mac_register() call).
m_vnic_steer entry point - to be passed a description of a flow or
flows that should be steered to a particular vnic

mac_open() could then be used to access the vnic (or perhaps vmac?) as
if it were just another piece of h/w.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant

Nicolas Droux

2006-Sep-27 17:36 UTC

head link

[crossbow-discuss] Re: driver API information

Paul,

On Sep 27, 2006, at 5:45 AM, Paul Durrant wrote:
> m_vnic_create/destroy entry point - to request new vnic
> creation/destruction (creation should cause the driver to register the
> interface to the vnic using the normal mac_register() call).
Do you require a kernel API in this case? Currently VNIC creation/ 
deletion is driven by a new libvnicadm library. This library is used  
by dladm(1M) and we could envision exporting it as a public interface  
as well. Note that the VNIC creation already triggers a mac_register 
() today, more on that below.
> m_vnic_steer entry point - to be passed a description of a flow or
> flows that should be steered to a particular vnic
Traffic is steered to the VNICs according to the MAC address  
associated with the VNIC. The MAC address is basically what  
characterizes the main flow of the VNIC [1]. Within a VNIC, or any  
other data-link for that matter, we will allow sub-flows to be  
defined. The plan is to allow separate rings to be associated with  
these sub-flows, if hardware resources are available. When a sub-flow  
is created, it can then be associated with a callback routine which  
will be invoked to process traffic matching that sub-flow. Would this  
work for you?
>
> mac_open() could then be used to access the vnic (or perhaps vmac?) as
> if it were just another piece of h/w.
That''s already the case today. The VNIC driver registers a separate  
MAC for each created VNIC. To the rest of the system, VNICs look and  
behave like regular NICs, and can be opened through mac_open() of  
course.

Thanks,
Nicolas.

[1] we are also considering allowing the administrator to have  
multiple VNICs share a MAC address of the underlying NIC, in this  
case the IP address will be part of what characterizes the main flow  
of a VNIC.

-- 
Nicolas Droux, Solaris Kernel Networking
Sun Microsystems, Inc. http://blogs.sun.com/droux

Sunay Tripathi

2006-Sep-27 21:42 UTC

head link

[crossbow-discuss] Re: driver API information

> On 9/25/06, Sunay Tripathi <Sunay.Tripathi at eng.sun.com> wrote:
> > The goals are bigger than just being able to use the Rx rings etc.
> > We anticipate that Rx rings will always be in short supply so
> > the stack wants to know whats available in order to use them best.
> > An interesting scenario is with some support from intrd2.0, observing
> > which traffic type (VNIC or flow) is seeing heavy traffic and moving
> > them in/out of H/W based on need. If the H/W resources can disappear
> > underneath us, it will get tricky.
> 
> We''ll have about 4k RX ''rings'' available in h/w
so I don''t think there
> should be too much of a supply issue unless under very heavy load.
> Moving heavily loaded connections to a separate ring sounds like a
> good plan but, if you''re anticipating a shortage of supply, then
> presumably the code must be able to fail gracefully if no h/w resource
> is available and simply not perform the migration.
I think it would be worthwhile to make a distinction between connections
and flows. For userland TCP, what you are dealing with is connections
(5 tuples specified) and for the time being we are dealing with more
wider definitions for flow (dst MAC or IP address, VLAN tag, tansport,
or dst port). Multiple connections will map into a flow.

For userland TCP, you shouldn''t need to create a VNIC. All you want to
do
is map a connection to a Rx ring and you are done. Crossbow will allow
you to do that without needing to create VNICs.

As for resources and graceful failover, your suggestions is correct except
we want to enforce it at the top of the GLDv3 layer (with a possibility
of a mac interface). The mac owns the resources. Both dladm/netrcm will
trigger creation of VNICs or flow specific policies at the mac layer
(pseudo H/W layer) and can state their preference whether they want to 
run out of H/W i.e. use a H/W Rx ring. I think userland TCp should
become a client of the same interface and request specific TCP connections
to be assigned to the H/W Rx ring. Its your choice to decide you wan to
fail in case no Rx rings are available or let the S/W classifier and
soft ring deal with it.
> > > The reason I ask is that having a scheme where the h/w driver
tells
> > > crossbow how may resources it has up front essentially means
those
> > > resources must be reserved for crossbow''s use. This is
wasteful if
> > > they are never used and will impact on other clients such as a
> > > user-level TCP/IP stack.
> >
> > Interesting. Our assumption was that NIC (and all its resources) are
> > owned by GLDv3. We will have direct APIs to the GLD driver to set
> > classification rules and request (or show preference for) H/W Rx
> > rings. So why can''t userland TCP and kernel stack live
together both
> > requesting what they want and giving their preference that they would
> > prefer H/W support and let GLDv3 manage it the best it can.
> 
> It''s fine for GLD to assume free access to the h/w providing it
can
> accept the occasional failure. A userlevel stack essentially needs at
> least one vnic per process to ensure security so a system can hog a
> fair few of them depending upon what it''s doing.
Why do you want to create a VNIC per process? How does it offer more
security? I think you achieve the same thing as long as the underlying
Rx rings etc are per process or per connection.
> > We can potentially add a flag to the API that if preference is stated
> > for the Rx ring and none is available, then creation of new VNIC or
> > flow will fail. This will give control to the sysadmin to choose
> > which things he wants to run out of the H/W.
> >
> > Do you need something more than this?
> >
> 
> No, that sounds reasonable: a driver interface that simpy requests new
> resources from the h/w when the client needs them will be most
> flexible. Declaring resources up-front in this model should not be
> necessary. I''d envisage something along the lines of:
> 
> m_vnic_create/destroy entry point - to request new vnic
> creation/destruction (creation should cause the driver to register the
> interface to the vnic using the normal mac_register() call).
> m_vnic_steer entry point - to be passed a description of a flow or
> flows that should be steered to a particular vnic
> 
> mac_open() could then be used to access the vnic (or perhaps vmac?) as
> if it were just another piece of h/w.
The interfaces will be along the lines. The only difference is that
mac will own the resources and both dladm/netrcm and userland TCP libraries
will have to ask the mac to get the resources assigned. I don''t think
the
smarts need to be in the driver on how to assign the hardware resources.

Cheers,
Sunay

> 
>   Paul
> 
> -- 
> Paul Durrant
> http://www.linkedin.com/in/pdurrant
> 

-- 
Sunay Tripathi
Sr. Staff Engineer
Solaris Core Networking Technologies 
Sun MicroSystems Inc.

Solaris Networking:	http://www.opensolaris.org/os/community/networking
Project Crossbow:	http://www.opensolaris.org/os/project/crossbow

Paul Durrant

2006-Sep-28 19:02 UTC

head link

[crossbow-discuss] Re: driver API information

On 9/27/06, Nicolas Droux <Nicolas.Droux at sun.com>
wrote:> Paul,
>
> On Sep 27, 2006, at 5:45 AM, Paul Durrant wrote:
>
> > m_vnic_create/destroy entry point - to request new vnic
> > creation/destruction (creation should cause the driver to register the
> > interface to the vnic using the normal mac_register() call).
>
> Do you require a kernel API in this case?
Personally not from a client point of view, although I guess Xen
required some sort of kernel API to set up a new vnic and then map it
into a domU.
> Currently VNIC creation/
> deletion is driven by a new libvnicadm library. This library is used
> by dladm(1M) and we could envision exporting it as a public interface
> as well. Note that the VNIC creation already triggers a mac_register
> () today, more on that below.
>
> > m_vnic_steer entry point - to be passed a description of a flow or
> > flows that should be steered to a particular vnic
>
> Traffic is steered to the VNICs according to the MAC address
> associated with the VNIC. The MAC address is basically what
> characterizes the main flow of the VNIC [1]. Within a VNIC, or any
> other data-link for that matter, we will allow sub-flows to be
> defined. The plan is to allow separate rings to be associated with
> these sub-flows, if hardware resources are available. When a sub-flow
> is created, it can then be associated with a callback routine which
> will be invoked to process traffic matching that sub-flow. Would this
> work for you?
Yes. Our h/w will not steer traffic based on MAC address - our point
of view is that the MAC is associated with the physical port rather
than any virtualization thereof. We steer traffic to multiple receive
queues based only on IP header information.
>
> >
> > mac_open() could then be used to access the vnic (or perhaps vmac?) as
> > if it were just another piece of h/w.
>
> That''s already the case today. The VNIC driver registers a
separate
> MAC for each created VNIC. To the rest of the system, VNICs look and
> behave like regular NICs, and can be opened through mac_open() of
> course.
>
Cool. Sounds good.

  Paul
> Thanks,
> Nicolas.
>
> [1] we are also considering allowing the administrator to have
> multiple VNICs share a MAC address of the underlying NIC, in this
> case the IP address will be part of what characterizes the main flow
> of a VNIC.
>
> --
> Nicolas Droux, Solaris Kernel Networking
> Sun Microsystems, Inc. http://blogs.sun.com/droux
>
>
>
>

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant

Paul Durrant

2006-Sep-28 19:17 UTC

head link

[crossbow-discuss] Re: driver API information

On 9/27/06, Sunay Tripathi <Sunay.Tripathi at eng.sun.com>
wrote:>
> I think it would be worthwhile to make a distinction between connections
> and flows. For userland TCP, what you are dealing with is connections
> (5 tuples specified) and for the time being we are dealing with more
> wider definitions for flow (dst MAC or IP address, VLAN tag, tansport,
> or dst port). Multiple connections will map into a flow.
>
Ok - I''d assumed that you were using a narrower definition of flow
based on a 5 or 3-tuple rather than any layer 2 information.
> For userland TCP, you shouldn''t need to create a VNIC. All you
want to do
> is map a connection to a Rx ring and you are done. Crossbow will allow
> you to do that without needing to create VNICs.
>
Indeed, although we may map several connections to the same RX queue.
For Xen though, we most definitely need to operate a VNIC.
> As for resources and graceful failover, your suggestions is correct except
> we want to enforce it at the top of the GLDv3 layer (with a possibility
> of a mac interface). The mac owns the resources. Both dladm/netrcm will
> trigger creation of VNICs or flow specific policies at the mac layer
> (pseudo H/W layer) and can state their preference whether they want to
> run out of H/W i.e. use a H/W Rx ring. I think userland TCp should
> become a client of the same interface and request specific TCP connections
> to be assigned to the H/W Rx ring. Its your choice to decide you wan to
> fail in case no Rx rings are available or let the S/W classifier and
> soft ring deal with it.
>
It would be nice if we could make our user-level stack talk to the
driver through the same interface as Xen and Crossbow - that''s one of
my aims and that''s why I want to see more API details; to determine
whether that is possible.
> > > > The reason I ask is that having a scheme where the h/w
driver tells
> > > > crossbow how may resources it has up front essentially means
those
> > > > resources must be reserved for crossbow''s use. This
is wasteful if
> > > > they are never used and will impact on other clients such as
a
> > > > user-level TCP/IP stack.
> > >
> > > Interesting. Our assumption was that NIC (and all its resources)
are
> > > owned by GLDv3. We will have direct APIs to the GLD driver to set
> > > classification rules and request (or show preference for) H/W Rx
> > > rings. So why can''t userland TCP and kernel stack live
together both
> > > requesting what they want and giving their preference that they
would
> > > prefer H/W support and let GLDv3 manage it the best it can.
> >
> > It''s fine for GLD to assume free access to the h/w providing
it can
> > accept the occasional failure. A userlevel stack essentially needs at
> > least one vnic per process to ensure security so a system can hog a
> > fair few of them depending upon what it''s doing.
>
> Why do you want to create a VNIC per process? How does it offer more
> security? I think you achieve the same thing as long as the underlying
> Rx rings etc are per process or per connection.
>
The main issue is denial of service. As you say we essentially need to
firewall RX queues from each other, binding them to a process - but
that''s essentially what I meant by VNIC.
> > > We can potentially add a flag to the API that if preference is
stated
> > > for the Rx ring and none is available, then creation of new VNIC
or
> > > flow will fail. This will give control to the sysadmin to choose
> > > which things he wants to run out of the H/W.
> > >
> > > Do you need something more than this?
> > >
> >
> > No, that sounds reasonable: a driver interface that simpy requests new
> > resources from the h/w when the client needs them will be most
> > flexible. Declaring resources up-front in this model should not be
> > necessary. I''d envisage something along the lines of:
> >
> > m_vnic_create/destroy entry point - to request new vnic
> > creation/destruction (creation should cause the driver to register the
> > interface to the vnic using the normal mac_register() call).
> > m_vnic_steer entry point - to be passed a description of a flow or
> > flows that should be steered to a particular vnic
> >
> > mac_open() could then be used to access the vnic (or perhaps vmac?) as
> > if it were just another piece of h/w.
>
> The interfaces will be along the lines. The only difference is that
> mac will own the resources and both dladm/netrcm and userland TCP libraries
> will have to ask the mac to get the resources assigned. I don''t
think the
> smarts need to be in the driver on how to assign the hardware resources.
>
Ok - I guess that will become clearer when the API is published. My
concern about publishing all resources to the mac layer in one go and
it expecting them to all be available is because, e.g., if I need to
populate 4000 RX queues with say 256 9k buffers each, that''s rather a
lot of DMA memory I''ve just swallowed. If, on the other hand, the mac
layer specifically requests a resource to be enabled/disabled then
that''s ok.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant

Darren.Reed at Sun.COM

2006-Sep-28 20:43 UTC

head link

[crossbow-discuss] Re: driver API information

Paul Durrant wrote:
> On 9/27/06, Nicolas Droux <Nicolas.Droux at sun.com> wrote:
>
>> Paul,
>>
>> On Sep 27, 2006, at 5:45 AM, Paul Durrant wrote:
>>
>> > m_vnic_create/destroy entry point - to request new vnic
>> > creation/destruction (creation should cause the driver to register
the
>> > interface to the vnic using the normal mac_register() call).
>>
>> Do you require a kernel API in this case?
>
>
> Personally not from a client point of view, although I guess Xen
> required some sort of kernel API to set up a new vnic and then map it
> into a domU.
>
>> Currently VNIC creation/
>> deletion is driven by a new libvnicadm library. This library is used
>> by dladm(1M) and we could envision exporting it as a public interface
>> as well. Note that the VNIC creation already triggers a mac_register
>> () today, more on that below.
>>
>> > m_vnic_steer entry point - to be passed a description of a flow or
>> > flows that should be steered to a particular vnic
>>
>> Traffic is steered to the VNICs according to the MAC address
>> associated with the VNIC. The MAC address is basically what
>> characterizes the main flow of the VNIC [1]. Within a VNIC, or any
>> other data-link for that matter, we will allow sub-flows to be
>> defined. The plan is to allow separate rings to be associated with
>> these sub-flows, if hardware resources are available. When a sub-flow
>> is created, it can then be associated with a callback routine which
>> will be invoked to process traffic matching that sub-flow. Would this
>> work for you?
>
>
> Yes. Our h/w will not steer traffic based on MAC address - our point
> of view is that the MAC is associated with the physical port rather
> than any virtualization thereof. We steer traffic to multiple receive
> queues based only on IP header information.

Can I ask what IPv4/v6 header fields you plan on allowing demultiplexing
to occur?  Do you have any plans to include any UDP/TCP header fields?

Darren

Nicolas Droux

2006-Sep-28 20:54 UTC

head link

[crossbow-discuss] Re: driver API information

On Sep 28, 2006, at 1:02 PM, Paul Durrant wrote:
> On 9/27/06, Nicolas Droux <Nicolas.Droux at sun.com> wrote:
>> Paul,
>>
>> On Sep 27, 2006, at 5:45 AM, Paul Durrant wrote:
>>
>> > m_vnic_create/destroy entry point - to request new vnic
>> > creation/destruction (creation should cause the driver to  
>> register the
>> > interface to the vnic using the normal mac_register() call).
>>
>> Do you require a kernel API in this case?
>
> Personally not from a client point of view, although I guess Xen
> required some sort of kernel API to set up a new vnic and then map it
> into a domU.
The current plan is to create the VNICs via libvnicadm from the Xen  
admin interface, no dedicated kernel interface will be used for that  
operation directly from Xen.
>
>> Currently VNIC creation/
>> deletion is driven by a new libvnicadm library. This library is used
>> by dladm(1M) and we could envision exporting it as a public interface
>> as well. Note that the VNIC creation already triggers a mac_register
>> () today, more on that below.
>>
>> > m_vnic_steer entry point - to be passed a description of a flow or
>> > flows that should be steered to a particular vnic
>>
>> Traffic is steered to the VNICs according to the MAC address
>> associated with the VNIC. The MAC address is basically what
>> characterizes the main flow of the VNIC [1]. Within a VNIC, or any
>> other data-link for that matter, we will allow sub-flows to be
>> defined. The plan is to allow separate rings to be associated with
>> these sub-flows, if hardware resources are available. When a sub-flow
>> is created, it can then be associated with a callback routine which
>> will be invoked to process traffic matching that sub-flow. Would this
>> work for you?
>
> Yes. Our h/w will not steer traffic based on MAC address - our point
> of view is that the MAC is associated with the physical port rather
> than any virtualization thereof. We steer traffic to multiple receive
> queues based only on IP header information.
We''re planning to support both models, i.e. a per-VNIC MAC address or  
a MAC address shared by multiple VNICs (see the "[1]" note in my  
previous email).

Thanks,
Nicolas.
>
>>
>> >
>> > mac_open() could then be used to access the vnic (or perhaps  
>> vmac?) as
>> > if it were just another piece of h/w.
>>
>> That''s already the case today. The VNIC driver registers a
separate
>> MAC for each created VNIC. To the rest of the system, VNICs look and
>> behave like regular NICs, and can be opened through mac_open() of
>> course.
>>
>
> Cool. Sounds good.
>
>  Paul
>
>> Thanks,
>> Nicolas.
>>
>> [1] we are also considering allowing the administrator to have
>> multiple VNICs share a MAC address of the underlying NIC, in this
>> case the IP address will be part of what characterizes the main flow
>> of a VNIC.
>>
>> --
>> Nicolas Droux, Solaris Kernel Networking
>> Sun Microsystems, Inc. http://blogs.sun.com/droux
>>
>>
>>
>>
>
>
> -- 
> Paul Durrant
> http://www.linkedin.com/in/pdurrant
-- 
Nicolas Droux, Solaris Kernel Networking
Sun Microsystems, Inc. http://blogs.sun.com/droux

David Edmondson

2006-Sep-29 05:25 UTC

head link

[crossbow-discuss] Re: driver API information

* pdurrant at gmail.com [2006-09-28 20:02:59]> Our h/w will not steer traffic based on MAC address - our point of
> view is that the MAC is associated with the physical port rather
> than any virtualization thereof. We steer traffic to multiple
> receive queues based only on IP header information.
That seems unfortunate from a Xen perspective, where the typical
approach is to assign a MAC address to a guest domain.

Is there no support for using multiple MAC addresses?

dme.
-- 
David Edmondson, Sun Microsystems, http://www.dme.org

Peter Memishian

2006-Sep-29 05:51 UTC

head link

[crossbow-discuss] Re: driver API information

> That seems unfortunate from a Xen perspective, where the typical > approach is to assign a MAC address to a guest domain.
>From an architectural perspective, that seems the most elegant (and mostcorrect) approach.  Everything else seems like a layering violation to me.

--
meem

Paul Durrant

2006-Sep-29 08:15 UTC

head link

[crossbow-discuss] Re: driver API information

On 9/29/06, David Edmondson <dme at sun.com>
wrote:>
> That seems unfortunate from a Xen perspective, where the typical
> approach is to assign a MAC address to a guest domain.
>
> Is there no support for using multiple MAC addresses?
>
There''s no support in h/w for this yet. We have to run the NIC
promiscuous and then dynamically add 5-tuple filters to steer
connections to the correct VNIC when we see them getting set up.
Future h/w will have the ability to steer on the basis of MAC address.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant

Paul Durrant

2006-Sep-29 08:16 UTC

head link

[crossbow-discuss] Re: driver API information

On 9/29/06, Peter Memishian <peter.memishian at sun.com>
wrote:>
>  > That seems unfortunate from a Xen perspective, where the typical
>  > approach is to assign a MAC address to a guest domain.
>
> From an architectural perspective, that seems the most elegant (and most
> correct) approach.  Everything else seems like a layering violation to me.
>
Agreed, for Xen virtualising on the basis of MAC address is the
correct thing to do. For Crossbow, I''m not sure. For user-level
stacks, definitely not.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant

Paul Durrant

2006-Sep-29 08:17 UTC

head link

[crossbow-discuss] Re: driver API information

On 9/28/06, Nicolas Droux <Nicolas.Droux at sun.com>
wrote:>
> We''re planning to support both models, i.e. a per-VNIC MAC address
or
> a MAC address shared by multiple VNICs (see the "[1]" note in my
> previous email).
>
That''s good.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant

Paul Durrant

2006-Sep-29 08:19 UTC

head link

[crossbow-discuss] Re: driver API information

On 9/28/06, Darren.Reed at sun.com <Darren.Reed at sun.com>
wrote:>
> Can I ask what IPv4/v6 header fields you plan on allowing demultiplexing
> to occur?  Do you have any plans to include any UDP/TCP header fields?
>
Our h/w can steer based on 5-tuple (i.e. src IP, dst IP, TCP/UDP, src
port, dst port) so, yes, we include the UDP/TCP ports from the header.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant

Sunay Tripathi

2006-Sep-29 15:41 UTC

head link

[crossbow-discuss] Re: driver API information

> 
>  > That seems unfortunate from a Xen perspective, where the typical
>  > approach is to assign a MAC address to a guest domain.
> 
> >From an architectural perspective, that seems the most elegant (and
most
> correct) approach.  Everything else seems like a layering violation to me.
I think if you have factory MAC addresses or can live with creation
of random mac addresses, then mac based VNICs should be preferred model.
However, a large number of customer are used to dealing with IP
addresses and have asked that since they already have unique IP
address per machine (or virtual machine), can they just use that.
Thats why we are investigating IP based VNICs.

At the end of the day, its just classification based on IP addresses.
The only minor layering violation is some code sitting in VNIC layer 
which turns arounds the multicast and broadcast packets to other
IP based VNICs (stated simply). At the end of the day, if we follow
strict layering, that also ends up producing more hackiness. 
Remebers that interfaces between TCP and IP when they were 
"supposedly" separate layers :)

Cheers,
Sunay

> 
> --
> meem
> _______________________________________________
> crossbow-discuss mailing list
> crossbow-discuss at opensolaris.org
> http://opensolaris.org/mailman/listinfo/crossbow-discuss
> 

-- 
Sunay Tripathi
Sr. Staff Engineer
Solaris Core Networking Technologies 
Sun MicroSystems Inc.

Solaris Networking:	http://www.opensolaris.org/os/community/networking
Project Crossbow:	http://www.opensolaris.org/os/project/crossbow

Sunay Tripathi

2006-Sep-29 15:42 UTC

head link

[crossbow-discuss] Re: driver API information

> * pdurrant at gmail.com [2006-09-28 20:02:59]
> > Our h/w will not steer traffic based on MAC address - our point of
> > view is that the MAC is associated with the physical port rather
> > than any virtualization thereof. We steer traffic to multiple
> > receive queues based only on IP header information.
> 
> That seems unfortunate from a Xen perspective, where the typical
> approach is to assign a MAC address to a guest domain.
> 
> Is there no support for using multiple MAC addresses?
Support for multiple MAC addresses is already there. Support for
choosing a factory mac address to build VNICs on top of is
coming.

Cheers,
Sunay

> 
> dme.
> -- 
> David Edmondson, Sun Microsystems, http://www.dme.org
> _______________________________________________
> crossbow-discuss mailing list
> crossbow-discuss at opensolaris.org
> http://opensolaris.org/mailman/listinfo/crossbow-discuss
> 

-- 
Sunay Tripathi
Sr. Staff Engineer
Solaris Core Networking Technologies 
Sun MicroSystems Inc.

Solaris Networking:	http://www.opensolaris.org/os/community/networking
Project Crossbow:	http://www.opensolaris.org/os/project/crossbow

David Edmondson

2006-Sep-29 16:09 UTC

head link

[crossbow-discuss] Re: driver API information

* Sunay.Tripathi at eng.sun.com [2006-09-29 16:42:42]>> * pdurrant at gmail.com [2006-09-28 20:02:59]
>> > Our h/w will not steer traffic based on MAC address - our point of
>> > view is that the MAC is associated with the physical port rather
>> > than any virtualization thereof. We steer traffic to multiple
>> > receive queues based only on IP header information.
>> 
>> That seems unfortunate from a Xen perspective, where the typical
>> approach is to assign a MAC address to a guest domain.
>> 
>> Is there no support for using multiple MAC addresses?
>
> Support for multiple MAC addresses is already there. Support for
> choosing a factory mac address to build VNICs on top of is
> coming.
The question related specifically to the hardware that Paul described.

dme.
-- 
David Edmondson, Sun Microsystems, http://www.dme.org

Sunay Tripathi

2006-Sep-29 16:21 UTC

head link

[crossbow-discuss] Re: driver API information

> * Sunay.Tripathi at eng.sun.com [2006-09-29 16:42:42]
> >> * pdurrant at gmail.com [2006-09-28 20:02:59]
> >> > Our h/w will not steer traffic based on MAC address - our
point of
> >> > view is that the MAC is associated with the physical port
rather
> >> > than any virtualization thereof. We steer traffic to multiple
> >> > receive queues based only on IP header information.
> >> 
> >> That seems unfortunate from a Xen perspective, where the typical
> >> approach is to assign a MAC address to a guest domain.
> >> 
> >> Is there no support for using multiple MAC addresses?
> >
> > Support for multiple MAC addresses is already there. Support for
> > choosing a factory mac address to build VNICs on top of is
> > coming.
> 
> The question related specifically to the hardware that Paul described.
Ah, my bad :)
> 
> dme.
> -- 
> David Edmondson, Sun Microsystems, http://www.dme.org
> 

-- 
Sunay Tripathi
Sr. Staff Engineer
Solaris Core Networking Technologies 
Sun MicroSystems Inc.

Solaris Networking:	http://www.opensolaris.org/os/community/networking
Project Crossbow:	http://www.opensolaris.org/os/project/crossbow

crossbow discuss - Sep 2006 - driver API information

[crossbow-discuss] driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information

[crossbow-discuss] Re: driver API information