Sunay Tripathi
2007-Jul-23 18:24 UTC
[crossbow-discuss] small packet performance, latency and forwarding
Guys, We have a basic implementation in place for this work http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt Basically the idea is to keep the data paths very tight, always do Dynamic polling when possible (independent of the workloads) while utilizing more than 1 CPU for parallelizing the work load. As part of this work, I am trying to see what can be done for small packet forwarding performance and also latency (btw there are two projects coming online soon to specifically target forwarding and latrency). So if people have needs/suggestions/stakes in this area, I would recommend reading the above document and dive in. As for how to test some of these things, you can use ''ttcp'' (http://sd.wareonearth.com/~phil/net/ttcp/) on a back to back setup for starters (10Gb NICs might be better). Use the ''-D'' option with small write (64 bytes) to disable nagle and actually send small packets on the wire. What I typically use server: ./ttcp -s -r -v -u -b 262144 -l 64 -n 500000 client: ./ttcp -D -s -v -u -t <hostname> -b 262144 -l 64 -n 500000 This is with UDP and if you snoop the wire, you will see the small packets going by. Cheers, Sunay -- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Sunay Tripathi
2007-Jul-23 18:26 UTC
[crossbow-discuss] small packet performance, latency and forwarding
[resend] Sunay Tripathi wrote:> Guys, > > We have a basic implementation in place for this work > http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt > > Basically the idea is to keep the data paths very tight, always do > Dynamic polling when possible (independent of the workloads) while > utilizing more than 1 CPU for parallelizing the work load. As part of > this work, I am trying to see what can be done for small packet > forwarding performance and also latency (btw there are two projects > coming online soon to specifically target forwarding and latrency). > > So if people have needs/suggestions/stakes in this area, I would > recommend reading the above document and dive in. As for how to > test some of these things, you can use ''ttcp'' > (http://sd.wareonearth.com/~phil/net/ttcp/) on a back to back > setup for starters (10Gb NICs might be better). Use the ''-D'' option > with small write (64 bytes) to disable nagle and actually send > small packets on the wire. > > What I typically use > server: ./ttcp -s -r -v -u -b 262144 -l 64 -n 500000 > client: ./ttcp -D -s -v -u -t <hostname> -b 262144 -l 64 -n 500000 > > This is with UDP and if you snoop the wire, you will see the small > packets going by. > > Cheers, > Sunay >-- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Garrett D''Amore
2007-Jul-23 22:28 UTC
[crossbow-discuss] small packet performance, latency and forwarding
Am I reading this correctly, in that polling for receive packets is only used if the NIC has more than a single rx ring? Most of the commonly available NICs only have a single receive ring. (Exceptions are nxge, ce, some models of bge, bnx, and certain Intel NICs.) Notably the e1000g devices found on current niagra hardware only have a single receive ring. When the rx ring gets overfull, it would be nice to be able to dynamically switch to polling somehow. -- Garrett Sunay Tripathi wrote:> Guys, > > We have a basic implementation in place for this work > http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt > > Basically the idea is to keep the data paths very tight, always do > Dynamic polling when possible (independent of the workloads) while > utilizing more than 1 CPU for parallelizing the work load. As part of > this work, I am trying to see what can be done for small packet > forwarding performance and also latency (btw there are two projects > coming online soon to specifically target forwarding and latrency). > > So if people have needs/suggestions/stakes in this area, I would > recommend reading the above document and dive in. As for how to > test some of these things, you can use ''ttcp'' > (http://sd.wareonearth.com/~phil/net/ttcp/) on a back to back > setup for starters (10Gb NICs might be better). Use the ''-D'' option > with small write (64 bytes) to disable nagle and actually send > small packets on the wire. > > What I typically use > server: ./ttcp -s -r -v -u -b 262144 -l 64 -n 500000 > client: ./ttcp -D -s -v -u -t <hostname> -b 262144 -l 64 -n 500000 > > This is with UDP and if you snoop the wire, you will see the small > packets going by. > > Cheers, > Sunay > >
Sunay Tripathi
2007-Jul-24 01:39 UTC
[crossbow-discuss] small packet performance, latency and forwarding
Garrett D''Amore wrote:> Am I reading this correctly, in that polling for receive packets is only > used if the NIC has more than a single rx ring?Yes. The issue is that we can''t let one consumer (which can be a service or a VNIC tied to a virtual machine) own the entire NIC. So if the NIC has only one Rx ring, we never put that in poll mode and let the packets always fly till the S/W classifer and soft ring set (SRS) (which mimic Rx rings in pseudo H/W layer). At that point, the SRS can still be put in a polling mode since that is unique to the VNIC or the service. Now having said that, this is probably suboptimal to the small packet forwarding performance. We can still put the entire NIC in the poll mode but what would be a good way to figure this out (don''t necessarily like the idea of adding a tunable). Can we do that by adding a property to the NIC? Is there a more intutive way?> Most of the commonly available NICs only have a single receive ring. > (Exceptions are nxge, ce, some models of bge, bnx, and certain Intel > NICs.) Notably the e1000g devices found on current niagra hardware only > have a single receive ring.Correct.> When the rx ring gets overfull, it would be nice to be able to > dynamically switch to polling somehow.Agreed. If you have any suggestions, let me know. Thanks, Sunay> > -- Garrett > > Sunay Tripathi wrote: >> Guys, >> >> We have a basic implementation in place for this work >> http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt >> >> Basically the idea is to keep the data paths very tight, always do >> Dynamic polling when possible (independent of the workloads) while >> utilizing more than 1 CPU for parallelizing the work load. As part of >> this work, I am trying to see what can be done for small packet >> forwarding performance and also latency (btw there are two projects >> coming online soon to specifically target forwarding and latrency). >> >> So if people have needs/suggestions/stakes in this area, I would >> recommend reading the above document and dive in. As for how to >> test some of these things, you can use ''ttcp'' >> (http://sd.wareonearth.com/~phil/net/ttcp/) on a back to back >> setup for starters (10Gb NICs might be better). Use the ''-D'' option >> with small write (64 bytes) to disable nagle and actually send >> small packets on the wire. >> >> What I typically use >> server: ./ttcp -s -r -v -u -b 262144 -l 64 -n 500000 >> client: ./ttcp -D -s -v -u -t <hostname> -b 262144 -l 64 -n 500000 >> >> This is with UDP and if you snoop the wire, you will see the small >> packets going by. >> >> Cheers, >> Sunay >> >> > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss-- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Garrett D''Amore
2007-Jul-24 01:51 UTC
[crossbow-discuss] small packet performance, latency and forwarding
Sunay Tripathi wrote:> Garrett D''Amore wrote: >> Am I reading this correctly, in that polling for receive packets is >> only used if the NIC has more than a single rx ring? > > Yes. The issue is that we can''t let one consumer (which can be a > service or a VNIC tied to a virtual machine) own the entire NIC. > So if the NIC has only one Rx ring, we never put that in poll > mode and let the packets always fly till the S/W classifer and > soft ring set (SRS) (which mimic Rx rings in pseudo H/W layer). At that > point, the SRS can still be put in a polling mode since that is > unique to the VNIC or the service. > > Now having said that, this is probably suboptimal to the small packet > forwarding performance. We can still put the entire NIC in the > poll mode but what would be a good way to figure this out (don''t > necessarily like the idea of adding a tunable). Can we do that > by adding a property to the NIC? Is there a more intutive way?When inbound packets arrive faster than the upper layers can process them, this would be the point to do that. So when the squeues or soft rings (or whatever the crossbow analog is) fill up, then may as well put the NIC in polling mode. Take it back out of polling mode when you catch back up (i.e. you are able to verify that that ring is no longer full and the poll returns no packet.) If there is no soft ring in the middle, then simply let the NIC tell you that its hardware ring is full... perhaps the NIC driver can *ask* to be put into polling mode in this situation? -- Garrett> >> Most of the commonly available NICs only have a single receive ring. >> (Exceptions are nxge, ce, some models of bge, bnx, and certain Intel >> NICs.) Notably the e1000g devices found on current niagra hardware >> only have a single receive ring. > > Correct. > >> When the rx ring gets overfull, it would be nice to be able to >> dynamically switch to polling somehow. > > Agreed. If you have any suggestions, let me know. > > Thanks, > Sunay > >> >> -- Garrett >> >> Sunay Tripathi wrote: >>> Guys, >>> >>> We have a basic implementation in place for this work >>> http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt >>> >>> Basically the idea is to keep the data paths very tight, always do >>> Dynamic polling when possible (independent of the workloads) while >>> utilizing more than 1 CPU for parallelizing the work load. As part of >>> this work, I am trying to see what can be done for small packet >>> forwarding performance and also latency (btw there are two projects >>> coming online soon to specifically target forwarding and latrency). >>> >>> So if people have needs/suggestions/stakes in this area, I would >>> recommend reading the above document and dive in. As for how to >>> test some of these things, you can use ''ttcp'' >>> (http://sd.wareonearth.com/~phil/net/ttcp/) on a back to back >>> setup for starters (10Gb NICs might be better). Use the ''-D'' option >>> with small write (64 bytes) to disable nagle and actually send >>> small packets on the wire. >>> >>> What I typically use >>> server: ./ttcp -s -r -v -u -b 262144 -l 64 -n 500000 >>> client: ./ttcp -D -s -v -u -t <hostname> -b 262144 -l 64 -n 500000 >>> >>> This is with UDP and if you snoop the wire, you will see the small >>> packets going by. >>> >>> Cheers, >>> Sunay >>> >>> >> >> _______________________________________________ >> crossbow-discuss mailing list >> crossbow-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss > >
Garrett D''Amore
2007-Jul-24 01:53 UTC
[crossbow-discuss] small packet performance, latency and forwarding
Sunay Tripathi wrote:> Garrett D''Amore wrote: >> Am I reading this correctly, in that polling for receive packets is >> only used if the NIC has more than a single rx ring? > > Yes. The issue is that we can''t let one consumer (which can be a > service or a VNIC tied to a virtual machine) own the entire NIC. > So if the NIC has only one Rx ring, we never put that in poll > mode and let the packets always fly till the S/W classifer and > soft ring set (SRS) (which mimic Rx rings in pseudo H/W layer). At that > point, the SRS can still be put in a polling mode since that is > unique to the VNIC or the service. > > Now having said that, this is probably suboptimal to the small packet > forwarding performance. We can still put the entire NIC in the > poll mode but what would be a good way to figure this out (don''t > necessarily like the idea of adding a tunable). Can we do that > by adding a property to the NIC? Is there a more intutive way?When inbound packets arrive faster than the upper layers can process them, this would be the point to do that. So when the squeues or soft rings (or whatever the crossbow analog is) fill up, then may as well put the NIC in polling mode. Take it back out of polling mode when you catch back up (i.e. you are able to verify that that ring is no longer full and the poll returns no packet.) If there is no soft ring in the middle, then simply let the NIC tell you that its hardware ring is full... perhaps the NIC driver can *ask* to be put into polling mode in this situation? -- Garrett> >> Most of the commonly available NICs only have a single receive ring. >> (Exceptions are nxge, ce, some models of bge, bnx, and certain Intel >> NICs.) Notably the e1000g devices found on current niagra hardware >> only have a single receive ring. > > Correct. > >> When the rx ring gets overfull, it would be nice to be able to >> dynamically switch to polling somehow. > > Agreed. If you have any suggestions, let me know. > > Thanks, > Sunay > >> >> -- Garrett >> >> Sunay Tripathi wrote: >>> Guys, >>> >>> We have a basic implementation in place for this work >>> http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt >>> >>> Basically the idea is to keep the data paths very tight, always do >>> Dynamic polling when possible (independent of the workloads) while >>> utilizing more than 1 CPU for parallelizing the work load. As part of >>> this work, I am trying to see what can be done for small packet >>> forwarding performance and also latency (btw there are two projects >>> coming online soon to specifically target forwarding and latrency). >>> >>> So if people have needs/suggestions/stakes in this area, I would >>> recommend reading the above document and dive in. As for how to >>> test some of these things, you can use ''ttcp'' >>> (http://sd.wareonearth.com/~phil/net/ttcp/) on a back to back >>> setup for starters (10Gb NICs might be better). Use the ''-D'' option >>> with small write (64 bytes) to disable nagle and actually send >>> small packets on the wire. >>> >>> What I typically use >>> server: ./ttcp -s -r -v -u -b 262144 -l 64 -n 500000 >>> client: ./ttcp -D -s -v -u -t <hostname> -b 262144 -l 64 -n 500000 >>> >>> This is with UDP and if you snoop the wire, you will see the small >>> packets going by. >>> >>> Cheers, >>> Sunay >>> >>> >> >> _______________________________________________ >> crossbow-discuss mailing list >> crossbow-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss > >_______________________________________________ crossbow-discuss mailing list crossbow-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss
Sunay Tripathi
2007-Jul-24 02:20 UTC
[crossbow-discuss] [vnm-discuss] small packet performance, latency and forwarding
Garrett D''Amore wrote:> Sunay Tripathi wrote: >> Garrett D''Amore wrote: >>> Am I reading this correctly, in that polling for receive packets is >>> only used if the NIC has more than a single rx ring? >> Yes. The issue is that we can''t let one consumer (which can be a >> service or a VNIC tied to a virtual machine) own the entire NIC. >> So if the NIC has only one Rx ring, we never put that in poll >> mode and let the packets always fly till the S/W classifer and >> soft ring set (SRS) (which mimic Rx rings in pseudo H/W layer). At that >> point, the SRS can still be put in a polling mode since that is >> unique to the VNIC or the service. >> >> Now having said that, this is probably suboptimal to the small packet >> forwarding performance. We can still put the entire NIC in the >> poll mode but what would be a good way to figure this out (don''t >> necessarily like the idea of adding a tunable). Can we do that >> by adding a property to the NIC? Is there a more intutive way? > > When inbound packets arrive faster than the upper layers can process > them, this would be the point to do that. So when the squeues or soft > rings (or whatever the crossbow analog is) fill up, then may as well put > the NIC in polling mode. Take it back out of polling mode when you > catch back up (i.e. you are able to verify that that ring is no longer > full and the poll returns no packet.)Its this determination thats hard. The inbound packet goes through the system through multiple paths and it has different processing overheads. The fact that one path can''t keep up doesn''t mean that entire NIC should stop because packets of different types can still be easily dealt by via a different path. In mixed traffic environments, its not easy to determine that we are struggling. Sometimes the user specified policies also play a role in making this hard. Assume the traffic for a Zone/Xen (determined via its mac address) is not getting processed because that particular Zone/Xen has very little CPU assigned to it. In such case, even though we can figure out that system can''t keep up, we can''t put the NIC in the polling mode. The NIC in such as case is a shared resource.> If there is no soft ring in the middle, then simply let the NIC tell > you that its hardware ring is full... perhaps the NIC driver can *ask* > to be put into polling mode in this situation?You assumption is based on that all packets are meant for one consumer which is not necessarily true. Cheers, Sunay> > -- Garrett >>> Most of the commonly available NICs only have a single receive ring. >>> (Exceptions are nxge, ce, some models of bge, bnx, and certain Intel >>> NICs.) Notably the e1000g devices found on current niagra hardware >>> only have a single receive ring. >> Correct. >> >>> When the rx ring gets overfull, it would be nice to be able to >>> dynamically switch to polling somehow. >> Agreed. If you have any suggestions, let me know. >> >> Thanks, >> Sunay >> >>> -- Garrett >>> >>> Sunay Tripathi wrote: >>>> Guys, >>>> >>>> We have a basic implementation in place for this work >>>> http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt >>>> >>>> Basically the idea is to keep the data paths very tight, always do >>>> Dynamic polling when possible (independent of the workloads) while >>>> utilizing more than 1 CPU for parallelizing the work load. As part of >>>> this work, I am trying to see what can be done for small packet >>>> forwarding performance and also latency (btw there are two projects >>>> coming online soon to specifically target forwarding and latrency). >>>> >>>> So if people have needs/suggestions/stakes in this area, I would >>>> recommend reading the above document and dive in. As for how to >>>> test some of these things, you can use ''ttcp'' >>>> (http://sd.wareonearth.com/~phil/net/ttcp/) on a back to back >>>> setup for starters (10Gb NICs might be better). Use the ''-D'' option >>>> with small write (64 bytes) to disable nagle and actually send >>>> small packets on the wire. >>>> >>>> What I typically use >>>> server: ./ttcp -s -r -v -u -b 262144 -l 64 -n 500000 >>>> client: ./ttcp -D -s -v -u -t <hostname> -b 262144 -l 64 -n 500000 >>>> >>>> This is with UDP and if you snoop the wire, you will see the small >>>> packets going by. >>>> >>>> Cheers, >>>> Sunay >>>> >>>> >>> _______________________________________________ >>> crossbow-discuss mailing list >>> crossbow-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >> > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss > _______________________________________________ > vnm-discuss mailing list > vnm-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/vnm-discuss-- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Garrett D''Amore
2007-Jul-24 02:23 UTC
[crossbow-discuss] [vnm-discuss] small packet performance, latency and forwarding
Sunay Tripathi wrote:> Garrett D''Amore wrote: > >> Sunay Tripathi wrote: >> >>> Garrett D''Amore wrote: >>> >>>> Am I reading this correctly, in that polling for receive packets is >>>> only used if the NIC has more than a single rx ring? >>>> >>> Yes. The issue is that we can''t let one consumer (which can be a >>> service or a VNIC tied to a virtual machine) own the entire NIC. >>> So if the NIC has only one Rx ring, we never put that in poll >>> mode and let the packets always fly till the S/W classifer and >>> soft ring set (SRS) (which mimic Rx rings in pseudo H/W layer). At that >>> point, the SRS can still be put in a polling mode since that is >>> unique to the VNIC or the service. >>> >>> Now having said that, this is probably suboptimal to the small packet >>> forwarding performance. We can still put the entire NIC in the >>> poll mode but what would be a good way to figure this out (don''t >>> necessarily like the idea of adding a tunable). Can we do that >>> by adding a property to the NIC? Is there a more intutive way? >>> >> When inbound packets arrive faster than the upper layers can process >> them, this would be the point to do that. So when the squeues or soft >> rings (or whatever the crossbow analog is) fill up, then may as well put >> the NIC in polling mode. Take it back out of polling mode when you >> catch back up (i.e. you are able to verify that that ring is no longer >> full and the poll returns no packet.) >> > > Its this determination thats hard. The inbound packet goes through the > system through multiple paths and it has different processing overheads. > The fact that one path can''t keep up doesn''t mean that entire NIC > should stop because packets of different types can still be easily > dealt by via a different path. In mixed traffic environments, its not > easy to determine that we are struggling. Sometimes the user specified > policies also play a role in making this hard. Assume the traffic > for a Zone/Xen (determined via its mac address) is not getting processed > because that particular Zone/Xen has very little CPU assigned to it. > In such case, even though we can figure out that system can''t keep > up, we can''t put the NIC in the polling mode. The NIC in such as case > is a shared resource. > > >> If there is no soft ring in the middle, then simply let the NIC tell >> you that its hardware ring is full... perhaps the NIC driver can *ask* >> to be put into polling mode in this situation? >> > > You assumption is based on that all packets are meant for one consumer > which is not necessarily true. >Is the problem that there is logical consumer to do the polling? What about in the software classification layer? If you assume that the problem is that the *hardware* ring is full, then it doesn''t matter *who* the traffic is for... the packets are arriving in the hardware ring too fast, and the host cpu can''t keep up with them... even if all it has to do is shuffle them to another rx queue. -- Garrett> Cheers, > Sunay > > > >> -- Garrett >> >>>> Most of the commonly available NICs only have a single receive ring. >>>> (Exceptions are nxge, ce, some models of bge, bnx, and certain Intel >>>> NICs.) Notably the e1000g devices found on current niagra hardware >>>> only have a single receive ring. >>>> >>> Correct. >>> >>> >>>> When the rx ring gets overfull, it would be nice to be able to >>>> dynamically switch to polling somehow. >>>> >>> Agreed. If you have any suggestions, let me know. >>> >>> Thanks, >>> Sunay >>> >>> >>>> -- Garrett >>>> >>>> Sunay Tripathi wrote: >>>> >>>>> Guys, >>>>> >>>>> We have a basic implementation in place for this work >>>>> http://www.opensolaris.org/os/project/crossbow/Design_softringset.txt >>>>> >>>>> Basically the idea is to keep the data paths very tight, always do >>>>> Dynamic polling when possible (independent of the workloads) while >>>>> utilizing more than 1 CPU for parallelizing the work load. As part of >>>>> this work, I am trying to see what can be done for small packet >>>>> forwarding performance and also latency (btw there are two projects >>>>> coming online soon to specifically target forwarding and latrency). >>>>> >>>>> So if people have needs/suggestions/stakes in this area, I would >>>>> recommend reading the above document and dive in. As for how to >>>>> test some of these things, you can use ''ttcp'' >>>>> (http://sd.wareonearth.com/~phil/net/ttcp/) on a back to back >>>>> setup for starters (10Gb NICs might be better). Use the ''-D'' option >>>>> with small write (64 bytes) to disable nagle and actually send >>>>> small packets on the wire. >>>>> >>>>> What I typically use >>>>> server: ./ttcp -s -r -v -u -b 262144 -l 64 -n 500000 >>>>> client: ./ttcp -D -s -v -u -t <hostname> -b 262144 -l 64 -n 500000 >>>>> >>>>> This is with UDP and if you snoop the wire, you will see the small >>>>> packets going by. >>>>> >>>>> Cheers, >>>>> Sunay >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> crossbow-discuss mailing list >>>> crossbow-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >>>> >> _______________________________________________ >> crossbow-discuss mailing list >> crossbow-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss >> _______________________________________________ >> vnm-discuss mailing list >> vnm-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/vnm-discuss >> > > >
Sunay Tripathi
2007-Jul-24 02:31 UTC
[crossbow-discuss] [vnm-discuss] small packet performance, latency and forwarding
Garrett D''Amore wrote:> Sunay Tripathi wrote: >> Garrett D''Amore wrote: >> >>> Sunay Tripathi wrote: >>> >>>> Garrett D''Amore wrote: >>>> >>>>> Am I reading this correctly, in that polling for receive packets is >>>>> only used if the NIC has more than a single rx ring? >>>>> >>>> Yes. The issue is that we can''t let one consumer (which can be a >>>> service or a VNIC tied to a virtual machine) own the entire NIC. >>>> So if the NIC has only one Rx ring, we never put that in poll >>>> mode and let the packets always fly till the S/W classifer and >>>> soft ring set (SRS) (which mimic Rx rings in pseudo H/W layer). At that >>>> point, the SRS can still be put in a polling mode since that is >>>> unique to the VNIC or the service. >>>> >>>> Now having said that, this is probably suboptimal to the small packet >>>> forwarding performance. We can still put the entire NIC in the >>>> poll mode but what would be a good way to figure this out (don''t >>>> necessarily like the idea of adding a tunable). Can we do that >>>> by adding a property to the NIC? Is there a more intutive way? >>>> >>> When inbound packets arrive faster than the upper layers can process >>> them, this would be the point to do that. So when the squeues or soft >>> rings (or whatever the crossbow analog is) fill up, then may as well put >>> the NIC in polling mode. Take it back out of polling mode when you >>> catch back up (i.e. you are able to verify that that ring is no longer >>> full and the poll returns no packet.) >>> >> Its this determination thats hard. The inbound packet goes through the >> system through multiple paths and it has different processing overheads. >> The fact that one path can''t keep up doesn''t mean that entire NIC >> should stop because packets of different types can still be easily >> dealt by via a different path. In mixed traffic environments, its not >> easy to determine that we are struggling. Sometimes the user specified >> policies also play a role in making this hard. Assume the traffic >> for a Zone/Xen (determined via its mac address) is not getting processed >> because that particular Zone/Xen has very little CPU assigned to it. >> In such case, even though we can figure out that system can''t keep >> up, we can''t put the NIC in the polling mode. The NIC in such as case >> is a shared resource. >> >> >>> If there is no soft ring in the middle, then simply let the NIC tell >>> you that its hardware ring is full... perhaps the NIC driver can *ask* >>> to be put into polling mode in this situation? >>> >> You assumption is based on that all packets are meant for one consumer >> which is not necessarily true. >> > > > Is the problem that there is logical consumer to do the polling?Correct. > What about in the software classification layer?> > If you assume that the problem is that the *hardware* ring is full, then > it doesn''t matter *who* the traffic is for... the packets are arriving > in the hardware ring too fast, and the host cpu can''t keep up with > them... even if all it has to do is shuffle them to another rx queue.I haven''t noticed that to the issue. The bottle necks seems to be above that. If what you say is true, then we can do somethings between the S/W classification layers and the NIC. Worth investigating. In the small packet forwarding case, can you do some experiments to verify this? Sunay -- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Garrett D''Amore
2007-Jul-24 04:03 UTC
[crossbow-discuss] [vnm-discuss] small packet performance, latency and forwarding
Sunay Tripathi wrote:> Garrett D''Amore wrote: >> Sunay Tripathi wrote: >>> Garrett D''Amore wrote: >>> >>>> Sunay Tripathi wrote: >>>> >>>>> Garrett D''Amore wrote: >>>>> >>>>>> Am I reading this correctly, in that polling for receive packets >>>>>> is only used if the NIC has more than a single rx ring? >>>>>> >>>>> Yes. The issue is that we can''t let one consumer (which can be a >>>>> service or a VNIC tied to a virtual machine) own the entire NIC. >>>>> So if the NIC has only one Rx ring, we never put that in poll >>>>> mode and let the packets always fly till the S/W classifer and >>>>> soft ring set (SRS) (which mimic Rx rings in pseudo H/W layer). At >>>>> that >>>>> point, the SRS can still be put in a polling mode since that is >>>>> unique to the VNIC or the service. >>>>> >>>>> Now having said that, this is probably suboptimal to the small packet >>>>> forwarding performance. We can still put the entire NIC in the >>>>> poll mode but what would be a good way to figure this out (don''t >>>>> necessarily like the idea of adding a tunable). Can we do that >>>>> by adding a property to the NIC? Is there a more intutive way? >>>>> >>>> When inbound packets arrive faster than the upper layers can process >>>> them, this would be the point to do that. So when the squeues or soft >>>> rings (or whatever the crossbow analog is) fill up, then may as >>>> well put >>>> the NIC in polling mode. Take it back out of polling mode when you >>>> catch back up (i.e. you are able to verify that that ring is no longer >>>> full and the poll returns no packet.) >>>> >>> Its this determination thats hard. The inbound packet goes through the >>> system through multiple paths and it has different processing >>> overheads. >>> The fact that one path can''t keep up doesn''t mean that entire NIC >>> should stop because packets of different types can still be easily >>> dealt by via a different path. In mixed traffic environments, its not >>> easy to determine that we are struggling. Sometimes the user specified >>> policies also play a role in making this hard. Assume the traffic >>> for a Zone/Xen (determined via its mac address) is not getting >>> processed >>> because that particular Zone/Xen has very little CPU assigned to it. >>> In such case, even though we can figure out that system can''t keep >>> up, we can''t put the NIC in the polling mode. The NIC in such as case >>> is a shared resource. >>> >>> >>>> If there is no soft ring in the middle, then simply let the NIC tell >>>> you that its hardware ring is full... perhaps the NIC driver can *ask* >>>> to be put into polling mode in this situation? >>>> >>> You assumption is based on that all packets are meant for one consumer >>> which is not necessarily true. >>> >> >> >> Is the problem that there is logical consumer to do the polling? > > Correct. > > > What about in the software classification layer? >> >> If you assume that the problem is that the *hardware* ring is full, >> then it doesn''t matter *who* the traffic is for... the packets are >> arriving in the hardware ring too fast, and the host cpu can''t keep >> up with them... even if all it has to do is shuffle them to another >> rx queue. > > I haven''t noticed that to the issue. The bottle necks seems to be > above that. If what you say is true, then we can do somethings between > the S/W classification layers and the NIC. Worth investigating. In > the small packet forwarding case, can you do some experiments to > verify this?I''ll see what I can do later. I definitely see with small packets the receive ring fills up quite quickly, but this is with Nevada (though with soft rings enabled), not with Crossbow. -- Garrett> > Sunay >