I just thought I''d give my GLDv3 driver a simple netperf test now that crossbow has integrated and I find that, whereas I could achieve 9.3Gbps before I can now only get ~3Gbps with the same driver code (barring necessary interface changes for crossbow) and the same hardware. The main crux of the problem, I think, is the sheer quantity of processing going on per received packet. This theory is supported by the fact that, when I turn on LRO, I get get 8.6Gbps for the same test with the same CPU bindings. Is there any way to turn off crossbow''s huge bump-in-the-stack since I have no vnics and therefore am not remotely interesting in flow classification or resource control? Paul
Paul, Its due to the fact that so far the focus has been virtualization and the 1 gigE NICs without multiple Rx rings etc. So even for 10gigE NICs with have multiple Rx/Tx rings, we do try to scale them using S/W fanout which might be the issue you are facing. We are working on the problem right now. In the meantime, disable some of the S/W scaling and you should get some of your performance back. Do this in /etc/system set mac_soft_ring_enable=0 set mac:mac_rx_soft_ring_count=0 set mac:mac_rx_soft_ring_10gig_count=0 You will have to reboot the machine after setting this. BTW, this is only a part workaround and in no way a supported feature. The better solution is being worked on. Cheers, Sunay Paul Durrant wrote:> I just thought I''d give my GLDv3 driver a simple netperf test now that > crossbow has integrated and I find that, whereas I could achieve 9.3Gbps > before I can now only get ~3Gbps with the same driver code (barring > necessary interface changes for crossbow) and the same hardware. > The main crux of the problem, I think, is the sheer quantity of > processing going on per received packet. This theory is supported by the > fact that, when I turn on LRO, I get get 8.6Gbps for the same test with > the same CPU bindings. > Is there any way to turn off crossbow''s huge bump-in-the-stack since I > have no vnics and therefore am not remotely interesting in flow > classification or resource control? > > Paul > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss-- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Paul, I forgot to make one thing clear though - for your kind of NIC (which has multiple Rx/Tx rings) a partial port to Crossbow is going to be very harmful. Keep in mind, that unless crossbow sees all the Rx/Tx rings, it will not use them. Also, dynamic interrupt blanking has been removed (don''t need heuristic based approach) and replaced by dynamic polling. So unless you expose interfaces for dynamic polling, you will see huge differences. So do enable dynamic polling and then do the things I mentioned below to give us some data. Thanks, Sunay Sunay Tripathi wrote:> Paul, > > Its due to the fact that so far the focus has been virtualization and > the 1 gigE NICs without multiple Rx rings etc. So even for 10gigE > NICs with have multiple Rx/Tx rings, we do try to scale them using > S/W fanout which might be the issue you are facing. We are working on > the problem right now. In the meantime, disable some of the S/W > scaling and you should get some of your performance back. Do this > in /etc/system > > set mac_soft_ring_enable=0 > set mac:mac_rx_soft_ring_count=0 > set mac:mac_rx_soft_ring_10gig_count=0 > > You will have to reboot the machine after setting this. BTW, this is > only a part workaround and in no way a supported feature. The better > solution is being worked on. > > Cheers, > Sunay > > Paul Durrant wrote: >> I just thought I''d give my GLDv3 driver a simple netperf test now that >> crossbow has integrated and I find that, whereas I could achieve 9.3Gbps >> before I can now only get ~3Gbps with the same driver code (barring >> necessary interface changes for crossbow) and the same hardware. >> The main crux of the problem, I think, is the sheer quantity of >> processing going on per received packet. This theory is supported by the >> fact that, when I turn on LRO, I get get 8.6Gbps for the same test with >> the same CPU bindings. >> Is there any way to turn off crossbow''s huge bump-in-the-stack since I >> have no vnics and therefore am not remotely interesting in flow >> classification or resource control? >> >> Paul >> _______________________________________________ >> crossbow-discuss mailing list >> crossbow-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss > >-- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Sunay Tripathi wrote:> > I forgot to make one thing clear though - for your kind of NIC > (which has multiple Rx/Tx rings) a partial port to Crossbow > is going to be very harmful. Keep in mind, that unless crossbow > sees all the Rx/Tx rings, it will not use them. Also, dynamic > interrupt blanking has been removed (don''t need heuristic based > approach) and replaced by dynamic polling. So unless you expose > interfaces for dynamic polling, you will see huge differences. > > So do enable dynamic polling and then do the things I mentioned > below to give us some data. >Sunay, Thanks for the info. From looking at the new API I don''t believe I can expose my multiple RX/TX rings to crossbow because my traffic steering algorithm is fixed (it''s an LFSR hash based on TCP/IP headers) and the current h/w does not support multiple MAC addresses (although I could do this in s/w of course). Is there a way I can take advantage of the polling API without needing to claim levels of virtualization that the h/w does not support? Paul PS: In general I don''t think it''s a good idea to assume that h/w that can traffic steer based on TCP/IP headers can also steer based on MAC address/VLAN tag.
Paul, You can simply expose one RX hardware ring group with multiple RX rings inside that group. Then you do the steering in hardware (RSS) to the multiple RX rings as you did before. This is how our model maps to your type of NIC. If someone then creates multiple VNICs on your NIC, we''ll do L2 classification to these VNICs in mac in software, you don''t have to do that yourself in software. To recap are roughly three combinations of L2 hardware classification and RSS we support: 1 - Some NICs can support multiple hardware groups with single rings per group, where they do L2 hardware classification between the groups. 2 - Some NICs can support only one hardware group with multiple rings and RSS between these rings, but no L2 hardware classification, which is the case of your hardware. 3 - Some other NICs can do both at the same time, multiple hardware groups with L2 hardware classification between the groups, then RSS between multiple RX rings in each group. Some NICs can do 1 or 2 but not a combination of both at the same time. What might be confusing is the name of the flag "MAC_VIRT_LEVEL1" which you need to raise in your driver in this case, although you support only one hardware group but no layer 2 hardware classification assist for virtualization. Nicolas. Paul Durrant wrote:> Sunay Tripathi wrote: >> I forgot to make one thing clear though - for your kind of NIC >> (which has multiple Rx/Tx rings) a partial port to Crossbow >> is going to be very harmful. Keep in mind, that unless crossbow >> sees all the Rx/Tx rings, it will not use them. Also, dynamic >> interrupt blanking has been removed (don''t need heuristic based >> approach) and replaced by dynamic polling. So unless you expose >> interfaces for dynamic polling, you will see huge differences. >> >> So do enable dynamic polling and then do the things I mentioned >> below to give us some data. >> > > Sunay, > > Thanks for the info. From looking at the new API I don''t believe I > can expose my multiple RX/TX rings to crossbow because my traffic > steering algorithm is fixed (it''s an LFSR hash based on TCP/IP headers) > and the current h/w does not support multiple MAC addresses (although I > could do this in s/w of course). Is there a way I can take advantage of > the polling API without needing to claim levels of virtualization that > the h/w does not support? > > Paul > > PS: In general I don''t think it''s a good idea to assume that h/w that > can traffic steer based on TCP/IP headers can also steer based on MAC > address/VLAN tag. > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss
Nicolas Droux wrote: [snip] Nicolas, Thanks for the detailed explanation. I''ll update my driver and try again.> What might be confusing is the name of the flag "MAC_VIRT_LEVEL1" which > you need to raise in your driver in this case, although you support only > one hardware group but no layer 2 hardware classification assist for > virtualization.That was what was throwing me; it looked like I could not claim MAC_VIRT_LEVEL_1 without being able to steer traffic to rings based on MAC address. Paul
Nicolas Droux wrote:> > You can simply expose one RX hardware ring group with multiple RX rings > inside that group. Then you do the steering in hardware (RSS) to the > multiple RX rings as you did before. This is how our model maps to your > type of NIC. >Nicolas, I''m trying to code this up now and I''m confused as to how I set up my single group. I''ve opted for MAC_GROUP_TYPE_STATIC (which I think is correct) but in my mr_gget() method I apparently need to set up mgi_addmac() and mgi_remmac() entry points (looking at the code, it doesn''t pssible to leave these NULL); how do I implement these given that my h/w only supports a single MAC address and thus I do not steer traffic based on MAC address? Do I implement them and just fail any call to mgi_addmac()? Paul
Paul, On Jan 19, 2009, at 5:43 AM, Paul Durrant wrote:> Nicolas Droux wrote: >> >> You can simply expose one RX hardware ring group with multiple RX >> rings >> inside that group. Then you do the steering in hardware (RSS) to the >> multiple RX rings as you did before. This is how our model maps to >> your >> type of NIC. >> > > Nicolas, > > I''m trying to code this up now and I''m confused as to how I set up > my > single group. I''ve opted for MAC_GROUP_TYPE_STATIC (which I think is > correct) but in my mr_gget() method I apparently need to set up > mgi_addmac() and mgi_remmac() entry points (looking at the code, it > doesn''t pssible to leave these NULL); how do I implement these given > that my h/w only supports a single MAC address and thus I do not steer > traffic based on MAC address? Do I implement them and just fail any > call > to mgi_addmac()?For drivers that support ring groups, all unicast MAC addresses, including the primary MAC address, are always programmed through the addmac/remmac entry points of the rings capability. Your single address will be programmed through these entry points on your ring group instead of the mc_unicst entry point. Nicolas.> > > Paul > > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss-- Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc. droux at sun.com - http://blogs.sun.com/droux
Nicolas Droux wrote:>> >> I''m trying to code this up now and I''m confused as to how I set up my >> single group. I''ve opted for MAC_GROUP_TYPE_STATIC (which I think is >> correct) but in my mr_gget() method I apparently need to set up >> mgi_addmac() and mgi_remmac() entry points (looking at the code, it >> doesn''t pssible to leave these NULL); how do I implement these given >> that my h/w only supports a single MAC address and thus I do not steer >> traffic based on MAC address? Do I implement them and just fail any call >> to mgi_addmac()? > > For drivers that support ring groups, all unicast MAC addresses, > including the primary MAC address, are always programmed through the > addmac/remmac entry points of the rings capability. Your single address > will be programmed through these entry points on your ring group instead > of the mc_unicst entry point. >Thanks. I just found the bit of code that fails mac_register() if mc_unicst is set... that was the cause of my mac_register() panic. I seem to be up and running now; just need to implement the addmac/remmac entry points properly and then I can do some more performance runs. Cheers, Paul