Andrew Gallatin
2008-May-07 15:49 UTC
[crossbow-discuss] questions from a 10GbE driver author
Hi, I maintain a driver for a 10GbE nic which supports multiple hardware tx/rx rings. We can steer rx packets into rings using the "standard" NDIS6 Toeplitz hashing on TCP port numbers, IP addresses, etc. We can also steer packets based on MAC address. Would this NIC be considered to be capable of supporting crossbow? Also, can crossbow do things like steer outgoing packets to the correct ring (for serialization to prevent out-of-order packets) based on the same sort of hashing that our NIC is doing in hardware? Eg, if we''re hashing on TCP src+dst port via a toeplitz hash, will crossbow''s tx ring selection mechanism choose the same outgoing ring for the connection as our driver chose for the incoming traffic? How hard is it to convert a 10GbE driver from plain GLDv3 to crossbow? Is there a guide or howto somewhere? When is crossbow scheduled for integration into Nevada? Thanks, Drew This message posted from opensolaris.org
Paul Durrant
2008-May-07 16:02 UTC
[crossbow-discuss] questions from a 10GbE driver author
Andrew Gallatin wrote:> > I maintain a driver for a 10GbE nic which supports multiple hardware tx/rx rings. We can steer rx packets into rings using the "standard" NDIS6 Toeplitz hashing on TCP port numbers, IP addresses, etc. We can also steer packets based on MAC address. Would this NIC be considered to be capable of supporting crossbow? >I''ve been wondering the same thing. We too can steer based on toeplitz hash, a basic LFSR hash and other factors. Alas Solaris seems to use a completely different hash internally. Also, on transmitted TCP packets there is no connection information accessible from a MAC driver (since we don''t get the STREAMS queue passed down) so, for affinity purposes, any hash has to be recalculated in s/w. It would be highly useful if the stack would store a driver connection hash passed to it on the receive side and pass it down as metadata to the transmit side (even if the hash was not used internally). Also, I''ve mentioned many times that Windows standardized on Toeplitz some time ago (it''s even in 2k3) but there seems little interest in using this hash in Solaris (which seems odd since most NICs are likely to support it going forwards). Paul
Sunay Tripathi
2008-May-07 19:12 UTC
[crossbow-discuss] questions from a 10GbE driver author
Hi Andrew,> I maintain a driver for a 10GbE nic which supports multiple hardware tx/rx rings. We can steer rx packets > into rings using the "standard" NDIS6 Toeplitz hashing on TCP port numbers, IP addresses, etc. We can also > steer packets based on MAC address. Would this NIC be considered to be capable of supporting crossbow?If you have a GLDv3 NIC, it would with Crossbow as level 0 VIRT support. But in you case, I think it is capable advanced virtualization and Crossbow can make use of it. If you have multiple Rx/Tx rings where we can steer mac addresses to a Rx ring and the Rx ring can have its own MSI-X interrupt (when in interrupt mode), then the NIC is Level 1 VIRT capable. If you deal with vlan tags, IP addresses, protocols and ports, then it becomes level 2 VIRT capable. So it seems like you are definitely level 1 capable which is pretty good.> > Also, can crossbow do things like steer outgoing packets to the correct ring (for serialization to prevent > out-of-order packets) based on the same sort of hashing that our NIC is doing in hardware? Eg, if we''re > hashing on TCP src+dst port via a toeplitz hash, will crossbow''s tx ring selection mechanism choose the > same outgoing ring for the connection as our driver chose for the incoming traffic?So we do that today. i.e. we pair up Rx and Tx rings together to keep the flows streamlined. But we treat all Tx rings as equal and we pick them sequentially based on how Rx rings get assigned. It seems to me that we will need to introduce a new function for MAC/driver API to help us pick the correct Tx ring for a given Rx rings. Or if Teoplitz hash function in not encumbered by any license, we can implement it in MAC layer and use it to pick a Tx ring for a given Rx ring. Does that makes sense?> > How hard is it to convert a 10GbE driver from plain GLDv3 to crossbow? Is there a guide or howto somewhere?There is a MAC/driver provider document on crossbow DOCs page. But I think Kais/Roamer were writing a specific document that describes the level with capabilities and something along the lines of howto. If you already have a GLDv3 driver, changing it to Crossbow is fairly simple (few days work) for level 1 virtualization support. One thing to note is that the crossbow MAC/driver API is something informal that might become official at some point. Just like GLDv3 interfaces are still informal but we are looking to make them officially available as a Solaris API now.> When is crossbow scheduled for integration into Nevada?Sometime later this year. Fall seems like a target right now. Cheers, Sunay> > Thanks, > > Drew > > > This message posted from opensolaris.org > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss-- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Sunay Tripathi
2008-May-17 00:05 UTC
[crossbow-discuss] questions from a 10GbE driver author
Andrew, Sorry I had dropped this thread ... More below Andrew Gallatin wrote:> Sunay Tripathi wrote: > >>> >>> Also, can crossbow do things like steer outgoing packets to the >>> correct ring (for serialization to prevent out-of-order packets) >>> based on the same sort of hashing that our NIC is doing in hardware? >>> Eg, if we''re hashing on TCP src+dst port via a toeplitz hash, will >>> crossbow''s tx ring selection mechanism choose the same outgoing ring >>> for the connection as our driver chose for the incoming traffic? >> >> So we do that today. i.e. we pair up Rx and Tx rings together to keep >> the flows streamlined. But we treat all Tx rings as equal and we pick >> them sequentially based on how Rx rings get assigned. It seems to me >> that we will need to introduce a new function for MAC/driver API >> to help us pick the correct Tx ring for a given Rx rings. Or if >> Teoplitz hash function in not encumbered by any license, we can >> implement it in MAC layer and use it to pick a Tx ring for a given >> Rx ring. Does that makes sense? > > "Sort of". When you say "treat all Tx rings as equal and we pick > them sequentially based on how Rx rings get assigned", do you > mean that all outgoing packets for a particular connection > will be sent on a particular ring, just not necessarily > the using the same ring pair as the incoming traffic is using?Thats correct.> > For example, say we have one active connection and 8 rx/tx > rings. Assume our NIC hashed the connection to rx ring 3. > Will the outgoing traffic for the connection be sent: > > 1) randomly to any ring from 0..7. Packet 0 may go out > ring 7, packet 1 is sent from ring 2, etc > > 2) always to the same "randomly" chosen ring. Eg, all > packets are sent from ring 2. > > 3) always from the same ring based on rx hash value. Eg, > all packets are sent from ring 3.Well, its a bit more complicated. See the MAC/Driver API on the docs page. For each link (which has a mac addresses), we assign it a Rx ring and a Tx ring. The Rx ring is choosen by telling the NIC that packet for MAC addresses A go to rx ring 3 (for example). And then we pick one of the available Tx rings to pair with Rx ring 3 (say Tx ring 2). At that point all packets that have mac address A in src or dst will use Rx ring 3 and Tx ring 2. Now we can be more intelligent in picking the Tx ring to pair with a Rx ring. So we in some sort do your option 2 above and we can do option 3 if we know the hash you use.> > Option 3 seems to provide the best performance on other > OSes. And as Paul Durrant mentioned earlier, it can be > achieved without doing any software hashing. All you need > to do is associate the rx hash value with the connection, > and pass it as metadata (like we do MSS for LSO) to the driver''s > transmit routine.Yes, we will need to know the hash function and make some code changes to do that.> >>> >>> How hard is it to convert a 10GbE driver from plain GLDv3 to >>> crossbow? Is there a guide or howto somewhere? >> >> There is a MAC/driver provider document on crossbow DOCs page. But I >> think Kais/Roamer were writing a specific document that describes >> the level with capabilities and something along the lines of howto. >> If you already have a GLDv3 driver, changing it to Crossbow is fairly >> simple (few days work) for level 1 virtualization support. > > Excellent, thank you. I will look at this.Cheers, Sunay> > Drew-- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Sunay Tripathi
2008-May-17 00:06 UTC
[crossbow-discuss] questions from a 10GbE driver author
Paul, I just replied to Andrew''s question which was similar. If you guys can read the MAC/Driver API doc and propose the changes needed, we would try and make it happen. Cheers, Sunay Paul Durrant wrote:> Andrew Gallatin wrote: >> I maintain a driver for a 10GbE nic which supports multiple hardware tx/rx rings. We can steer rx packets into rings using the "standard" NDIS6 Toeplitz hashing on TCP port numbers, IP addresses, etc. We can also steer packets based on MAC address. Would this NIC be considered to be capable of supporting crossbow? >> > > I''ve been wondering the same thing. We too can steer based on toeplitz > hash, a basic LFSR hash and other factors. Alas Solaris seems to use a > completely different hash internally. Also, on transmitted TCP packets > there is no connection information accessible from a MAC driver (since > we don''t get the STREAMS queue passed down) so, for affinity purposes, > any hash has to be recalculated in s/w. It would be highly useful if the > stack would store a driver connection hash passed to it on the receive > side and pass it down as metadata to the transmit side (even if the hash > was not used internally). > Also, I''ve mentioned many times that Windows standardized on Toeplitz > some time ago (it''s even in 2k3) but there seems little interest in > using this hash in Solaris (which seems odd since most NICs are likely > to support it going forwards). > > Paul > _______________________________________________ > crossbow-discuss mailing list > crossbow-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss-- Sunay Tripathi Distinguished Engineer Solaris Core Operating System Sun MicroSystems Inc. Solaris Networking: http://www.opensolaris.org/os/community/networking Project Crossbow: http://www.opensolaris.org/os/project/crossbow
Daniel Liebster
2008-Dec-30 16:39 UTC
[crossbow-discuss] questions from a 10GbE driver author
Could you please provide the tunables for this driver (e.g. offload engines, etc) available in OpenSolaris? We had rather poor 10Ge performance in OpenSolaris till I got lucky with TCP tunables from Solaris 10. Thanks Dan -- This message posted from opensolaris.org
Hi Dan, What''s the driver you''re using? Thanks, Samuel Daniel Liebster wrote:> Could you please provide the tunables for this driver (e.g. offload engines, etc) available in OpenSolaris? We had rather poor 10Ge performance in OpenSolaris till I got lucky with TCP tunables from Solaris 10. > > Thanks > Dan
Daniel Liebster
2008-Dec-31 15:17 UTC
[crossbow-discuss] questions from a 10GbE driver author
We''re using the ixgb driver.The servers are x4500 with Sun PCI-X 10Gb cards.. # modinfo |grep ixgb 160 fffffffff7e8a000 c6d8 221 1 ixgb (Intel 10Gb Ethernet) We were getting a peak throughput of about 1.5 Gb/s per stream (up to about 5Gb/s if we open multiple streams) I found these ndd settings via google, and they got us up to about 6 Gb/s.. ndd -set /dev/tcp tcp_xmit_hiwat 400000 ndd -set /dev/tcp tcp_xmit_lowat 32768 ndd -set /dev/tcp tcp_recv_hiwat 400000 ndd -set /dev/tcp tcp_deferred_acks_max 16 ndd -set /dev/tcp tcp_local_dacks_max 16 ndd -set /dev/tcp tcp_max_buf 2097152 ndd -set /dev/tcp tcp_cwnd_max 2097152 Thanks for taking the time to reply! Dan -- This message posted from opensolaris.org