On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote:> On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > > On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > > > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: > > > > Hi Michael, > > > > > > > > I am playing with vhost multiqueue capability and have a question about > > > > vhost multiqueue and RSS (receive side steering). My setup has Mellanox > > > > ConnectX-3 NIC which supports multiqueue and RSS. Network related > > > > parameters for qemu are: > > > > > > > > -netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 > > > > -device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 > > > > > > > > In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. > > > > > > > > I am running one tcp stream into the guest using iperf. Since there is > > > > only one tcp stream I expect it to be handled by one queue only but > > > > this seams to be not the case. ethtool -S on a host shows that the > > > > stream is handled by one queue in the NIC, just like I would expect, > > > > but in a guest all 4 virtio-input interrupt are incremented. Am I > > > > missing any configuration? > > > > > > I don't see anything obviously wrong with what you describe. > > > Maybe, somehow, same irqfd got bound to multiple MSI vectors? > > It does not look like this is what is happening judging by the way > > interrupts are distributed between queues. They are not distributed > > uniformly and often I see one queue gets most interrupt and others get > > much less and then it changes. > > Weird. It would happen if you transmitted from multiple CPUs. > You did pin iperf to a single CPU within guest, did you not? >No, I didn't because I didn't expect it to matter for input interrupts. When I run iperf on a host rx queue that receives all packets depends only on a connection itself, not on a cpu iperf is running on (I tested that). When I pin iperf in a guest I do indeed see that all interrupts are arriving to the same irq vector. Is a number after virtio-input in /proc/interrupt any indication of a queue a packet arrived to (on a host I can use ethtool -S to check what queue receives packets, but unfortunately this does not work for virtio nic in a guest)? Because if it is the way RSS works in virtio is not how it works on a host and not what I would expect after reading about RSS. The queue a packets arrives to should be calculated by hashing fields from a packet header only. -- Gleb.
On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote:> On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: > > On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > > > On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > > > > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: > > > > > Hi Michael, > > > > > > > > > > I am playing with vhost multiqueue capability and have a question about > > > > > vhost multiqueue and RSS (receive side steering). My setup has Mellanox > > > > > ConnectX-3 NIC which supports multiqueue and RSS. Network related > > > > > parameters for qemu are: > > > > > > > > > > -netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 > > > > > -device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 > > > > > > > > > > In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. > > > > > > > > > > I am running one tcp stream into the guest using iperf. Since there is > > > > > only one tcp stream I expect it to be handled by one queue only but > > > > > this seams to be not the case. ethtool -S on a host shows that the > > > > > stream is handled by one queue in the NIC, just like I would expect, > > > > > but in a guest all 4 virtio-input interrupt are incremented. Am I > > > > > missing any configuration? > > > > > > > > I don't see anything obviously wrong with what you describe. > > > > Maybe, somehow, same irqfd got bound to multiple MSI vectors? > > > It does not look like this is what is happening judging by the way > > > interrupts are distributed between queues. They are not distributed > > > uniformly and often I see one queue gets most interrupt and others get > > > much less and then it changes. > > > > Weird. It would happen if you transmitted from multiple CPUs. > > You did pin iperf to a single CPU within guest, did you not? > > > No, I didn't because I didn't expect it to matter for input interrupts. > When I run iperf on a host rx queue that receives all packets depends > only on a connection itself, not on a cpu iperf is running on (I tested > that).This really depends on the type of networking card you have on the host, and how it's configured. I think you will get something more closely resembling this behaviour if you enable RFS in host.> When I pin iperf in a guest I do indeed see that all interrupts > are arriving to the same irq vector. Is a number after virtio-input > in /proc/interrupt any indication of a queue a packet arrived to (on > a host I can use ethtool -S to check what queue receives packets, but > unfortunately this does not work for virtio nic in a guest)?I think it is.> Because if > it is the way RSS works in virtio is not how it works on a host and not > what I would expect after reading about RSS. The queue a packets arrives > to should be calculated by hashing fields from a packet header only.Yes, what virtio has is not RSS - it's an accelerated RFS really. The point is to try and take application locality into account.> -- > Gleb.
On Mon, Nov 17, 2014 at 01:58:20PM +0200, Michael S. Tsirkin wrote:> On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote: > > On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: > > > On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > > > > On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > > > > > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: > > > > > > Hi Michael, > > > > > > > > > > > > I am playing with vhost multiqueue capability and have a question about > > > > > > vhost multiqueue and RSS (receive side steering). My setup has Mellanox > > > > > > ConnectX-3 NIC which supports multiqueue and RSS. Network related > > > > > > parameters for qemu are: > > > > > > > > > > > > -netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 > > > > > > -device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 > > > > > > > > > > > > In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. > > > > > > > > > > > > I am running one tcp stream into the guest using iperf. Since there is > > > > > > only one tcp stream I expect it to be handled by one queue only but > > > > > > this seams to be not the case. ethtool -S on a host shows that the > > > > > > stream is handled by one queue in the NIC, just like I would expect, > > > > > > but in a guest all 4 virtio-input interrupt are incremented. Am I > > > > > > missing any configuration? > > > > > > > > > > I don't see anything obviously wrong with what you describe. > > > > > Maybe, somehow, same irqfd got bound to multiple MSI vectors? > > > > It does not look like this is what is happening judging by the way > > > > interrupts are distributed between queues. They are not distributed > > > > uniformly and often I see one queue gets most interrupt and others get > > > > much less and then it changes. > > > > > > Weird. It would happen if you transmitted from multiple CPUs. > > > You did pin iperf to a single CPU within guest, did you not? > > > > > No, I didn't because I didn't expect it to matter for input interrupts. > > When I run iperf on a host rx queue that receives all packets depends > > only on a connection itself, not on a cpu iperf is running on (I tested > > that). > > This really depends on the type of networking card you have > on the host, and how it's configured. > > I think you will get something more closely resembling this > behaviour if you enable RFS in host. > > > When I pin iperf in a guest I do indeed see that all interrupts > > are arriving to the same irq vector. Is a number after virtio-input > > in /proc/interrupt any indication of a queue a packet arrived to (on > > a host I can use ethtool -S to check what queue receives packets, but > > unfortunately this does not work for virtio nic in a guest)? > > I think it is. > > > Because if > > it is the way RSS works in virtio is not how it works on a host and not > > what I would expect after reading about RSS. The queue a packets arrives > > to should be calculated by hashing fields from a packet header only. > > Yes, what virtio has is not RSS - it's an accelerated RFS really. >OK, if what virtio has is RFS and not RSS my test results make sense. Thanks! -- Gleb.
On 11/17/2014 07:58 PM, Michael S. Tsirkin wrote:> On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote: >> > On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: >>> > > On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: >>>> > > > On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: >>>>> > > > > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: >>>>>> > > > > > Hi Michael, >>>>>> > > > > > >>>>>> > > > > > I am playing with vhost multiqueue capability and have a question about >>>>>> > > > > > vhost multiqueue and RSS (receive side steering). My setup has Mellanox >>>>>> > > > > > ConnectX-3 NIC which supports multiqueue and RSS. Network related >>>>>> > > > > > parameters for qemu are: >>>>>> > > > > > >>>>>> > > > > > -netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 >>>>>> > > > > > -device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 >>>>>> > > > > > >>>>>> > > > > > In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. >>>>>> > > > > > >>>>>> > > > > > I am running one tcp stream into the guest using iperf. Since there is >>>>>> > > > > > only one tcp stream I expect it to be handled by one queue only but >>>>>> > > > > > this seams to be not the case. ethtool -S on a host shows that the >>>>>> > > > > > stream is handled by one queue in the NIC, just like I would expect, >>>>>> > > > > > but in a guest all 4 virtio-input interrupt are incremented. Am I >>>>>> > > > > > missing any configuration? >>>>> > > > > >>>>> > > > > I don't see anything obviously wrong with what you describe. >>>>> > > > > Maybe, somehow, same irqfd got bound to multiple MSI vectors? >>>> > > > It does not look like this is what is happening judging by the way >>>> > > > interrupts are distributed between queues. They are not distributed >>>> > > > uniformly and often I see one queue gets most interrupt and others get >>>> > > > much less and then it changes. >>> > > >>> > > Weird. It would happen if you transmitted from multiple CPUs. >>> > > You did pin iperf to a single CPU within guest, did you not? >>> > > >> > No, I didn't because I didn't expect it to matter for input interrupts. >> > When I run iperf on a host rx queue that receives all packets depends >> > only on a connection itself, not on a cpu iperf is running on (I tested >> > that). > This really depends on the type of networking card you have > on the host, and how it's configured. > > I think you will get something more closely resembling this > behaviour if you enable RFS in host. > >> > When I pin iperf in a guest I do indeed see that all interrupts >> > are arriving to the same irq vector. Is a number after virtio-input >> > in /proc/interrupt any indication of a queue a packet arrived to (on >> > a host I can use ethtool -S to check what queue receives packets, but >> > unfortunately this does not work for virtio nic in a guest)? > I think it is. > >> > Because if >> > it is the way RSS works in virtio is not how it works on a host and not >> > what I would expect after reading about RSS. The queue a packets arrives >> > to should be calculated by hashing fields from a packet header only. > Yes, what virtio has is not RSS - it's an accelerated RFS really.Strictly speaking, not aRFS. aRFS requires a programmable filter and needs driver to fill the filter on demand. For virtio-net, this is done automatically in host side (tun/tap). There's no guest involvement.> > The point is to try and take application locality into account. >Yes, the locality was done through (consider a N vcpu guest with N queue): - virtio-net driver will provide a default 1:1 mapping between vcpu and txq through XPS - virtio-net driver will suggest a default irq affinity hint also for a 1:1 mapping bettwen vcpu and txq/rxq With all these, each vcpu get its private txq/rxq paris. And host side implementation (tun/tap) will make sure if the packets of a flow were received from queue N, if will also use queue N to transmit the packets of this flow to guest.
On 11/18/2014 09:37 AM, Zhang Haoyu wrote:>> On Mon, Nov 17, 2014 at 01:58:20PM +0200, Michael S. Tsirkin wrote: >>> On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote: >>>> On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: >>>>> On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: >>>>>> On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: >>>>>>> On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: >>>>>>>> Hi Michael, >>>>>>>> >>>>>>>> I am playing with vhost multiqueue capability and have a question about >>>>>>>> vhost multiqueue and RSS (receive side steering). My setup has Mellanox >>>>>>>> ConnectX-3 NIC which supports multiqueue and RSS. Network related >>>>>>>> parameters for qemu are: >>>>>>>> >>>>>>>> -netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 >>>>>>>> -device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 >>>>>>>> >>>>>>>> In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. >>>>>>>> >>>>>>>> I am running one tcp stream into the guest using iperf. Since there is >>>>>>>> only one tcp stream I expect it to be handled by one queue only but >>>>>>>> this seams to be not the case. ethtool -S on a host shows that the >>>>>>>> stream is handled by one queue in the NIC, just like I would expect, >>>>>>>> but in a guest all 4 virtio-input interrupt are incremented. Am I >>>>>>>> missing any configuration? >>>>>>> I don't see anything obviously wrong with what you describe. >>>>>>> Maybe, somehow, same irqfd got bound to multiple MSI vectors? >>>>>> It does not look like this is what is happening judging by the way >>>>>> interrupts are distributed between queues. They are not distributed >>>>>> uniformly and often I see one queue gets most interrupt and others get >>>>>> much less and then it changes. >>>>> Weird. It would happen if you transmitted from multiple CPUs. >>>>> You did pin iperf to a single CPU within guest, did you not? >>>>> >>>> No, I didn't because I didn't expect it to matter for input interrupts. >>>> When I run iperf on a host rx queue that receives all packets depends >>>> only on a connection itself, not on a cpu iperf is running on (I tested >>>> that). >>> This really depends on the type of networking card you have >>> on the host, and how it's configured. >>> >>> I think you will get something more closely resembling this >>> behaviour if you enable RFS in host. >>> >>>> When I pin iperf in a guest I do indeed see that all interrupts >>>> are arriving to the same irq vector. Is a number after virtio-input >>>> in /proc/interrupt any indication of a queue a packet arrived to (on >>>> a host I can use ethtool -S to check what queue receives packets, but >>>> unfortunately this does not work for virtio nic in a guest)? >>> I think it is. >>> >>>> Because if >>>> it is the way RSS works in virtio is not how it works on a host and not >>>> what I would expect after reading about RSS. The queue a packets arrives >>>> to should be calculated by hashing fields from a packet header only. >>> Yes, what virtio has is not RSS - it's an accelerated RFS really. >>> >> OK, if what virtio has is RFS and not RSS my test results make sense. >> Thanks! > I think the RSS emulation for virtio-mq NIC is implemented in tun_select_queue(), > am I missing something? > > Thanks, > Zhang Haoyu >Yes, if RSS is the short for Receive Side Steering which is a generic technology. But RSS is usually short for Receive Side Scaling which was commonly technology used by Windows, it was implemented through a indirection table in the card which is obviously not supported in tun currently.
On Tue, Nov 18, 2014 at 11:41:11AM +0800, Jason Wang wrote:> On 11/18/2014 09:37 AM, Zhang Haoyu wrote: > >> On Mon, Nov 17, 2014 at 01:58:20PM +0200, Michael S. Tsirkin wrote: > >>> On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote: > >>>> On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: > >>>>> On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > >>>>>> On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > >>>>>>> On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: > >>>>>>>> Hi Michael, > >>>>>>>> > >>>>>>>> I am playing with vhost multiqueue capability and have a question about > >>>>>>>> vhost multiqueue and RSS (receive side steering). My setup has Mellanox > >>>>>>>> ConnectX-3 NIC which supports multiqueue and RSS. Network related > >>>>>>>> parameters for qemu are: > >>>>>>>> > >>>>>>>> -netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 > >>>>>>>> -device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 > >>>>>>>> > >>>>>>>> In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. > >>>>>>>> > >>>>>>>> I am running one tcp stream into the guest using iperf. Since there is > >>>>>>>> only one tcp stream I expect it to be handled by one queue only but > >>>>>>>> this seams to be not the case. ethtool -S on a host shows that the > >>>>>>>> stream is handled by one queue in the NIC, just like I would expect, > >>>>>>>> but in a guest all 4 virtio-input interrupt are incremented. Am I > >>>>>>>> missing any configuration? > >>>>>>> I don't see anything obviously wrong with what you describe. > >>>>>>> Maybe, somehow, same irqfd got bound to multiple MSI vectors? > >>>>>> It does not look like this is what is happening judging by the way > >>>>>> interrupts are distributed between queues. They are not distributed > >>>>>> uniformly and often I see one queue gets most interrupt and others get > >>>>>> much less and then it changes. > >>>>> Weird. It would happen if you transmitted from multiple CPUs. > >>>>> You did pin iperf to a single CPU within guest, did you not? > >>>>> > >>>> No, I didn't because I didn't expect it to matter for input interrupts. > >>>> When I run iperf on a host rx queue that receives all packets depends > >>>> only on a connection itself, not on a cpu iperf is running on (I tested > >>>> that). > >>> This really depends on the type of networking card you have > >>> on the host, and how it's configured. > >>> > >>> I think you will get something more closely resembling this > >>> behaviour if you enable RFS in host. > >>> > >>>> When I pin iperf in a guest I do indeed see that all interrupts > >>>> are arriving to the same irq vector. Is a number after virtio-input > >>>> in /proc/interrupt any indication of a queue a packet arrived to (on > >>>> a host I can use ethtool -S to check what queue receives packets, but > >>>> unfortunately this does not work for virtio nic in a guest)? > >>> I think it is. > >>> > >>>> Because if > >>>> it is the way RSS works in virtio is not how it works on a host and not > >>>> what I would expect after reading about RSS. The queue a packets arrives > >>>> to should be calculated by hashing fields from a packet header only. > >>> Yes, what virtio has is not RSS - it's an accelerated RFS really. > >>> > >> OK, if what virtio has is RFS and not RSS my test results make sense. > >> Thanks! > > I think the RSS emulation for virtio-mq NIC is implemented in tun_select_queue(), > > am I missing something? > > > > Thanks, > > Zhang Haoyu > > > > Yes, if RSS is the short for Receive Side Steering which is a generic > technology. But RSS is usually short for Receive Side Scaling which was > commonly technology used by Windows, it was implemented through a > indirection table in the card which is obviously not supported in tun > currently.Hmm, I had an impression that "Receive Side Steering" and "Receive Side Scaling" are interchangeable. Software implementation for RSS is called "Receive Packet Steering" according to Documentation/networking/scaling.txt not "Receive Packet Scaling". Those damn TLAs are confusing. -- Gleb.