Xu, Dongxiao
2009-Nov-27 02:26 UTC
[Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
Current netback uses one pair of tasklets for Tx/Rx data transaction. Netback tasklet could only run at one CPU at a time, and it is used to serve all the netfronts. Therefore it has become a performance bottle neck. This patch is to use multiple tasklet pairs to replace the current single pair in dom0. Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets serve specific group of netfronts. Also for those global and static variables, we duplicated them for each group in order to avoid the spinlock. Test senario: We use ten 1G NIC interface to talk with 10 VMs (netfront) in server. So the total bandwidth is 10G. For host machine, bind each guest''s netfront with each NIC interface. For client machine, do netperf testing with each guest. Test Case Packet Size Throughput(Mbps) Dom0 CPU Util Guests CPU Util w/o patch 1400 4304.30 400.33% 112.21% w/ patch 1400 9533.13 461.64% 243.81% BTW, when we test this patch, we found that the domain_lock in grant table operation becomes a bottle neck. We temporarily remove the global domain_lock to achieve good performance. Best Regards, -- Dongxiao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Nov-27 09:42 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
Hi, Does this change have any impact on the responsiveness of domain 0 userspace while the host is under heavy network load? We have found that the netback tasklets can completely dominate dom0''s VCPU to the point where no userspace process ever gets the chance to run, since this includes sshd and the management toolstack that can be quite annoying. The issue was probably specific to using a single-VCPU domain 0 in XenServer but since you patch introduces a tasklet per VCPU it could possibly happen to multi-VCPU domain 0. For XenServer we converted the tasklets into a kernel thread, at the cost of a small reduction in overall throughput but yielding a massive improvement in domain 0 responsiveness. Unfortunately the change was made by someone who has since left Citrix and I cannot locate the numbers he left behind :-( Our patch is attached. A netback thread per domain 0 VCPU might be interesting to experiment with? Ian. On Fri, 2009-11-27 at 02:26 +0000, Xu, Dongxiao wrote:> Current netback uses one pair of tasklets for Tx/Rx data transaction. > Netback tasklet could only run at one CPU at a time, and it is used to > serve all the netfronts. Therefore it has become a performance bottle > neck. This patch is to use multiple tasklet pairs to replace the > current single pair in dom0. > Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of tasklets > pair (CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets serve > specific group of netfronts. Also for those global and static > variables, we duplicated them for each group in order to avoid the > spinlock. > > Test senario: > We use ten 1G NIC interface to talk with 10 VMs (netfront) in server. > So the total bandwidth is 10G. > For host machine, bind each guest''s netfront with each NIC interface. > For client machine, do netperf testing with each guest. > > Test Case Packet Size Throughput(Mbps) Dom0 CPU Util Guests CPU Util > w/o patch 1400 4304.30 400.33% 112.21% > w/ patch 1400 9533.13 461.64% 243.81% > > BTW, when we test this patch, we found that the domain_lock in grant > table operation becomes a bottle neck. We temporarily remove the > global domain_lock to achieve good performance. > > Best Regards, > -- Dongxiao_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2009-Nov-27 16:08 UTC
RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
Ian, Thanks for your comments. Some explainations below. Best Regards, -- Dongxiao Ian Campbell wrote:> Hi, > > Does this change have any impact on the responsiveness of domain 0 > userspace while the host is under heavy network load? We have found > that the netback tasklets can completely dominate dom0''s VCPU to the > point where no userspace process ever gets the chance to run, since > this includes sshd and the management toolstack that can be quite > annoying. > > The issue was probably specific to using a single-VCPU domain 0 in > XenServer but since you patch introduces a tasklet per VCPU it could > possibly happen to multi-VCPU domain 0.The former case you found is because all the netfronts are processed by one dom0''s tasklet, therefore the only vcpu which handles the tasklet will become super busy and have no time to handle other userspace issues. My patch separates the netback''s workload to different tasklets, and if the tasklets could bind with different vcpus in dom0 (by irqbalance or manually pin interrupt), the total CPU utilization will be delivered to each vcpu in average, which could make dom0 more scalable. Take our test case as an example, the system is in heavy network load, whose throughput is close to its network bandwidth (9.55G/10G), but it only uses ~460% dom0''s CPU (Dom0 totally has 10 vcpus to handle the network, so each vcpu cost 46% for the network workload). Therefore for 1G NIC, there will be no problem. Also for the current 10G NIC, most of them have multi-queue technology, interrupts will be deliver to different cpus, so dom0''s each vcpu is only needed to handle part of the workload, I believe there will be no problem too.> > For XenServer we converted the tasklets into a kernel thread, at the > cost of a small reduction in overall throughput but yielding a massive > improvement in domain 0 responsiveness. Unfortunately the change was > made by someone who has since left Citrix and I cannot locate the > numbers he left behind :-( > > Our patch is attached. A netback thread per domain 0 VCPU might be > interesting to experiment with?Adding the kernel thread mechanism to netback is a good way to improve dom0''s responsiveness in UP case. However for multiple vcpu dom0, I think it may be not needed. Anyway it is another story from my multiple tasklet approach.> > Ian. > > On Fri, 2009-11-27 at 02:26 +0000, Xu, Dongxiao wrote: >> Current netback uses one pair of tasklets for Tx/Rx data transaction. >> Netback tasklet could only run at one CPU at a time, and it is used >> to serve all the netfronts. Therefore it has become a performance >> bottle neck. This patch is to use multiple tasklet pairs to replace >> the current single pair in dom0. Assuming that Dom0 has CPUNR >> VCPUs, we define CPUNR kinds of tasklets pair (CPUNR for Tx, and >> CPUNR for Rx). Each pare of tasklets serve specific group of >> netfronts. Also for those global and static variables, we duplicated >> them for each group in order to avoid the spinlock. >> >> Test senario: >> We use ten 1G NIC interface to talk with 10 VMs (netfront) in server. >> So the total bandwidth is 10G. >> For host machine, bind each guest''s netfront with each NIC interface. >> For client machine, do netperf testing with each guest. >> >> Test Case Packet Size Throughput(Mbps) Dom0 CPU Util Guests CPU Util >> w/o patch 1400 4304.30 400.33% 112.21% >> w/ patch 1400 9533.13 461.64% 243.81% >> >> BTW, when we test this patch, we found that the domain_lock in grant >> table operation becomes a bottle neck. We temporarily remove the >> global domain_lock to achieve good performance. >> >> Best Regards, >> -- Dongxiao_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2009-Nov-27 16:15 UTC
RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
> Test Case Packet Size Throughput(Mbps) Dom0 CPU Util Guests CPU > Util > w/o patch 1400 4304.30 400.33% 112.21% > w/ patch 1400 9533.13 461.64% 243.81% > > BTW, when we test this patch, we found that the domain_lock in grant table > operation becomes a bottle neck. We temporarily remove the global > domain_lock to achieve good performance.What are the figures with the domain_lock still present? How many VCPUs did dom0 have (it would be good to see numbers for 2,3 and 4 VCPUs). I''d rather see use of kthreads than tasklets as this enables more control over QoS (I believe there are patches for this). Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2009-Nov-27 16:57 UTC
RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
Ian Pratt wrote:>> Test Case Packet Size Throughput(Mbps) Dom0 CPU Util Guests CPU >> Util >> w/o patch 1400 4304.30 400.33% 112.21% >> w/ patch 1400 9533.13 461.64% 243.81% >> >> BTW, when we test this patch, we found that the domain_lock in grant >> table operation becomes a bottle neck. We temporarily remove the >> global domain_lock to achieve good performance. > > What are the figures with the domain_lock still present? How many > VCPUs did dom0 have (it would be good to see numbers for 2,3 and 4 > VCPUs). > > I''d rather see use of kthreads than tasklets as this enables more > control over QoS (I believe there are patches for this). > > Thanks, > IanThe domain lock is in grant_op hypercall. If the multiple tasklets are fighting with each other for this big domain lock, it would become a bottleneck and hurt the performance. Our test system has 16 LP in total, so we have 16 vcpus in dom0 by default. 10 of them are used to handle the network load. For our test case, dom0''s total cpu utilization is ~461.64%, so each vcpu ocupies ~46%. Actually the multiple tasklet in netback could already improve the the QoS of the system, therefore I think it can also help to get better responseness for that vcpu. I think I can try to write another patch which replace the tasklet by kthread, because I think is a different job with the multi-tasklet netback support. (kthread is used to guarantee the responseness of userspace, however multi-tasklet netback is used to remove the dom0''s cpu utilization bottleneck). However I am not sure whether the improvement in QoS by this change is needed In MP system? Thanks! Dongxiao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2009-Nov-28 13:15 UTC
RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
> The domain lock is in grant_op hypercall. If the multiple tasklets are fighting > with each other for this big domain lock, it would become a bottleneck and > hurt the performance. > Our test system has 16 LP in total, so we have 16 vcpus in dom0 by > default. > 10 of them are used to handle the network load. For our test case, dom0''s > totalvcpu utilization is ~461.64%, so each vcpu ocupies ~46%.Having 10 VCPUs for dom0 doesn''t seem like a good idea -- it really oughtn''t to need that many CPUs to handle IO load. Have you got any results with e.g. 2 or 4 VCPUs? When we switch over to using netchannel2 by default this issue should largely go away anyhow as the copy is not done by dom0. Have you done any tests with netchannel2?> Actually the multiple tasklet in netback could already improve the the QoS of the > system, therefore I think it can also help to get better responseness for > that vcpu. > I think I can try to write another patch which replace the tasklet by kthread, > because I think is a different job with the multi-tasklet netback support. > (kthread is used to guarantee the responseness of userspace, however multi-tasklet > netback is used to remove the dom0''s cpu utilization bottleneck). However I am not > sure whether the improvement in QoS by this change is needed In MP system?Have you looked at the patch that xenserver uses to replace the tasklets by kthreads? Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2009-Dec-02 10:17 UTC
RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
Hi, According to your feedback, I revised my patch and resend it now. [PATCH 01]: Use multiple tasklet pairs to replace the current single pair in dom0. [PATCH 02]: Replace the tasklet with kernel thread. It may hurt the performance, but could improve the responseness from userspace. Test senario: We use ten 1G NIC interface to talk with 10 VMs (netfront) in server. So the total bandwidth is 10G. For host machine, bind each guest''s netfront with each NIC interface. For client machine, do netperf testing with each guest. Test Case Throughput(Mbps) Dom0 CPU Util Guests CPU Util w/o any patch 4304.30 400.33% 112.21% w/ 01 patch 9533.13 461.64% 243.81% w/ 01 and 02 patches 7942.68 597.83% 250.53%>From the result we can see that, the case "w/ 01 and 02 patches" didn''t reach/near the total bandwidth. It is because some vcpus in dom0 are saturated due to the context switch with other tasks, thus hurt the performance. To prove this idea, I did a experiment, which sets the kernel thread to SCHED_FIFO type, in order to avoid preemption by normal tasks. The experiment result is showed below, and it could get good performance. However like tasklet, set the kernel thread to high priority could also influence the userspace responseness because the usespace application (for example, sshd) could not preempt that netback kernel thread.w/ hi-priority kthread 9535.74 543.56% 241.26% For netchannel2, it omits the grant copy in dom0, I didn''t try it yet. But I used xenoprofile in current netback system to get a feeling that, grant copy occupies ~1/6 cpu cycle of dom0 (including Xen and dom0 vmlinux). BTW, 02 patch is ported from the patch given by Ian Campbell. You can add your signed-off-by if you want. :) Best Regards, -- Dongxiao Ian Pratt wrote:>> The domain lock is in grant_op hypercall. If the multiple tasklets >> are fighting with each other for this big domain lock, it would >> become a bottleneck and >> hurt the performance. >> Our test system has 16 LP in total, so we have 16 vcpus in dom0 by >> default. >> 10 of them are used to handle the network load. For our test case, >> dom0''s totalvcpu utilization is ~461.64%, so each vcpu ocupies >> ~46%. > > Having 10 VCPUs for dom0 doesn''t seem like a good idea -- it really > oughtn''t to need that many CPUs to handle IO load. Have you got any > results with e.g. 2 or 4 VCPUs? > > When we switch over to using netchannel2 by default this issue should > largely go away anyhow as the copy is not done by dom0. Have you done > any tests with netchannel2? > >> Actually the multiple tasklet in netback could already improve the >> the QoS of the system, therefore I think it can also help to get >> better responseness for >> that vcpu. >> I think I can try to write another patch which replace the tasklet >> by kthread, because I think is a different job with the >> multi-tasklet netback support. (kthread is used to guarantee the >> responseness of userspace, however multi-tasklet netback is used to >> remove the dom0''s cpu utilization bottleneck). However I am not sure >> whether the improvement in QoS by this change is needed In MP >> system? > > Have you looked at the patch that xenserver uses to replace the > tasklets by kthreads? > > Thanks, > Ian_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Dec-03 21:28 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
On 12/02/09 02:17, Xu, Dongxiao wrote:> Hi, > According to your feedback, I revised my patch and resend it now. > > [PATCH 01]: Use multiple tasklet pairs to replace the current single pair in dom0. > [PATCH 02]: Replace the tasklet with kernel thread. It may hurt the performance, but could improve the responseness from userspace. > > Test senario: > We use ten 1G NIC interface to talk with 10 VMs (netfront) in server. So the total bandwidth is 10G. > For host machine, bind each guest''s netfront with each NIC interface. > For client machine, do netperf testing with each guest. > > Test Case Throughput(Mbps) Dom0 CPU Util Guests CPU Util > w/o any patch 4304.30 400.33% 112.21% > w/ 01 patch 9533.13 461.64% 243.81% > w/ 01 and 02 patches 7942.68 597.83% 250.53% > > From the result we can see that, the case "w/ 01 and 02 patches" didn''t reach/near the total bandwidth. It is because some vcpus in dom0 are saturated due to the context switch with other tasks, thus hurt the performance. To prove this idea, I did a experiment, which sets the kernel thread to SCHED_FIFO type, in order to avoid preemption by normal tasks. The experiment result is showed below, and it could get good performance. However like tasklet, set the kernel thread to high priority could also influence the userspace responseness because the usespace application (for example, sshd) could not preempt that netback kernel thread. > > w/ hi-priority kthread 9535.74 543.56% 241.26% > > For netchannel2, it omits the grant copy in dom0, I didn''t try it yet. But I used xenoprofile in current netback system to get a feeling that, grant copy occupies ~1/6 cpu cycle of dom0 (including Xen and dom0 vmlinux). > > BTW, 02 patch is ported from the patch given by Ian Campbell. You can add your signed-off-by if you want. :) >I''ve applied this to the xen/dom0/backend/netback-tasklet branch for now. However, I noticed a number of problems with a quick lookover of the code: * "netbk" should either be static, or have a longer name (mentioning xen) * same with "foreign_page_tracker" o (the foreign page tracker API should have better names, but that''s not your problem) * What''s cpu_online_nr for? I don''t think it should be necessary at all, and if it is, then it needs a much more distinct name. * If they''re really per-cpu variables, they should use the percpu mechanism * How do you relate the number of online CPUs to the whole group index/pending index computation? It isn''t obvious how they''re connected, or how it guarantees that the index is enough. * What happens if you start hotplugging cpus? * All the repeated netbk[group_idx]. expressions would be improved by defining a local pointer for that value. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2009-Dec-04 02:13 UTC
RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
Hi, Jeremy, Thanks much for your comments, here are some explainations for the 01 patch. Best Regards, -- Dongxiao Jeremy Fitzhardinge wrote:> On 12/02/09 02:17, Xu, Dongxiao wrote: >> Hi, >> According to your feedback, I revised my patch and resend it now. >> >> [PATCH 01]: Use multiple tasklet pairs to replace the current single >> pair in dom0. [PATCH 02]: Replace the tasklet with kernel thread. It >> may hurt the performance, but could improve the responseness from >> userspace. >> >> Test senario: >> We use ten 1G NIC interface to talk with 10 VMs (netfront) in >> server. So the total bandwidth is 10G. >> For host machine, bind each guest''s netfront with each NIC interface. >> For client machine, do netperf testing with each guest. >> >> Test Case Throughput(Mbps) Dom0 CPU Util Guests CPU Util >> w/o any patch 4304.30 400.33% 112.21% >> w/ 01 patch 9533.13 461.64% 243.81% >> w/ 01 and 02 patches 7942.68 597.83% 250.53% >> >> From the result we can see that, the case "w/ 01 and 02 patches" >> didn''t reach/near the total bandwidth. It is because some vcpus in >> dom0 are saturated due to the context switch with other tasks, thus >> hurt the performance. To prove this idea, I did a experiment, which >> sets the kernel thread to SCHED_FIFO type, in order to avoid >> preemption by normal tasks. The experiment result is showed below, >> and it could get good performance. However like tasklet, set the >> kernel thread to high priority could also influence the userspace >> responseness because the usespace application (for example, sshd) >> could not preempt that netback kernel thread. >> >> w/ hi-priority kthread 9535.74 543.56% 241.26% >> >> For netchannel2, it omits the grant copy in dom0, I didn''t try it >> yet. But I used xenoprofile in current netback system to get a >> feeling that, grant copy occupies ~1/6 cpu cycle of dom0 (including >> Xen and dom0 vmlinux). >> >> BTW, 02 patch is ported from the patch given by Ian Campbell. You >> can add your signed-off-by if you want. :) >> > > I''ve applied this to the xen/dom0/backend/netback-tasklet branch for > now. However, I noticed a number of problems with a quick lookover of > the code: > > * "netbk" should either be static, or have a longer name > (mentioning xen)OK, I will rename it.> * same with "foreign_page_tracker" > o (the foreign page tracker API should have better names, > but that''s not your problem) > * What''s cpu_online_nr for? I don''t think it should be necessary > at all, and if it is, then it needs a much more distinct name. > * If they''re really per-cpu variables, they should use the percpu > mechanismActually those tasklets are not per-cpu variables. We just defined cpu_online_nr of tasklets, in order to get the best performance if each tasklet could run on each cpu. However, they are not binded with cpus. Some tasklets may run on the same vcpu of dom0 due to interrupt delivery affinity. Therefore these tasklets are not per-cpu variables.> * How do you relate the number of online CPUs to the whole group > index/pending index computation? It isn''t obvious how they''re > connected, or how it guarantees that the index is enough.Same explaination as above. Whether online cpus number is greater or less than tasklet number does not matter in our case. We defined them to the same value is only for getting best performance.> * What happens if you start hotplugging cpus?It doesn''t matter. Assumes that dom0 has 3 vcpus now, so there are 3 tasklets to handle network traffic. At one time, admin add one vcpu to dom0. In this case, 4 vcpus will handle the three tasklets. Think of the current situation without my patch, dom0''s all vcpus handle only one tasklet, which is a bottleneck.> * All the repeated netbk[group_idx]. expressions would be improved > by defining a local pointer for that value.OK, I will improve it. Thanks!> > J_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Dec-04 02:33 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
On 12/03/09 18:13, Xu, Dongxiao wrote:>> * same with "foreign_page_tracker" >> o (the foreign page tracker API should have better names, >> but that''s not your problem) >> * What''s cpu_online_nr for? I don''t think it should be necessary >> at all, and if it is, then it needs a much more distinct name. >> * If they''re really per-cpu variables, they should use the percpu >> mechanism >> > Actually those tasklets are not per-cpu variables. > We just defined cpu_online_nr of tasklets, in order to get the best performance > if each tasklet could run on each cpu. However, they are not binded with cpus. > Some tasklets may run on the same vcpu of dom0 due to interrupt delivery > affinity. Therefore these tasklets are not per-cpu variables. >OK, you should name the variable to what it actually means, not what its value happens to be. It seems like a parameter which should be adjustable via sysfs or something. How did you arrive at 3?>> * How do you relate the number of online CPUs to the whole group >> index/pending index computation? It isn''t obvious how they''re >> connected, or how it guarantees that the index is enough. >> > Same explaination as above. Whether online cpus number is greater or less than > tasklet number does not matter in our case. We defined them to the same value > is only for getting best performance. >Nevertheless, it isn''t at all clear how we can be certain the index calculations are less guaranteed to be less than the number of tasklets. There is a lot of code scattered around the place; perhaps you could condense it into a smaller number of places? In fact, the overall patch size is very large, and hard to review and test. Could you please give some thought to how you can incrementally modify netback to get the result you want. For example, keep the current functional structure, but make the changes to generalize to N processing handlers (but keeping N=1), then convert the softirq to a tasklet, then make N > 1. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2009-Dec-08 09:22 UTC
RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
Jeremy, I have revised the patch according to your suggestion. See attachment. 0001: Keep group number as 1, and put all the global/static variables to struct xen_netbk. Do some preparations for multiple tasklets support. 0002: Support for netback multiple tasklet. 0003: Use kernel thread to replace the tasklet in order to ensure the dom0 userspace QoS. Thanks! Dongxiao. Jeremy Fitzhardinge wrote:> On 12/03/09 18:13, Xu, Dongxiao wrote: >>> * same with "foreign_page_tracker" >>> o (the foreign page tracker API should have better names, >>> but that''s not your problem) >>> * What''s cpu_online_nr for? I don''t think it should be >>> necessary at all, and if it is, then it needs a much more >>> distinct name. * If they''re really per-cpu variables, they >>> should use the percpu mechanism >>> >> Actually those tasklets are not per-cpu variables. >> We just defined cpu_online_nr of tasklets, in order to get the best >> performance if each tasklet could run on each cpu. However, they are >> not binded with cpus. Some tasklets may run on the same vcpu of dom0 >> due to interrupt delivery affinity. Therefore these tasklets are not >> per-cpu variables. >> > > OK, you should name the variable to what it actually means, not what > its value happens to be. It seems like a parameter which should be > adjustable via sysfs or something. > > How did you arrive at 3? > >>> * How do you relate the number of online CPUs to the whole >>> group index/pending index computation? It isn''t obvious how >>> they''re connected, or how it guarantees that the index is >>> enough. >>> >> Same explaination as above. Whether online cpus number is greater or >> less than tasklet number does not matter in our case. We defined >> them to the same value is only for getting best performance. >> > > Nevertheless, it isn''t at all clear how we can be certain the index > calculations are less guaranteed to be less than the number of > tasklets. There is a lot of code scattered around the place; perhaps > you could condense it into a smaller number of places? > > In fact, the overall patch size is very large, and hard to review and > test. Could you please give some thought to how you can incrementally > modify netback to get the result you want. For example, keep the > current functional structure, but make the changes to generalize to N > processing handlers (but keeping N=1), then convert the softirq to a > tasklet, then make N > 1. > > Thanks, > J_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Dec-09 20:23 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
On 12/08/09 01:22, Xu, Dongxiao wrote:> Jeremy, > I have revised the patch according to your suggestion. See attachment. > 0001: Keep group number as 1, and put all the global/static variables to struct xen_netbk. Do some preparations for multiple tasklets support. > 0002: Support for netback multiple tasklet. > 0003: Use kernel thread to replace the tasklet in order to ensure the dom0 userspace QoS. >Thanks, this looks much better. The only thing which lept out at me from a first reading is that "group_nr" is still too generic a name for a global symbol. Something like "xen_netbk_nr_groups" perhaps? Also, is it worth making it a tunable? Presumably it needn''t scale exactly with the number of dom0 cpus; if you only have one or two gbit interfaces, then you could saturate that pretty quickly with a small number of cpus, regardless of how many domains you have. I''ve pushed this out in its own branch: xen/dom0/backend/netback-tasklet; please post any future patches against this branch. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2009-Dec-10 03:29 UTC
RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
Jeremy Fitzhardinge wrote:> On 12/08/09 01:22, Xu, Dongxiao wrote: >> Jeremy, >> I have revised the patch according to your suggestion. See >> attachment. 0001: Keep group number as 1, and put all the >> global/static variables to struct xen_netbk. Do some preparations >> for multiple tasklets support. 0002: Support for netback multiple >> tasklet. 0003: Use kernel thread to replace the tasklet in order to >> ensure the dom0 userspace QoS. >> > > Thanks, this looks much better. The only thing which lept out at me > from a first reading is that "group_nr" is still too generic a name > for a global symbol. Something like "xen_netbk_nr_groups" perhaps?Thanks, I will change the name for it.> > Also, is it worth making it a tunable? Presumably it needn''t scale > exactly with the number of dom0 cpus; if you only have one or two gbit > interfaces, then you could saturate that pretty quickly with a small > number of cpus, regardless of how many domains you have.How many CPUs are serving for the NIC interface is determined by how interrupt is delivered. If system only has two gbit interfaces, and they delivier interrupts to CPU0 and CPU1, then the case is: two CPUs handle two tasklets. Other CPUs are idle. The group_nr just defines the max number of tasklets, however it doesn''t decide how tasklet is handled by CPU.> > I''ve pushed this out in its own branch: > xen/dom0/backend/netback-tasklet; please post any future patches > against this branch.What''s my next step for this netback-tasklet tree merging into xen/master? Thanks! Dongxiao> > J_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Dec-10 09:07 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
On Wed, 2009-12-09 at 20:23 +0000, Jeremy Fitzhardinge wrote:> > I''ve pushed this out in its own branch: > xen/dom0/backend/netback-tasklet; please post any future patches > against > this branch.This is probably a bit pointless given the code is commented out but I wanted to get it out there while it was in my head: Subject: xen: ensure locking gnttab_copy_grant_page is safe against interrupts. Now that netback processing occurs in a thread instead of a tasklet gnttab_copy_grant_page needs to be safe against interrupts. The code is currently commented out in this tree but on 2.6.18 we observed a deadlock where the netback thread called gnttab_copy_grant_page, locked gnttab_dma_lock for writing, was interrupted and on return from interrupt the network stack''s TX tasklet ended up calling __gnttab_dma_map_page via the hardware driver->swiotlb and tries to take gnttab_dma_lock for reading. Correct the commented code so we don''t get bitten if/when it is re-enabled. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org># Cc: "Xu, Dongxiao" <dongxiao.xu@intel.com> --- drivers/xen/grant-table.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index c653970..f25a2bc 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -828,14 +828,14 @@ int gnttab_copy_grant_page(grant_ref_t ref, struct page **pagep) mfn = pfn_to_mfn(pfn); new_mfn = virt_to_mfn(new_addr); -// write_seqlock(&gnttab_dma_lock); /* protects __gnttab_dma_map_page on 2.6.18 */ +// write_seqlock_irq(&gnttab_dma_lock); /* protects __gnttab_dma_map_page on 2.6.18 */ /* Make seq visible before checking page_mapped. */ smp_mb(); /* Has the page been DMA-mapped? */ if (unlikely(page_mapped(page))) { - //write_sequnlock(&gnttab_dma_lock); + //write_sequnlock_irq(&gnttab_dma_lock); put_page(new_page); err = -EBUSY; goto out; @@ -855,7 +855,7 @@ int gnttab_copy_grant_page(grant_ref_t ref, struct page **pagep) BUG_ON(err); BUG_ON(unmap.status); -// write_sequnlock(&gnttab_dma_lock); +// write_sequnlock_irq(&gnttab_dma_lock); if (!xen_feature(XENFEAT_auto_translated_physmap)) { set_phys_to_machine(page_to_pfn(new_page), INVALID_P2M_ENTRY); -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Dec-10 17:54 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
On 12/10/09 01:07, Ian Campbell wrote:> Subject: xen: ensure locking gnttab_copy_grant_page is safe against interrupts. > > Now that netback processing occurs in a thread instead of a tasklet > gnttab_copy_grant_page needs to be safe against interrupts. > > The code is currently commented out in this tree but on 2.6.18 we observed a > deadlock where the netback thread called gnttab_copy_grant_page, locked > gnttab_dma_lock for writing, was interrupted and on return from interrupt the > network stack''s TX tasklet ended up calling __gnttab_dma_map_page via the > hardware driver->swiotlb and tries to take gnttab_dma_lock for reading. > > Correct the commented code so we don''t get bitten if/when it is re-enabled. >What''s the issue here? I''m inclined to just remove the commented-out code if it isn''t being used, and re-evaluate the locking if/when it becomes necessary. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Dec-10 18:01 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
On 12/09/09 19:29, Xu, Dongxiao wrote:>> Also, is it worth making it a tunable? Presumably it needn''t scale >> exactly with the number of dom0 cpus; if you only have one or two gbit >> interfaces, then you could saturate that pretty quickly with a small >> number of cpus, regardless of how many domains you have. >> > How many CPUs are serving for the NIC interface is determined by how > interrupt is delivered. If system only has two gbit interfaces, and they > delivier interrupts to CPU0 and CPU1, then the case is: two CPUs handle > two tasklets. Other CPUs are idle. The group_nr just defines the max > number of tasklets, however it doesn''t decide how tasklet is handled by > CPU. >So does this mean that a given vcpu will be used to handle the interrupt if happens to be running on a pcpu with affinity for the device? Or that particular devices will be handled by particular vcpus?>> I''ve pushed this out in its own branch: >> xen/dom0/backend/netback-tasklet; please post any future patches >> against this branch. >> > What''s my next step for this netback-tasklet tree merging into xen/master? >Hm, well, I guess: * I''d like to see some comments Keir/Ian(s)/others that this is basically the right approach. It looks OK to me, but I don''t have much experience with performance in the field. o does nc2 make nc1 obsolete? * Testing to make sure it really works. Netback is clearly critical functionality, so I''d like to be sure we''re not introducing big regressions J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Dec-10 18:07 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
On Thu, 2009-12-10 at 17:54 +0000, Jeremy Fitzhardinge wrote:> On 12/10/09 01:07, Ian Campbell wrote: > > Subject: xen: ensure locking gnttab_copy_grant_page is safe against interrupts. > > > > Now that netback processing occurs in a thread instead of a tasklet > > gnttab_copy_grant_page needs to be safe against interrupts. > > > > The code is currently commented out in this tree but on 2.6.18 we observed a > > deadlock where the netback thread called gnttab_copy_grant_page, locked > > gnttab_dma_lock for writing, was interrupted and on return from interrupt the > > network stack''s TX tasklet ended up calling __gnttab_dma_map_page via the > > hardware driver->swiotlb and tries to take gnttab_dma_lock for reading. > > > > Correct the commented code so we don''t get bitten if/when it is re-enabled. > > > > What''s the issue here?a deadlock if someone naively uncomments the existing code.> I''m inclined to just remove the commented-out > code if it isn''t being used, and re-evaluate the locking if/when it > becomes necessary.that would be fine also. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2009-Dec-11 01:34 UTC
RE: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
Jeremy Fitzhardinge wrote:> On 12/09/09 19:29, Xu, Dongxiao wrote: >>> Also, is it worth making it a tunable? Presumably it needn''t scale >>> exactly with the number of dom0 cpus; if you only have one or two >>> gbit interfaces, then you could saturate that pretty quickly with a >>> small number of cpus, regardless of how many domains you have. >>> >> How many CPUs are serving for the NIC interface is determined by how >> interrupt is delivered. If system only has two gbit interfaces, and >> they delivier interrupts to CPU0 and CPU1, then the case is: two >> CPUs handle two tasklets. Other CPUs are idle. The group_nr just >> defines the max number of tasklets, however it doesn''t decide how >> tasklet is handled by CPU. >> > > So does this mean that a given vcpu will be used to handle the > interrupt if happens to be running on a pcpu with affinity for the > device? Or that particular devices will be handled by particular > vcpus?If NIC device is owned by Dom0, then its interrupt affinity is related with Dom0''s *VCPU* (I think its not PCPU). Which VCPU will handle the device interrupt is determined by the interrupt affinity, either set manually by commands such as: "echo XXX > /proc/irq/irq_num/smp_processor_id", or automatically adjusted by irqbalanced.> >>> I''ve pushed this out in its own branch: >>> xen/dom0/backend/netback-tasklet; please post any future patches >>> against this branch. >>> >> What''s my next step for this netback-tasklet tree merging into >> xen/master? >> > > Hm, well, I guess: > > * I''d like to see some comments Keir/Ian(s)/others that this is > basically the right approach. It looks OK to me, but I don''t > have much experience with performance in the field. > o does nc2 make nc1 obsolete? > * Testing to make sure it really works. Netback is clearly > critical functionality, so I''d like to be sure we''re not > introducing big regressionsI will do another round of testing for this patch, and will give you reply then. Thanks! Dongxiao> > J_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Dec-11 08:34 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
>>> Ian Campbell <Ian.Campbell@citrix.com> 10.12.09 19:07 >>> >On Thu, 2009-12-10 at 17:54 +0000, Jeremy Fitzhardinge wrote: >> On 12/10/09 01:07, Ian Campbell wrote: >> > Subject: xen: ensure locking gnttab_copy_grant_page is safe against interrupts. >> > >> > Now that netback processing occurs in a thread instead of a tasklet >> > gnttab_copy_grant_page needs to be safe against interrupts. >> > >> > The code is currently commented out in this tree but on 2.6.18 we observed a >> > deadlock where the netback thread called gnttab_copy_grant_page, locked >> > gnttab_dma_lock for writing, was interrupted and on return from interrupt the >> > network stack''s TX tasklet ended up calling __gnttab_dma_map_page via the >> > hardware driver->swiotlb and tries to take gnttab_dma_lock for reading. >> > >> > Correct the commented code so we don''t get bitten if/when it is re-enabled. >> > >> >> What''s the issue here? > >a deadlock if someone naively uncomments the existing code.Btw., can any of you explain why 2.6.18 needs this (and the related) code, but pv-ops doesn''t? Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Dec-11 09:34 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
On Fri, 2009-12-11 at 08:34 +0000, Jan Beulich wrote:> > Btw., can any of you explain why 2.6.18 needs this (and the related) > code,but pv-ops doesn''t?The 2.6.18 bits came from http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/a395e58bd234 I can''t immediately see why this wouldn''t apply to pv ops too. There are comments in drivers/pci/xen-iommu.c (e.g. in xen_dma_unmap_page) which suggest someone has though about it at some point... Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2009-Dec-11 14:24 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
On Fri, Dec 11, 2009 at 09:34:42AM +0000, Ian Campbell wrote:> On Fri, 2009-12-11 at 08:34 +0000, Jan Beulich wrote: > > > > Btw., can any of you explain why 2.6.18 needs this (and the related) > > code,but pv-ops doesn''t? > > The 2.6.18 bits came from > http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/a395e58bd234Oh boy. That code (and its friends): /* Has the page been DMA-mapped? */ + if (unlikely(page_mapped(page))) { caused us at Virtual Iron so much head-ache. If a guest had iSCSI backed storage, and the page was grant mapped back to Dom0, in some cases we would two page references taken with one put - meaning that ''page_mapped(page))'' would true forever. Joshua Nicholas traced it down to the TCP stack taking a page reference and never giving it up. The end result is that during migration (and shutdown) the guest would never die due to outstanding page grants. We never troubleshooted the TCP stack, but we implemented a similar mechanism in the blkback that netback does - squash the page reference when done. Not the most cleanest but it did the job.> > I can''t immediately see why this wouldn''t apply to pv ops too. There are > comments in drivers/pci/xen-iommu.c (e.g. in xen_dma_unmap_page) which > suggest someone has though about it at some point...Lets wait and see if its needed ? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Jan-13 10:17 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
>>> Ian Campbell <Ian.Campbell@citrix.com> 10.12.09 10:07 >>> >Subject: xen: ensure locking gnttab_copy_grant_page is safe against interrupts. > >Now that netback processing occurs in a thread instead of a tasklet >gnttab_copy_grant_page needs to be safe against interrupts. > >The code is currently commented out in this tree but on 2.6.18 we observed a >deadlock where the netback thread called gnttab_copy_grant_page, locked >gnttab_dma_lock for writing, was interrupted and on return from interrupt the >network stack''s TX tasklet ended up calling __gnttab_dma_map_page via the >hardware driver->swiotlb and tries to take gnttab_dma_lock for reading. > >Correct the commented code so we don''t get bitten if/when it is re-enabled.Wouldn''t safe-against-softirq be sufficient then? Or if not (i.e. if meant to be generic), wouldn''t the irq-safe variant need to be used independent of the new feature (and then also in the 2.6.18 tree)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Jan-14 16:55 UTC
Re: [Xen-devel][Pv-ops][PATCH] Netback multiple tasklet support
On Wed, 2010-01-13 at 10:17 +0000, Jan Beulich wrote:> >>> Ian Campbell <Ian.Campbell@citrix.com> 10.12.09 10:07 >>> > >Subject: xen: ensure locking gnttab_copy_grant_page is safe against interrupts. > > > >Now that netback processing occurs in a thread instead of a tasklet > >gnttab_copy_grant_page needs to be safe against interrupts. > > > >The code is currently commented out in this tree but on 2.6.18 we observed a > >deadlock where the netback thread called gnttab_copy_grant_page, locked > >gnttab_dma_lock for writing, was interrupted and on return from interrupt the > >network stack''s TX tasklet ended up calling __gnttab_dma_map_page via the > >hardware driver->swiotlb and tries to take gnttab_dma_lock for reading. > > > >Correct the commented code so we don''t get bitten if/when it is re-enabled. > > Wouldn''t safe-against-softirq be sufficient then?Yes, I guess so.> Or if not (i.e. if meant to be generic), wouldn''t the irq-safe variant need to be used > independent of the new feature (and then also in the 2.6.18 tree)?Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cui, Dexuan
2010-Mar-17 08:46 UTC
[Xen-devel] [PATCH] [pv-ops] fix dom0 S3 when MSI is used.
The old commit a234848f works only when the device supports D3hot; when the device only supports D3cold, the device doesn''t work properly after resuming from Dom0 S3. A better workaround is invoking the PHYSDEVOP_restore_msi hypercall. The patch reverts the old commit and invokes the hypercall. Signed-off-by: Dexuan Cui <dexuan.cui@intel.com> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index b40c6d0..c6bffe2 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -20,6 +20,7 @@ #include <asm/errno.h> #include <asm/io.h> +#include <asm/xen/hypercall.h> #include <asm/xen/hypervisor.h> #include "pci.h" @@ -271,8 +272,7 @@ void write_msi_msg(unsigned int irq, struct msi_msg *msg) { struct irq_desc *desc = irq_to_desc(irq); - if (!xen_initial_domain()) - write_msi_msg_desc(desc, msg); + write_msi_msg_desc(desc, msg); } static int msi_free_irqs(struct pci_dev* dev); @@ -347,6 +347,20 @@ static void __pci_restore_msix_state(struct pci_dev *dev) void pci_restore_msi_state(struct pci_dev *dev) { + if (xen_initial_domain()) { + struct physdev_restore_msi physdev; + + if (!dev->msi_enabled && !dev->msix_enabled) + return; + + pci_intx_for_msi(dev, 0); + + physdev.bus = dev->bus->number; + physdev.devfn = dev->devfn; + HYPERVISOR_physdev_op(PHYSDEVOP_restore_msi, &physdev); + + return; + } __pci_restore_msi_state(dev); __pci_restore_msix_state(dev); } diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h index ac5de37..516d2b4 100644 --- a/include/xen/interface/physdev.h +++ b/include/xen/interface/physdev.h @@ -144,6 +144,13 @@ struct physdev_manage_pci { uint8_t devfn; }; +#define PHYSDEVOP_restore_msi 19 +struct physdev_restore_msi { + /* IN */ + uint8_t bus; + uint8_t devfn; +}; + #define PHYSDEVOP_manage_pci_add_ext 20 struct physdev_manage_pci_ext { /* IN */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Mar-17 14:28 UTC
Re: [Xen-devel] [PATCH] [pv-ops] fix dom0 S3 when MSI is used.
On Wed, Mar 17, 2010 at 04:46:47PM +0800, Cui, Dexuan wrote:> The old commit a234848f works only when the device supports D3hot; when the > device only supports D3cold, the device doesn''t work properly after resuming > from Dom0 S3. > A better workaround is invoking the PHYSDEVOP_restore_msi hypercall. > The patch reverts the old commit and invokes the hypercall. > > Signed-off-by: Dexuan Cui <dexuan.cui@intel.com> > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index b40c6d0..c6bffe2 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -20,6 +20,7 @@ > #include <asm/errno.h> > #include <asm/io.h> > > +#include <asm/xen/hypercall.h> > #include <asm/xen/hypervisor.h> > > #include "pci.h" > @@ -271,8 +272,7 @@ void write_msi_msg(unsigned int irq, struct msi_msg *msg) > { > struct irq_desc *desc = irq_to_desc(irq); > > - if (!xen_initial_domain()) > - write_msi_msg_desc(desc, msg); > + write_msi_msg_desc(desc, msg);Nice. That will remove the other platofmr build problem.> } > > static int msi_free_irqs(struct pci_dev* dev); > @@ -347,6 +347,20 @@ static void __pci_restore_msix_state(struct pci_dev *dev) > > void pci_restore_msi_state(struct pci_dev *dev) > { > + if (xen_initial_domain()) {That won''t do. If you try to compile this kernel on other platforms (say PPC), this will throw a huge problem.> + struct physdev_restore_msi physdev; > + > + if (!dev->msi_enabled && !dev->msix_enabled) > + return;This seems redundant. I think the problem you are trying to address is doing to show up when doing PC passthrough using pciback and xen-pcifront. The mechanims that was employed there to make it work, was to utilize the arch_setup_msi_irqs in (arch/x86/kernel/apic/io_apic.c) and make it call functions in arch/x86/pci/xen.c. That code then figures out if you are running in priviliged or un-priviliged mode and makes the appropiate call. Perhaps you should using that idea and expand on it. I would suggest you take a look at the how PPC implements this and see if there is something that can be borrowed from their mechanism.> + > + pci_intx_for_msi(dev, 0); > + > + physdev.bus = dev->bus->number; > + physdev.devfn = dev->devfn; > + HYPERVISOR_physdev_op(PHYSDEVOP_restore_msi, &physdev); > + > + return; > + } > __pci_restore_msi_state(dev); > __pci_restore_msix_state(dev); > } > diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h > index ac5de37..516d2b4 100644 > --- a/include/xen/interface/physdev.h > +++ b/include/xen/interface/physdev.h > @@ -144,6 +144,13 @@ struct physdev_manage_pci { > uint8_t devfn; > }; > > +#define PHYSDEVOP_restore_msi 19 > +struct physdev_restore_msi { > + /* IN */ > + uint8_t bus; > + uint8_t devfn; > +}; > + > #define PHYSDEVOP_manage_pci_add_ext 20 > struct physdev_manage_pci_ext { > /* IN */> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cui, Dexuan
2010-Mar-18 03:05 UTC
RE: [Xen-devel] [PATCH] [pv-ops] fix dom0 S3 when MSI is used.
Konrad Rzeszutek Wilk wrote:> On Wed, Mar 17, 2010 at 04:46:47PM +0800, Cui, Dexuan wrote: >> The old commit a234848f works only when the device supports D3hot; >> when the device only supports D3cold, the device doesn''t work >> properly after resuming from Dom0 S3. A better workaround is >> invoking the PHYSDEVOP_restore_msi hypercall. >> The patch reverts the old commit and invokes the hypercall. >> >> Signed-off-by: Dexuan Cui <dexuan.cui@intel.com> >> >> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c >> index b40c6d0..c6bffe2 100644 >> --- a/drivers/pci/msi.c >> +++ b/drivers/pci/msi.c >> @@ -20,6 +20,7 @@ >> #include <asm/errno.h> >> #include <asm/io.h> >> >> +#include <asm/xen/hypercall.h> >> #include <asm/xen/hypervisor.h> >> >> #include "pci.h" >> @@ -271,8 +272,7 @@ void write_msi_msg(unsigned int irq, struct >> msi_msg *msg) { struct irq_desc *desc = irq_to_desc(irq); >> >> - if (!xen_initial_domain()) >> - write_msi_msg_desc(desc, msg); >> + write_msi_msg_desc(desc, msg); > > Nice. That will remove the other platofmr build problem. >> } >> >> static int msi_free_irqs(struct pci_dev* dev); >> @@ -347,6 +347,20 @@ static void __pci_restore_msix_state(struct >> pci_dev *dev) >> >> void pci_restore_msi_state(struct pci_dev *dev) >> { >> + if (xen_initial_domain()) { > > That won''t do. If you try to compile this kernel on other platformsHi Konrad, Thanks for the comments! The patch is used to act as "a better workaround" for the actual Dom0 S3 issue for now. I think the MSI implementation in pv-ops dom0 will be re-worked in future, just like the IOAPIC rework that happened. This patch is not meant to fix all the existing issues. :-)> (say PPC), this will throw a huge problem. > >> + struct physdev_restore_msi physdev; >> + >> + if (!dev->msi_enabled && !dev->msix_enabled) >> + return; > > This seems redundant.Seems not redundant. if (!dev->msi_enabled && !dev->msix_enabled), we don''t need to go further to invoke the hypercall.> > I think the problem you are trying to address is doing to show up when > doing PC passthrough using pciback and xen-pcifront.Actually the bug I''m trying to fix is only limited to pv-dom0 itself: In my host, after Dom0 S3 resume, the SATA disk can''t work properly in Dom0 and Dom0''s filesytem would become inaccessibe. With pci=nomsi, or with the patch, Dom0 S3 can work fine.> > The mechanims that was employed there to make it work, was to utilize > the arch_setup_msi_irqs in (arch/x86/kernel/apic/io_apic.c) and make > it call functions in arch/x86/pci/xen.c. That code then figures out > if you are running in priviliged or un-priviliged mode and makes the > appropiate call. > > Perhaps you should using that idea and expand on it. I would suggest > you take a look at the how PPC implements this and see if there is > something that can be borrowed from their mechanism.Agree. I think this could be considered in the coming MSI rework in pv-ops dom0. Thanks, -- Dexuan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Mar-19 01:03 UTC
[Xen-devel] Re: [PATCH] [pv-ops] fix dom0 S3 when MSI is used.
On 03/17/2010 01:46 AM, Cui, Dexuan wrote:> The old commit a234848f works only when the device supports D3hot; when the > device only supports D3cold, the device doesn''t work properly after resuming > from Dom0 S3. >What branches should I apply this to? Both 2.6.31 and .32, or just .32?> A better workaround is invoking the PHYSDEVOP_restore_msi hypercall. > The patch reverts the old commit and invokes the hypercall. >Is this a new hypercall? Aside from that, it looks fine as a stopgap until we do MSI properly. Thanks, J> Signed-off-by: Dexuan Cui<dexuan.cui@intel.com> > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index b40c6d0..c6bffe2 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -20,6 +20,7 @@ > #include<asm/errno.h> > #include<asm/io.h> > > +#include<asm/xen/hypercall.h> > #include<asm/xen/hypervisor.h> > > #include "pci.h" > @@ -271,8 +272,7 @@ void write_msi_msg(unsigned int irq, struct msi_msg *msg) > { > struct irq_desc *desc = irq_to_desc(irq); > > - if (!xen_initial_domain()) > - write_msi_msg_desc(desc, msg); > + write_msi_msg_desc(desc, msg); > } > > static int msi_free_irqs(struct pci_dev* dev); > @@ -347,6 +347,20 @@ static void __pci_restore_msix_state(struct pci_dev *dev) > > void pci_restore_msi_state(struct pci_dev *dev) > { > + if (xen_initial_domain()) { > + struct physdev_restore_msi physdev; > + > + if (!dev->msi_enabled&& !dev->msix_enabled) > + return; > + > + pci_intx_for_msi(dev, 0); > + > + physdev.bus = dev->bus->number; > + physdev.devfn = dev->devfn; > + HYPERVISOR_physdev_op(PHYSDEVOP_restore_msi,&physdev); > + > + return; > + } > __pci_restore_msi_state(dev); > __pci_restore_msix_state(dev); > } > diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h > index ac5de37..516d2b4 100644 > --- a/include/xen/interface/physdev.h > +++ b/include/xen/interface/physdev.h > @@ -144,6 +144,13 @@ struct physdev_manage_pci { > uint8_t devfn; > }; > > +#define PHYSDEVOP_restore_msi 19 > +struct physdev_restore_msi { > + /* IN */ > + uint8_t bus; > + uint8_t devfn; > +}; > + > #define PHYSDEVOP_manage_pci_add_ext 20 > struct physdev_manage_pci_ext { > /* IN */ >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Mar-19 01:04 UTC
Re: [Xen-devel] [PATCH] [pv-ops] fix dom0 S3 when MSI is used.
On 03/17/2010 07:28 AM, Konrad Rzeszutek Wilk wrote:>> static int msi_free_irqs(struct pci_dev* dev); >> @@ -347,6 +347,20 @@ static void __pci_restore_msix_state(struct pci_dev *dev) >> >> void pci_restore_msi_state(struct pci_dev *dev) >> { >> + if (xen_initial_domain()) { >> > That won''t do. If you try to compile this kernel on other platforms (say > PPC), this will throw a huge problem. >Actually, <xen/xen.h> now defines these for all architectures, so they''re safe to put into arch-independent code. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cui, Dexuan
2010-Mar-19 01:29 UTC
[Xen-devel] RE: [PATCH] [pv-ops] fix dom0 S3 when MSI is used.
Jeremy Fitzhardinge wrote:> On 03/17/2010 01:46 AM, Cui, Dexuan wrote: >> The old commit a234848f works only when the device supports D3hot; >> when the device only supports D3cold, the device doesn''t work >> properly after resuming from Dom0 S3. >> > What branches should I apply this to? Both 2.6.31 and .32, or just > .32?I think it should be both. :-)>> A better workaround is invoking the PHYSDEVOP_restore_msi hypercall. >> The patch reverts the old commit and invokes the hypercall. >> > Is this a new hypercall?No. The hypercall has been there from changeset 18937: 2dffa6ceb0af, 15 months ago. :-)> > Aside from that, it looks fine as a stopgap until we do MSI properly.Agree. Thanks, -- Dexuan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2010-Apr-26 14:27 UTC
[Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
Hi Jeremy and all, I''d like to make an update on these patches. The main logic is not changed, and I only did a rebase towards the upstream pv-ops kernel. See attached patch. The original patches are checked in Jeremy''s netback-tasklet branch. Let me explain the main idea of the patchset again: Current netback uses one pair of tasklets for Tx/Rx data transaction. Netback tasklet could only run at one CPU at a time, and it is used to serve all the netfronts. Therefore it has become a performance bottle neck. This patch is to use multiple tasklet pairs to replace the current single pair in dom0. Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets serve specific group of netfronts. Also for those global and static variables, we duplicated them for each group in order to avoid the spinlock. PATCH 01: Generilize static/global variables into ''struct xen_netbk''. PATCH 02: Multiple tasklets support. PATCH 03: Use Kernel thread to replace the tasklet. Recently I re-tested the patchset with Intel 10G multi-queue NIC device, and use 10 outside 1G NICs to do netperf tests with that 10G NIC. Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU. With the patchset, the performance is 2x of the original throughput. Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs. With the patchset, the performance is 3.7x of the original throughput. when we test this patch, we found that the domain_lock in grant table operation (gnttab_copy()) becomes a bottle neck. We temporarily remove the global domain_lock to achieve good performance. Thanks, Dongxiao Jeremy Fitzhardinge wrote:> On 12/09/09 19:29, Xu, Dongxiao wrote: >>> Also, is it worth making it a tunable? Presumably it needn''t scale >>> exactly with the number of dom0 cpus; if you only have one or two >>> gbit interfaces, then you could saturate that pretty quickly with a >>> small number of cpus, regardless of how many domains you have. >>> >> How many CPUs are serving for the NIC interface is determined by how >> interrupt is delivered. If system only has two gbit interfaces, and >> they delivier interrupts to CPU0 and CPU1, then the case is: two >> CPUs handle two tasklets. Other CPUs are idle. The group_nr just >> defines the max number of tasklets, however it doesn''t decide how >> tasklet is handled by CPU. >> > > So does this mean that a given vcpu will be used to handle the > interrupt if happens to be running on a pcpu with affinity for the > device? Or that particular devices will be handled by particular > vcpus? > >>> I''ve pushed this out in its own branch: >>> xen/dom0/backend/netback-tasklet; please post any future patches >>> against this branch. >>> >> What''s my next step for this netback-tasklet tree merging into >> xen/master? >> > > Hm, well, I guess: > > * I''d like to see some comments Keir/Ian(s)/others that this is > basically the right approach. It looks OK to me, but I don''t > have much experience with performance in the field. > o does nc2 make nc1 obsolete? > * Testing to make sure it really works. Netback is clearly > critical functionality, so I''d like to be sure we''re not > introducing big regressions > > J_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Apr-27 00:19 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
> Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU. > With the patchset, the performance is 2x of the original throughput. > > Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs. > With the patchset, the performance is 3.7x of the original throughput.What was the original throughput? 1GB? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2010-Apr-27 00:40 UTC
RE: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
Konrad Rzeszutek Wilk wrote:>> Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU. >> With the patchset, the performance is 2x of the original throughput. >> >> Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs. >> With the patchset, the performance is 3.7x of the original >> throughput. > > What was the original throughput? 1GB?The total bandwith for both cases should be 10G, since we are using an 10G NIC card. For case 1, the original throughput is 3539.33Gbps. And for case 2, the original throughput is 1310.72Gbps. Thanks, Dongxiao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2010-Apr-27 03:02 UTC
RE: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
Xu, Dongxiao wrote:> Hi Jeremy and all, > > I''d like to make an update on these patches. The main logic is not > changed, and I only did a rebase towards the upstream pv-ops kernel. > See attached patch. The original patches are checked in Jeremy''s > netback-tasklet branch. > > Let me explain the main idea of the patchset again: > > Current netback uses one pair of tasklets for Tx/Rx data transaction. > Netback tasklet could only run at one CPU at a time, and it is used to > serve all the netfronts. Therefore it has become a performance bottle > neck. This patch is to use multiple tasklet pairs to replace the > current single pair in dom0. > > Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of > tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets > serve specific group of netfronts. Also for those global and static > variables, we duplicated them for each group in order to avoid the > spinlock. > > PATCH 01: Generilize static/global variables into ''struct xen_netbk''. > > PATCH 02: Multiple tasklets support. > > PATCH 03: Use Kernel thread to replace the tasklet. > > Recently I re-tested the patchset with Intel 10G multi-queue NIC > device, and use 10 outside 1G NICs to do netperf tests with that 10G > NIC.Here are more descriptions about the test: In the host side, we launch 10 HVM guests, each is installed with PV VNIF driver, all the vif interfaces are bound with a 10G NIC through bridge. So the 10 guests are sharing the 10G bandwidth. Outside the host, we use 10 1G NIC interfaces to do netperf test with the 10 HVM guests. Thanks, Dongxiao> > Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU. > With the patchset, the performance is 2x of the original throughput. > > Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs. > With the patchset, the performance is 3.7x of the original throughput. > > when we test this patch, we found that the domain_lock in grant table > operation (gnttab_copy()) becomes a bottle neck. We temporarily > remove the global domain_lock to achieve good performance. > > Thanks, > Dongxiao > > Jeremy Fitzhardinge wrote: >> On 12/09/09 19:29, Xu, Dongxiao wrote: >>>> Also, is it worth making it a tunable? Presumably it needn''t scale >>>> exactly with the number of dom0 cpus; if you only have one or two >>>> gbit interfaces, then you could saturate that pretty quickly with a >>>> small number of cpus, regardless of how many domains you have. >>>> >>> How many CPUs are serving for the NIC interface is determined by how >>> interrupt is delivered. If system only has two gbit interfaces, and >>> they delivier interrupts to CPU0 and CPU1, then the case is: two >>> CPUs handle two tasklets. Other CPUs are idle. The group_nr just >>> defines the max number of tasklets, however it doesn''t decide how >>> tasklet is handled by CPU. >>> >> >> So does this mean that a given vcpu will be used to handle the >> interrupt if happens to be running on a pcpu with affinity for the >> device? Or that particular devices will be handled by particular >> vcpus? >> >>>> I''ve pushed this out in its own branch: >>>> xen/dom0/backend/netback-tasklet; please post any future patches >>>> against this branch. >>>> >>> What''s my next step for this netback-tasklet tree merging into >>> xen/master? >>> >> >> Hm, well, I guess: >> >> * I''d like to see some comments Keir/Ian(s)/others that this is >> basically the right approach. It looks OK to me, but I don''t >> have much experience with performance in the field. >> o does nc2 make nc1 obsolete? >> * Testing to make sure it really works. Netback is clearly >> critical functionality, so I''d like to be sure we''re not >> introducing big regressions >> >> J > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2010-Apr-27 10:49 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
> I''d like to make an update on these patches. The main logic is not > changed, and I only did a rebase towards the upstream pv-ops kernel. > See attached patch. The original patches are checked in Jeremy''s > netback-tasklet branch.I have a couple of (quite minor) comments on the patches: 0001-Netback-Generilize-static-global-variables-into-stru.txt:> diff --git a/drivers/xen/netback/netback.c b/drivers/xen/netback/netback.c > index c24debf..a484b0a 100644 > --- a/drivers/xen/netback/netback.c > +++ b/drivers/xen/netback/netback.c > @@ -49,18 +49,13 @@ > > /*define NETBE_DEBUG_INTERRUPT*/ > > +struct xen_netbk *xen_netbk;> +int group_nr = 1; > +struct page_foreign_tracker *foreign_page_tracker;I think these two would benefit from more descriptive names, given that they''re not static. If I was feeling pedantic, I''d complain that this includes some bits of support for multiple struct xen_netbks, rather than just moving all of the fields around, which reduces its obviously-correct-ness quite a bit. Even more pedantically, it might be better to pass around a struct xen_netbk in a few places, rather than an int group, so that you get better compiler type checking. 0002-Netback-Multiple-tasklets-support.txt: Design question: you do your balancing by making each tasklet serve roughly equal numbers of remote domains. Would it not have been easier to make them serve equal numbers of netfronts? You could then get rid of the domain_entry business completely and just have a count of serviced netfronts in each xen_netbk, which might be a bit easier to deal with.> diff --git a/drivers/xen/netback/interface.c b/drivers/xen/netback/interface.c > index 21c1f95..bdf3c1d 100644 > --- a/drivers/xen/netback/interface.c > +++ b/drivers/xen/netback/interface.c > @@ -54,6 +54,59 @@ > static unsigned long netbk_queue_length = 32; > module_param_named(queue_length, netbk_queue_length, ulong, 0644); > > > +static void remove_domain_from_list(struct xen_netbk *netbk, > + struct xen_netif *netif) > +{ > + struct domain_entry *dom_entry = NULL; > + int group = netif->group; > + > + list_for_each_entry(dom_entry, > + &netbk[group].group_domain_list, dom) { > + if (dom_entry->domid == netif->domid) > + break; > + } > + if (!dom_entry) > + return;Can this ever happen? If so, you might have issues when several netfronts all end in the same frontend domain e.g.: -- netfront A arrives and is added to list -- netfront B arrives and is added to list -- netfront B is removed -- netfront B is removed again. It''s no longer in the list, but A is, so A gets removed instead. The end result being that netback thinks the group is idle, but it actually has netfront A in it. I guess the worst that can happen is that you don''t balance across tasklets properly, but it''d still be better avoided. If it *can''t* happen, there should probably be some kind of warning when it does.> + > + spin_lock(&netbk[netif->group].group_domain_list_lock); > + netbk[netif->group].group_domain_nr--; > + list_del(&dom_entry->dom); > + spin_unlock(&netbk[netif->group].group_domain_list_lock); > + kfree(dom_entry); > +} > + > static void __netif_up(struct xen_netif *netif) > { > enable_irq(netif->irq); > > @@ -70,6 +123,7 @@ static int net_open(struct net_device *dev) > { > struct xen_netif *netif = netdev_priv(dev); > if (netback_carrier_ok(netif)) { > + add_domain_to_list(xen_netbk, group_nr, netif); > __netif_up(netif); > netif_start_queue(dev); > } > @@ -79,8 +133,10 @@ static int net_open(struct net_device *dev) > static int net_close(struct net_device *dev) > { > struct xen_netif *netif = netdev_priv(dev); > - if (netback_carrier_ok(netif)) > + if (netback_carrier_ok(netif)) { > __netif_down(netif); > + remove_domain_from_list(xen_netbk, netif); > + } > netif_stop_queue(dev); > return 0; > }Okay, so if the interface gets down''d and then up''d it''ll potentially move to a different group. How much testing has that situation had? I''d be tempted to add the interface to the list as soon as it''s created and then leave it there until it''s removed, and not rebalance during its lifetime at all, and hence avoid the issue.> @@ -1570,6 +1570,7 @@ static int __init netback_init(void) > /* We can increase reservation by this much in net_rx_action(). */ > // balloon_update_driver_allowance(NET_RX_RING_SIZE); > > + group_nr = num_online_cpus(); > xen_netbk = kzalloc(group_nr * sizeof(struct xen_netbk), GFP_KERNEL); > if (!xen_netbk) { > printk(KERN_ALERT "%s: out of memory\n", __func__);What happens if the number of online CPUs changes while netback is running? In particular, do we stop trying to send a tasklet/thread to a CPU which has been offlined? 0003-Use-Kernel-thread-to-replace-the-tasklet.txt:> diff --git a/drivers/xen/netback/netback.c b/drivers/xen/netback/netback.c > index 773cd4f..8b55efc 100644 > --- a/drivers/xen/netback/netback.c > +++ b/drivers/xen/netback/netback.c > +static int netbk_action_thread(void *index) > +{ > + int group = (long)index; > + struct xen_netbk *netbk = &xen_netbk[group]; > + while (1) { > + wait_event_interruptible(netbk->netbk_action_wq, > + rx_work_todo(group) > + || tx_work_todo(group)); > + cond_resched(); > + > + if (rx_work_todo(group)) > + net_rx_action(group); > + > + if (tx_work_todo(group)) > + net_tx_action(group); > + }Hmm... You use kthread_stop() on this thread, so you should probably test kthread_should_stop() every so often.> + > + return 0; > +} > + > + > static int __init netback_init(void) > { > int i;Apart from that, it all looks fine to me. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Apr-27 18:37 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
On 04/27/2010 03:49 AM, Steven Smith wrote:>> I''d like to make an update on these patches. The main logic is not >> changed, and I only did a rebase towards the upstream pv-ops kernel. >> See attached patch. The original patches are checked in Jeremy''s >> netback-tasklet branch. >> > I have a couple of (quite minor) comments on the patches: > > 0001-Netback-Generilize-static-global-variables-into-stru.txt: > >> diff --git a/drivers/xen/netback/netback.c b/drivers/xen/netback/netback.c >> index c24debf..a484b0a 100644 >> --- a/drivers/xen/netback/netback.c >> +++ b/drivers/xen/netback/netback.c >> @@ -49,18 +49,13 @@ >> >> /*define NETBE_DEBUG_INTERRUPT*/ >> >> +struct xen_netbk *xen_netbk; >> > >> +int group_nr = 1; >> +struct page_foreign_tracker *foreign_page_tracker; >> > I think these two would benefit from more descriptive names, given > that they''re not static. > >Yes. Actually I thought I raised the same points the first time through and Dongxiao had posted patches addressing them. I have to admit I haven''t looked at the reposted patches in detail yet. Have we suffered a regression here? Hm, maybe its just this issue which slipped through.> If I was feeling pedantic, I''d complain that this includes some bits > of support for multiple struct xen_netbks, rather than just moving all > of the fields around, which reduces its obviously-correct-ness quite a > bit. >I''m always in favour of having more, smaller, functionally distinct patches, so if this patch can be further subdivided then it would be nice to do so. It''s especially useful when trying to track down regressions via bisection.> Even more pedantically, it might be better to pass around a struct > xen_netbk in a few places, rather than an int group, so that you get > better compiler type checking. >+1> Apart from that, it all looks fine to me. >Thanks for looking at this. It had been missing the gaze of some networking-savvy eyes. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2010-Apr-28 09:31 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
> >> diff --git a/drivers/xen/netback/netback.c b/drivers/xen/netback/netback.c > >> index c24debf..a484b0a 100644 > >> --- a/drivers/xen/netback/netback.c > >> +++ b/drivers/xen/netback/netback.c > >> @@ -49,18 +49,13 @@ > >> > >> /*define NETBE_DEBUG_INTERRUPT*/ > >> > >> +struct xen_netbk *xen_netbk; > >> > > > >> +int group_nr = 1; > >> +struct page_foreign_tracker *foreign_page_tracker; > >> > > I think these two would benefit from more descriptive names, given > > that they''re not static. > > Yes. Actually I thought I raised the same points the first time through > and Dongxiao had posted patches addressing them. I have to admit I > haven''t looked at the reposted patches in detail yet. Have we suffered > a regression here? > > Hm, maybe its just this issue which slipped through.I think so, yes (assuming the patches posted on the 26th of April are the most recent version).> > Apart from that, it all looks fine to me. > Thanks for looking at this. It had been missing the gaze of some > networking-savvy eyes.There is one other potential issue which just occurred to me. These patches assign netifs to groups pretty much arbitrarily, beyond trying to keep the groups balanced. It might be better to try to group interfaces so that the tasklet runs on the same VCPU as the interrupt i.e. grouping interfaces according to interrupt affinity. That would have two main benefits: -- Less cross-VCPU traffic, and hence better cache etc. behaviour. -- Potentially better balancing. If you find that you''ve accidentally assigned two high-traffic interfaces to the same group, irqbalance or whatnot should rebalance the interrupts to different vcpus, but that doesn''t automatically do us much good because most of the work is done in the tasklet (which will still only run on one vcpu and hence become a bottleneck). If we rebalanced the netif groups when irqbalance rebalanced the interrupts then we''d bypass the issue. Of course, someone would need to go and implement the rebalance-in-response-to-irqbalance, which would be non-trivial. You could imagine doing it by just getting rid of the explicit group field in struct xen_netif and using smp_processor_id() instead, but that would need quite a bit of extra thought about what happens if e.g. the start_xmit and the tx complete interrupt happen on different vcpus. It sounds like the patch provides a useful improvement as it stands, and the rebalancing would probably be a bit of a pain, so I don''t think this is a blocker to an immediate merge, but it''d be nice if someone could look at it eventually. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2010-Apr-28 10:27 UTC
RE: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
Hi Steven, Thanks for your careful review. Some explaination inline. For your consideration of group and interrupt affinity, I will reply in another thread. Thanks, Dongxiao Steven Smith wrote:>> I''d like to make an update on these patches. The main logic is not >> changed, and I only did a rebase towards the upstream pv-ops kernel. >> See attached patch. The original patches are checked in Jeremy''s >> netback-tasklet branch. > I have a couple of (quite minor) comments on the patches: > > 0001-Netback-Generilize-static-global-variables-into-stru.txt: >> diff --git a/drivers/xen/netback/netback.c >> b/drivers/xen/netback/netback.c index c24debf..a484b0a 100644 --- >> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c >> @@ -49,18 +49,13 @@ >> >> /*define NETBE_DEBUG_INTERRUPT*/ >> >> +struct xen_netbk *xen_netbk; > >> +int group_nr = 1; >> +struct page_foreign_tracker *foreign_page_tracker; > I think these two would benefit from more descriptive names, given > that they''re not static.Oops...I thought I had modified this when Jeremy commented this last time, maybe there was some mistake and I left it until today... I swear I will change this in next version of patchset. ;-)> > > > If I was feeling pedantic, I''d complain that this includes some bits > of support for multiple struct xen_netbks, rather than just moving all > of the fields around, which reduces its obviously-correct-ness quite a > bit.Actually I was struggling how to split the first patch into small ones, however I don''t have much idea since the patch changes the function Interface/data structure, so the corresponding caller needs change too, which generates a 1k line of patch...> > Even more pedantically, it might be better to pass around a struct > xen_netbk in a few places, rather than an int group, so that you get > better compiler type checking.I will change this in next version of patch.> > > > 0002-Netback-Multiple-tasklets-support.txt: > Design question: you do your balancing by making each tasklet serve > roughly equal numbers of remote domains. Would it not have been > easier to make them serve equal numbers of netfronts? You could then > get rid of the domain_entry business completely and just have a count > of serviced netfronts in each xen_netbk, which might be a bit easier > to deal with.According to my understanding, one guest with VNIF driver represented by one netfront. Is this true? Therefore I don''t understand the difference between "number of domains" and "number of netfronts", I used to thought they were the same. Please correct me my understanding is wrong. Actually in the very early stage of this patch, I use a simple method to identify which group does a netfront belong to, by calculating (domid % online_cpu_nr()). The advantage is simple, however it may cause netfront count unbalanced between different groups. I will try to remove "domain_entry" related code in next version patch.> >> diff --git a/drivers/xen/netback/interface.c >> b/drivers/xen/netback/interface.c index 21c1f95..bdf3c1d 100644 --- >> a/drivers/xen/netback/interface.c +++ >> b/drivers/xen/netback/interface.c @@ -54,6 +54,59 @@ >> static unsigned long netbk_queue_length = 32; >> module_param_named(queue_length, netbk_queue_length, ulong, 0644); >> >> >> +static void remove_domain_from_list(struct xen_netbk *netbk, >> + struct xen_netif *netif) >> +{ >> + struct domain_entry *dom_entry = NULL; >> + int group = netif->group; >> + >> + list_for_each_entry(dom_entry, >> + &netbk[group].group_domain_list, dom) { >> + if (dom_entry->domid == netif->domid) >> + break; >> + } >> + if (!dom_entry) >> + return; > Can this ever happen? If so, you might have issues when several > netfronts all end in the same frontend domain e.g.: > > -- netfront A arrives and is added to list > -- netfront B arrives and is added to list > -- netfront B is removed > -- netfront B is removed again. It''s no longer in the list, > but A is, so A gets removed instead. > > The end result being that netback thinks the group is idle, but it > actually has netfront A in it. I guess the worst that can happen is > that you don''t balance across tasklets properly, but it''d still be > better avoided. > > If it *can''t* happen, there should probably be some kind of warning > when it does. > >> + >> + spin_lock(&netbk[netif->group].group_domain_list_lock); >> + netbk[netif->group].group_domain_nr--; >> + list_del(&dom_entry->dom); >> + spin_unlock(&netbk[netif->group].group_domain_list_lock); >> + kfree(dom_entry); +} >> + >> static void __netif_up(struct xen_netif *netif) >> { >> enable_irq(netif->irq); >> >> @@ -70,6 +123,7 @@ static int net_open(struct net_device *dev) { >> struct xen_netif *netif = netdev_priv(dev); >> if (netback_carrier_ok(netif)) { >> + add_domain_to_list(xen_netbk, group_nr, netif); >> __netif_up(netif); >> netif_start_queue(dev); >> } >> @@ -79,8 +133,10 @@ static int net_open(struct net_device *dev) >> static int net_close(struct net_device *dev) >> { >> struct xen_netif *netif = netdev_priv(dev); >> - if (netback_carrier_ok(netif)) >> + if (netback_carrier_ok(netif)) { >> __netif_down(netif); >> + remove_domain_from_list(xen_netbk, netif); >> + } >> netif_stop_queue(dev); >> return 0; >> } > Okay, so if the interface gets down''d and then up''d it''ll potentially > move to a different group. How much testing has that situation had? > > I''d be tempted to add the interface to the list as soon as it''s > created and then leave it there until it''s removed, and not rebalance > during its lifetime at all, and hence avoid the issue. > >> @@ -1570,6 +1570,7 @@ static int __init netback_init(void) >> /* We can increase reservation by this much in net_rx_action(). */ >> // balloon_update_driver_allowance(NET_RX_RING_SIZE); >> >> + group_nr = num_online_cpus(); >> xen_netbk = kzalloc(group_nr * sizeof(struct xen_netbk), >> GFP_KERNEL); if (!xen_netbk) { printk(KERN_ALERT "%s: out of >> memory\n", __func__); > What happens if the number of online CPUs changes while netback is > running? In particular, do we stop trying to send a tasklet/thread to > a CPU which has been offlined?The group_nr just defines the max number of tasklets, however it doesn''t decide which tasklet is handled by which CPU. It is decided by the delivery of interrupt.> > > 0003-Use-Kernel-thread-to-replace-the-tasklet.txt: >> diff --git a/drivers/xen/netback/netback.c >> b/drivers/xen/netback/netback.c index 773cd4f..8b55efc 100644 --- >> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c >> +static int netbk_action_thread(void *index) >> +{ >> + int group = (long)index; >> + struct xen_netbk *netbk = &xen_netbk[group]; >> + while (1) { >> + wait_event_interruptible(netbk->netbk_action_wq, >> + rx_work_todo(group) + || tx_work_todo(group)); >> + cond_resched(); >> + >> + if (rx_work_todo(group)) >> + net_rx_action(group); >> + >> + if (tx_work_todo(group)) >> + net_tx_action(group); >> + } > Hmm... You use kthread_stop() on this thread, so you should probably > test kthread_should_stop() every so often.OK, I will modify it.> >> + >> + return 0; >> +} >> + >> + >> static int __init netback_init(void) >> { >> int i; > > > Apart from that, it all looks fine to me. > > Steven._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2010-Apr-28 11:36 UTC
RE: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
Steven Smith wrote:>>>> diff --git a/drivers/xen/netback/netback.c >>>> b/drivers/xen/netback/netback.c index c24debf..a484b0a 100644 --- >>>> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c >>>> @@ -49,18 +49,13 @@ >>>> >>>> /*define NETBE_DEBUG_INTERRUPT*/ >>>> >>>> +struct xen_netbk *xen_netbk; >>>> >>> >>>> +int group_nr = 1; >>>> +struct page_foreign_tracker *foreign_page_tracker; >>>> >>> I think these two would benefit from more descriptive names, given >>> that they''re not static. >> >> Yes. Actually I thought I raised the same points the first time >> through and Dongxiao had posted patches addressing them. I have to >> admit I haven''t looked at the reposted patches in detail yet. Have >> we suffered a regression here? >> >> Hm, maybe its just this issue which slipped through. > I think so, yes (assuming the patches posted on the 26th of April are > the most recent version). > >>> Apart from that, it all looks fine to me. >> Thanks for looking at this. It had been missing the gaze of some >> networking-savvy eyes. > There is one other potential issue which just occurred to me. These > patches assign netifs to groups pretty much arbitrarily, beyond trying > to keep the groups balanced. It might be better to try to group > interfaces so that the tasklet runs on the same VCPU as the interrupt > i.e. grouping interfaces according to interrupt affinity. That would > have two main benefits: > > -- Less cross-VCPU traffic, and hence better cache etc. behaviour. > -- Potentially better balancing. If you find that you''ve accidentally > assigned two high-traffic interfaces to the same group, irqbalance > or whatnot should rebalance the interrupts to different vcpus, but > that doesn''t automatically do us much good because most of the work > is done in the tasklet (which will still only run on one vcpu and > hence become a bottleneck). If we rebalanced the netif groups when > irqbalance rebalanced the interrupts then we''d bypass the issue. > > Of course, someone would need to go and implement the > rebalance-in-response-to-irqbalance, which would be non-trivial.Your idea is workable if the netfront is bound with a single queue NIC via a bridge. Hence we know which interrupt is used to serve the netfront, and then we can group netfronts according to the interrupt affinity. And as you said, the effort is non-trivial. However if the NIC is multi-queued, which has only one interface but multiple interrupt queues. All netfronts are bounded with this interface via one bridge. We have no idea which interrupt queue is serving for a specified netfront. So the rebalance according to interrupt affinity is a challenge. Do you have idea on this point? Thanks, Dongxiao> > You could imagine doing it by just getting rid of the explicit group > field in struct xen_netif and using smp_processor_id() instead, but > that would need quite a bit of extra thought about what happens if > e.g. the start_xmit and the tx complete interrupt happen on different > vcpus. > > It sounds like the patch provides a useful improvement as it stands, > and the rebalancing would probably be a bit of a pain, so I don''t > think this is a blocker to an immediate merge, but it''d be nice if > someone could look at it eventually. > > Steven._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2010-Apr-28 11:51 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
> >> I''d like to make an update on these patches. The main logic is not > >> changed, and I only did a rebase towards the upstream pv-ops kernel. > >> See attached patch. The original patches are checked in Jeremy''s > >> netback-tasklet branch. > > I have a couple of (quite minor) comments on the patches: > > > > 0001-Netback-Generilize-static-global-variables-into-stru.txt: > >> diff --git a/drivers/xen/netback/netback.c > >> b/drivers/xen/netback/netback.c index c24debf..a484b0a 100644 --- > >> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c > >> @@ -49,18 +49,13 @@ > >> > >> /*define NETBE_DEBUG_INTERRUPT*/ > >> > >> +struct xen_netbk *xen_netbk; > > > >> +int group_nr = 1; > >> +struct page_foreign_tracker *foreign_page_tracker; > > I think these two would benefit from more descriptive names, given > > that they''re not static. > Oops...I thought I had modified this when Jeremy commented this > last time, maybe there was some mistake and I left it until today...Easily done.> > If I was feeling pedantic, I''d complain that this includes some bits > > of support for multiple struct xen_netbks, rather than just moving all > > of the fields around, which reduces its obviously-correct-ness quite a > > bit. > Actually I was struggling how to split the first patch into small ones, > however I don''t have much idea since the patch changes the function > Interface/data structure, so the corresponding caller needs change too, > which generates a 1k line of patch...The approach I would take would be something like this: 1) Gather all the global data together into a struct xen_netbk, and then have a single, global, instance of that structure. Go through and turn every reference to a bit of global data into a reference to a field of that structure. This will be a massive patch, but it''s purely mechanical and it''s very easy to check that it''s safe. 2) Introduce struct ext_page and use it everywhere you use it in the current patch. This should be fairly small. 3) Generalise to multiple struct xen_netbks by changing the single global instance into a struct xen_netbk * and fixing the resulting compile errors. Another big patch, but provided you remember to initialise everything properly the compiler will check almost all of it for you. This is to some extent a bikeshed argument, so if you prefer the current patch structure then that would work as well.> > Even more pedantically, it might be better to pass around a struct > > xen_netbk in a few places, rather than an int group, so that you get > > better compiler type checking. > I will change this in next version of patch.Thanks.> > 0002-Netback-Multiple-tasklets-support.txt: > > Design question: you do your balancing by making each tasklet serve > > roughly equal numbers of remote domains. Would it not have been > > easier to make them serve equal numbers of netfronts? You could then > > get rid of the domain_entry business completely and just have a count > > of serviced netfronts in each xen_netbk, which might be a bit easier > > to deal with. > According to my understanding, one guest with VNIF driver represented > by one netfront. Is this true? Therefore I don''t understand the difference > between "number of domains" and "number of netfronts", I used to thought > they were the same. Please correct me my understanding is wrong.I think we might be using slightly different terminology here. When I say ``netfront'''', I mean the frontend half of a virtual network interface, rather than the netfront driver, so a single domain can be configured with multiple netfronts in the same way that a single physical host can have multiple ixgbes (say), despite only having one ixgbe driver loaded. So, my original point was that it might be better to balance interfaces such that the number of interfaces in each group is the same, ignoring the frontend domain ID completely. This would mean that if, for instance, a domain had two very busy NICs then they wouldn''t be forced to share a tasklet, which might otherwise be a bottleneck. The downside, of course, is that it would allow domains with multiple vNICs to use more dom0 CPU time, potentially aggravating starvation and unfairness problems. On the other hand, a domain with N vNICs wouldn''t be able to do any more damage than N domains with 1 vNIC each, so I don''t think it''s too bad.> Actually in the very early stage of this patch, I use a simple method to > identify which group does a netfront belong to, by calculating > (domid % online_cpu_nr()). The advantage is simple, however it may > cause netfront count unbalanced between different groups.Well, any static scheme will potentially come unbalanced some of the time, if different interfaces experience different levels of traffic. But see the other thread for another discussion of balancing issues.> I will try to remove "domain_entry" related code in next version patch.Thanks.> >> @@ -1570,6 +1570,7 @@ static int __init netback_init(void) > >> /* We can increase reservation by this much in net_rx_action(). */ > >> // balloon_update_driver_allowance(NET_RX_RING_SIZE); > >> > >> + group_nr = num_online_cpus(); > >> xen_netbk = kzalloc(group_nr * sizeof(struct xen_netbk), > >> GFP_KERNEL); if (!xen_netbk) { printk(KERN_ALERT "%s: out of > >> memory\n", __func__); > > What happens if the number of online CPUs changes while netback is > > running? In particular, do we stop trying to send a tasklet/thread to > > a CPU which has been offlined? > The group_nr just defines the max number of tasklets, however it doesn''t decide > which tasklet is handled by which CPU. It is decided by the delivery of interrupt.Ah, yes, so it is. Thanks for explaining it.> > 0003-Use-Kernel-thread-to-replace-the-tasklet.txt: > >> diff --git a/drivers/xen/netback/netback.c > >> b/drivers/xen/netback/netback.c index 773cd4f..8b55efc 100644 --- > >> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c > >> +static int netbk_action_thread(void *index) > >> +{ > >> + int group = (long)index; > >> + struct xen_netbk *netbk = &xen_netbk[group]; > >> + while (1) { > >> + wait_event_interruptible(netbk->netbk_action_wq, > >> + rx_work_todo(group) + || tx_work_todo(group)); > >> + cond_resched(); > >> + > >> + if (rx_work_todo(group)) > >> + net_rx_action(group); > >> + > >> + if (tx_work_todo(group)) > >> + net_tx_action(group); > >> + } > > Hmm... You use kthread_stop() on this thread, so you should probably > > test kthread_should_stop() every so often. > OK, I will modify it.Thanks. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2010-Apr-28 12:04 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
> >>> Apart from that, it all looks fine to me. > >> Thanks for looking at this. It had been missing the gaze of some > >> networking-savvy eyes. > > There is one other potential issue which just occurred to me. These > > patches assign netifs to groups pretty much arbitrarily, beyond trying > > to keep the groups balanced. It might be better to try to group > > interfaces so that the tasklet runs on the same VCPU as the interrupt > > i.e. grouping interfaces according to interrupt affinity. That would > > have two main benefits: > > > > -- Less cross-VCPU traffic, and hence better cache etc. behaviour. > > -- Potentially better balancing. If you find that you''ve accidentally > > assigned two high-traffic interfaces to the same group, irqbalance > > or whatnot should rebalance the interrupts to different vcpus, but > > that doesn''t automatically do us much good because most of the work > > is done in the tasklet (which will still only run on one vcpu and > > hence become a bottleneck). If we rebalanced the netif groups when > > irqbalance rebalanced the interrupts then we''d bypass the issue. > > > > Of course, someone would need to go and implement the > > rebalance-in-response-to-irqbalance, which would be non-trivial. > Your idea is workable if the netfront is bound with a single queue > NIC via a bridge. Hence we know which interrupt is used to serve the > netfront, and then we can group netfronts according to the interrupt > affinity. And as you said, the effort is non-trivial. > > However if the NIC is multi-queued, which has only one interface but > multiple interrupt queues. All netfronts are bounded with this interface > via one bridge. We have no idea which interrupt queue is serving for > a specified netfront. So the rebalance according to interrupt affinity > is a challenge. Do you have idea on this point?Sorry, I should have been clearer here. When I said ``interrupt'''' I meant the event channel interrupt which the netfront instance will use to notify netback, not the physical hardware interrupt of whatever physical NIC is ultimately associated with it. We should always know which event channel a given netfront is using, and hence which interrupt, and so we should be able to find out its affinity. In effect, we''d rebalance in response to messages from the guest to netback, which is at least vaguely reasonable as a proxy for actual load. There are at least three relevant contexts here: -- Interrupts generated by the hardware -- The netback tasklets -- Interrupts generated by the guest As you say, doing anything based on where hardware interrupts are being delivered is somewhere between hard an impossible, but it might be possible to do something useful with the interrupts from the guest. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2010-Apr-28 12:23 UTC
RE: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
Steven Smith wrote:>>>> I''d like to make an update on these patches. The main logic is not >>>> changed, and I only did a rebase towards the upstream pv-ops >>>> kernel. See attached patch. The original patches are checked in >>>> Jeremy''s netback-tasklet branch. >>> I have a couple of (quite minor) comments on the patches: >>> >>> 0001-Netback-Generilize-static-global-variables-into-stru.txt: >>>> diff --git a/drivers/xen/netback/netback.c >>>> b/drivers/xen/netback/netback.c index c24debf..a484b0a 100644 --- >>>> a/drivers/xen/netback/netback.c +++ >>>> b/drivers/xen/netback/netback.c @@ -49,18 +49,13 @@ >>>> >>>> /*define NETBE_DEBUG_INTERRUPT*/ >>>> >>>> +struct xen_netbk *xen_netbk; >>> >>>> +int group_nr = 1; >>>> +struct page_foreign_tracker *foreign_page_tracker; >>> I think these two would benefit from more descriptive names, given >>> that they''re not static. >> Oops...I thought I had modified this when Jeremy commented this >> last time, maybe there was some mistake and I left it until today... >> Easily done. > >>> If I was feeling pedantic, I''d complain that this includes some bits >>> of support for multiple struct xen_netbks, rather than just moving >>> all of the fields around, which reduces its obviously-correct-ness >>> quite a bit. >> Actually I was struggling how to split the first patch into small >> ones, >> however I don''t have much idea since the patch changes the function >> Interface/data structure, so the corresponding caller needs change >> too, >> which generates a 1k line of patch... > The approach I would take would be something like this: > > 1) Gather all the global data together into a struct xen_netbk, and > then have a single, global, instance of that structure. Go through > and turn every reference to a bit of global data into a reference > to a field of that structure. This will be a massive patch, but > it''s purely mechanical and it''s very easy to check that it''s safe. > > 2) Introduce struct ext_page and use it everywhere you use it in the > current patch. This should be fairly small. > > 3) Generalise to multiple struct xen_netbks by changing the single > global instance into a struct xen_netbk * and fixing the resulting > compile errors. Another big patch, but provided you remember to > initialise everything properly the compiler will check almost all > of it for you. > > This is to some extent a bikeshed argument, so if you prefer the > current patch structure then that would work as well.Thanks for your suggestion, I will have a try on this.> >>> Even more pedantically, it might be better to pass around a struct >>> xen_netbk in a few places, rather than an int group, so that you get >>> better compiler type checking. >> I will change this in next version of patch. Thanks. > >>> 0002-Netback-Multiple-tasklets-support.txt: >>> Design question: you do your balancing by making each tasklet serve >>> roughly equal numbers of remote domains. Would it not have been >>> easier to make them serve equal numbers of netfronts? You could >>> then get rid of the domain_entry business completely and just have >>> a count of serviced netfronts in each xen_netbk, which might be a >>> bit easier to deal with. >> According to my understanding, one guest with VNIF driver represented >> by one netfront. Is this true? Therefore I don''t understand the >> difference between "number of domains" and "number of netfronts", I >> used to thought >> they were the same. Please correct me my understanding is wrong. > I think we might be using slightly different terminology here. When I > say ``netfront'''', I mean the frontend half of a virtual network > interface, rather than the netfront driver, so a single domain can be > configured with multiple netfronts in the same way that a single > physical host can have multiple ixgbes (say), despite only having one > ixgbe driver loaded. > > So, my original point was that it might be better to balance > interfaces such that the number of interfaces in each group is the > same, ignoring the frontend domain ID completely. This would mean > that if, for instance, a domain had two very busy NICs then they > wouldn''t be forced to share a tasklet, which might otherwise be a > bottleneck. > > The downside, of course, is that it would allow domains with multiple > vNICs to use more dom0 CPU time, potentially aggravating starvation > and unfairness problems. On the other hand, a domain with N vNICs > wouldn''t be able to do any more damage than N domains with 1 vNIC > each, so I don''t think it''s too bad.It''s my mis-understanding previously, thanks for explaination on this point. Regards, Dongxiao> >> Actually in the very early stage of this patch, I use a simple >> method to >> identify which group does a netfront belong to, by calculating >> (domid % online_cpu_nr()). The advantage is simple, however it may >> cause netfront count unbalanced between different groups. > Well, any static scheme will potentially come unbalanced some of the > time, if different interfaces experience different levels of traffic. > But see the other thread for another discussion of balancing issues. > >> I will try to remove "domain_entry" related code in next version >> patch. Thanks. > >>>> @@ -1570,6 +1570,7 @@ static int __init netback_init(void) >>>> /* We can increase reservation by this much in net_rx_action(). >>>> */ // balloon_update_driver_allowance(NET_RX_RING_SIZE); >>>> >>>> + group_nr = num_online_cpus(); >>>> xen_netbk = kzalloc(group_nr * sizeof(struct xen_netbk), >>>> GFP_KERNEL); if (!xen_netbk) { printk(KERN_ALERT "%s: out of >>>> memory\n", __func__); >>> What happens if the number of online CPUs changes while netback is >>> running? In particular, do we stop trying to send a tasklet/thread >>> to a CPU which has been offlined? >> The group_nr just defines the max number of tasklets, however it >> doesn''t decide which tasklet is handled by which CPU. It is decided >> by the delivery of interrupt. > Ah, yes, so it is. Thanks for explaining it. > >>> 0003-Use-Kernel-thread-to-replace-the-tasklet.txt: >>>> diff --git a/drivers/xen/netback/netback.c >>>> b/drivers/xen/netback/netback.c index 773cd4f..8b55efc 100644 --- >>>> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c >>>> +static int netbk_action_thread(void *index) >>>> +{ >>>> + int group = (long)index; >>>> + struct xen_netbk *netbk = &xen_netbk[group]; >>>> + while (1) { >>>> + wait_event_interruptible(netbk->netbk_action_wq, >>>> + rx_work_todo(group) + || tx_work_todo(group)); >>>> + cond_resched(); + >>>> + if (rx_work_todo(group)) >>>> + net_rx_action(group); >>>> + >>>> + if (tx_work_todo(group)) >>>> + net_tx_action(group); >>>> + } >>> Hmm... You use kthread_stop() on this thread, so you should probably >>> test kthread_should_stop() every so often. >> OK, I will modify it. > Thanks. > > Steven._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Apr-28 12:43 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
>>> Steven Smith <steven.smith@citrix.com> 28.04.10 13:51 >>> >2) Introduce struct ext_page and use it everywhere you use it in the > current patch. This should be fairly small.In working through the patches to make them usable on our forward ported trees, I wondered what this is good for at all, for two reasons: On 64-bits embedding the data directly into page->mapping would be possible without any other consideration. Even on 32-bits embedding is possible based on the observation that the two fields together don''t need more than 32 bits (idx always being less than MAX_PENDING_REQS [which itself could even grow significantly] and group being bounded by NR_CPUS).>I think we might be using slightly different terminology here. When I >say ``netfront'''', I mean the frontend half of a virtual network >interface, rather than the netfront driver, so a single domain can be >configured with multiple netfronts in the same way that a single >physical host can have multiple ixgbes (say), despite only having one >ixgbe driver loaded. > >So, my original point was that it might be better to balance >interfaces such that the number of interfaces in each group is the >same, ignoring the frontend domain ID completely. This would mean >that if, for instance, a domain had two very busy NICs then they >wouldn''t be forced to share a tasklet, which might otherwise be a >bottleneck.As you had pointed out in an earlier reply, the use of the domain ID here is flawed anyway - it had to be replaced for the whole set to be usable for us. We count netif-s and balance based on that count, at once eliminating the need to do any allocation when adding a new netif. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2010-Apr-28 13:33 UTC
RE: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
Steven Smith wrote:>>>>> Apart from that, it all looks fine to me. >>>> Thanks for looking at this. It had been missing the gaze of some >>>> networking-savvy eyes. >>> There is one other potential issue which just occurred to me. These >>> patches assign netifs to groups pretty much arbitrarily, beyond >>> trying to keep the groups balanced. It might be better to try to >>> group interfaces so that the tasklet runs on the same VCPU as the >>> interrupt i.e. grouping interfaces according to interrupt affinity. >>> That would have two main benefits: >>> >>> -- Less cross-VCPU traffic, and hence better cache etc. behaviour. >>> -- Potentially better balancing. If you find that you''ve >>> accidentally assigned two high-traffic interfaces to the same >>> group, irqbalance or whatnot should rebalance the interrupts to >>> different vcpus, but that doesn''t automatically do us much good >>> because most of the work is done in the tasklet (which will >>> still only run on one vcpu and hence become a bottleneck). If >>> we rebalanced the netif groups when irqbalance rebalanced the >>> interrupts then we''d bypass the issue. >>> >>> Of course, someone would need to go and implement the >>> rebalance-in-response-to-irqbalance, which would be non-trivial. >> Your idea is workable if the netfront is bound with a single queue >> NIC via a bridge. Hence we know which interrupt is used to serve the >> netfront, and then we can group netfronts according to the interrupt >> affinity. And as you said, the effort is non-trivial. >> >> However if the NIC is multi-queued, which has only one interface but >> multiple interrupt queues. All netfronts are bounded with this >> interface via one bridge. We have no idea which interrupt queue is >> serving for >> a specified netfront. So the rebalance according to interrupt >> affinity is a challenge. Do you have idea on this point? > Sorry, I should have been clearer here. When I said ``interrupt'''' I > meant the event channel interrupt which the netfront instance will use > to notify netback, not the physical hardware interrupt of whatever > physical NIC is ultimately associated with it. We should always know > which event channel a given netfront is using, and hence which > interrupt, and so we should be able to find out its affinity. In > effect, we''d rebalance in response to messages from the guest to > netback, which is at least vaguely reasonable as a proxy for actual > load.OK, I understand, what you were thinking about is on netfront TX, while I was talking about the netfront RX. In my solution, each tasklet PAIR will be assigned to a group. So I think the optimization should work for both directions. As we have a common view that rebalance on RX rebalance is hard to implement, and the optimization point is on TX rebalance. Do you think if TX rebalance would have side effect on RX direction? However in my next version of patch, I would not include this logic since the change is not small and needs more effort. Thanks, Dongxiao> > There are at least three relevant contexts here: > > -- Interrupts generated by the hardware > -- The netback tasklets > -- Interrupts generated by the guest > > As you say, doing anything based on where hardware interrupts are > being delivered is somewhere between hard an impossible, but it might > be possible to do something useful with the interrupts from the guest. > > Steven._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2010-Apr-30 07:29 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
> >>> Steven Smith <steven.smith@citrix.com> 28.04.10 13:51 >>> > >2) Introduce struct ext_page and use it everywhere you use it in the > > current patch. This should be fairly small. > In working through the patches to make them usable on our forward > ported trees, I wondered what this is good for at all, for two reasons: > > On 64-bits embedding the data directly into page->mapping would > be possible without any other consideration. > > Even on 32-bits embedding is possible based on the observation > that the two fields together don''t need more than 32 bits (idx > always being less than MAX_PENDING_REQS [which itself could > even grow significantly] and group being bounded by NR_CPUS).Good point; I hadn''t noticed that. That would be a nicer way of doing things. It sounds like you''ve had a pretty good look at these patches. Did you see anything else worth pointing out? Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2010-Apr-30 07:35 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
> >> However if the NIC is multi-queued, which has only one interface but > >> multiple interrupt queues. All netfronts are bounded with this > >> interface via one bridge. We have no idea which interrupt queue is > >> serving for > >> a specified netfront. So the rebalance according to interrupt > >> affinity is a challenge. Do you have idea on this point? > > Sorry, I should have been clearer here. When I said ``interrupt'''' I > > meant the event channel interrupt which the netfront instance will use > > to notify netback, not the physical hardware interrupt of whatever > > physical NIC is ultimately associated with it. We should always know > > which event channel a given netfront is using, and hence which > > interrupt, and so we should be able to find out its affinity. In > > effect, we''d rebalance in response to messages from the guest to > > netback, which is at least vaguely reasonable as a proxy for actual > > load. > > OK, I understand, what you were thinking about is on netfront TX, > while I was talking about the netfront RX. > > In my solution, each tasklet PAIR will be assigned to a group. So I think > the optimization should work for both directions.Yep.> As we have a common view that rebalance on RX rebalance is hard to > implement, and the optimization point is on TX rebalance. Do you think > if TX rebalance would have side effect on RX direction?Most network traffic is at least a little bit symmetrical, so rebalancing for TX should help at least a little bit for RX-heavy workloads as well. There are some potential oddities if you have some interfaces which are TX-heavy and some which are RX-heavy, though. I think this needs more thought, and probably some experimentation, before we can decide on the best approach. Of course, there''s no reason not to use a very simple balancer initially (e.g. equal numbers of netifs in each group) and only do the more complicated bits if it starts causing problems.> However in my next version of patch, I would not include this logic > since the change is not small and needs more effort.Perfectly reasonable. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Apr-30 08:27 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
>>> Steven Smith <steven.smith@citrix.com> 30.04.10 09:29 >>> >It sounds like you''ve had a pretty good look at these patches. Did >you see anything else worth pointing out?No, the major ones you had already pointed out. Despite your earlier comments, even in the latest version the load balancing code still doesn''t seem right. I found it necessary to do this right in __netif_{up,down}(), so that there is no potential of mis-balanced increments and decrements (we also still have the [pointless] list inserts/removes in our version of the patches, which was what actually pointed out the issue by way of seeing crashes or endless loops). This in turn made it necessary to add logic to ignore the first (couple of?) invocation(s) of netif_be_int() (i.e. before the netif as assigned to a group). Also in spite of your earlier comments, the use of kthread_should_stop() in the latest version of the patches still seems insufficient - afaict it should also be used in the expression passed to wait_event_interruptible(). Minor ones are - should not use kzalloc() for allocating the (huge) array of struct xen_netbk in netback_init() - the changes to netif_be_dbg() are bogus - tasklets get replaced unconditionally with kthreads - I opted to make this dependent on a command line option, as the general description hinted both having their up- and downsides - placement of fields in struct xen_netbk (I''d prefer all small fields to precede the large arrays - on x86 this results in smaller code) Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel