Xu, Dongxiao
2010-May-05 07:25 UTC
[Xen-devel][Pv-ops][PATCH 0/4 v4] Netback multiple threads support
This is netback multithread support patchset version 4. Main Changes from v3: 1. Patchset is against xen/next tree. 2. Merge group and idx into netif->mapping. 3. Use vmalloc to allocate netbk structures. Main Changes from v2: 1. Merge "group" and "idx" into "netif->mapping", therefore page_ext is not used now. 2. Put netbk_add_netif() and netbk_remove_netif() into __netif_up() and __netif_down(). 3. Change the usage of kthread_should_stop(). 4. Use __get_free_pages() to replace kzalloc(). 5. Modify the changes to netif_be_dbg(). 6. Use MODPARM_netback_kthread to determine whether using tasklet or kernel thread. 7. Put small fields in the front, and large arrays in the end of struct xen_netbk. 8. Add more checks in netif_page_release(). Current netback uses one pair of tasklets for Tx/Rx data transaction. Netback tasklet could only run at one CPU at a time, and it is used to serve all the netfronts. Therefore it has become a performance bottle neck. This patch is to use multiple tasklet pairs to replace the current single pair in dom0. Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets serve specific group of netfronts. Also for those global and static variables, we duplicated them for each group in order to avoid the spinlock. PATCH 01: Generilize static/global variables into ''struct xen_netbk''. PATCH 02: Introduce a new struct type page_ext. PATCH 03: Multiple tasklets support. PATCH 04: Use Kernel thread to replace the tasklet. Recently I re-tested the patchset with Intel 10G multi-queue NIC device, and use 10 outside 1G NICs to do netperf tests with that 10G NIC. Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU. With the patchset, the performance is 2x of the original throughput. Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs. With the patchset, the performance is 3.7x of the original throughput. when we test this patch, we found that the domain_lock in grant table operation (gnttab_copy()) becomes a bottle neck. We temporarily remove the global domain_lock to achieve good performance. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2010-May-11 08:57 UTC
RE: [Xen-devel][Pv-ops][PATCH 0/4 v4] Netback multiple threads support
Hi, Do you have comments on this version of patch? Thanks, Dongxiao Xu, Dongxiao wrote:> This is netback multithread support patchset version 4. > > Main Changes from v3: > 1. Patchset is against xen/next tree. > 2. Merge group and idx into netif->mapping. > 3. Use vmalloc to allocate netbk structures. > > Main Changes from v2: > 1. Merge "group" and "idx" into "netif->mapping", therefore > page_ext is not used now. > 2. Put netbk_add_netif() and netbk_remove_netif() into > __netif_up() and __netif_down(). > 3. Change the usage of kthread_should_stop(). > 4. Use __get_free_pages() to replace kzalloc(). > 5. Modify the changes to netif_be_dbg(). > 6. Use MODPARM_netback_kthread to determine whether using > tasklet or kernel thread. > 7. Put small fields in the front, and large arrays in the end of > struct xen_netbk. > 8. Add more checks in netif_page_release(). > > Current netback uses one pair of tasklets for Tx/Rx data transaction. > Netback tasklet could only run at one CPU at a time, and it is used to > serve all the netfronts. Therefore it has become a performance bottle > neck. This patch is to use multiple tasklet pairs to replace the > current single pair in dom0. > > Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of > tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets > serve specific group of netfronts. Also for those global and static > variables, we duplicated them for each group in order to avoid the > spinlock. > > PATCH 01: Generilize static/global variables into ''struct xen_netbk''. > > PATCH 02: Introduce a new struct type page_ext. > > PATCH 03: Multiple tasklets support. > > PATCH 04: Use Kernel thread to replace the tasklet. > > Recently I re-tested the patchset with Intel 10G multi-queue NIC > device, and use 10 outside 1G NICs to do netperf tests with that 10G > NIC. > > Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU. > With the patchset, the performance is 2x of the original throughput. > > Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs. > With the patchset, the performance is 3.7x of the original throughput. > > when we test this patch, we found that the domain_lock in grant table > operation (gnttab_copy()) becomes a bottle neck. We temporarily > remove the global domain_lock to achieve good performance._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2010-May-13 01:14 UTC
RE: [Xen-devel][Pv-ops][PATCH 0/4 v4] Netback multiple threads support
Hi Steven and Jan, I modified the code according to your comments, and the latest version is version 4. Do you have further comments or consideration on this version? Thanks, Dongxiao Xu, Dongxiao wrote:> Hi, > > Do you have comments on this version of patch? > > Thanks, > Dongxiao > > Xu, Dongxiao wrote: >> This is netback multithread support patchset version 4. >> >> Main Changes from v3: >> 1. Patchset is against xen/next tree. >> 2. Merge group and idx into netif->mapping. >> 3. Use vmalloc to allocate netbk structures. >> >> Main Changes from v2: >> 1. Merge "group" and "idx" into "netif->mapping", therefore >> page_ext is not used now. >> 2. Put netbk_add_netif() and netbk_remove_netif() into >> __netif_up() and __netif_down(). >> 3. Change the usage of kthread_should_stop(). >> 4. Use __get_free_pages() to replace kzalloc(). >> 5. Modify the changes to netif_be_dbg(). >> 6. Use MODPARM_netback_kthread to determine whether using >> tasklet or kernel thread. >> 7. Put small fields in the front, and large arrays in the end of >> struct xen_netbk. >> 8. Add more checks in netif_page_release(). >> >> Current netback uses one pair of tasklets for Tx/Rx data transaction. >> Netback tasklet could only run at one CPU at a time, and it is used >> to serve all the netfronts. Therefore it has become a performance >> bottle neck. This patch is to use multiple tasklet pairs to replace >> the current single pair in dom0. >> >> Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of >> tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets >> serve specific group of netfronts. Also for those global and static >> variables, we duplicated them for each group in order to avoid the >> spinlock. >> >> PATCH 01: Generilize static/global variables into ''struct xen_netbk''. >> >> PATCH 02: Introduce a new struct type page_ext. >> >> PATCH 03: Multiple tasklets support. >> >> PATCH 04: Use Kernel thread to replace the tasklet. >> >> Recently I re-tested the patchset with Intel 10G multi-queue NIC >> device, and use 10 outside 1G NICs to do netperf tests with that 10G >> NIC. >> >> Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU. >> With the patchset, the performance is 2x of the original throughput. >> >> Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs. >> With the patchset, the performance is 3.7x of the original >> throughput. >> >> when we test this patch, we found that the domain_lock in grant table >> operation (gnttab_copy()) becomes a bottle neck. We temporarily >> remove the global domain_lock to achieve good performance._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-May-17 07:08 UTC
RE: [Xen-devel][Pv-ops][PATCH 0/4 v4] Netback multiple threads support
>>> "Xu, Dongxiao" <dongxiao.xu@intel.com> 13.05.10 03:14 >>> >I modified the code according to your comments, and the latest version is version 4. >Do you have further comments or consideration on this version?Looked good to me, based on my comparison with the patch version we use in our forward ported trees. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2010-May-21 17:34 UTC
Re: [Xen-devel][Pv-ops][PATCH 0/4 v4] Netback multiple threads support
> Hi Steven and Jan, > > I modified the code according to your comments, and the latest > version is version 4. Do you have further comments or consideration > on this version?No, that all looks fine to me. Sorry about the delay in replying; I thought I''d already responded, but I seem to have dropped it on the floor somewhere. Steven.> Xu, Dongxiao wrote: > > Hi, > > > > Do you have comments on this version of patch? > > > > Thanks, > > Dongxiao > > > > Xu, Dongxiao wrote: > >> This is netback multithread support patchset version 4. > >> > >> Main Changes from v3: > >> 1. Patchset is against xen/next tree. > >> 2. Merge group and idx into netif->mapping. > >> 3. Use vmalloc to allocate netbk structures. > >> > >> Main Changes from v2: > >> 1. Merge "group" and "idx" into "netif->mapping", therefore > >> page_ext is not used now. > >> 2. Put netbk_add_netif() and netbk_remove_netif() into > >> __netif_up() and __netif_down(). > >> 3. Change the usage of kthread_should_stop(). > >> 4. Use __get_free_pages() to replace kzalloc(). > >> 5. Modify the changes to netif_be_dbg(). > >> 6. Use MODPARM_netback_kthread to determine whether using > >> tasklet or kernel thread. > >> 7. Put small fields in the front, and large arrays in the end of > >> struct xen_netbk. > >> 8. Add more checks in netif_page_release(). > >> > >> Current netback uses one pair of tasklets for Tx/Rx data transaction. > >> Netback tasklet could only run at one CPU at a time, and it is used > >> to serve all the netfronts. Therefore it has become a performance > >> bottle neck. This patch is to use multiple tasklet pairs to replace > >> the current single pair in dom0. > >> > >> Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of > >> tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets > >> serve specific group of netfronts. Also for those global and static > >> variables, we duplicated them for each group in order to avoid the > >> spinlock. > >> > >> PATCH 01: Generilize static/global variables into ''struct xen_netbk''. > >> > >> PATCH 02: Introduce a new struct type page_ext. > >> > >> PATCH 03: Multiple tasklets support. > >> > >> PATCH 04: Use Kernel thread to replace the tasklet. > >> > >> Recently I re-tested the patchset with Intel 10G multi-queue NIC > >> device, and use 10 outside 1G NICs to do netperf tests with that 10G > >> NIC. > >> > >> Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU. > >> With the patchset, the performance is 2x of the original throughput. > >> > >> Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs. > >> With the patchset, the performance is 3.7x of the original > >> throughput. > >> > >> when we test this patch, we found that the domain_lock in grant table > >> operation (gnttab_copy()) becomes a bottle neck. We temporarily > >> remove the global domain_lock to achieve good performance. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Dongxiao
2010-May-25 02:05 UTC
RE: [Xen-devel][Pv-ops][PATCH 0/4 v4] Netback multiple threads support
Thank you for your acknowledgement. Regards, Dongxiao Steven Smith wrote:>> Hi Steven and Jan, >> >> I modified the code according to your comments, and the latest >> version is version 4. Do you have further comments or consideration >> on this version? > No, that all looks fine to me. > > Sorry about the delay in replying; I thought I''d already responded, > but I seem to have dropped it on the floor somewhere. > > Steven. > > >> Xu, Dongxiao wrote: >>> Hi, >>> >>> Do you have comments on this version of patch? >>> >>> Thanks, >>> Dongxiao >>> >>> Xu, Dongxiao wrote: >>>> This is netback multithread support patchset version 4. >>>> >>>> Main Changes from v3: >>>> 1. Patchset is against xen/next tree. >>>> 2. Merge group and idx into netif->mapping. >>>> 3. Use vmalloc to allocate netbk structures. >>>> >>>> Main Changes from v2: >>>> 1. Merge "group" and "idx" into "netif->mapping", therefore >>>> page_ext is not used now. >>>> 2. Put netbk_add_netif() and netbk_remove_netif() into >>>> __netif_up() and __netif_down(). >>>> 3. Change the usage of kthread_should_stop(). >>>> 4. Use __get_free_pages() to replace kzalloc(). >>>> 5. Modify the changes to netif_be_dbg(). >>>> 6. Use MODPARM_netback_kthread to determine whether using >>>> tasklet or kernel thread. >>>> 7. Put small fields in the front, and large arrays in the end of >>>> struct xen_netbk. >>>> 8. Add more checks in netif_page_release(). >>>> >>>> Current netback uses one pair of tasklets for Tx/Rx data >>>> transaction. Netback tasklet could only run at one CPU at a time, >>>> and it is used to serve all the netfronts. Therefore it has become >>>> a performance bottle neck. This patch is to use multiple tasklet >>>> pairs to replace the current single pair in dom0. >>>> >>>> Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of >>>> tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of >>>> tasklets serve specific group of netfronts. Also for those global >>>> and static variables, we duplicated them for each group in order >>>> to avoid the spinlock. >>>> >>>> PATCH 01: Generilize static/global variables into ''struct >>>> xen_netbk''. >>>> >>>> PATCH 02: Introduce a new struct type page_ext. >>>> >>>> PATCH 03: Multiple tasklets support. >>>> >>>> PATCH 04: Use Kernel thread to replace the tasklet. >>>> >>>> Recently I re-tested the patchset with Intel 10G multi-queue NIC >>>> device, and use 10 outside 1G NICs to do netperf tests with that >>>> 10G NIC. >>>> >>>> Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU. >>>> With the patchset, the performance is 2x of the original >>>> throughput. >>>> >>>> Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs. >>>> With the patchset, the performance is 3.7x of the original >>>> throughput. >>>> >>>> when we test this patch, we found that the domain_lock in grant >>>> table operation (gnttab_copy()) becomes a bottle neck. We >>>> temporarily remove the global domain_lock to achieve good >>>> performance._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Apparently Analagous Threads
- [PATCH 1/1] xen/netback: correctly calculate required slots of skb.
- [Pv-ops][PATCH 3/4 v4] Netback: Multiple tasklets support
- [PATCH] Fix xentop on pv-ops domain0
- [PATCH]Fix the bug of guest os installation failure and win2k boot failure
- [PATCH] VNIF: Using smart polling instead of event notification.