Andy, I put some profiling around calls to GrantAccess and EndAccess, and have the following results: XenNet TxBufferGC Count = 108351, Avg Time = 227989 XenNet TxBufferFree Count = 0, Avg Time = 0 XenNet RxBufferAlloc Count = 108353, Avg Time = 17349 XenNet RxBufferFree Count = 0, Avg Time = 0 XenNet ReturnPacket Count = 65231, Avg Time = 1106 XenNet RxBufferCheck Count = 108353, Avg Time = 124069 XenNet Linearize Count = 129024, Avg Time = 29333 XenNet SendPackets Count = 129024, Avg Time = 67107 XenNet SendQueuedPackets Count = 237369, Avg Time = 73055 XenNet GrantAccess Count = 194325, Avg Time = 25878 XenNet EndAccess Count = 194261, Avg Time = 27181 The time for GrantAccess and EndAccess is, I think, quite significant in the scheme of things, especially as TxBufferGC and RxBufferCheck (the two large times) will both have multiple calls to GrantAccess and EndAccess. What I''d like to do is implement a compromise between my previous buffer management approach (used lots of memory, but no allocate/grant per packet) and your approach (uses minimum memory, but allocate/grant per packet). We would maintain a pool of packets and buffers, and grow and shrink the pool dynamically, as follows: . Create a freelist of packets and buffers . When we need a new packet or buffer, and there are none on the freelist, allocate them and grant the buffer. . When we are done with them, put them on the freelist . Keep a count of the minimum size of the freelists. If the free list has been greater than some value (32?) for some time (5 seconds?) then free half of the items on the list. . Maybe keep a freelist per processor too, to avoid the need for spinlocks where we are running at DISPATCH_LEVEL I think that gives us a pretty good compromise between memory usage and calls to allocate/grant/ungrant/free. I was going to look at getting rid of the Linearize, but if we don''t Linearize then we have to GrantAccess to the kernel supplied buffer, and I think a (max) 1500 byte memcpy is going to be cheaper than a call to GrantAccess... Thoughts? James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> What I''d like to do is implement a compromise between my previousbuffer> management approach (used lots of memory, but no allocate/grant per > packet) and your approach (uses minimum memory, but allocate/grant per > packet). We would maintain a pool of packets and buffers, and grow and > shrink the pool dynamically, as follows: > . Create a freelist of packets and buffers > . When we need a new packet or buffer, and there are none on the > freelist, allocate them and grant the buffer. > . When we are done with them, put them on the freelist > . Keep a count of the minimum size of the freelists. If the free list > has been greater than some value (32?) for some time (5 seconds?) then > free half of the items on the list. > . Maybe keep a freelist per processor too, to avoid the need for > spinlocks where we are running at DISPATCH_LEVEL > > I think that gives us a pretty good compromise between memory usageand> calls to allocate/grant/ungrant/free.I have implemented something like the above, a ''page pool'' which is a list of pre-granted pages. This drops the time spent in TxBufferGC and SendQueuedPackets by 30-50%. A good start I think, although there doesn''t appear to be much improvement in the iperf results, maybe only 20%. It''s time for sleep now, but when I get a chance I''ll add the same logic to the receive path, and clean it up so xennet can unload properly (currently it leaks and/or crashes on unload). James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James, Could you please provide me some context and details of this work. This seems related to the work we are doing in netchannel2 to reuse grants, but I don''t think I understand what is that you are trying to do and how it is related. Thanks Renato> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > James Harper > Sent: Friday, February 29, 2008 5:45 AM > To: James Harper; Andy Grover > Cc: xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] more profiling > > > What I''d like to do is implement a compromise between my previous > buffer > > management approach (used lots of memory, but no allocate/grant per > > packet) and your approach (uses minimum memory, but > allocate/grant per > > packet). We would maintain a pool of packets and buffers, > and grow and > > shrink the pool dynamically, as follows: > > . Create a freelist of packets and buffers . When we need a > new packet > > or buffer, and there are none on the freelist, allocate > them and grant > > the buffer. > > . When we are done with them, put them on the freelist . > Keep a count > > of the minimum size of the freelists. If the free list has been > > greater than some value (32?) for some time (5 seconds?) then free > > half of the items on the list. > > . Maybe keep a freelist per processor too, to avoid the need for > > spinlocks where we are running at DISPATCH_LEVEL > > > > I think that gives us a pretty good compromise between memory usage > and > > calls to allocate/grant/ungrant/free. > > I have implemented something like the above, a ''page pool'' > which is a list of pre-granted pages. This drops the time > spent in TxBufferGC and SendQueuedPackets by 30-50%. A good > start I think, although there doesn''t appear to be much > improvement in the iperf results, maybe only 20%. > > It''s time for sleep now, but when I get a chance I''ll add the > same logic to the receive path, and clean it up so xennet can > unload properly (currently it leaks and/or crashes on unload). > > James > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> James, > > Could you please provide me some context and details of this work. > This seems related to the work we are doing in netchannel2 to reuse > grants, but I don''t think I understand what is that you are trying todo> and how it is related. >The solution I ended up implementing was to keep a list of pre-allocated pre-granted pages. Any time we need a new page (either for putting on the rx list, or for copying a tx packet to) it comes from the list. If there are no page on the list, a new page is allocated and granted. When we are finished with the page, it goes back on the free list. I''ll also be writing some sort of garbage collector which runs periodically (maybe every x seconds, or every x calls to ''put_page_on_freelist''). If during that interval the number of free pages has been constantly above some threshold (32?), then we will ungrant and free half the pages on the list. This will keep memory usage reasonable while keeping performance good. In the tx path, the windows xennet driver currently takes the sg list of buffers per packet and copies them to a single page buffer. At first I thought there would be some performance to be had in just passing the backend the list of pages, but it looks like the memory copy operation is much less expensive than the grant operation. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Thanks for the reply. It seems that your changes do not include netback, i.e all the changes are limited to netfront. Correct? In that case, your changes avoid the cost of issuing and revoking the grant (i.e. adding and removing the grant from the grant table). I assume netback is still doing hypercalls for grant operations on every I/O operation (i.e. grant map for TX and grant copy operation for RX). In netchannel2 we plan to avoid the grant operations in netback as well. In my experiments I also see overheads on issuing and revoking grants due to the use of atomic operations, but these are much less expensive than copying an entire packet as you do on the TX path. I am surprised with your results. Can you give more details about your configuration and how you are comparing the cost of copy versus issuing grants on TX. Thanks Renato> -----Original Message----- > From: James Harper [mailto:james.harper@bendigoit.com.au] > Sent: Friday, February 29, 2008 2:26 PM > To: Santos, Jose Renato G; Andy Grover > Cc: xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] more profiling > > > James, > > > > Could you please provide me some context and details of this work. > > This seems related to the work we are doing in netchannel2 to reuse > > grants, but I don''t think I understand what is that you are > trying to > do > > and how it is related. > > > > The solution I ended up implementing was to keep a list of > pre-allocated pre-granted pages. Any time we need a new page > (either for putting on the rx list, or for copying a tx > packet to) it comes from the list. If there are no page on > the list, a new page is allocated and granted. When we are > finished with the page, it goes back on the free list. > > I''ll also be writing some sort of garbage collector which > runs periodically (maybe every x seconds, or every x calls to > ''put_page_on_freelist''). If during that interval the number > of free pages has been constantly above some threshold (32?), > then we will ungrant and free half the pages on the list. > This will keep memory usage reasonable while keeping performance good. > > In the tx path, the windows xennet driver currently takes the > sg list of buffers per packet and copies them to a single > page buffer. At first I thought there would be some > performance to be had in just passing the backend the list of > pages, but it looks like the memory copy operation is much > less expensive than the grant operation. > > James > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> In my experiments I also see overheads on issuing and revoking > grants due to the use of atomic operations, but these are > much less expensive than copying an entire packet as you > do on the TX path. I am surprised with your results. > Can you give more details about your configuration and how you > are comparing the cost of copy versus issuing grants on TX.I think you are right in saying that the issuing and revoking of grants is due to the use of atomic operations. Having looked into it some more, it looks like KeAcquireSpinlock (the windows lock operation) is fairly expensive. Under windows, it is the code that gets the next free ref that is protected by spinlocks. I believe that if we only get the ref once, but then reuse that ref over and over, then we''d get a lot better performace. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Renato, Santos, Jose Renato G wrote:> James, > > Could you please provide me some context and details of this work. > This seems related to the work we are doing in netchannel2 to reuse grants,Is there any update on this work since the last Xen Summit? I''m particularly interested in anything has happened with multi-queue NIC''s. Two reasons: 1/ Our NIC is multiqueue and I''d like to ensure we''re compatible with any Netchannel2 work done to exploit this. 2/ Our ''accelerated plugin'' implementation could potentially make use of some of the skb allocator mods to avoid a copy on rx. Cheers, Greg -- Greg Law glaw@solarflare.com +44 1223 518 040 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Santos, Jose Renato G
2008-Mar-03 18:35 UTC
RE: [Xen-devel] Netchannel2 (was more profiling)
Greg, We are currently working on support for multi-queue NICs. This will be the first feature available. Grant reuse and copy on the guest will come later. We don''t have code available yet but we have a spec of the interface between netback and device driver for multi-queue support. It will be good if your drivers supported that interface I will send you the spec later on a private email. Regards Renato> -----Original Message----- > From: Greg Law [mailto:glaw@solarflare.com] > Sent: Monday, March 03, 2008 2:53 AM > To: Santos, Jose Renato G > Cc: xen-devel@lists.xensource.com > Subject: [Xen-devel] Netchannel2 (was more profiling) > > Hi Renato, > > Santos, Jose Renato G wrote: > > James, > > > > Could you please provide me some context and details of this work. > > This seems related to the work we are doing in netchannel2 to reuse > > grants, > > Is there any update on this work since the last Xen Summit? > I''m particularly interested in anything has happened with > multi-queue NIC''s. > Two reasons: > > 1/ Our NIC is multiqueue and I''d like to ensure we''re > compatible with any Netchannel2 work done to exploit this. > > 2/ Our ''accelerated plugin'' implementation could > potentially make use of some of the skb allocator mods to > avoid a copy on rx. > > Cheers, > > Greg > -- > Greg Law glaw@solarflare.com +44 > 1223 518 040 >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: James Harper [mailto:james.harper@bendigoit.com.au] > Sent: Sunday, March 02, 2008 7:48 PM > To: Santos, Jose Renato G; Andy Grover > Cc: xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] more profiling > > > In my experiments I also see overheads on issuing and > revoking grants > > due to the use of atomic operations, but these are much > less expensive > > than copying an entire packet as you do on the TX path. I > am surprised > > with your results. > > Can you give more details about your configuration and how you are > > comparing the cost of copy versus issuing grants on TX. > > I think you are right in saying that the issuing and revoking > of grants is due to the use of atomic operations. Having > looked into it some more, it looks like KeAcquireSpinlock > (the windows lock operation) is fairly expensive. > > Under windows, it is the code that gets the next free ref > that is protected by spinlocks. I believe that if we only get > the ref once, but then reuse that ref over and over, then > we''d get a lot better performace. >Yes. Avoiding the spinlock should improve performance. Definetely, it should be a win on the RX path. But is it worth in the TX path, if you now have to copy the packet? Do you have experimental data showing that copying is better than the spinlock? I don''t have much experience with Windows but I think this would be very surprising... Regards Renato> James >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Under windows, it is the code that gets the next free ref > > that is protected by spinlocks. I believe that if we only get > > the ref once, but then reuse that ref over and over, then > > we''d get a lot better performace. > > > > Yes. Avoiding the spinlock should improve performance. Definetely,it> should be a win on the RX path. But is it worth in the TX path, if younow> have to copy the packet? Do you have experimental data showing that > copying is better than the spinlock? I don''t have much experience with > Windows but I think this would be very surprising...Looking at the profiling data that I have collected, the copy operation (max 1500 bytes copy, probably around 200-300 on average) does appear to consume less CPU resources than the acquire spinlock operation. The other thing to consider is that Windows seems to give us 2-3 separate pages of data per packet, one containing the header, another containing the next header, and one containing the layer 3 data. This would be three get grant entry operations. However, if I can avoid the spinlock-per-grant-entry, and also avoid copying, then things will be even better! Btw, with ''request-rx-copy = 1'', does that mean that the backend still makes copies of the data to give to us? James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel