Hi all, I am exploring the possibility of designing a custom hardware acceleration solution using an ASIC or an FPGA to accelerate some part of Xen. Basically, I am looking for some part of the code that could be built in hardware to make it faster. Does anybody know where I could get some statistics on the code, such as the most called functions, the most parallelizable functions, etc... If you could think of something that would be useful in HW I would be very interested to know. Thanks, Jad. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jad Naous > Sent: 26 January 2006 08:52 > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] Custom Hardware Acceleration > > Hi all, > I am exploring the possibility of designing a custom hardware > acceleration solution using an ASIC or an FPGA to accelerate > some part of Xen. Basically, I am looking for some part of > the code that could be built in hardware to make it faster. > Does anybody know where I could get some statistics on the > code, such as the most called functions, the most > parallelizable functions, etc... If you could think of > something that would be useful in HW I would be very > interested to know. > Thanks, > Jad.Xen''s "load" on the system is going to be mainly in performing CPU tasks for something that in the non-Xen case would be handled by the ordinary hardware/microprocessor itself, such a s trapping memory mapped IO acccesses, IOIO accesses or page-table translation (using the "Shadow page-table" scheme). Unfortunately, unless your ASIC becomes part of the CPU itself, I think it''s unlikely that you''d be able to speed things up noticably... The speed of, for example, block device accesses, is very much dependant on the speed of transfer to/from the block-device, and very little on the speed of the operations added by Xen (although admittedly, adding a layer of software between the requesting software and the hardware servicing the request will ALWAYS add some delay, and you can only ADD delay, never take it away...) Unless of course, you want to build a very complex device, such as a multi-guest capable graghics card (one that can draw to multiple display surfaces at any given time and switch between these based on some software/hardware switching mechanism). A company called Level5 has a network card that supports multiple guests... I''d be very interested in hearing a differign view, of course. [Note that I work for AMD, but I''m a Software engineer, so I don''t necessarily understand all of the Hardware Implications...] -- Mats _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jan 26, 2006 at 12:52:29AM -0800, Jad Naous wrote:> Hi all, > I am exploring the possibility of designing a custom hardware > acceleration solution using an ASIC or an FPGA to accelerate some part > of Xen. Basically, I am looking for some part of the code that could be > built in hardware to make it faster. Does anybody know where I could get > some statistics on the code, such as the most called functions, the most > parallelizable functions, etc... If you could think of something that > would be useful in HW I would be very interested to know. > Thanks, > Jad.You could make a custom NIC FPGA that can handle paravirtulized network receive. The NIC can inspect the destination MAC address of the incomming packet, and DMA it to a pre-alloced space in the domU (removing the need for the page flip). It will require modifing the xen network drivers, but should be pretty cool. Thanks, Jon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yes, if I do end up doing it on an FPGA, it would be open-source. However, I''m also open to suggestions that don''t necessarily end up on a separate (from the CPU) standalone ASIC/FPGA, but should be integrated into the CPU. For example, are there any processor instructions (extensions) Xen developers would like to see in future processors? I''m doing research into either including reprogrammable hardware into a CPU (research oriented) or on a separate chip that is close to the CPU (implementation), which could then be configured according to the application running on it. Returning to the example I mentioned, the instruction could then be implemented on the FPGA so that Xen could use it. Any ideas similar to this would also be very interesting. Of course, the instruction idea is only limited to on-chip reprogrammable logic, but the concept for an off-chip FPGA is similar, but limited in communication speed with the CPU. Btw, sorry if this is not the appropriate place to post this. I''d be happy to move it somewhere else if I''m asked to, but I thought that this would be of interest to Xen developers. Thanks, Jad. BSD Lazarus wrote:> Hello Jad, > > Do you plan on an open-source FPGA design? > > -- David > > Jad Naous wrote: >> Hi all, >> I am exploring the possibility of designing a custom hardware >> acceleration solution using an ASIC or an FPGA to accelerate some part >> of Xen. Basically, I am looking for some part of the code that could >> be built in hardware to make it faster. Does anybody know where I >> could get some statistics on the code, such as the most called >> functions, the most parallelizable functions, etc... If you could >> think of something that would be useful in HW I would be very >> interested to know. >> Thanks, >> Jad. >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Hi all, > I am exploring the possibility of designing a custom hardware > acceleration solution using an ASIC or an FPGA to accelerate > some part of Xen. Basically, I am looking for some part of > the code that could be built in hardware to make it faster. > Does anybody know where I could get some statistics on the > code, such as the most called functions, the most > parallelizable functions, etc... If you could think of > something that would be useful in HW I would be very > interested to know.You might like to take a look at the following paper, which Keir and I wrote in 2000. Although we designed it for a different purpose, it would work great with Xen and enable direct IO from guests with very low additional hardware cost. http://www.cl.cam.ac.uk/users/iap10/gige.ps Best, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jon Mason wrote:> On Thu, Jan 26, 2006 at 12:52:29AM -0800, Jad Naous wrote: >> Hi all, >> I am exploring the possibility of designing a custom hardware >> acceleration solution using an ASIC or an FPGA to accelerate some part >> of Xen. Basically, I am looking for some part of the code that could be >> built in hardware to make it faster. Does anybody know where I could get >> some statistics on the code, such as the most called functions, the most >> parallelizable functions, etc... If you could think of something that >> would be useful in HW I would be very interested to know. >> Thanks, >> Jad. > > You could make a custom NIC FPGA that can handle paravirtulized network > receive. The NIC can inspect the destination MAC address of the incomming > packet, and DMA it to a pre-alloced space in the domU (removing the need > for the page flip). It will require modifing the xen network drivers, > but should be pretty cool. > > Thanks, > JonJust wanted to update that we are going to implement the paravirtualized NIC on an FPGA, and see if you guys can suggest where to start. We have never done any XEN development. The implementation should be done by the end of March. Are there any suggestions on how to make our FPGA implementation as portable as possible to other boards? The problem is we might end up using some IP cores for our implementation. If we have time, we''d get rid of those. Thanks, Jad. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Jan 30, 2006 at 12:41:47PM -0800, Jad Naous wrote:> > > Jon Mason wrote: > >On Thu, Jan 26, 2006 at 12:52:29AM -0800, Jad Naous wrote: > >>Hi all, > >>I am exploring the possibility of designing a custom hardware > >>acceleration solution using an ASIC or an FPGA to accelerate some part > >>of Xen. Basically, I am looking for some part of the code that could be > >>built in hardware to make it faster. Does anybody know where I could get > >>some statistics on the code, such as the most called functions, the most > >>parallelizable functions, etc... If you could think of something that > >>would be useful in HW I would be very interested to know. > >>Thanks, > >>Jad. > > > >You could make a custom NIC FPGA that can handle paravirtulized network > >receive. The NIC can inspect the destination MAC address of the incoming > >packet, and DMA it to a pre-alloced space in the domU (removing the need > >for the page flip). It will require modifying the xen network drivers, > >but should be pretty cool. > > > >Thanks, > >Jon > > Just wanted to update that we are going to implement the paravirtulized > NIC on an FPGA, and see if you guys can suggest where to start. We have > never done any XEN development.By "paravirtulized NIC", I assume you mean one that handles the network receive in hardware (or are there other features you are planning?). If so, I would think you would need the following things: specialized FPGA firmware custom device driver for FPGA adapter modified netfront driver (remove page flipping) modified netback driver The specialized firmware should have some type of data structure which contains the MAC address and corresponding DMA address(es) for that domain. It should be able to parse the MAC header in the adapter cache and DMA to the corresponding address (without dropping any packets). The custom device driver should me able to program the adapter with MAC addresses and DMA addresses (as well as perform generic networking device driver functionality). The Xen network drivers (netfront and netback) will have to be modified to take advantage of this new feature. I would think you would want to have the basic driver firmware working as your starting point, then modify it to add the MAC/DMA handling. Then, modify the xen networking drivers to take advantage.> The implementation should be done by the end of March. Are there anyAn extremely aggressive schedule. Let if there is anything I can do to help.> suggestions on how to make our FPGA implementation as portable as > possible to other boards? The problem is we might end up using some IP > cores for our implementation. If we have time, we''d get rid of those. > Thanks, > Jad.Helpfully all of this makes sense. If not, feel free to ask me to clarify. Thanks, Jon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel