Hi Jan, Keir -- My x86 assembly skills are much too poor to carefully evaluate and, if of value, implement this in Xen but given your previous interest, such as: http://xenbits.xensource.com/xen-unstable.hg?rev/8de4b4e9a435 the following might be worth looking at. Intel has just posted memcpy improvements for glibc for recent popular Intel processor families here: http://article.gmane.org/gmane.comp.lib.glibc.alpha/15278 The preface to the above patch looks very enticing... Semi-related, I wonder if you know, if there were a "copy_page_from_other_node()" to be used if the caller is fairly sure that the page is being copied between nodes, could this be made significantly faster than a normal copy_page()? Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
It has to be said, possibly tmem excepted, there is very little page copying in Xen. -- Keir On 15/07/2010 19:15, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> Hi Jan, Keir -- > > My x86 assembly skills are much too poor to carefully evaluate > and, if of value, implement this in Xen but given your previous > interest, such as: > > http://xenbits.xensource.com/xen-unstable.hg?rev/8de4b4e9a435 > > the following might be worth looking at. > > Intel has just posted memcpy improvements for glibc for recent > popular Intel processor families here: > > http://article.gmane.org/gmane.comp.lib.glibc.alpha/15278 > > The preface to the above patch looks very enticing... > > Semi-related, I wonder if you know, if there were a > "copy_page_from_other_node()" to be used if the > caller is fairly sure that the page is being copied > between nodes, could this be made significantly faster > than a normal copy_page()? > > Thanks, > Dan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I wasn''t sure about that... Jan''s patch to speed up copy_page (by 12%) went in before tmem was in-tree, so I assumed otherwise. Clearly my interest is for tmem, especially if 2x-4x improvement is possible, but if there really is no significant advantage for non-tmem code, I will put it on my list... for sometime in the next century when I am a good x86 assembly programmer :-)> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Thursday, July 15, 2010 3:35 PM > To: Dan Magenheimer; Jan Beulich > Cc: xen-devel@lists.xensource.com > Subject: Re: Even faster page copy for Xen? > > It has to be said, possibly tmem excepted, there is very little page > copying > in Xen. > > -- Keir > > On 15/07/2010 19:15, "Dan Magenheimer" <dan.magenheimer@oracle.com> > wrote: > > > Hi Jan, Keir -- > > > > My x86 assembly skills are much too poor to carefully evaluate > > and, if of value, implement this in Xen but given your previous > > interest, such as: > > > > http://xenbits.xensource.com/xen-unstable.hg?rev/8de4b4e9a435 > > > > the following might be worth looking at. > > > > Intel has just posted memcpy improvements for glibc for recent > > popular Intel processor families here: > > > > http://article.gmane.org/gmane.comp.lib.glibc.alpha/15278 > > > > The preface to the above patch looks very enticing... > > > > Semi-related, I wonder if you know, if there were a > > "copy_page_from_other_node()" to be used if the > > caller is fairly sure that the page is being copied > > between nodes, could this be made significantly faster > > than a normal copy_page()? > > > > Thanks, > > Dan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan also added a copy_page hypercall, so he may have been optimising for that, since I guess the Novell kernels must use it. -- Keir On 16/07/2010 00:36, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> I wasn''t sure about that... Jan''s patch to speed up > copy_page (by 12%) went in before tmem was in-tree, > so I assumed otherwise. Clearly my interest is for > tmem, especially if 2x-4x improvement is possible, > but if there really is no significant advantage for > non-tmem code, I will put it on my list... for sometime > in the next century when I am a good x86 assembly > programmer :-) > >> -----Original Message----- >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Thursday, July 15, 2010 3:35 PM >> To: Dan Magenheimer; Jan Beulich >> Cc: xen-devel@lists.xensource.com >> Subject: Re: Even faster page copy for Xen? >> >> It has to be said, possibly tmem excepted, there is very little page >> copying >> in Xen. >> >> -- Keir >> >> On 15/07/2010 19:15, "Dan Magenheimer" <dan.magenheimer@oracle.com> >> wrote: >> >>> Hi Jan, Keir -- >>> >>> My x86 assembly skills are much too poor to carefully evaluate >>> and, if of value, implement this in Xen but given your previous >>> interest, such as: >>> >>> http://xenbits.xensource.com/xen-unstable.hg?rev/8de4b4e9a435 >>> >>> the following might be worth looking at. >>> >>> Intel has just posted memcpy improvements for glibc for recent >>> popular Intel processor families here: >>> >>> http://article.gmane.org/gmane.comp.lib.glibc.alpha/15278 >>> >>> The preface to the above patch looks very enticing... >>> >>> Semi-related, I wonder if you know, if there were a >>> "copy_page_from_other_node()" to be used if the >>> caller is fairly sure that the page is being copied >>> between nodes, could this be made significantly faster >>> than a normal copy_page()? >>> >>> Thanks, >>> Dan >> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser 07/16/10 9:57 AM >>> >Jan also added a copy_page hypercall, so he may have been optimising for >that, since I guess the Novell kernels must use it.Yes, that was the very goal of that change. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 15.07.10 at 20:15, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: > Hi Jan, Keir -- > > My x86 assembly skills are much too poor to carefully evaluate > and, if of value, implement this in Xen but given your previous > interest, such as: > > http://xenbits.xensource.com/xen-unstable.hg?rev/8de4b4e9a435 > > the following might be worth looking at. > > Intel has just posted memcpy improvements for glibc for recent > popular Intel processor families here: > > http://article.gmane.org/gmane.comp.lib.glibc.alpha/15278 > > The preface to the above patch looks very enticing...I''m not sure how much of this applies to the much more specific case of copying pages... Additionally, I don''t think trying to use XMM registers in Xen would be a good idea.> Semi-related, I wonder if you know, if there were a > "copy_page_from_other_node()" to be used if the > caller is fairly sure that the page is being copied > between nodes, could this be made significantly faster > than a normal copy_page()?I would think that this should mostly be taken care of by using non-temporal stores (non-temporal loads unfortunately aren''t available without using XMM registers). The only other meaningful tuning one could do would be to increase the prefetch distances and grow the distance between loads and stores. The latter would require the use of more registers and hence have other drawbacks. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Aug 6, 2010 at 12:57 AM, Jan Beulich <JBeulich@novell.com> wrote:>>>> On 15.07.10 at 20:15, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: >> Hi Jan, Keir -- >> >> My x86 assembly skills are much too poor to carefully evaluate >> and, if of value, implement this in Xen but given your previous >> interest, such as: >> >> http://xenbits.xensource.com/xen-unstable.hg?rev/8de4b4e9a435 >> >> the following might be worth looking at. >> >> Intel has just posted memcpy improvements for glibc for recent >> popular Intel processor families here: >> >> http://article.gmane.org/gmane.comp.lib.glibc.alpha/15278 >> >> The preface to the above patch looks very enticing... > > I''m not sure how much of this applies to the much more specific > case of copying pages... Additionally, I don''t think trying to > use XMM registers in Xen would be a good idea.Why would you say using xmm/sse in Xen is a bad idea ? We already have a copy_page_sse2 (in copy_page.S) in our code base and available (by default) for x86_64. Is it a bad idea to use that ?> >> Semi-related, I wonder if you know, if there were a >> "copy_page_from_other_node()" to be used if the >> caller is fairly sure that the page is being copied >> between nodes, could this be made significantly faster >> than a normal copy_page()? > > I would think that this should mostly be taken care of by > using non-temporal stores (non-temporal loads unfortunately > aren''t available without using XMM registers). The only other > meaningful tuning one could do would be to increase the > prefetch distances and grow the distance between loads and > stores. The latter would require the use of more registers > and hence have other drawbacks. > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Aug 9, 2010 at 10:47 AM, Dulloor <dulloor@gmail.com> wrote:> On Fri, Aug 6, 2010 at 12:57 AM, Jan Beulich <JBeulich@novell.com> wrote: >>>>> On 15.07.10 at 20:15, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: >>> Hi Jan, Keir -- >>> >>> My x86 assembly skills are much too poor to carefully evaluate >>> and, if of value, implement this in Xen but given your previous >>> interest, such as: >>> >>> http://xenbits.xensource.com/xen-unstable.hg?rev/8de4b4e9a435 >>> >>> the following might be worth looking at. >>> >>> Intel has just posted memcpy improvements for glibc for recent >>> popular Intel processor families here: >>> >>> http://article.gmane.org/gmane.comp.lib.glibc.alpha/15278 >>> >>> The preface to the above patch looks very enticing... >> >> I''m not sure how much of this applies to the much more specific >> case of copying pages... Additionally, I don''t think trying to >> use XMM registers in Xen would be a good idea.> Why would you say using xmm/sse in Xen is a bad idea ? We already have a > copy_page_sse2 (in copy_page.S) in our code base and available (by default) > for x86_64. Is it a bad idea to use that ?Never mind about copy_page_sse2 ! That function name is misleading. But, still ... I need a copy_page routine and was planning to use sse. Is that not fine ?> >> >>> Semi-related, I wonder if you know, if there were a >>> "copy_page_from_other_node()" to be used if the >>> caller is fairly sure that the page is being copied >>> between nodes, could this be made significantly faster >>> than a normal copy_page()? >> >> I would think that this should mostly be taken care of by >> using non-temporal stores (non-temporal loads unfortunately >> aren''t available without using XMM registers). The only other >> meaningful tuning one could do would be to increase the >> prefetch distances and grow the distance between loads and >> stores. The latter would require the use of more registers >> and hence have other drawbacks. >> >> Jan >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 09.08.10 at 19:57, Dulloor <dulloor@gmail.com> wrote: > On Mon, Aug 9, 2010 at 10:47 AM, Dulloor <dulloor@gmail.com> wrote: >> On Fri, Aug 6, 2010 at 12:57 AM, Jan Beulich <JBeulich@novell.com> wrote: >>> I''m not sure how much of this applies to the much more specific >>> case of copying pages... Additionally, I don''t think trying to >>> use XMM registers in Xen would be a good idea. > >> Why would you say using xmm/sse in Xen is a bad idea ? We already have a >> copy_page_sse2 (in copy_page.S) in our code base and available (by default) >> for x86_64. Is it a bad idea to use that ? > > Never mind about copy_page_sse2 ! That function name is misleading.Why - it is code that''s dependent on SSE2 to be available. Note it doesn''t have ''xmm'' in its name - that indeed would be misleading.> But, still ... I need a copy_page routine and was planning to use sse. > Is that not fine ?You can do so if you feel like saving/restoring all necessary XMM state isn''t going to eat up all of the performance win... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >> Why would you say using xmm/sse in Xen is a bad idea ? We already > have a > >> copy_page_sse2 (in copy_page.S) in our code base and available (by > default) > >> for x86_64. Is it a bad idea to use that ? > > > > Never mind about copy_page_sse2 ! That function name is misleading. > > Why - it is code that''s dependent on SSE2 to be available. Note it > doesn''t have ''xmm'' in its name - that indeed would be misleading. > > > But, still ... I need a copy_page routine and was planning to use > sse. > > Is that not fine ? > > You can do so if you feel like saving/restoring all necessary XMM > state isn''t going to eat up all of the performance win...Again excuse my x86 ignorance, but on some architectures floating point registers can be saved/restored "lazily" because there is a privileged bit that disables their use (which can be trapped and used as a "floating-point dirty" bit). Is there anything equivalent for the XMM state? If so, then lazy save might be a good approach. If not, then I agree that the state save/restore overhead might eat up the performance win. (However, if we were to later use Linux memory compaction and NUMA page migration, the performance tradeoff might change to positive.) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/08/2010 13:31, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> You can do so if you feel like saving/restoring all necessary XMM >> state isn''t going to eat up all of the performance win... > > Again excuse my x86 ignorance, but on some architectures > floating point registers can be saved/restored "lazily" > because there is a privileged bit that disables their use > (which can be trapped and used as a "floating-point dirty" bit). > Is there anything equivalent for the XMM state? If so, > then lazy save might be a good approach. If not, then I agree > that the state save/restore overhead might eat up the performance > win. (However, if we were to later use Linux memory compaction > and NUMA page migration, the performance tradeoff might change > to positive.)We do lazy FPU/SSE restore already. But in any case, it is questionable how much faster you can make a non-temporal and/or non-local bulk memory copy: it ought to be bottlenecked on FSB bandwidth. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 10.08.10 at 14:31, Dan Magenheimer <dan.magenheimer@oracle.com> wrote: >> You can do so if you feel like saving/restoring all necessary XMM >> state isn''t going to eat up all of the performance win... > > Again excuse my x86 ignorance, but on some architectures > floating point registers can be saved/restored "lazily" > because there is a privileged bit that disables their use > (which can be trapped and used as a "floating-point dirty" bit). > Is there anything equivalent for the XMM state? If so,CR0.TS covers both FP and XMM state.> then lazy save might be a good approach. If not, then I agreeLazy save, even in the kernel, is used mainly for avoiding the user context restore, not for dealing with in-kernel accesses to that register state. It certainly can be made work, but again I''m uncertain it''s worth it.> that the state save/restore overhead might eat up the performance > win. (However, if we were to later use Linux memory compaction > and NUMA page migration, the performance tradeoff might change > to positive.)Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel