Roland Mainz
2006-Oct-13 01:51 UTC
[qemu-discuss] Solaris libc with QEmu-Accerlated |memcpy()|, |bzero()| ... ?
Hi! ---- While thinking about how the QEmu performance in the "interpreter" mode (e.g. emulating AMD64 on SPARC) could be improved I remebered that some platforms have CPU-specific versions of libc&co. to improve the performance... ... the question would be: Is it usefull to add another libc variant to Solaris which calls into the emulator code to "accerlate" functions like |memcopy()|, |bzero()| etc. ... ? IMO it could short-cut a 1MB copy (1048576 bytes... which may result in at least 131072 emulated instructions (assuming 8byte/64bit transfers per instruction, not counting any loop/conditional/etc. instructions). If each emulation takes ~40 natve instructions in the host system we may gain a factor of 40 in such memory operations (OkOk, this is just a very raw estimation) ... :-) I guess that other emulators like Bochs or virtualilsation software like VMware or Xen may be able to benefit from such an API, too (the performance improvment would be much smaller than a factor of 40 but it would sill save some CPU time) ... Commets/suggestions/rants welcome... ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.mainz at nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;)
Roland Mainz
2006-Oct-14 22:42 UTC
[qemu-discuss] Solaris libc with QEmu-Accerlated |memcpy()|,|bzero()| ... ?
Roland Mainz wrote:> While thinking about how the QEmu performance in the "interpreter" mode > (e.g. emulating AMD64 on SPARC) could be improved I remebered that some > platforms have CPU-specific versions of libc&co. to improve the > performance... > ... the question would be: Is it usefull to add another libc variant to > Solaris which calls into the emulator code to "accerlate" functions like > |memcopy()|, |bzero()| etc. ... ? > IMO it could short-cut a 1MB copy (1048576 bytes... which may result in > at least 131072 emulated instructions (assuming 8byte/64bit transfers > per instruction, not counting any loop/conditional/etc. instructions). > If each emulation takes ~40 natve instructions in the host system we may > gain a factor of 40 in such memory operations (OkOk, this is just a very > raw estimation) ... :-) > > I guess that other emulators like Bochs or virtualilsation software like > VMware or Xen may be able to benefit from such an API, too (the > performance improvment would be much smaller than a factor of 40 but it > would sill save some CPU time) ...One small clarification about VMware and Xen: VMware and Xen run native code on the native CPU but AFAIK all things related to the MMU call back into the virtualisation layer - which is quite expensive. A "block copy engine" (and a "zero block engine") in the virtualisation layer would be one call vs. >= 256 calls when you move a 1MB block with 4K pages - and therefore it is IMO a good idea to add a general API which can be used by VGMware&&Xen&&QEmu&&BOchs, too... ---- Bye, Roland P.S.: Can anyone forward this to the Xen people at Sun ? -- __ . . __ (o.\ \/ /.o) roland.mainz at nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;)