I''ve been investigating why qemu-dm is causing %CPU to be high when viewing fully vitalized guests with vncviewer( about 20% usage ). I''ve looked at the code, and one area that I''m curious about is the vram_dirty() function in tools/ioemu/hw/vga.c. Please correct me if I''m wrong, but vram_dirty() seems to be using SSE inline functions to optimize it''s bit-shifting/masking/loading/storing/comparison operations to see if dirty bits need to be set for a page within the shadow table. Also, I used gdb to make sure that I''m really executing the SSE optimized version of vram_dirty() that utilizes xmm0 registers. So out of curiosity, I decided to comment out calls to vram_dirty() from vga_draw_graphic() and the guests still behave normally, but CPU% now drops to 8%. I know this is silly, and that I should expect a CPU drop for commenting out code, but why is vram_dirty() adding 12% CPU utilization when it can be commented out without crashing my viewer, and without me even noticing a difference in the guests behavior? Can someone help me to understand the purpose vram_dirty serves and perhaps why it seems 2 be so CPU intensive without really changing the way my virtual guest behaves? Also, where else should I look in the code for possible explanations to why qemu-dm uses 20% CPU simply to view a guest. All comments and suggestions regarding this matter are appreciated, thx, T. McAfee Xen Testing _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
There was a long thread about this topic already plus a patch floating around. I don''t think vram_dirty is the problem. vram_dirty seems to be Xen-specific. Presumably, since we map the framebuffer directly into the guest, we cannot use write-faulting anymore to track dirtying. Instead, it looks like we rely on a double buffer to determine which portions of the screen change. Regards, Anthony Liguori Tommie McAfee wrote:> I''ve been investigating why qemu-dm is causing %CPU to be high when > viewing fully vitalized guests with vncviewer( about 20% usage ). > > I''ve looked at the code, and one area that I''m curious about is the > vram_dirty() function in tools/ioemu/hw/vga.c. Please correct me if I''m > wrong, but vram_dirty() seems to be using SSE inline functions to > optimize it''s bit-shifting/masking/loading/storing/comparison operations > to see if dirty bits need to be set for a page within the shadow table. > Also, I used gdb to make sure that I''m really executing the SSE > optimized version of vram_dirty() that utilizes xmm0 registers. > > So out of curiosity, I decided to comment out calls to vram_dirty() from > vga_draw_graphic() and the guests still behave normally, but CPU% now > drops to 8%. I know this is silly, and that I should expect a CPU drop > for commenting out code, but why is vram_dirty() adding 12% CPU > utilization when it can be commented out without crashing my viewer, and > without me even noticing a difference in the guests behavior? Can > someone help me to understand the purpose vram_dirty serves and perhaps > why it seems 2 be so CPU intensive without really changing the way my > virtual guest behaves? > > Also, where else should I look in the code for possible explanations to > why qemu-dm uses 20% CPU simply to view a guest. All comments and > suggestions regarding this matter are appreciated, > > thx, > > T. McAfee > Xen Testing > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Sep 20, 2006 at 04:01:44PM -0400, Tommie McAfee wrote:> Also, where else should I look in the code for possible explanations to > why qemu-dm uses 20% CPU simply to view a guest. All comments and > suggestions regarding this matter are appreciated,Rather than looking at the code, try using a profiling tool like OProfile to instrument exactly where the runtime is going. For example running it against qemu-dm instance on my Fedora Core 6 Xen host shows the following top hot-spots: # opcontrol --setup --separate=library,kernel \ --vmlinux=/usr/lib/debug/lib/modules/2.6.17-1.2647.fc6/vmlinux \ -e CPU_CLK_UNHALTED:100000: # opcontrol --start # opreport /usr/lib64/xen/bin/qemu-dm -a -l 2>/dev/null | head -7 Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples cum. samples % cum. % image name symbol name 391713 391713 56.1887 56.1887 qemu-dm vram_dirty 202108 593821 28.9911 85.1798 qemu-dm _vnc_update_client 99266 693087 14.2391 99.4189 qemu-dm vga_draw_line24_32 2023 695110 0.2902 99.7091 qemu-dm vga_update_display Fortunately this profile shows some clear hotspots to examine in greater details - 3 sxamples accounting for 98% of all CPU time in qemu-dm This is with a single VNC client connected, but not actively doing anything in the guest framebuffer, using RHEL-3 as the fully-virt guest. If you have debuginfo available you can even get source file annotations of where the hits are. For example, taking the 2nd hit there: # opannotate /usr/lib64/xen/bin/qemu-dm --source -i _vnc_update_client ....snip start of source.... 14800 1.5009 : for (x = 0; x < X2DP_UP(vs, vs->ds->width); x++) { 9928 1.0068 : if (vs->dirty_row[y] & (1ULL << x)) { 936343 94.9594 : if (memcmp(old_ptr, ptr, tile_bytes)) { : vs->has_update = 1; 1 1.0e-04 : vs->update_row[y] |= (1ULL << x); : memcpy(old_ptr, ptr, tile_bytes); : } 14513 1.4718 : vs->dirty_row[y] &= ~(1ULL << x); : } : 4742 0.4809 : ptr += tile_bytes; : old_ptr += tile_bytes; : } ......snip rest of source.... Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Since I wrote the code all I can do is say that Anthony is completely correct, there''s a shadow frame buffer that `vram_dirty'' compares to see if the HVM guest has modified it''s frame buffer. Deleting this function should result in a blank VGA screen for your HVM guest, I''m really curious how you got this to work. Are you accessing the guest from the serial line rather than the VGA console? -- Don Dugger "Censeo Toto nos in Kansa esse decisse." - D. Gale Donald.D.Dugger@intel.com Ph: (303)443-3786>-----Original Message----- >From: xen-devel-bounces@lists.xensource.com >[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >Anthony Liguori >Sent: Wednesday, September 20, 2006 2:13 PM >To: Tommie McAfee >Cc: xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] qemu-dm performance > >There was a long thread about this topic already plus a patch floating >around. I don''t think vram_dirty is the problem. > >vram_dirty seems to be Xen-specific. Presumably, since we map the >framebuffer directly into the guest, we cannot use write-faulting >anymore to track dirtying. Instead, it looks like we rely on a double >buffer to determine which portions of the screen change. > >Regards, > >Anthony Liguori > >Tommie McAfee wrote: >> I''ve been investigating why qemu-dm is causing %CPU to be high when >> viewing fully vitalized guests with vncviewer( about 20% usage ). >> >> I''ve looked at the code, and one area that I''m curious about is the >> vram_dirty() function in tools/ioemu/hw/vga.c. Please >correct me if I''m >> wrong, but vram_dirty() seems to be using SSE inline functions to >> optimize it''s >bit-shifting/masking/loading/storing/comparison operations >> to see if dirty bits need to be set for a page within the >shadow table. >> Also, I used gdb to make sure that I''m really executing the SSE >> optimized version of vram_dirty() that utilizes xmm0 registers. >> >> So out of curiosity, I decided to comment out calls to >vram_dirty() from >> vga_draw_graphic() and the guests still behave normally, but CPU% now >> drops to 8%. I know this is silly, and that I should expect >a CPU drop >> for commenting out code, but why is vram_dirty() adding 12% CPU >> utilization when it can be commented out without crashing my >viewer, and >> without me even noticing a difference in the guests behavior? Can >> someone help me to understand the purpose vram_dirty serves >and perhaps >> why it seems 2 be so CPU intensive without really changing the way my >> virtual guest behaves? >> >> Also, where else should I look in the code for possible >explanations to >> why qemu-dm uses 20% CPU simply to view a guest. All comments and >> suggestions regarding this matter are appreciated, >> >> thx, >> >> T. McAfee >> Xen Testing >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Donald,>Are you accessing the guest from the serial line rather than the VGAconsole? I''m sure that I''m accessing the guest from the VGA console.>Deleting this function should result in a blank VGA screen for your HVMguest, I not sure why but I''m not seeing this. Here is what how I commented out vram_dirty (as shown in gdb): 1563 for (y = 0; y < s->vram_size; y += TARGET_PAGE_SIZE){ 1564 /*if (vram_dirty(s, y, TARGET_PAGE_SIZE)) 1565 cpu_physical_memory_set_dirty(s->vram_offset + y); */ 1566 } After making these changes, I did a ''make && make install'' from the tools/ioemu directory to rebuild the qemu-dm binary. The guests still behave normally after commenting out calls to vram_dirty(), and these changes cause the CPU% to drop from 20 to about 8.>there''s a shadow frame buffer that `vram_dirty'' compares to see if theHVM guest has modified it''s frame buffer What could be causing the screen to be updated if vram_dirty() isn''t even being called here to compare the shadow frame buffer with the HVMs'' frame buffer? Is any one else seeing this? Thanks for all of your help, T. McAfee Xen Testing -----Original Message----- From: Dugger, Donald D [mailto:donald.d.dugger@intel.com] Sent: Wednesday, September 20, 2006 5:40 PM To: Anthony Liguori; McAfee, Tommie M Cc: xen-devel@lists.xensource.com Subject: RE: [Xen-devel] qemu-dm performance Since I wrote the code all I can do is say that Anthony is completely correct, there''s a shadow frame buffer that `vram_dirty'' compares to see if the HVM guest has modified it''s frame buffer. Deleting this function should result in a blank VGA screen for your HVM guest, I''m really curious how you got this to work. Are you accessing the guest from the serial line rather than the VGA console? -- Don Dugger "Censeo Toto nos in Kansa esse decisse." - D. Gale Donald.D.Dugger@intel.com Ph: (303)443-3786>-----Original Message----- >From: xen-devel-bounces@lists.xensource.com >[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >Anthony Liguori >Sent: Wednesday, September 20, 2006 2:13 PM >To: Tommie McAfee >Cc: xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] qemu-dm performance > >There was a long thread about this topic already plus a patch floating >around. I don''t think vram_dirty is the problem. > >vram_dirty seems to be Xen-specific. Presumably, since we map the >framebuffer directly into the guest, we cannot use write-faulting >anymore to track dirtying. Instead, it looks like we rely on a double >buffer to determine which portions of the screen change. > >Regards, > >Anthony Liguori > >Tommie McAfee wrote: >> I''ve been investigating why qemu-dm is causing %CPU to be high when >> viewing fully vitalized guests with vncviewer( about 20% usage ). >> >> I''ve looked at the code, and one area that I''m curious about is the >> vram_dirty() function in tools/ioemu/hw/vga.c. Please >correct me if I''m >> wrong, but vram_dirty() seems to be using SSE inline functions to >> optimize it''s >bit-shifting/masking/loading/storing/comparison operations >> to see if dirty bits need to be set for a page within the >shadow table. >> Also, I used gdb to make sure that I''m really executing the SSE >> optimized version of vram_dirty() that utilizes xmm0 registers. >> >> So out of curiosity, I decided to comment out calls to >vram_dirty() from >> vga_draw_graphic() and the guests still behave normally, but CPU% now >> drops to 8%. I know this is silly, and that I should expect >a CPU drop >> for commenting out code, but why is vram_dirty() adding 12% CPU >> utilization when it can be commented out without crashing my >viewer, and >> without me even noticing a difference in the guests behavior? Can >> someone help me to understand the purpose vram_dirty serves >and perhaps >> why it seems 2 be so CPU intensive without really changing the way my >> virtual guest behaves? >> >> Also, where else should I look in the code for possible >explanations to >> why qemu-dm uses 20% CPU simply to view a guest. All comments and >> suggestions regarding this matter are appreciated, >> >> thx, >> >> T. McAfee >> Xen Testing >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >Deleting this function should result in a blank VGA screen for your HVM > guest, > > I not sure why but I''m not seeing this. Here is what how I commented > out vram_dirty (as shown in gdb): > > 1563 for (y = 0; y < s->vram_size; y += TARGET_PAGE_SIZE){ > 1564 /*if (vram_dirty(s, y, TARGET_PAGE_SIZE)) > 1565 cpu_physical_memory_set_dirty(s->vram_offset + y); > */ > 1566 }It turns out that there was a bug a little later on which meant that the dirty bits were never actually cleared. Fixing this seems to have fairly drastically reduced qemu overhead. The fix is now present in xen-unstable. Thanks, Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks a lot, this fix does reduce qemu overhead. Also, while testing I noticed a small bug with vncviewer using a sles9sp3 guest. My initial attempt to view a guest causes vnc to render a small window that doesn''t respond to mouse movements. I have to close this window and then start the vncviewer again to get a resized and usable window. I''m only seeing this on my first attempt to view the guest. Any suggestions? Thx again for your help, T. McAfee Xen Testing -----Original Message----- From: Steven Smith [mailto:sos22@hermes.cam.ac.uk] On Behalf Of Steven Smith Sent: Thursday, September 21, 2006 3:23 PM To: McAfee, Tommie M Cc: Dugger, Donald D; Anthony Liguori; xen-devel@lists.xensource.com; sos22@srcf.ucam.org Subject: Re: [Xen-devel] qemu-dm performance> >Deleting this function should result in a blank VGA screen for yourHVM> guest, > > I not sure why but I''m not seeing this. Here is what how I commented > out vram_dirty (as shown in gdb): > > 1563 for (y = 0; y < s->vram_size; y += TARGET_PAGE_SIZE){ > 1564 /*if (vram_dirty(s, y, TARGET_PAGE_SIZE)) > 1565 cpu_physical_memory_set_dirty(s->vram_offset + y); > */ > 1566 }It turns out that there was a bug a little later on which meant that the dirty bits were never actually cleared. Fixing this seems to have fairly drastically reduced qemu overhead. The fix is now present in xen-unstable. Thanks, Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Also, while testing I noticed a small bug with vncviewer using a > sles9sp3 guest. My initial attempt to view a guest causes vnc to render > a small window that doesn''t respond to mouse movements. I have to close > this window and then start the vncviewer again to get a resized and > usable window. I''m only seeing this on my first attempt to view the > guest. Any suggestions?Yep: update to a version of xen-unstable which has cset 11589. It should hit xen-unstable in the next couple of hours. Steven.> > Thx again for your help, > > T. McAfee > Xen Testing > > > > -----Original Message----- > From: Steven Smith [mailto:sos22@hermes.cam.ac.uk] On Behalf Of Steven > Smith > Sent: Thursday, September 21, 2006 3:23 PM > To: McAfee, Tommie M > Cc: Dugger, Donald D; Anthony Liguori; xen-devel@lists.xensource.com; > sos22@srcf.ucam.org > Subject: Re: [Xen-devel] qemu-dm performance > > > >Deleting this function should result in a blank VGA screen for your > HVM > > guest, > > > > I not sure why but I''m not seeing this. Here is what how I commented > > out vram_dirty (as shown in gdb): > > > > 1563 for (y = 0; y < s->vram_size; y += TARGET_PAGE_SIZE){ > > 1564 /*if (vram_dirty(s, y, TARGET_PAGE_SIZE)) > > 1565 cpu_physical_memory_set_dirty(s->vram_offset + y); > > */ > > 1566 } > It turns out that there was a bug a little later on which meant that > the dirty bits were never actually cleared. Fixing this seems to have > fairly drastically reduced qemu overhead. The fix is now present in > xen-unstable. > > Thanks, > > Steven._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
SLES uses TightVNC which has a problem with resize events (it doesn''t handle them). If you replace the VNC client with RealVNC, which handles resize events, this problem should go away. -- Don Dugger "Censeo Toto nos in Kansa esse decisse." - D. Gale Donald.D.Dugger@intel.com Ph: (303)443-3786>-----Original Message----- >From: McAfee, Tommie M [mailto:Tommie.McAfee@unisys.com] >Sent: Friday, September 22, 2006 12:58 PM >To: Steven Smith >Cc: Dugger, Donald D; Anthony Liguori; >xen-devel@lists.xensource.com; sos22@srcf.ucam.org >Subject: RE: [Xen-devel] qemu-dm performance > > > >Thanks a lot, this fix does reduce qemu overhead. > >Also, while testing I noticed a small bug with vncviewer using a >sles9sp3 guest. My initial attempt to view a guest causes vnc >to render >a small window that doesn''t respond to mouse movements. I >have to close >this window and then start the vncviewer again to get a resized and >usable window. I''m only seeing this on my first attempt to view the >guest. Any suggestions? > >Thx again for your help, > >T. McAfee >Xen Testing > > > >-----Original Message----- >From: Steven Smith [mailto:sos22@hermes.cam.ac.uk] On Behalf Of Steven >Smith >Sent: Thursday, September 21, 2006 3:23 PM >To: McAfee, Tommie M >Cc: Dugger, Donald D; Anthony Liguori; xen-devel@lists.xensource.com; >sos22@srcf.ucam.org >Subject: Re: [Xen-devel] qemu-dm performance > >> >Deleting this function should result in a blank VGA screen for your >HVM >> guest, >> >> I not sure why but I''m not seeing this. Here is what how I commented >> out vram_dirty (as shown in gdb): >> >> 1563 for (y = 0; y < s->vram_size; y += TARGET_PAGE_SIZE){ >> 1564 /*if (vram_dirty(s, y, TARGET_PAGE_SIZE)) >> 1565 >cpu_physical_memory_set_dirty(s->vram_offset + y); >> */ >> 1566 } >It turns out that there was a bug a little later on which meant that >the dirty bits were never actually cleared. Fixing this seems to have >fairly drastically reduced qemu overhead. The fix is now present in >xen-unstable. > >Thanks, > >Steven. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel