志波 唐
2007-May-23 09:15 UTC
[Xen-users] Apache CGI Performance Big Degration in Dom0 vs. Native
Hi there, I ran a test on an Apache server, the workload is a helloworld.c compiled cgi, very simple. OS is SLES 10. The stress tool is ab (apache bench). The performance looks big degration from native to Dom0: Running in prefork mode: Native Dom0 Performance(request/s) 3700 . 650 CPU% 75% 99% Running in worker mode: Native Dom0 Performance(request/s) 1750. 769 CPU% 32% 26% Also running XenOprof and Oprof as below clock cycles data: Domain 0, worker: CPU: Core 2, speed 2666.75 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % image name app name symbol name 71855 4.6699 xen-syms-pae-3.0.4_13138-0.33 xen-syms-pae-3.0.4_13138-0.33 get_page_from_l1e 71812 4.6672 xen-syms-pae-3.0.4_13138-0.33 xen-syms-pae-3.0.4_13138-0.33 handle_exception 58634 3.8107 xen-syms-pae-3.0.4_13138-0.33 xen-syms-pae-3.0.4_13138-0.33 put_page_from_l1e 56301 3.6591 xen-syms-pae-3.0.4_13138-0.33 xen-syms-pae-3.0.4_13138-0.33 do_update_va_mapping 49543 3.2199 vmlinux-2.6.16.46-0.10-xenpae vmlinux-2.6.16.46-0.10-xenpae _spin_lock Domain 0, Prefork: CPU: Core 2, speed 2666.75 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % image name app name symbol name 492796 20.0510 xen-syms-pae-3.0.4_13138-0.33 xen-syms-pae-3.0.4_13138-0.33 do_update_va_mapping 268506 10.9251 xen-syms-pae-3.0.4_13138-0.33 xen-syms-pae-3.0.4_13138-0.33 ptwr_do_page_fault 81074 3.2988 xen-syms-pae-3.0.4_13138-0.33 xen-syms-pae-3.0.4_13138-0.33 test_all_events 72424 2.9468 xen-syms-pae-3.0.4_13138-0.33 xen-syms-pae-3.0.4_13138-0.33 get_page_from_l1e 66056 2.6877 xen-syms-pae-3.0.4_13138-0.33 xen-syms-pae-3.0.4_13138-0.33 handle_exception 61031 2.4832 xen-syms-pae-3.0.4_13138-0.33 xen-syms-pae-3.0.4_13138-0.33 do_mmu_update Native, Worker: CPU: Core 2, speed 2667.14 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % app name symbol name 46048 5.8715 vmlinux-2.6.16.46-0.10-bigsmp kmap_atomic 41587 5.3027 vmlinux-2.6.16.46-0.10-bigsmp copy_page_range 39759 5.0696 vmlinux-2.6.16.46-0.10-bigsmp unmap_vmas 38722 4.9373 vmlinux-2.6.16.46-0.10-bigsmp page_fault 30638 3.9066 vmlinux-2.6.16.46-0.10-bigsmp page_remove_rmap 29481 3.7591 vmlinux-2.6.16.46-0.10-bigsmp __handle_mm_fault Native Prefork: CPU: Core 2, speed 2667.14 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % app name symbol name 47505 6.9430 vmlinux-2.6.16.46-0.10-bigsmp kmap_atomic 41182 6.0189 vmlinux-2.6.16.46-0.10-bigsmp page_fault 36425 5.3236 vmlinux-2.6.16.46-0.10-bigsmp copy_page_range 34533 5.0471 vmlinux-2.6.16.46-0.10-bigsmp unmap_vmas 32155 4.6996 vmlinux-2.6.16.46-0.10-bigsmp __handle_mm_fault 30848 4.5085 vmlinux-2.6.16.46-0.10-bigsmp find_get_page 28589 4.1784 vmlinux-2.6.16.46-0.10-bigsmp page_remove_rmap On pure HTML workload, I find little performance gap between domain 0 and Native, and neither so much Memory and Page operation. Is the Domain 0 performance for CGI degration because of Xen Memory and Page ineffiencency? Any hints or experience on this? thanks in advance Rgds Hunter Tang ___________________________________________________________ 抢注雅虎免费邮箱3.5G容量,20M附件! http://cn.mail.yahoo.com _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users