Yang, Xiaowei
2006-Nov-23 09:13 UTC
[Xen-devel] some performance issue of shadow2 on 2.6 Linux VMX and possible fix
On ia32e system, old Linux 2.6 kernel (previous to 2.6.16) shares one l4 page for all processes. At context switch, it replaces old L3 page in l4e with the new one. Current shadow discards old shadow page at the same time. When context switch (e.g. client/server model) is very frequent, it can be a high cost. One solution is to pin L3 page as well as L4 page. It reserves previous process'' L3 shadow page for later use. The test shows it benefits benchmark with frequent context switch such as OLTP (server/client), CPU2k (multi users) and specjbb (multi warehouses). But it also introduces some overhead. As L3 page table is pinned, it needs 1+ extra page fault to be unpinned after the process is terminated. KB''s performance has some impact. Thanks, Xiaowei _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Nov-23 10:22 UTC
RE: [Xen-devel] some performance issue of shadow2 on 2.6 Linux VMX andpossible fix
> On ia32e system, old Linux 2.6 kernel (previous to 2.6.16) shares onel4> page for all processes. At context switch, it replaces old L3 page in > l4e with the new one. Current shadow discards old shadow page at the > same time. When context switch (e.g. client/server model) is very > frequent, it can be a high cost.[Aside: I think it was 2.6.11 that moved to a proper 4-level pagetable, but certainly later than RHEL4 and SLES9 kernels shipped]> One solution is to pin L3 page as well as L4 page. It reservesprevious> process'' L3 shadow page for later use. The test shows it benefits > benchmark with frequent context switch such as OLTP (server/client), > CPU2k (multi users) and specjbb (multi warehouses). > > But it also introduces some overhead. As L3 page table is pinned, it > needs 1+ extra page fault to be unpinned after the process is > terminated. KB''s performance has some impact.I think for stuff like this we need to use some heuristics to detect old kernels and enable L3 pining rather than slowing down everyone. E.g. enabling the heuristic if the number of L4 pages we currently have shadowed is <= #vcpus. At the very least, we could pin L3 mapped in just the specific L4 slot used by older linux kernels. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2006-Nov-23 18:07 UTC
[Xen-devel] Re: some performance issue of shadow2 on 2.6 Linux VMX and possible fix
Hi, At 17:13 +0800 on 23 Nov (1164301986), Yang, Xiaowei wrote:> On ia32e system, old Linux 2.6 kernel (previous to 2.6.16) shares one l4 > page for all processes. At context switch, it replaces old L3 page in > l4e with the new one. Current shadow discards old shadow page at the > same time. When context switch (e.g. client/server model) is very > frequent, it can be a high cost. > > One solution is to pin L3 page as well as L4 page. It reserves previous > process'' L3 shadow page for later use. The test shows it benefits > benchmark with frequent context switch such as OLTP (server/client), > CPU2k (multi users) and specjbb (multi warehouses).Thanks! Changeset 12533:2fd223c64fc6 pins l3 shadows, and falls back to normal behaviour if we see the guest using too many l4 pages. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel