Hi, all: I am trying to studying the cache effect under Xen. I found under virtualization, the cache latency is smaller than under a non-virt OS using same machine. I am using similar technogies as in post, Figure 1. http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/ My host domain is Cent OS 6.2, running Linux 3.4.35, Xen version is 4.2.0 Guest OS is Ubuntu 12.04, Kernel 3.2.0. My CPU is Intel i7, x980, it has 6 cores, runs constantly at 3.33GHz (disabled hyperthread). Also disabled frequency scaling. with 32KB L1, 256KB L2 on each core, and a shared 12 MB L3 cache. I am measuring the cache latency, the workload is readling data from an array. X axis is the array size, in log scale. y axis is the average cycles per access. Two lines are shown: The solid line is an experiments done using non-virt OS. I pinned the task to a specific core to prevent migration; The dashed line is running the same experiment, but within a guest OS(configured with one VCPU, pinned to one core). You can see both lines shows three jumps at 32KB, 256KB, and around 12MB, which is the size of L1, L2, and L3. The strange thing is the time using virtualization is smaller than non-virt. I am guessing Xen did some cache prefetch about this? Thanks very much! Sisu -- Sisu Xi, PhD Candidate http://www.cse.wustl.edu/~xis/ Department of Computer Science and Engineering Campus Box 1045 Washington University in St. Louis One Brookings Drive St. Louis, MO 63130 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Hi, At 13:22 -0500 on 19 May (1368969755), Sisu Xi wrote:> I am using similar technogies as in post, Figure 1. > http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/ > > My host domain is Cent OS 6.2, running Linux 3.4.35, > Xen version is 4.2.0 > Guest OS is Ubuntu 12.04, Kernel 3.2.0. > > My CPU is Intel i7, x980, it has 6 cores, runs constantly at 3.33GHz > (disabled hyperthread). Also disabled frequency scaling. > > with 32KB L1, 256KB L2 on each core, and a shared 12 MB L3 cache. > > I am measuring the cache latency, the workload is readling data from an > array. > > X axis is the array size, in log scale. y axis is the average cycles per > access. > > Two lines are shown: > The solid line is an experiments done using non-virt OS. I pinned the task > to a specific core to prevent migration; > The dashed line is running the same experiment, but within a guest > OS(configured with one VCPU, pinned to one core). > > You can see both lines shows three jumps at 32KB, 256KB, and around 12MB, > which is the size of L1, L2, and L3. > > The strange thing is the time using virtualization is smaller than non-virt. > > I am guessing Xen did some cache prefetch about this?Not that I know of. And if you''re using the random access patterns described in that blog, I don''t see how prefetching would help. My guess is there''s some other confounding factor -- are you absolutely sure that you''ve turned off all the power management in both cases (since you''re measuring memory access time in CPU cycles that could skew the graph in either direction)? You could try a CPU-bound test and see if the Xen case is faster there as well -- if so it''s definitely not cache behaviour. We have seen cases where things like scheduler effects made a difference (e.g. if you''re using a single-processor linux in the Xen case make sure to use a single-processor linux in bare-metal too as that affects kernel performance). Is your test array already populated and pinned in memory to avoid page faults? Cheers, Tim.