With a Linux 3.2.23 PVOPs and Xen 4.0.4, I am observing that boot speed takes a hit when the following per_cpu access is made in arch/x86/xen/enlighthen.c:xen_vcpu_setup(). The slow down is caused by the term on the left of the equal sign, ie, access to the per cpu area. As you see from the timed samples below, the access can take anywhere from .6 to 1.2 secs (though I have seen it take 1.6 secs every single time with Linux 3.2.16 and Xen 4.0.3). Subsequent iterations from the loop do not incur the expense. It also doesn''t happen on older CPUs from what I can tell, only in the newer Sandybridge Intel i-series processors. My initial guess is that there is some sort of page fault taking place but I haven''t been able to confirm that. Any ideas on how to track this farther down into Xe n to isolate and understand what is going on? Thank you. Roger R. Cruz printk(KERN_ALERT "in cpu loop %d for shared info", cpu); ===> per_cpu(xen_vcpu,cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; printk(KERN_ALERT "in cpu loop %d computed shared info", cpu); Oct 25 20:24:11 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info Oct 25 20:24:11 23445LU-PBBFMWY kernel: [ 1.238877] in cpu loop 0 computed shared info Oct 25 20:52:53 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info Oct 25 20:52:53 23445LU-PBBFMWY kernel: [ 0.718365] in cpu loop 0 computed shared info Oct 25 20:41:27 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info Oct 25 20:41:27 23445LU-PBBFMWY kernel: [ 0.671690] in cpu loop 0 computed shared info
On Fri, 2012-10-26 at 18:30 +0100, Roger Cruz wrote:> As you see from the timed samples below, the access can take anywhere > from .6 to 1.2 secs (though I have seen it take 1.6 secs every single > time with Linux 3.2.16 and Xen 4.0.3).Isn''t this just because setting up the per-cpu shared info makes the time available to printk. Note that the timestamp is zero beforehand (a big hint that time isn''t setup yet). Also see xen_clocksource_read(). IOW I think what is being measured here is the time from start of day until the second printk, not the time between the two prints. [...]> printk(KERN_ALERT "in cpu loop %d for shared info", cpu); > ===> per_cpu(xen_vcpu,cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; > printk(KERN_ALERT "in cpu loop %d computed shared info", cpu); > > Oct 25 20:24:11 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info > Oct 25 20:24:11 23445LU-PBBFMWY kernel: [ 1.238877] in cpu loop 0 computed shared info > > Oct 25 20:52:53 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info > Oct 25 20:52:53 23445LU-PBBFMWY kernel: [ 0.718365] in cpu loop 0 computed shared info > > Oct 25 20:41:27 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info > Oct 25 20:41:27 23445LU-PBBFMWY kernel: [ 0.671690] in cpu loop 0 computed shared info > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Well, that may be a good clue or reason for what I have been thinking was a long time to setup the zonelists (that is how it shows up when looking at an untouched kernel). So if I understand your response correctly.. the time displayed will always be zero until the per_cpu area is set up? ________________________________________ From: Ian Campbell [Ian.Campbell@citrix.com] Sent: Friday, October 26, 2012 2:38 PM To: Roger Cruz Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] Tracking down a boot speed issue On Fri, 2012-10-26 at 18:30 +0100, Roger Cruz wrote:> As you see from the timed samples below, the access can take anywhere > from .6 to 1.2 secs (though I have seen it take 1.6 secs every single > time with Linux 3.2.16 and Xen 4.0.3).Isn''t this just because setting up the per-cpu shared info makes the time available to printk. Note that the timestamp is zero beforehand (a big hint that time isn''t setup yet). Also see xen_clocksource_read(). IOW I think what is being measured here is the time from start of day until the second printk, not the time between the two prints. [...]> printk(KERN_ALERT "in cpu loop %d for shared info", cpu); > ===> per_cpu(xen_vcpu,cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; > printk(KERN_ALERT "in cpu loop %d computed shared info", cpu); > > Oct 25 20:24:11 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info > Oct 25 20:24:11 23445LU-PBBFMWY kernel: [ 1.238877] in cpu loop 0 computed shared info > > Oct 25 20:52:53 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info > Oct 25 20:52:53 23445LU-PBBFMWY kernel: [ 0.718365] in cpu loop 0 computed shared info > > Oct 25 20:41:27 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info > Oct 25 20:41:27 23445LU-PBBFMWY kernel: [ 0.671690] in cpu loop 0 computed shared info > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
A question remains in my mind as why the times are variable. Why does it take .67 secs and 1.6 secs other times for exactly the same boot code? ________________________________________ From: Ian Campbell [Ian.Campbell@citrix.com] Sent: Friday, October 26, 2012 2:38 PM To: Roger Cruz Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] Tracking down a boot speed issue On Fri, 2012-10-26 at 18:30 +0100, Roger Cruz wrote:> As you see from the timed samples below, the access can take anywhere > from .6 to 1.2 secs (though I have seen it take 1.6 secs every single > time with Linux 3.2.16 and Xen 4.0.3).Isn''t this just because setting up the per-cpu shared info makes the time available to printk. Note that the timestamp is zero beforehand (a big hint that time isn''t setup yet). Also see xen_clocksource_read(). IOW I think what is being measured here is the time from start of day until the second printk, not the time between the two prints. [...]> printk(KERN_ALERT "in cpu loop %d for shared info", cpu); > ===> per_cpu(xen_vcpu,cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; > printk(KERN_ALERT "in cpu loop %d computed shared info", cpu); > > Oct 25 20:24:11 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info > Oct 25 20:24:11 23445LU-PBBFMWY kernel: [ 1.238877] in cpu loop 0 computed shared info > > Oct 25 20:52:53 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info > Oct 25 20:52:53 23445LU-PBBFMWY kernel: [ 0.718365] in cpu loop 0 computed shared info > > Oct 25 20:41:27 23445LU-PBBFMWY kernel: [ 0.000000] in cpu loop 0 for shared info > Oct 25 20:41:27 23445LU-PBBFMWY kernel: [ 0.671690] in cpu loop 0 computed shared info > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Please don''t top-post. On Fri, 2012-10-26 at 19:55 +0100, Roger Cruz wrote:> So if I understand your response correctly.. the time displayed will > always be zero until the per_cpu area is set up?Correct. Ian.
On Fri, 2012-10-26 at 20:02 +0100, Roger Cruz wrote:> A question remains in my mind as why the times are variable. Why does > it take .67 secs and 1.6 secs other times for exactly the same boot > code?Something before this point takes a variable amount of time perhaps? I don''t know what happens before this point in the boot. Ian.
>>> On 27.10.12 at 12:46, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Fri, 2012-10-26 at 20:02 +0100, Roger Cruz wrote: >> A question remains in my mind as why the times are variable. Why does >> it take .67 secs and 1.6 secs other times for exactly the same boot >> code? > > Something before this point takes a variable amount of time perhaps? I > don''t know what happens before this point in the boot.This may be a direct effect of not being the only VM on the underlying host (i.e. there can be any amount of stolen time included in that initial time period). Jan