Hi all, I''m currently developing/testing a new scheduler for Xen and I am seeing some very strange behaviour which I can''t seem to pinpoint: For benchmarking purposes, I am running a task inside Mini-OS in a tight, busy-spinning loop for some time. The loop repeatedly polls NOW() until it exceeds a certain time limit. What I am observing is that NOW() seems to "jump" sometimes: two subsequent reads return values which differ by tens of milliseconds! I notice that my scheduler gets invoked a couple of times, but it does *not* switch to another VCPU and I doubt that the scheduler invocations alone take that long. So the loop should indeed be contiuously spinning with sporadic interruptions in the range of a few microseconds, but not tens of milliseconds. Yet, this is not what I am seeing. I wonder where the (P)CPU goes during those time intervals and so this possibly weird idea came up that Xen might use some trickery trying to detect and pause busy-spinning VCPUs. Is there anything like that in Xen (BTW: This is xen-3.2.1) , and, if there is, can it be disabled for a given domain? (Sorry if this is a silly question. Since my code is experimental and not well tested yet, there is of course the possibility that I made some stupid mistake. However, I''ve been staring at code, debug logs, etc. for several days now without much success and I am slowly getting desperate. If Xen really does pause spinning VCPUs it would explain everything.) Thanks for any help Rob -- Robert Kaiser http://wwwvs.informatik.fh-wiesbaden.de Labor für Verteilte Systeme kaiser@informatik.fh-wiesbaden.de FH Wiesbaden - University of Applied Sciences tel: (+49)611-9495-1294 Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax: (+49)611-9495-1289 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Magenheimer
2008-Sep-19 14:36 UTC
RE: [Xen-devel] Does Xen detect busy-spinning VCPUs?
Is your mini-OS pinned and you''re sure dom0 or other domains are not getting a piece of the pcpu? If so... I''ve seen anecdotal evidence of long pauses that led me to wonder about interrupt latency here: http://lists.xensource.com/archives/html/xen-devel/2008-08/msg00232.html I don''t recall the situation or the length of the pause but perhaps you are seeing something similar. Unfortunately, I never pursued the answer to the interrupt latency question.> -----Original Message----- > From: Robert Kaiser [mailto:kaiser@informatik.fh-wiesbaden.de] > Sent: Friday, September 19, 2008 6:00 AM > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] Does Xen detect busy-spinning VCPUs? > > > Hi all, > > I''m currently developing/testing a new scheduler for Xen and > I am seeing some > very strange behaviour which I can''t seem to pinpoint: For > benchmarking > purposes, I am running a task inside Mini-OS in a tight, > busy-spinning loop > for some time. The loop repeatedly polls NOW() until it > exceeds a certain > time limit. What I am observing is that NOW() seems to "jump" > sometimes: two > subsequent reads return values which differ by tens of > milliseconds! I notice > that my scheduler gets invoked a couple of times, but it does > *not* switch to > another VCPU and I doubt that the scheduler invocations alone > take that long. > So the loop should indeed be contiuously spinning with sporadic > interruptions in the range of a few microseconds, but not tens of > milliseconds. Yet, this is not what I am seeing. I wonder > where the (P)CPU > goes during those time intervals and so this possibly weird > idea came up that > Xen might use some trickery trying to detect and pause > busy-spinning VCPUs. > Is there anything like that in Xen (BTW: This is xen-3.2.1) , > and, if there > is, can it be disabled for a given domain? > > (Sorry if this is a silly question. Since my code is > experimental and not well > tested yet, there is of course the possibility that I made > some stupid > mistake. However, I''ve been staring at code, debug logs, etc. > for several > days now without much success and I am slowly getting > desperate. If Xen > really does pause spinning VCPUs it would explain everything.) > > Thanks for any help > > Rob > > -- > Robert Kaiser > http://wwwvs.informatik.fh-wiesbaden.de > Labor für Verteilte Systeme > kaiser@informatik.fh-wiesbaden.de > FH Wiesbaden - University of Applied Sciences tel: > (+49)611-9495-1294 > Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax: > (+49)611-9495-1289 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel, thanks for your response! Am Freitag 19 September 2008 16:36:13 schrieb Daniel Magenheimer:> Is your mini-OS pinned and you''re sure dom0 or other > domains are not getting a piece of the pcpu? If so...Well as I said: It''s my own scheduler that decides. I can see that it gets invoked a few times, but I checked that it always returns the VCPU that is running the spinning loop. Thus, if everything outside my scheduler plays by (what I think are) the rules, that VCPU should be the only one to get access to the PCPU. (Except for interrupt-level activities, of course). So, just _assuming_ that interrupt processing does not eat up those tens of milliseconds, where else can they possibly go? Any hints as to how I could proceed to pinpoint this problem? So far I have debugged my code by running the entire system on Qemu, using its built-in debug stub. However, anything time-related behaves completely different on Qemu than on real hardware, so I can''t use that setup any more. Presently, I''m trying to get Xen''s built-in GDB stub to work, I wonder if that will be any better than Qemu. AFAIU, the stub would have to preserve TSC register contents across breakpoints, otherwise the time coordinate perceived by the system will jump erratically. Not sure if it does that really, so this may turn out to be another dead end -- oh well..> > I''ve seen anecdotal evidence of long pauses that led me > to wonder about interrupt latency here: > > http://lists.xensource.com/archives/html/xen-devel/2008-08/msg00232.html > > I don''t recall the situation or the length of the pause > but perhaps you are seeing something similar. Unfortunately,I am seeing situations where two subsequent calls to NOW() in the Mini-OS context deliver time coordinates that differ by 95(!) milliseconds. If this were due to interrupt latencies, surely that would have been noticed by someone? Cheers Rob> I never pursued the answer to the interrupt latency question. > > > -----Original Message----- > > From: Robert Kaiser [mailto:kaiser@informatik.fh-wiesbaden.de] > > Sent: Friday, September 19, 2008 6:00 AM > > To: xen-devel@lists.xensource.com > > Subject: [Xen-devel] Does Xen detect busy-spinning VCPUs? > > > > > > Hi all, > > > > I''m currently developing/testing a new scheduler for Xen and > > I am seeing some > > very strange behaviour which I can''t seem to pinpoint: For > > benchmarking > > purposes, I am running a task inside Mini-OS in a tight, > > busy-spinning loop > > for some time. The loop repeatedly polls NOW() until it > > exceeds a certain > > time limit. What I am observing is that NOW() seems to "jump" > > sometimes: two > > subsequent reads return values which differ by tens of > > milliseconds! I notice > > that my scheduler gets invoked a couple of times, but it does > > *not* switch to > > another VCPU and I doubt that the scheduler invocations alone > > take that long. > > So the loop should indeed be contiuously spinning with sporadic > > interruptions in the range of a few microseconds, but not tens of > > milliseconds. Yet, this is not what I am seeing. I wonder > > where the (P)CPU > > goes during those time intervals and so this possibly weird > > idea came up that > > Xen might use some trickery trying to detect and pause > > busy-spinning VCPUs. > > Is there anything like that in Xen (BTW: This is xen-3.2.1) , > > and, if there > > is, can it be disabled for a given domain? > > > > (Sorry if this is a silly question. Since my code is > > experimental and not well > > tested yet, there is of course the possibility that I made > > some stupid > > mistake. However, I''ve been staring at code, debug logs, etc. > > for several > > days now without much success and I am slowly getting > > desperate. If Xen > > really does pause spinning VCPUs it would explain everything.) > > > > Thanks for any help > > > > Rob > > > > -- > > Robert Kaiser > > http://wwwvs.informatik.fh-wiesbaden.de > > Labor für Verteilte Systeme > > kaiser@informatik.fh-wiesbaden.de > > FH Wiesbaden - University of Applied Sciences tel: > > (+49)611-9495-1294 > > Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax: > > (+49)611-9495-1289 > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel-- Robert Kaiser http://wwwvs.informatik.fh-wiesbaden.de Labor für Verteilte Systeme kaiser@informatik.fh-wiesbaden.de FH Wiesbaden - University of Applied Sciences tel: (+49)611-9495-1294 Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax: (+49)611-9495-1289 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Geoffrey Lefebvre
2008-Sep-19 15:57 UTC
Re: [Xen-devel] Does Xen detect busy-spinning VCPUs?
> I am seeing situations where two subsequent calls to NOW() in the Mini-OS > context deliver time coordinates that differ by 95(!) milliseconds. If this > were due to interrupt latencies, surely that would have been noticed by > someone?These are long latencies. Have you tried comparing the values returned by NOW() in mini-os with reads from the tsc? NOW() is probably correct but it''s a good thing to rule this out. geoffrey> >> I never pursued the answer to the interrupt latency question. >> >> > -----Original Message----- >> > From: Robert Kaiser [mailto:kaiser@informatik.fh-wiesbaden.de] >> > Sent: Friday, September 19, 2008 6:00 AM >> > To: xen-devel@lists.xensource.com >> > Subject: [Xen-devel] Does Xen detect busy-spinning VCPUs? >> > >> > >> > Hi all, >> > >> > I''m currently developing/testing a new scheduler for Xen and >> > I am seeing some >> > very strange behaviour which I can''t seem to pinpoint: For >> > benchmarking >> > purposes, I am running a task inside Mini-OS in a tight, >> > busy-spinning loop >> > for some time. The loop repeatedly polls NOW() until it >> > exceeds a certain >> > time limit. What I am observing is that NOW() seems to "jump" >> > sometimes: two >> > subsequent reads return values which differ by tens of >> > milliseconds! I notice >> > that my scheduler gets invoked a couple of times, but it does >> > *not* switch to >> > another VCPU and I doubt that the scheduler invocations alone >> > take that long. >> > So the loop should indeed be contiuously spinning with sporadic >> > interruptions in the range of a few microseconds, but not tens of >> > milliseconds. Yet, this is not what I am seeing. I wonder >> > where the (P)CPU >> > goes during those time intervals and so this possibly weird >> > idea came up that >> > Xen might use some trickery trying to detect and pause >> > busy-spinning VCPUs. >> > Is there anything like that in Xen (BTW: This is xen-3.2.1) , >> > and, if there >> > is, can it be disabled for a given domain? >> > >> > (Sorry if this is a silly question. Since my code is >> > experimental and not well >> > tested yet, there is of course the possibility that I made >> > some stupid >> > mistake. However, I''ve been staring at code, debug logs, etc. >> > for several >> > days now without much success and I am slowly getting >> > desperate. If Xen >> > really does pause spinning VCPUs it would explain everything.) >> > >> > Thanks for any help >> > >> > Rob >> > >> > -- >> > Robert Kaiser >> > http://wwwvs.informatik.fh-wiesbaden.de >> > Labor für Verteilte Systeme >> > kaiser@informatik.fh-wiesbaden.de >> > FH Wiesbaden - University of Applied Sciences tel: >> > (+49)611-9495-1294 >> > Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax: >> > (+49)611-9495-1289 >> > >> > _______________________________________________ >> > Xen-devel mailing list >> > Xen-devel@lists.xensource.com >> > http://lists.xensource.com/xen-devel > > > > -- > Robert Kaiser http://wwwvs.informatik.fh-wiesbaden.de > Labor für Verteilte Systeme kaiser@informatik.fh-wiesbaden.de > FH Wiesbaden - University of Applied Sciences tel: (+49)611-9495-1294 > Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax: (+49)611-9495-1289 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Magenheimer
2008-Sep-19 16:13 UTC
RE: [Xen-devel] Does Xen detect busy-spinning VCPUs?
What does NOW() translate to in your mini-OS? Is it built on a TSC read or on top of the paravirtualized time parameters exported by Xen via the shared page? Once Xen is entered (for scheduling or anything else), Xen can choose to do anything it wants. It may do some housekeeping tasks for example (for example, there''s code to zero out pages from a destroyed domain that gets executed asynchonously). But I''d think 95msec is excessive, so if you are absolutely certain Xen is not scheduling another domain on that physical CPU during that timeslice, nor moving the mini-OS to another physical CPU, this is very worthwhile to track down.> surely that would have been noticed by someone?Perhaps not, because most real environments have lots of scheduling events that could cause gaps like that. One possible thought: If you turn on tracing (xentrace) and can isolate the suspect interval in the trace, you might get some clues as to what is happening. Last, Mukesh Rathor has some new gdb technology working with Xen though, as you''ve already discovered, time and debugging don''t mix well. See Mukesh''s talk at the last Xen summit for more info. Dan> -----Original Message----- > From: Robert Kaiser [mailto:kaiser@informatik.fh-wiesbaden.de] > Sent: Friday, September 19, 2008 9:47 AM > To: Daniel Magenheimer > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Does Xen detect busy-spinning VCPUs? > > > Daniel, > > thanks for your response! > > Am Freitag 19 September 2008 16:36:13 schrieb Daniel Magenheimer: > > Is your mini-OS pinned and you''re sure dom0 or other > > domains are not getting a piece of the pcpu? If so... > > Well as I said: It''s my own scheduler that decides. I can see > that it gets > invoked a few times, but I checked that it always returns the > VCPU that is > running the spinning loop. Thus, if everything outside my > scheduler plays by > (what I think are) the rules, that VCPU should be the only > one to get access > to the PCPU. (Except for interrupt-level activities, of > course). So, just > _assuming_ that interrupt processing does not eat up those tens of > milliseconds, where else can they possibly go? > > Any hints as to how I could proceed to pinpoint this problem? > So far I have > debugged my code by running the entire system on Qemu, using > its built-in > debug stub. However, anything time-related behaves completely > different on > Qemu than on real hardware, so I can''t use that setup any > more. Presently, > I''m trying to get Xen''s built-in GDB stub to work, I wonder > if that will be > any better than Qemu. AFAIU, the stub would have to preserve > TSC register > contents across breakpoints, otherwise the time coordinate > perceived by the > system will jump erratically. Not sure if it does that > really, so this may > turn out to be another dead end -- oh well.. > > > > > I''ve seen anecdotal evidence of long pauses that led me > > to wonder about interrupt latency here: > > > > > http://lists.xensource.com/archives/html/xen-devel/2008-08/msg > 00232.html > > > > I don''t recall the situation or the length of the pause > > but perhaps you are seeing something similar. Unfortunately, > > I am seeing situations where two subsequent calls to NOW() in > the Mini-OS > context deliver time coordinates that differ by 95(!) > milliseconds. If this > were due to interrupt latencies, surely that would have been > noticed by > someone? > > Cheers > > Rob > > > > I never pursued the answer to the interrupt latency question. > > > > > -----Original Message----- > > > From: Robert Kaiser [mailto:kaiser@informatik.fh-wiesbaden.de] > > > Sent: Friday, September 19, 2008 6:00 AM > > > To: xen-devel@lists.xensource.com > > > Subject: [Xen-devel] Does Xen detect busy-spinning VCPUs? > > > > > > > > > Hi all, > > > > > > I''m currently developing/testing a new scheduler for Xen and > > > I am seeing some > > > very strange behaviour which I can''t seem to pinpoint: For > > > benchmarking > > > purposes, I am running a task inside Mini-OS in a tight, > > > busy-spinning loop > > > for some time. The loop repeatedly polls NOW() until it > > > exceeds a certain > > > time limit. What I am observing is that NOW() seems to "jump" > > > sometimes: two > > > subsequent reads return values which differ by tens of > > > milliseconds! I notice > > > that my scheduler gets invoked a couple of times, but it does > > > *not* switch to > > > another VCPU and I doubt that the scheduler invocations alone > > > take that long. > > > So the loop should indeed be contiuously spinning with sporadic > > > interruptions in the range of a few microseconds, but not tens of > > > milliseconds. Yet, this is not what I am seeing. I wonder > > > where the (P)CPU > > > goes during those time intervals and so this possibly weird > > > idea came up that > > > Xen might use some trickery trying to detect and pause > > > busy-spinning VCPUs. > > > Is there anything like that in Xen (BTW: This is xen-3.2.1) , > > > and, if there > > > is, can it be disabled for a given domain? > > > > > > (Sorry if this is a silly question. Since my code is > > > experimental and not well > > > tested yet, there is of course the possibility that I made > > > some stupid > > > mistake. However, I''ve been staring at code, debug logs, etc. > > > for several > > > days now without much success and I am slowly getting > > > desperate. If Xen > > > really does pause spinning VCPUs it would explain everything.) > > > > > > Thanks for any help > > > > > > Rob > > > > > > -- > > > Robert Kaiser > > > http://wwwvs.informatik.fh-wiesbaden.de > > > Labor für Verteilte Systeme > > > kaiser@informatik.fh-wiesbaden.de > > > FH Wiesbaden - University of Applied Sciences tel: > > > (+49)611-9495-1294 > > > Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax: > > > (+49)611-9495-1289 > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com > > > http://lists.xensource.com/xen-devel > > > > -- > Robert Kaiser > http://wwwvs.informatik.fh-wiesbaden.de > Labor für Verteilte Systeme > kaiser@informatik.fh-wiesbaden.de > FH Wiesbaden - University of Applied Sciences tel: > (+49)611-9495-1294 > Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax: > (+49)611-9495-1289 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Robert Kaiser wrote:> Daniel, > > thanks for your response! > > Am Freitag 19 September 2008 16:36:13 schrieb Daniel Magenheimer: >> Is your mini-OS pinned and you''re sure dom0 or other >> domains are not getting a piece of the pcpu? If so... > > Well as I said: It''s my own scheduler that decides. I can see that it gets > invoked a few times, but I checked that it always returns the VCPU that is > running the spinning loop. Thus, if everything outside my scheduler plays by > (what I think are) the rules, that VCPU should be the only one to get access > to the PCPU. (Except for interrupt-level activities, of course). So, just > _assuming_ that interrupt processing does not eat up those tens of > milliseconds, where else can they possibly go? > > Any hints as to how I could proceed to pinpoint this problem?Try your minios test domain w/o your own changes to the Xen scheduler. When you run the test, use a uniprocessor dom0 bound to cpu 0. Bind your minios test domain to cpu 1. This will verify your test domain code independent from you Xen scheduler changes. If your test domain is still seeing large time jumps, verify that the idle vcpu for cpu 1 is not getting any cpu time. If it is, your test domain is doing something that is causing it to block. Steve So far I have> debugged my code by running the entire system on Qemu, using its built-in > debug stub. However, anything time-related behaves completely different on > Qemu than on real hardware, so I can''t use that setup any more. Presently, > I''m trying to get Xen''s built-in GDB stub to work, I wonder if that will be > any better than Qemu. AFAIU, the stub would have to preserve TSC register > contents across breakpoints, otherwise the time coordinate perceived by the > system will jump erratically. Not sure if it does that really, so this may > turn out to be another dead end -- oh well.. > >> I''ve seen anecdotal evidence of long pauses that led me >> to wonder about interrupt latency here: >> >> http://lists.xensource.com/archives/html/xen-devel/2008-08/msg00232.html >> >> I don''t recall the situation or the length of the pause >> but perhaps you are seeing something similar. Unfortunately, > > I am seeing situations where two subsequent calls to NOW() in the Mini-OS > context deliver time coordinates that differ by 95(!) milliseconds. If this > were due to interrupt latencies, surely that would have been noticed by > someone? > > Cheers > > Rob > > >> I never pursued the answer to the interrupt latency question. >> >>> -----Original Message----- >>> From: Robert Kaiser [mailto:kaiser@informatik.fh-wiesbaden.de] >>> Sent: Friday, September 19, 2008 6:00 AM >>> To: xen-devel@lists.xensource.com >>> Subject: [Xen-devel] Does Xen detect busy-spinning VCPUs? >>> >>> >>> Hi all, >>> >>> I''m currently developing/testing a new scheduler for Xen and >>> I am seeing some >>> very strange behaviour which I can''t seem to pinpoint: For >>> benchmarking >>> purposes, I am running a task inside Mini-OS in a tight, >>> busy-spinning loop >>> for some time. The loop repeatedly polls NOW() until it >>> exceeds a certain >>> time limit. What I am observing is that NOW() seems to "jump" >>> sometimes: two >>> subsequent reads return values which differ by tens of >>> milliseconds! I notice >>> that my scheduler gets invoked a couple of times, but it does >>> *not* switch to >>> another VCPU and I doubt that the scheduler invocations alone >>> take that long. >>> So the loop should indeed be contiuously spinning with sporadic >>> interruptions in the range of a few microseconds, but not tens of >>> milliseconds. Yet, this is not what I am seeing. I wonder >>> where the (P)CPU >>> goes during those time intervals and so this possibly weird >>> idea came up that >>> Xen might use some trickery trying to detect and pause >>> busy-spinning VCPUs. >>> Is there anything like that in Xen (BTW: This is xen-3.2.1) , >>> and, if there >>> is, can it be disabled for a given domain? >>> >>> (Sorry if this is a silly question. Since my code is >>> experimental and not well >>> tested yet, there is of course the possibility that I made >>> some stupid >>> mistake. However, I''ve been staring at code, debug logs, etc. >>> for several >>> days now without much success and I am slowly getting >>> desperate. If Xen >>> really does pause spinning VCPUs it would explain everything.) >>> >>> Thanks for any help >>> >>> Rob >>> >>> -- >>> Robert Kaiser >>> http://wwwvs.informatik.fh-wiesbaden.de >>> Labor für Verteilte Systeme >>> kaiser@informatik.fh-wiesbaden.de >>> FH Wiesbaden - University of Applied Sciences tel: >>> (+49)611-9495-1294 >>> Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax: >>> (+49)611-9495-1289 >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel