Hello, I''m new to using xentrace and xenalyze and I am having problems running xenalyze on a large trace file. It is always giving me a fatal error. If I run it on like a 30 second trace it seems to work fine. Is this a known issue or am I possibly doing something wrong? Do you think it would work if I truncate the file or would it be missing stuff xenalyze expects? If there is no way to truncate it perhaps I''ll see if I can modify it to only show me certain time frame - I haven''t looked at the code yet so I guess I''ll have to see if that is possible. I''m using xen3.4.3 with rhel5.4 dom0 running a rhel5.4 vm. I''m trying to debug a vm hang at boot which sporadically occurs so I just have trace running while I do a bunch of creates and deletes so the trace file gets fairly large. If you have other ideas what might work better I would be interested in hearing them. ------- -bash-3.2$ ls -la trace.raw -rw-r--r-- 1 root root 13238044416 Jul 7 23:02 trace.raw -bash-3.2$ xenalyze/xenalyze --cpu-hz=2.43G --summary trace.raw > out ------- .. .. .. .. runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible tsc skew. runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible tsc skew. runstate_change old_runstate runnable, d1402v0 runstate running. Possible tsc skew. Not updating. FATAL: p->current null ] 20f101(20:f:101) 3 [ 802061ea ffffffff f ] ----- Any help is appreciated, Thanks, Tom Graves _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
The file length itself probably isn''t that important, but rather the fact that longer trace files increase the opportunity for certain kinds of probabilistic problematic events to occur. The problem here looks like a problem with TSC skew -- xenalyze is having trouble figuring out how to process the records in the right order because of drift in the TSC value across cores (that''s what the "Possible tsc skew" messages are about), and end up breaking an assumption because it''s failing (hence the "FATAL: p->current = NULL" message) . Can you give me the cs of the tip of your hg tree? I''ll take a look and see if I have a local fix. -George On Thu, Jul 8, 2010 at 3:09 PM, Thomas Graves <tgraves@yahoo-inc.com> wrote:> Hello, > > I''m new to using xentrace and xenalyze and I am having problems running > xenalyze on a large trace file. It is always giving me a fatal error. If I > run it on like a 30 second trace it seems to work fine. > > Is this a known issue or am I possibly doing something wrong? Do you think > it would work if I truncate the file or would it be missing stuff xenalyze > expects? If there is no way to truncate it perhaps I''ll see if I can > modify it to only show me certain time frame - I haven''t looked at the code > yet so I guess I''ll have to see if that is possible. > > I''m using xen3.4.3 with rhel5.4 dom0 running a rhel5.4 vm. > > I''m trying to debug a vm hang at boot which sporadically occurs so I just > have trace running while I do a bunch of creates and deletes so the trace > file gets fairly large. If you have other ideas what might work better I > would be interested in hearing them. > > > ------- > -bash-3.2$ ls -la trace.raw > -rw-r--r-- 1 root root 13238044416 Jul 7 23:02 trace.raw > -bash-3.2$ xenalyze/xenalyze --cpu-hz=2.43G --summary trace.raw > out > > ------- > .. > .. > .. > .. > runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible > tsc skew. > runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible > tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate running. Possible > tsc skew. > Not updating. > FATAL: p->current null > ] 20f101(20:f:101) 3 [ 802061ea ffffffff f ] > ----- > > > Any help is appreciated, > Thanks, > Tom Graves > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
-bash-3.2$ hg id 503e0902a86a+ tip -bash-3.2$ hg parents changeset: 49:503e0902a86a tag: tip user: George Dunlap <george.dunlap@eu.citrix.com> date: Tue Jun 22 17:11:51 2010 +0100 summary: More xenalyze type fixes I''m using a clone of http://xenbits.xensource.com/ext/xenalyze.hg and then patched with the patch -p1 < back-patches/3.4.diff and make on rhel5.4. Let me know if you Thanks, Tom On 7/8/10 9:24 AM, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote: The file length itself probably isn''t that important, but rather the fact that longer trace files increase the opportunity for certain kinds of probabilistic problematic events to occur. The problem here looks like a problem with TSC skew -- xenalyze is having trouble figuring out how to process the records in the right order because of drift in the TSC value across cores (that''s what the "Possible tsc skew" messages are about), and end up breaking an assumption because it''s failing (hence the "FATAL: p->current = NULL" message) . Can you give me the cs of the tip of your hg tree? I''ll take a look and see if I have a local fix. -George On Thu, Jul 8, 2010 at 3:09 PM, Thomas Graves <tgraves@yahoo-inc.com> wrote:> Hello, > > I''m new to using xentrace and xenalyze and I am having problems running > xenalyze on a large trace file. It is always giving me a fatal error. If I > run it on like a 30 second trace it seems to work fine. > > Is this a known issue or am I possibly doing something wrong? Do you think > it would work if I truncate the file or would it be missing stuff xenalyze > expects? If there is no way to truncate it perhaps I''ll see if I can > modify it to only show me certain time frame - I haven''t looked at the code > yet so I guess I''ll have to see if that is possible. > > I''m using xen3.4.3 with rhel5.4 dom0 running a rhel5.4 vm. > > I''m trying to debug a vm hang at boot which sporadically occurs so I just > have trace running while I do a bunch of creates and deletes so the trace > file gets fairly large. If you have other ideas what might work better I > would be interested in hearing them. > > > ------- > -bash-3.2$ ls -la trace.raw > -rw-r--r-- 1 root root 13238044416 Jul 7 23:02 trace.raw > -bash-3.2$ xenalyze/xenalyze --cpu-hz=2.43G --summary trace.raw > out > > ------- > .. > .. > .. > .. > runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible > tsc skew. > runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible > tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate running. Possible > tsc skew. > Not updating. > FATAL: p->current null > ] 20f101(20:f:101) 3 [ 802061ea ffffffff f ] > ----- > > > Any help is appreciated, > Thanks, > Tom Graves > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I had a work-around for the problem in a local patch-queue somewhere. I''ve pushed it (along with a bunch of local stuff I had lying around ) -- do a pull and let me know if it works better. -George On Thu, Jul 8, 2010 at 3:46 PM, Thomas Graves <tgraves@yahoo-inc.com> wrote:> > -bash-3.2$ hg id > 503e0902a86a+ tip > -bash-3.2$ hg parents > changeset: 49:503e0902a86a > tag: tip > user: George Dunlap <george.dunlap@eu.citrix.com> > date: Tue Jun 22 17:11:51 2010 +0100 > summary: More xenalyze type fixes > > I’m using a clone of http://xenbits.xensource.com/ext/xenalyze.hg and then > patched with the patch -p1 < back-patches/3.4.diff and make on rhel5.4. > > Let me know if you > > Thanks, > Tom > > > On 7/8/10 9:24 AM, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote: > > The file length itself probably isn''t that important, but rather the > fact that longer trace files increase the opportunity for certain > kinds of probabilistic problematic events to occur. > > The problem here looks like a problem with TSC skew -- xenalyze is > having trouble figuring out how to process the records in the right > order because of drift in the TSC value across cores (that''s what the > "Possible tsc skew" messages are about), and end up breaking an > assumption because it''s failing (hence the "FATAL: p->current = NULL" > message) . > > Can you give me the cs of the tip of your hg tree? I''ll take a look > and see if I have a local fix. > > -George > > On Thu, Jul 8, 2010 at 3:09 PM, Thomas Graves <tgraves@yahoo-inc.com> wrote: >> Hello, >> >> I''m new to using xentrace and xenalyze and I am having problems running >> xenalyze on a large trace file. It is always giving me a fatal error. If I >> run it on like a 30 second trace it seems to work fine. >> >> Is this a known issue or am I possibly doing something wrong? Do you >> think >> it would work if I truncate the file or would it be missing stuff xenalyze >> expects? If there is no way to truncate it perhaps I''ll see if I can >> modify it to only show me certain time frame - I haven''t looked at the >> code >> yet so I guess I''ll have to see if that is possible. >> >> I''m using xen3.4.3 with rhel5.4 dom0 running a rhel5.4 vm. >> >> I''m trying to debug a vm hang at boot which sporadically occurs so I just >> have trace running while I do a bunch of creates and deletes so the trace >> file gets fairly large. If you have other ideas what might work better I >> would be interested in hearing them. >> >> >> ------- >> -bash-3.2$ ls -la trace.raw >> -rw-r--r-- 1 root root 13238044416 Jul 7 23:02 trace.raw >> -bash-3.2$ xenalyze/xenalyze --cpu-hz=2.43G --summary trace.raw > out >> >> ------- >> .. >> .. >> .. >> .. >> runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible >> tsc skew. >> runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible >> tsc skew. >> runstate_change old_runstate runnable, d1402v0 runstate running. Possible >> tsc skew. >> Not updating. >> FATAL: p->current null >> ] 20f101(20:f:101) 3 [ 802061ea ffffffff f ] >> ----- >> >> >> Any help is appreciated, >> Thanks, >> Tom Graves >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks for the updates. It ran a lot longer then before but it still ended up failing. I''ll try truncating the file and a few other things. Let me know if you have any other ideas. runstate_change old_runstate runnable, d1402v0 runstate blocked. Possible tsc skew. runstate_change old_runstate runnable, d1402v0 runstate blocked. Possible tsc skew. runstate_change old_runstate runnable, d1402v0 runstate blocked. Possible tsc skew. runstate_change old_runstate runnable, d1402v0 runstate blocked. Possible tsc skew. runstate_change old_runstate runnable, d1402v0 runstate blocked. Possible tsc skew. runstate_change old_runstate runnable, d1402v0 runstate blocked. Possible tsc skew. runstate_change old_runstate runnable, d1402v0 runstate blocked. Possible tsc skew. runstate_change old_runstate runnable, d1402v0 runstate blocked. Possible tsc skew. runstate_change old_runstate runnable, d1402v0 runstate blocked. Possible tsc skew. runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible tsc skew. runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible tsc skew. runstate_change old_runstate runnable, d1402v0 runstate running. Possible tsc skew. Not updating. FATAL: p->current null ] 20f101(20:f:101) 3 [ 802061ea ffffffff f ] Tom On 7/8/10 11:36 AM, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote: I had a work-around for the problem in a local patch-queue somewhere. I''ve pushed it (along with a bunch of local stuff I had lying around ) -- do a pull and let me know if it works better. -George On Thu, Jul 8, 2010 at 3:46 PM, Thomas Graves <tgraves@yahoo-inc.com> wrote:> > -bash-3.2$ hg id > 503e0902a86a+ tip > -bash-3.2$ hg parents > changeset: 49:503e0902a86a > tag: tip > user: George Dunlap <george.dunlap@eu.citrix.com> > date: Tue Jun 22 17:11:51 2010 +0100 > summary: More xenalyze type fixes > > I''m using a clone of http://xenbits.xensource.com/ext/xenalyze.hg and then > patched with the patch -p1 < back-patches/3.4.diff and make on rhel5.4. > > Let me know if you > > Thanks, > Tom > > > On 7/8/10 9:24 AM, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote: > > The file length itself probably isn''t that important, but rather the > fact that longer trace files increase the opportunity for certain > kinds of probabilistic problematic events to occur. > > The problem here looks like a problem with TSC skew -- xenalyze is > having trouble figuring out how to process the records in the right > order because of drift in the TSC value across cores (that''s what the > "Possible tsc skew" messages are about), and end up breaking an > assumption because it''s failing (hence the "FATAL: p->current = NULL" > message) . > > Can you give me the cs of the tip of your hg tree? I''ll take a look > and see if I have a local fix. > > -George > > On Thu, Jul 8, 2010 at 3:09 PM, Thomas Graves <tgraves@yahoo-inc.com> wrote: >> Hello, >> >> I''m new to using xentrace and xenalyze and I am having problems running >> xenalyze on a large trace file. It is always giving me a fatal error. If I >> run it on like a 30 second trace it seems to work fine. >> >> Is this a known issue or am I possibly doing something wrong? Do you >> think >> it would work if I truncate the file or would it be missing stuff xenalyze >> expects? If there is no way to truncate it perhaps I''ll see if I can >> modify it to only show me certain time frame - I haven''t looked at the >> code >> yet so I guess I''ll have to see if that is possible. >> >> I''m using xen3.4.3 with rhel5.4 dom0 running a rhel5.4 vm. >> >> I''m trying to debug a vm hang at boot which sporadically occurs so I just >> have trace running while I do a bunch of creates and deletes so the trace >> file gets fairly large. If you have other ideas what might work better I >> would be interested in hearing them. >> >> >> ------- >> -bash-3.2$ ls -la trace.raw >> -rw-r--r-- 1 root root 13238044416 Jul 7 23:02 trace.raw >> -bash-3.2$ xenalyze/xenalyze --cpu-hz=2.43G --summary trace.raw > out >> >> ------- >> .. >> .. >> .. >> .. >> runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible >> tsc skew. >> runstate_change old_runstate blocked, d1402v0 runstate runnable. Possible >> tsc skew. >> runstate_change old_runstate runnable, d1402v0 runstate running. Possible >> tsc skew. >> Not updating. >> FATAL: p->current null >> ] 20f101(20:f:101) 3 [ 802061ea ffffffff f ] >> ----- >> >> >> Any help is appreciated, >> Thanks, >> Tom Graves >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
OK, I unified all of the p->current checks, so all of them only issue a warning and skip that record. Try it now. On my to-do list is to add a lamport clock to the runstate change trace records. Hopefully that will solve the intractable TSC drift problems once and for all. -George On 08/07/10 23:08, Thomas Graves wrote:> Thanks for the updates. It ran a lot longer then before but it still > ended up failing. I’ll try truncating the file and a few other things. > Let me know if you have any other ideas. > > > runstate_change old_runstate runnable, d1402v0 runstate blocked. > Possible tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate blocked. > Possible tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate blocked. > Possible tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate blocked. > Possible tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate blocked. > Possible tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate blocked. > Possible tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate blocked. > Possible tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate blocked. > Possible tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate blocked. > Possible tsc skew. > runstate_change old_runstate blocked, d1402v0 runstate runnable. > Possible tsc skew. > runstate_change old_runstate blocked, d1402v0 runstate runnable. > Possible tsc skew. > runstate_change old_runstate runnable, d1402v0 runstate running. > Possible tsc skew. > Not updating. > FATAL: p->current null > ] 20f101(20:f:101) 3 [ 802061ea ffffffff f ] > > > Tom > > > On 7/8/10 11:36 AM, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote: > > I had a work-around for the problem in a local patch-queue somewhere. > I''ve pushed it (along with a bunch of local stuff I had lying around > ) -- do a pull and let me know if it works better. > > -George > > On Thu, Jul 8, 2010 at 3:46 PM, Thomas Graves > <tgraves@yahoo-inc.com> wrote: > > > > -bash-3.2$ hg id > > 503e0902a86a+ tip > > -bash-3.2$ hg parents > > changeset: 49:503e0902a86a > > tag: tip > > user: George Dunlap <george.dunlap@eu.citrix.com> > > date: Tue Jun 22 17:11:51 2010 +0100 > > summary: More xenalyze type fixes > > > > I’m using a clone of http://xenbits.xensource.com/ext/xenalyze.hg > and then > > patched with the patch -p1 < back-patches/3.4.diff and make on > rhel5.4. > > > > Let me know if you > > > > Thanks, > > Tom > > > > > > On 7/8/10 9:24 AM, "George Dunlap" <George.Dunlap@eu.citrix.com> > wrote: > > > > The file length itself probably isn''t that important, but rather the > > fact that longer trace files increase the opportunity for certain > > kinds of probabilistic problematic events to occur. > > > > The problem here looks like a problem with TSC skew -- xenalyze is > > having trouble figuring out how to process the records in the right > > order because of drift in the TSC value across cores (that''s what the > > "Possible tsc skew" messages are about), and end up breaking an > > assumption because it''s failing (hence the "FATAL: p->current = NULL" > > message) . > > > > Can you give me the cs of the tip of your hg tree? I''ll take a look > > and see if I have a local fix. > > > > -George > > > > On Thu, Jul 8, 2010 at 3:09 PM, Thomas Graves > <tgraves@yahoo-inc.com> wrote: > > > Hello, > > > > > > I''m new to using xentrace and xenalyze and I am having problems > running > > > xenalyze on a large trace file. It is always giving me a fatal > error. If I > > > run it on like a 30 second trace it seems to work fine. > > > > > > Is this a known issue or am I possibly doing something wrong? Do you > > > think > > > it would work if I truncate the file or would it be missing stuff > xenalyze > > > expects? If there is no way to truncate it perhaps I''ll see if I can > > > modify it to only show me certain time frame - I haven''t looked > at the > > > code > > > yet so I guess I''ll have to see if that is possible. > > > > > > I''m using xen3.4.3 with rhel5.4 dom0 running a rhel5.4 vm. > > > > > > I''m trying to debug a vm hang at boot which sporadically occurs > so I just > > > have trace running while I do a bunch of creates and deletes so > the trace > > > file gets fairly large. If you have other ideas what might work > better I > > > would be interested in hearing them. > > > > > > > > > ------- > > > -bash-3.2$ ls -la trace.raw > > > -rw-r--r-- 1 root root 13238044416 Jul 7 23:02 trace.raw > > > -bash-3.2$ xenalyze/xenalyze --cpu-hz=2.43G --summary trace.raw > out > > > > > > ------- > > > .. > > > .. > > > .. > > > .. > > > runstate_change old_runstate blocked, d1402v0 runstate runnable. > Possible > > > tsc skew. > > > runstate_change old_runstate blocked, d1402v0 runstate runnable. > Possible > > > tsc skew. > > > runstate_change old_runstate runnable, d1402v0 runstate running. > Possible > > > tsc skew. > > > Not updating. > > > FATAL: p->current null > > > ] 20f101(20:f:101) 3 [ 802061ea ffffffff f ] > > > ----- > > > > > > > > > Any help is appreciated, > > > Thanks, > > > Tom Graves > > > > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com > > > http://lists.xensource.com/xen-devel > > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel