Hi all, sorry to intrude on xen-devel, but I think I need direction from the expertise here. I''ve admin''d Xen servers of various flavors for a couple years, but never seen this before. After a period ranging from several hours to several days, my primary database and development DomU completely locks up. Net disconnects, but CPU(sec) continues to tick in xentop. No errors, and nothing logged. All dom''s are CentOS, so I''m pasting below what I''ve already posted to centos-devel and centos-virt. On Mon, Jul 14, 2008 at 3:49 PM, Jerry Amundson <jamundso@gmail.com> wrote:> Two Dell 6950 (now called R905, 4 Dual-Core AMD Opteron 8200 series) > heartbeat/drbd nodes running the stock CentOS 5.2 Dom0. The domU''s are > the only resources in heartbeat. > Dom1 is a perfectly running, updated, CentOS 5.2 Apache/MySQL/Samba > Dom2 is a CentOS 4.6 software development and database serverSo crash tells me that Dom2 gets to this point: SYSTEM MAP: System.map-2.6.9-67.0.20.ELxenU DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.9-67.0.20.ELxenU/vmlinux (2.6.9-67.0.20.ELxenU) DUMPFILE: /public/IntSys/tmp/m1.dmp CPUS: 6 DATE: Mon Jul 14 11:53:59 2008 UPTIME: 6 days, 11:39:33 LOAD AVERAGE: 548.07, 542.95, 434.99 TASKS: 2721 NODENAME: monolith RELEASE: 2.6.9-67.0.20.ELxenU VERSION: #1 SMP Thu Jun 26 08:36:44 EDT 2008 MACHINE: x86_64 (2194 Mhz) MEMORY: 10 GB PANIC: "" PID: 0 COMMAND: "swapper" TASK: ffffffff80322b40 (1 of 6) [THREAD_INFO: ffffffff80426000] CPU: 0 STATE: TASK_RUNNING WARNING: panic task not found crash> bt PID: 0 TASK: ffffffff80322b40 CPU: 0 COMMAND: "swapper" #0 [ffffffff80427ec0] schedule at ffffffff80294d9a #1 [ffffffff80427f98] cpu_idle at ffffffff8010b85d crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 2621696 10 GB ---- FREE 8884 34.7 MB 0% of TOTAL MEM USED 2612812 10 GB 99% of TOTAL MEM SHARED 0 0 0% of TOTAL MEM BUFFERS 59585 232.8 MB 2% of TOTAL MEM CACHED 1325825 5.1 GB 50% of TOTAL MEM SLAB 358565 1.4 GB 13% of TOTAL MEM TOTAL HIGH 0 0 0% of TOTAL MEM FREE HIGH 0 0 0% of TOTAL HIGH TOTAL LOW 2621696 10 GB 100% of TOTAL MEM FREE LOW 8884 34.7 MB 0% of TOTAL LOW kmem: swap_info[0].swap_map at ffffff00001ea000 is unaccessible So I see where the DomU is, but how did it get there? Can I find out from crash, or do I need something "real-time" within the DomU? Of course, searching has given me nothing to go on, hence this post, but I''ll continue... Tia, jerry -- "Your life is trite and jaded, boring and confiscated." - Twisted Sister _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jerry Amundson
2008-Jul-17 18:02 UTC
[Xen-devel] Re: Please help: domU becomes unresponsive
More info from crash... Can someone enlighten me as to what [swapper] is attempting here? PID PPID CPU TASK ST %MEM VSZ RSS COMM 0 0 0 ffffffff80322b40 RU 0.0 0 0 [swapper] 0 1 1 ffffff800e7b1030 RU 0.0 0 0 [swapper] 0 1 2 ffffff80006727f0 RU 0.0 0 0 [swapper] 0 1 3 ffffff8000672030 RU 0.0 0 0 [swapper] 0 1 4 ffffff80000057f0 RU 0.0 0 0 [swapper] 0 1 5 ffffff8000005030 RU 0.0 0 0 [swapper] 1 0 2 ffffff800e7b17f0 IN 0.0 4756 552 init 2 1 0 ffffff80000037f0 IN 0.0 0 0 [migration/0] 3 1 0 ffffff8000003030 IN 0.0 0 0 [ksoftirqd/0] 4 1 0 ffffff80000a67f0 IN 0.0 0 0 [events/0] 5 1 1 ffffff80000a6030 IN 0.0 0 0 [khelper] 6 1 4 ffffff82800037f0 IN 0.0 0 0 [kthread] 7 6 0 ffffff8280003030 IN 0.0 0 0 [xenwatch] 8 6 0 ffffff82800127f0 IN 0.0 0 0 [xenbus] 16 6 1 ffffff8280012030 IN 0.0 0 0 [migration/1] 17 6 1 ffffff82800677f0 IN 0.0 0 0 [ksoftirqd/1] 18 6 1 ffffff8280067030 RU 0.0 0 0 [events/1] 20 6 2 ffffff82800697f0 IN 0.0 0 0 [migration/2] 21 6 2 ffffff8280069030 IN 0.0 0 0 [ksoftirqd/2] 22 6 2 ffffff82800b67f0 IN 0.0 0 0 [events/2] 24 6 3 ffffff82800b6030 IN 0.0 0 0 [migration/3] 25 6 3 ffffff80007047f0 IN 0.0 0 0 [ksoftirqd/3] On Wed, Jul 16, 2008 at 9:50 AM, Jerry Amundson <jamundso@gmail.com> wrote:> Hi all, sorry to intrude on xen-devel, but I think I need direction > from the expertise here. I''ve admin''d Xen servers of various flavors > for a couple years, but never seen this before. After a period ranging > from several hours to several days, my primary database and > development DomU completely locks up. Net disconnects, but CPU(sec) > continues to tick in xentop. No errors, and nothing logged. All dom''s > are CentOS, so I''m pasting below what I''ve already posted to > centos-devel and centos-virt. > > On Mon, Jul 14, 2008 at 3:49 PM, Jerry Amundson <jamundso@gmail.com> wrote: >> Two Dell 6950 (now called R905, 4 Dual-Core AMD Opteron 8200 series) >> heartbeat/drbd nodes running the stock CentOS 5.2 Dom0. The domU''s are >> the only resources in heartbeat. >> Dom1 is a perfectly running, updated, CentOS 5.2 Apache/MySQL/Samba >> Dom2 is a CentOS 4.6 software development and database server > > So crash tells me that Dom2 gets to this point: > SYSTEM MAP: System.map-2.6.9-67.0.20.ELxenU > DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.9-67.0.20.ELxenU/vmlinux > (2.6.9-67.0.20.ELxenU) > DUMPFILE: /public/IntSys/tmp/m1.dmp > CPUS: 6 > DATE: Mon Jul 14 11:53:59 2008 > UPTIME: 6 days, 11:39:33 > LOAD AVERAGE: 548.07, 542.95, 434.99 > TASKS: 2721 > NODENAME: monolith > RELEASE: 2.6.9-67.0.20.ELxenU > VERSION: #1 SMP Thu Jun 26 08:36:44 EDT 2008 > MACHINE: x86_64 (2194 Mhz) > MEMORY: 10 GB > PANIC: "" > PID: 0 > COMMAND: "swapper" > TASK: ffffffff80322b40 (1 of 6) [THREAD_INFO: ffffffff80426000] > CPU: 0 > STATE: TASK_RUNNING > WARNING: panic task not found > > crash> bt > PID: 0 TASK: ffffffff80322b40 CPU: 0 COMMAND: "swapper" > #0 [ffffffff80427ec0] schedule at ffffffff80294d9a > #1 [ffffffff80427f98] cpu_idle at ffffffff8010b85d > crash> kmem -i > PAGES TOTAL PERCENTAGE > TOTAL MEM 2621696 10 GB ---- > FREE 8884 34.7 MB 0% of TOTAL MEM > USED 2612812 10 GB 99% of TOTAL MEM > SHARED 0 0 0% of TOTAL MEM > BUFFERS 59585 232.8 MB 2% of TOTAL MEM > CACHED 1325825 5.1 GB 50% of TOTAL MEM > SLAB 358565 1.4 GB 13% of TOTAL MEM > > TOTAL HIGH 0 0 0% of TOTAL MEM > FREE HIGH 0 0 0% of TOTAL HIGH > TOTAL LOW 2621696 10 GB 100% of TOTAL MEM > FREE LOW 8884 34.7 MB 0% of TOTAL LOW > > kmem: swap_info[0].swap_map at ffffff00001ea000 is unaccessible > > So I see where the DomU is, but how did it get there? Can I find out > from crash, or do I need something "real-time" within the DomU? Of > course, searching has given me nothing to go on, hence this post, but > I''ll continue... > > Tia, > jerry > > -- > "Your life is trite and jaded, boring and confiscated." - Twisted Sister >-- "Your life is trite and jaded, boring and confiscated." - Twisted Sister _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jerry Amundson
2008-Jul-22 03:52 UTC
[Xen-devel] Re: Please help: domU becomes unresponsive
No direction at all to resolve this? Previously: Bare metal, Intel i686, 8 GB RAM, very stable. Currently: Xen 3.0.3/4 guest, AMD x86_64, 10 GB RAM, locks up DAILY. Needless to say, we can''t keep the current platform for long. Bug opened http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1303 I have xentrace files that look normal to me - just a lot of __enter_scheduler do_block domain_wake on diffent cpu''s over and over. That correlates with the hardware completely pegged during the lockup time - 50% to dom0 and 50% to domU. Maybe it''s CentOS 4? Maybe it''s drbd? Maybe it''s the above, combined with high load, and memory usage? Maybe the answer is right in front of me, but I don''t see it, and I''m frustrated, and my office is frustrated. jerry -- "Be a good boy, and always let your conscience be your guide." - The Blue Fairy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2008-Jul-22 04:11 UTC
RE: [Xen-devel] Re: Please help: domU becomes unresponsive
> Currently: Xen 3.0.3/4 guest, AMD x86_64, 10 GB RAM, locks up DAILY.In the absence of any other advice, 3.0.x is pretty old and you are unlikely to find anyone particularly familiar with any particular bugs in it. I''d be upgrading to 3.2.x as the first step in resolving this problem. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jerry Amundson
2008-Jul-22 04:28 UTC
Re: [Xen-devel] Re: Please help: domU becomes unresponsive
On Mon, Jul 21, 2008 at 11:11 PM, James Harper <james.harper@bendigoit.com.au> wrote:>> Currently: Xen 3.0.3/4 guest, AMD x86_64, 10 GB RAM, locks up DAILY. > > In the absence of any other advice, 3.0.x is pretty old and you are > unlikely to find anyone particularly familiar with any particular bugs > in it. I''d be upgrading to 3.2.x as the first step in resolving this > problem.Fair enough. That''s something at least, and for that I thank you. jerry -- "Be a good boy, and always let your conscience be your guide." - The Blue Fairy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jerry Amundson
2008-Jul-22 04:36 UTC
Re: [Xen-devel] Re: Please help: domU becomes unresponsive
On Mon, Jul 21, 2008 at 11:11 PM, James Harper <james.harper@bendigoit.com.au> wrote:>> Currently: Xen 3.0.3/4 guest, AMD x86_64, 10 GB RAM, locks up DAILY. > > In the absence of any other advice, 3.0.x is pretty old and you are > unlikely to find anyone particularly familiar with any particular bugs > in it. I''d be upgrading to 3.2.x as the first step in resolving this > problem.Then again, I''m trying to keep this simple. Trying to shoe horn 3.2 into CentOS 5.2 x86_64 is non-trivial. jerry -- "Be a good boy, and always let your conscience be your guide." - The Blue Fairy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jerry Amundson
2008-Jul-22 05:03 UTC
Re: [Xen-devel] Re: Please help: domU becomes unresponsive
On Mon, Jul 21, 2008 at 11:36 PM, Jerry Amundson <jamundso@gmail.com> wrote:> Trying to shoe horn 3.2 into CentOS 5.2 x86_64 is non-trivial.Ok. Found the src.rpm. That helps a bunch... Sorry, long day! jerry -- "Be a good boy, and always let your conscience be your guide." - The Blue Fairy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jerry Amundson
2008-Jul-26 19:31 UTC
Re: [Xen-devel] Re: Please help: domU becomes unresponsive
On Tue, Jul 22, 2008 at 12:03 AM, Jerry Amundson <jamundso@gmail.com> wrote:> On Mon, Jul 21, 2008 at 11:36 PM, Jerry Amundson <jamundso@gmail.com> wrote: >> Trying to shoe horn 3.2 into CentOS 5.2 x86_64 is non-trivial. > > Ok. Found the src.rpm. That helps a bunch... > Sorry, long day!For the archives, xen-3.2.0-0xs seems to be the solution. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1303 jerry -- "Years of Academy training... wasted!" - Buzz Lightyear _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel