thr3ads.net - Xen users - [Xen-devel] Guest CentOS 5.4 64bit on-top XCP 0.5 issue / HP ProLiant BL460c G6 [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Marco Sinhoreli

2011-Feb-18 17:57 UTC

[Xen-devel] Guest CentOS 5.4 64bit on-top XCP 0.5 issue / HP ProLiant BL460c G6

Hello all:

I''ve running on-top XCP 0.5 (Xen Hypervisor 3.4.2) a CentOS 5.4 64bit
Guest and it has some issue to finish the boot. In linux kernel boot,
it goes into a loop like this bellow:

<code>
INFO: task swapper:1 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
swapper       D ffff880004217d88     0     1      0     2               (L-TLB)
 ffff880004217c90  0000000000000246  0000000000000000  ffff8800010f3460
 0000000000000008  ffff88000425c7a0  ffff88001fa51860  0000000000001678
 ffff88000425c988  00000000000520a1
Call Trace:
 [<ffffffff80287879>] __wake_up_common+0x3e/0x68
 [<ffffffff80262fb3>] wait_for_completion+0x7d/0xaa
 [<ffffffff8028906a>] default_wake_function+0x0/0xe
 [<ffffffff80258b30>] pdflush+0x0/0x207
 [<ffffffff8029c577>] kthread_create+0xc1/0x141
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80258b30>] pdflush+0x0/0x207
 [<ffffffff802889dd>] enqueue_task+0x41/0x56
 [<ffffffff80288a48>] __activate_task+0x56/0x6d
 [<ffffffff802490c3>] try_to_wake_up+0x392/0x3a4
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff802c1187>] start_one_pdflush_thread+0x1b/0x2e
 [<ffffffff8065d20a>] pdflush_init+0xa/0x13
 [<ffffffff8064c7eb>] init+0x1f9/0x2fe
 [<ffffffff80260b2c>] child_rip+0xa/0x12
 [<ffffffff8064c5f2>] init+0x0/0x2fe
 [<ffffffff80260b22>] child_rip+0x0/0x12
</code>

This problem occurs only on HP blades model HP ProLiant BL460c G6 [1].
Others servers running XenServer or XCP the problem does not occur.


[1]
http://h10010.www1.hp.com/wwpc/us/en/sm/WF05a/3709945-3709945-3328410-241641-3328419-3884098.html

Cheers,

--
Marco Sinhoreli

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

nadaoneal

2011-Apr-15 18:51 UTC

head link

[Xen-users] Re: Guest CentOS 5.4 64bit on-top XCP 0.5 issue / HP ProLiant BL460c G6

Hi Marco,

I hope you''ve already solved this issue to your satisfaction, but I
thought
I''d post just in case. There''s an issue that affects people
who:
- Are running HP or Fujitsu servers with a hardware RAID
- Are running CentOS/RHEL Xen domUs on CentOS/RHEL Xen dom0s
- Are using kernels 2.6.18-194.x or greater (and really, who isn''t?)
... it''s currently affecting me and I think it''s the one
affecting you.

Please see these bug reports for RedHat and CentOS for some background
information:
http://bugs.centos.org/view.php?id=4515
https://bugzilla.redhat.com/show_bug.cgi?id=605444
- You should update your firmware if you haven''t already, though that
will
not solve the problem on its own.
- You should ensure that your battery is charging correctly.
- You should switch your scheduler, on both the dom0 and the domU, to noop.
You can do this by adding "elevator=noop" to your kernel line in
/etc/grub.conf and restarting.

In my case, I also have a blade (g5 instead of g6), and my stack trace harps
on fsync issues rather than pdflush issues, but I suspect you''re
experiencing more or less the same issue. I''m currently on CentOS 5.6
and
2.6.18-238.9.1.el5xen, but I also see this issue on CentOS 5.5 and kernels
in the -194, -233, and earlier -238 ranges. I see it with Xen 3.0.3
(CentOS''s version), 3.4.3, and 4.1. (http://www.gitco.de/repo/)

You''re experiencing the issue right away, on boot, but if upgrading the
firmware and changing the scheduler fixes the boot issue, I would encourage
you to nevertheless run some tests in the guest domU to ensure that
you''re
okay during times of heavy disk access. I''ve been using dd to write a
gb to
disk to test:
$ dd if=/dev/zero of=./test1024M bs=1024k count=1024 conv=fsync
I found that before upgrading the firmware and changing the scheduler, this
would reliably make dmesg explode with "blocked for more than 120
seconds"
messages, and the write speed could be as low as 353 kB/s. Writing anything
less than 1GB did not as reliably cause issues.

Since making these changes, I still sometimes see issues with this heavy
test, still sometimes see a single "blocked for 120 seconds" message.
The
write speed can be as low as 2MB/sec, but is generally between closer
50MB/sec. So I certainly don''t have the answer, but these changes have
made
a very material difference.

--
View this message in context:
http://xen.1045712.n5.nabble.com/Guest-CentOS-5-4-64bit-on-top-XCP-0-5-issue-HP-ProLiant-BL460c-G6-tp3391484p4306328.html
Sent from the Xen - User mailing list archive at Nabble.com.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

nadaoneal

2011-Apr-15 20:10 UTC

head link

[Xen-users] Re: Guest CentOS 5.4 64bit on-top XCP 0.5 issue / HP ProLiant BL460c G6

I apologize for double-posting, but I''ve just "verified" with
about 50 trials
that capping the RAID max write speed on the dom0 at about 50MB seems to
allow the domU to very consistently write 1GB at 46-48MB/sec, without any
dmesg errors. So I would update my recommendations to:
 - install any relevant firmware upgrades and ensure there''s no battery
issue
 - on dom0: echo "50000" > /proc/sys/dev/raid/speed_limit_max
 ... if your domU uses a RAID configuration, you might want to do this on
domU as well
 - on domU and dom0: change default scheduler to noop

Again, hope this is helpful to someone. It''s just a band-aid - the
actual
fix will come either from the kernel or from a firmware update, or both,
eventually.

--
View this message in context:
http://xen.1045712.n5.nabble.com/Guest-CentOS-5-4-64bit-on-top-XCP-0-5-issue-HP-ProLiant-BL460c-G6-tp3391484p4306475.html
Sent from the Xen - User mailing list archive at Nabble.com.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Feb 2011 - Guest CentOS 5.4 64bit on-top XCP 0.5 issue / HP ProLiant BL460c G6

[Xen-devel] Guest CentOS 5.4 64bit on-top XCP 0.5 issue / HP ProLiant BL460c G6

[Xen-users] Re: Guest CentOS 5.4 64bit on-top XCP 0.5 issue / HP ProLiant BL460c G6

[Xen-users] Re: Guest CentOS 5.4 64bit on-top XCP 0.5 issue / HP ProLiant BL460c G6