Benjamin Weaver
2011-Aug-29 12:37 UTC
[Xen-users] with heavy VM IO, clocksource causes random dom0 reboots
On Debian Squeeze (2.6.32-5) running Xen 4.0, I have created 2 Ubuntu Lucid Lynx (Ubuntu 10.04) vms. The vms, in a stress test, pass a large file between them via nfs file sharing. A previous entry in this forum helped to establish that some ethernet cards improve VM IO performance. However, our box installed with better intel nics is still rebooting under heavy VM IO loads. The kernel call trace is copied below. Note the first line, a reference to PV clocksource note: these bug notices http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603727 and this suggestion page: http://wiki.xensource.com/xenwiki/xenpm#head-253cbbe6cf12fa31e10022610cd7090aa980921f These pages, (the last, in particular) describe IO jitter related to clocksource. This would be consistent with the problems with interrupts we have been experiencing. 1. I have been told to set clocksource=pit on dom0. How and where, in Squeeze is this clocksource parameter set? At compile time, or at boot time? And if at boot time, in /etc/default/grub? How exactly, if in /etc/default/grub, is this parameter set. Lines in that file begin with capital letters: GRUB_TIMEOUT=5 GRUB_CMDLINE_XEN="com1=9600,8n1 console=com1,vga noreboot" GRUB_CMDLINE_LINUX="console=tty0 console=hvc0" 2. Is a package necessary for pit ? 3. should clocksource = pit be set on domUs as well? Aug 29 06:28:53 vm2 kernel: [53400.204119] updatedb.mloc D 0000000000000000 0 3605 3601 0x00000000 Aug 29 06:28:53 vm2 kernel: [53400.204125] ffff8802ed071530 0000000000000286 0000000000000000 ffff88009babd260 Aug 29 06:28:53 vm2 kernel: [53400.204132] ffff8802e95f0000 0000000000000088 000000000000f9e0 ffff88009e399fd8 Aug 29 06:28:53 vm2 kernel: [53400.204138] 0000000000015780 0000000000015780 ffff8802e95e1530 ffff8802e95e1828 Aug 29 06:28:53 vm2 kernel: [53400.204144] Call Trace: Aug 29 06:28:53 vm2 kernel: [53400.204154] [<ffffffff8102ddcc>] ? pvclock_clocksource_read+0x3a/0x8b Aug 29 06:28:53 vm2 kernel: [53400.204160] [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40 Aug 29 06:28:53 vm2 kernel: [53400.204166] [<ffffffff8130c16a>] ? io_schedule+0x73/0xb7 Aug 29 06:28:53 vm2 kernel: [53400.204169] [<ffffffff8110f1d5>] ? sync_buffer+0x3b/0x40 Aug 29 06:28:53 vm2 kernel: [53400.204174] [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe Aug 29 06:28:53 vm2 kernel: [53400.204178] [<ffffffff8130c677>] ? __wait_on_bit+0x41/0x70 Aug 29 06:28:53 vm2 kernel: [53400.204181] [<ffffffff8110f19a>] ? sync_buffer+0x0/0x40 Aug 29 06:28:53 vm2 kernel: [53400.204185] [<ffffffff8130c711>] ? out_of_line_wait_on_bit+0x6b/0x77 Aug 29 06:28:53 vm2 kernel: [53400.204190] [<ffffffff81065f34>] ? wake_bit_function+0x0/0x23 Aug 29 06:28:53 vm2 kernel: [53400.204202] [<ffffffffa04bf824>] ? ocfs2_read_blocks+0x55d/0x6c2 [ocfs2] Aug 29 06:28:53 vm2 kernel: [53400.204208] [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 29 06:28:53 vm2 kernel: [53400.204217] [<ffffffffa04db9a1>] ? ocfs2_validate_inode_block+0x0/0x1ab [ocfs2] Aug 29 06:28:53 vm2 kernel: [53400.204226] [<ffffffffa04db58c>] ? ocfs2_read_inode_block_full+0x37/0x51 [ocfs2] Aug 29 06:28:53 vm2 kernel: [53400.204235] [<ffffffffa04d135b>] ? ocfs2_inode_lock_atime+0x73/0x23f [ocfs2] Aug 29 06:28:53 vm2 kernel: [53400.204243] [<ffffffffa04c8564>] ? ocfs2_dir_foreach_blk+0x48/0x435 [ocfs2] Aug 29 06:28:53 vm2 kernel: [53400.204248] [<ffffffff810fc34c>] ? filldir+0x0/0xb7 Aug 29 06:28:53 vm2 kernel: [53400.204257] [<ffffffffa04d13d5>] ? ocfs2_inode_lock_atime+0xed/0x23f [ocfs2] Aug 29 06:28:53 vm2 kernel: [53400.204261] [<ffffffff810fc34c>] ? filldir+0x0/0xb7 Aug 29 06:28:53 vm2 kernel: [53400.204269] [<ffffffffa04c9af3>] ? ocfs2_readdir+0x161/0x1d0 [ocfs2] Aug 29 06:28:53 vm2 kernel: [53400.204273] [<ffffffff810fc34c>] ? filldir+0x0/0xb7 Aug 29 06:28:53 vm2 kernel: [53400.204277] [<ffffffff810fc51c>] ? vfs_readdir+0x75/0xa7 Aug 29 06:28:53 vm2 kernel: [53400.204281] [<ffffffff810fc686>] ? sys_getdents+0x7a/0xc7 Aug 29 06:28:53 vm2 kernel: [53400.204285] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Todd Deshane
2011-Aug-30 03:46 UTC
Re: [Xen-users] with heavy VM IO, clocksource causes random dom0 reboots
On Mon, Aug 29, 2011 at 8:37 AM, Benjamin Weaver <benjamin.weaver@phon.ox.ac.uk> wrote:> On Debian Squeeze (2.6.32-5) running Xen 4.0, I have created 2 Ubuntu Lucid > Lynx (Ubuntu 10.04) vms. The vms, in a stress test, pass a large file > between them via nfs file sharing. A previous entry in this forum helped to > establish that some ethernet cards improve VM IO performance. > > However, our box installed with better intel nics is still rebooting under > heavy VM IO loads. The kernel call trace is copied below. Note the first > line, a reference to PV clocksource > > note: these bug notices > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603727 > > and this suggestion page: > > http://wiki.xensource.com/xenwiki/xenpm#head-253cbbe6cf12fa31e10022610cd7090aa980921f > > > These pages, (the last, in particular) describe IO jitter related to > clocksource. This would be consistent with the problems with interrupts we > have been experiencing. > > > 1. I have been told to set clocksource=pit on dom0. How and where, in > Squeeze is this clocksource parameter set? At compile time, or at boot time? > And if at boot time, in /etc/default/grub? > > How exactly, if in /etc/default/grub, is this parameter set. Lines in that > file begin with capital letters: > > GRUB_TIMEOUT=5 > GRUB_CMDLINE_XEN="com1=9600,8n1 console=com1,vga noreboot"change this line to: GRUB_CMDLINE_XEN="com1=9600,8n1 console=com1,vga clocksource=pit noreboot"> GRUB_CMDLINE_LINUX="console=tty0 console=hvc0" > > > 2. Is a package necessary for pit ? >You shouldn''t. Do like the wiki says and check xm dmesg | grep -i timer> 3. should clocksource = pit be set on domUs as well? >Based on what I am reading from the wiki. It looks like it is a Xen hypervisor setting, so no. Hope that helps. Thanks, Todd -- Todd Deshane http://www.linkedin.com/in/deshantm http://www.xen.org/products/cloudxen.html http://runningxen.com/ _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Seemingly Similar Threads
- Bug#603727: xen-hypervisor-4.0-amd64: i386 Dom0 crashes after doing some I/O on local storage (software Raid1 on SAS-drives with mpt2sas driver)
- Weird problem with rsync 3.0.9
- SuSE8.0 and SAMBA crash
- [PATCH] btrfs: fix d_off in the first dirent
- Strange situation with openssl and kernel