Pim van Riezen
2011-Feb-08 12:22 UTC
[Xen-devel] Xen hypervisor external denial of service vulnerability?
Good day, In a scenario where we saw several dom0 nodes fall down due to a sustained SYN flood to a network range, we have been investigating issues with Xen under high network load. The results so far seem to be not so pretty. We recreated a lab setup that can reproduce the scenario with some reliability, although it takes a bit of trial-and-error to get crashes out of it. SETUP: 2x Dell R710 - 4x 6core AMD Opteron 6174 - 128GB memory - Broadcom BCM5709 - LSI SAS2008 rev.02 - Emulex Saturn-X FC adapter - CentOS 5.5 w/ gitco Xen 4.0.1 1x NexSan SATABeast FC raid 1x Brocade FC switch 5x Flood sources (Dell R210) The dom0 machines are loaded with 50 PV images, coupled to a LVM partition on FC, half of which are set to start compiling a kernel in rc.local. There are also 2 HVM images on both machines doing the same. Networking for all guests is configured in the bridging setup, attached to a specific vlan that arrives tagged at the Dom0. So vifs end up in xenbr86, née xenbr0.86. Grub conf for the dom0s: kernel /xen.gz-4.0.1 dom0_mem=2048M max_cstate=0 cpuidle=off module /vmlinuz-2.6.18-194.11.4.el5xen ro root=LABEL=/ elevator=deadline xencons=tty The flooding is always done to either the entire IP range the guests live in (in case of SYN floods) or a sub-range of about 50 IPs (in case of UDP floods), with random source addresses. ISSUE: When the pps rate gets into the insane territory (gigabit link saturated or near-saturated), the machine seems to start losing track of interrupts. Depending on the severity, this leads to CPU soft lockups on random cores. Under more dire circumstances, other hardware attached to the PCI bus starts timing out making the kernel lose track of storage. Usually the SAS-controller is the first to go, but I''ve also seen timeouts on the FC controller. THINGS TRIED: 1. Raising the broadcom RX ring from 255 to 3000. No noticable effects. 2. Downgrading to Xen 3.4.3. No effect. 3. Different Dell BIOS versions. No effect. 4. Lowering number of guests -> effects get less serious. Not a serious option. 5. Using ipt_LIMIT in the FORWARD table set to 10000/s -> effects get less serious when dealing with tcp SYN attacks. No effect when dealing with 28byte UDP attacks. 6. Disabling HPET as per http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html with cpuidle=0 and disabling irqbalance -> effects get less serious. The changes in 6 stop the machine from completely crapping itself, but I still get soft lockups, although they seem to be limited to one of these two paths: [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 [<ffffffff8027458e>] smp_call_function_many+0x38/0x4c [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 [<ffffffff80274688>] smp_call_function+0x4e/0x5e [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 [<ffffffff8028fdd7>] on_each_cpu+0x10/0x2a [<ffffffff802d7428>] kill_bdev+0x1b/0x30 [<ffffffff802d7a47>] __blkdev_put+0x4f/0x169 [<ffffffff80213492>] __fput+0xd3/0x1bd [<ffffffff802243cb>] filp_close+0x5c/0x64 [<ffffffff8021e5d0>] sys_close+0x88/0xbd [<ffffffff802602f9>] tracesys+0xab/0xb6 and [<ffffffff8026f4f3>] raw_safe_halt+0x84/0xa8 [<ffffffff8026ca88>] xen_idle+0x38/0x4a [<ffffffff8024af6c>] cpu_idle+0x97/0xba [<ffffffff8064eb0f>] start_kernel+0x21f/0x224 [<ffffffff8064e1e5>] _sinittext+0x1e5/0x1eb In some scenarios, an application running on the dom0 that relies on pthread_cond_timedwait seems to be hanging in all its thread on that specific call. This may be related to some timing going wonky during the attack, not sure. Is there anything more we can try? Cheers, Pim van Riezen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pim van Riezen
2011-Feb-08 12:39 UTC
Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?
Addendum: The Dells are actually R715. The dom0 kernel is actually vmlinuz-2.6.18-194.32.1.el5xen Cheers, Pim On Feb 8, 2011, at 13:22 , Pim van Riezen wrote:> Good day, > > In a scenario where we saw several dom0 nodes fall down due to a sustained SYN flood to a network range, we have been investigating issues with Xen under high network load. The results so far seem to be not so pretty. We recreated a lab setup that can reproduce the scenario with some reliability, although it takes a bit of trial-and-error to get crashes out of it. > > SETUP: > 2x Dell R710 > - 4x 6core AMD Opteron 6174 > - 128GB memory > - Broadcom BCM5709 > - LSI SAS2008 rev.02 > - Emulex Saturn-X FC adapter > - CentOS 5.5 w/ gitco Xen 4.0.1 > > 1x NexSan SATABeast FC raid > 1x Brocade FC switch > 5x Flood sources (Dell R210) > > The dom0 machines are loaded with 50 PV images, coupled to a LVM partition on FC, half of which are set to start compiling a kernel in rc.local. There are also 2 HVM images on both machines doing the same. > > Networking for all guests is configured in the bridging setup, attached to a specific vlan that arrives tagged at the Dom0. So vifs end up in xenbr86, née xenbr0.86. > > Grub conf for the dom0s: > > kernel /xen.gz-4.0.1 dom0_mem=2048M max_cstate=0 cpuidle=off > module /vmlinuz-2.6.18-194.11.4.el5xen ro root=LABEL=/ elevator=deadline > xencons=tty > > The flooding is always done to either the entire IP range the guests live in (in case of SYN floods) or a sub-range of about 50 IPs (in case of UDP floods), with random source addresses. > > ISSUE: > When the pps rate gets into the insane territory (gigabit link saturated or near-saturated), the machine seems to start losing track of interrupts. Depending on the severity, this leads to CPU soft lockups on random cores. Under more dire circumstances, other hardware attached to the PCI bus starts timing out making the kernel lose track of storage. Usually the SAS-controller is the first to go, but I''ve also seen timeouts on the FC controller. > > THINGS TRIED: > 1. Raising the broadcom RX ring from 255 to 3000. No noticable effects. > 2. Downgrading to Xen 3.4.3. No effect. > 3. Different Dell BIOS versions. No effect. > 4. Lowering number of guests -> effects get less serious. Not a serious option. > 5. Using ipt_LIMIT in the FORWARD table set to 10000/s -> effects get less serious when dealing with tcp SYN attacks. No effect when dealing with 28byte UDP attacks. > 6. Disabling HPET as per http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html with cpuidle=0 and disabling irqbalance -> effects get less serious. > > The changes in 6 stop the machine from completely crapping itself, but I still get soft lockups, although they seem to be limited to one of these two paths: > > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > [<ffffffff8027458e>] smp_call_function_many+0x38/0x4c > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > [<ffffffff80274688>] smp_call_function+0x4e/0x5e > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > [<ffffffff8028fdd7>] on_each_cpu+0x10/0x2a > [<ffffffff802d7428>] kill_bdev+0x1b/0x30 > [<ffffffff802d7a47>] __blkdev_put+0x4f/0x169 > [<ffffffff80213492>] __fput+0xd3/0x1bd > [<ffffffff802243cb>] filp_close+0x5c/0x64 > [<ffffffff8021e5d0>] sys_close+0x88/0xbd > [<ffffffff802602f9>] tracesys+0xab/0xb6 > > and > > [<ffffffff8026f4f3>] raw_safe_halt+0x84/0xa8 > [<ffffffff8026ca88>] xen_idle+0x38/0x4a > [<ffffffff8024af6c>] cpu_idle+0x97/0xba > [<ffffffff8064eb0f>] start_kernel+0x21f/0x224 > [<ffffffff8064e1e5>] _sinittext+0x1e5/0x1eb > > In some scenarios, an application running on the dom0 that relies on pthread_cond_timedwait seems to be hanging in all its thread on that specific call. This may be related to some timing going wonky during the attack, not sure. > > Is there anything more we can try? > > Cheers, > Pim van Riezen > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2011-Feb-08 15:53 UTC
Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?
On Tue, Feb 08, 2011 at 01:39:06PM +0100, Pim van Riezen wrote:> Addendum: > > The Dells are actually R715. > The dom0 kernel is actually vmlinuz-2.6.18-194.32.1.el5xen >Have you gived dom0 fixed amount of memory, and also increase dom0 vcpu weights so that dom0 will always get enough cpu time to take care of things? http://wiki.xensource.com/xenwiki/XenBestPractices -- Pasi> Cheers, > Pim > > On Feb 8, 2011, at 13:22 , Pim van Riezen wrote: > > > Good day, > > > > In a scenario where we saw several dom0 nodes fall down due to a sustained SYN flood to a network range, we have been investigating issues with Xen under high network load. The results so far seem to be not so pretty. We recreated a lab setup that can reproduce the scenario with some reliability, although it takes a bit of trial-and-error to get crashes out of it. > > > > SETUP: > > 2x Dell R710 > > - 4x 6core AMD Opteron 6174 > > - 128GB memory > > - Broadcom BCM5709 > > - LSI SAS2008 rev.02 > > - Emulex Saturn-X FC adapter > > - CentOS 5.5 w/ gitco Xen 4.0.1 > > > > 1x NexSan SATABeast FC raid > > 1x Brocade FC switch > > 5x Flood sources (Dell R210) > > > > The dom0 machines are loaded with 50 PV images, coupled to a LVM partition on FC, half of which are set to start compiling a kernel in rc.local. There are also 2 HVM images on both machines doing the same. > > > > Networking for all guests is configured in the bridging setup, attached to a specific vlan that arrives tagged at the Dom0. So vifs end up in xenbr86, née xenbr0.86. > > > > Grub conf for the dom0s: > > > > kernel /xen.gz-4.0.1 dom0_mem=2048M max_cstate=0 cpuidle=off > > module /vmlinuz-2.6.18-194.11.4.el5xen ro root=LABEL=/ elevator=deadline > > xencons=tty > > > > The flooding is always done to either the entire IP range the guests live in (in case of SYN floods) or a sub-range of about 50 IPs (in case of UDP floods), with random source addresses. > > > > ISSUE: > > When the pps rate gets into the insane territory (gigabit link saturated or near-saturated), the machine seems to start losing track of interrupts. Depending on the severity, this leads to CPU soft lockups on random cores. Under more dire circumstances, other hardware attached to the PCI bus starts timing out making the kernel lose track of storage. Usually the SAS-controller is the first to go, but I''ve also seen timeouts on the FC controller. > > > > THINGS TRIED: > > 1. Raising the broadcom RX ring from 255 to 3000. No noticable effects. > > 2. Downgrading to Xen 3.4.3. No effect. > > 3. Different Dell BIOS versions. No effect. > > 4. Lowering number of guests -> effects get less serious. Not a serious option. > > 5. Using ipt_LIMIT in the FORWARD table set to 10000/s -> effects get less serious when dealing with tcp SYN attacks. No effect when dealing with 28byte UDP attacks. > > 6. Disabling HPET as per http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html with cpuidle=0 and disabling irqbalance -> effects get less serious. > > > > The changes in 6 stop the machine from completely crapping itself, but I still get soft lockups, although they seem to be limited to one of these two paths: > > > > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > > [<ffffffff8027458e>] smp_call_function_many+0x38/0x4c > > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > > [<ffffffff80274688>] smp_call_function+0x4e/0x5e > > [<ffffffff8023f830>] invalidate_bh_lru+0x0/0x42 > > [<ffffffff8028fdd7>] on_each_cpu+0x10/0x2a > > [<ffffffff802d7428>] kill_bdev+0x1b/0x30 > > [<ffffffff802d7a47>] __blkdev_put+0x4f/0x169 > > [<ffffffff80213492>] __fput+0xd3/0x1bd > > [<ffffffff802243cb>] filp_close+0x5c/0x64 > > [<ffffffff8021e5d0>] sys_close+0x88/0xbd > > [<ffffffff802602f9>] tracesys+0xab/0xb6 > > > > and > > > > [<ffffffff8026f4f3>] raw_safe_halt+0x84/0xa8 > > [<ffffffff8026ca88>] xen_idle+0x38/0x4a > > [<ffffffff8024af6c>] cpu_idle+0x97/0xba > > [<ffffffff8064eb0f>] start_kernel+0x21f/0x224 > > [<ffffffff8064e1e5>] _sinittext+0x1e5/0x1eb > > > > In some scenarios, an application running on the dom0 that relies on pthread_cond_timedwait seems to be hanging in all its thread on that specific call. This may be related to some timing going wonky during the attack, not sure. > > > > Is there anything more we can try? > > > > Cheers, > > Pim van Riezen > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pim van Riezen
2011-Feb-08 16:10 UTC
Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?
On Feb 8, 2011, at 16:53 , Pasi Kärkkäinen wrote:> On Tue, Feb 08, 2011 at 01:39:06PM +0100, Pim van Riezen wrote: >> Addendum: >> >> The Dells are actually R715. >> The dom0 kernel is actually vmlinuz-2.6.18-194.32.1.el5xen >> > > Have you gived dom0 fixed amount of memory, and also increase dom0 vcpu weights > so that dom0 will always get enough cpu time to take care of things?Fixed dom0_mem, yes. Weighting, seems not, but just did a testrun with the dom0 weight set to 512. I got 2 task blocks on one node (that seems a new development) and the same plus another raw_safe_halt soft lockup on the other: Feb 8 17:07:50 handel kernel: INFO: task syslogd:9120 blocked for more than 120 seconds. Feb 8 17:07:50 handel kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 17:07:50 handel kernel: syslogd D 00000ede44a71082 0 9120 1 9123 9089 (NOTLB) Feb 8 17:07:50 handel kernel: ffff880078087d88 0000000000000282 0000000000000000 0000000000000001 Feb 8 17:07:50 handel kernel: 000000000000000a ffff8800798d70c0 ffff8800000320c0 0000000000022b41 Feb 8 17:07:50 handel kernel: ffff8800798d72a8 0000000000000000 Feb 8 17:07:50 handel kernel: Call Trace: Feb 8 17:07:50 handel kernel: [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 Feb 8 17:07:50 handel kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 17:07:50 handel kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff Feb 8 17:07:50 handel kernel: [<ffffffff8023119d>] __writeback_single_inode+0x1e9/0x328 Feb 8 17:07:50 handel kernel: [<ffffffff802d330d>] do_readv_writev+0x26e/0x291 Feb 8 17:07:51 handel kernel: [<ffffffff802e5b8b>] sync_inode+0x24/0x33 Feb 8 17:07:51 handel kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc Feb 8 17:07:51 handel kernel: [<ffffffff80251e07>] do_fsync+0x52/0xa4 Feb 8 17:07:51 handel kernel: [<ffffffff802d3b11>] __do_fsync+0x23/0x36 Feb 8 17:07:51 handel kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6 Feb 8 17:07:51 handel kernel: Feb 8 17:07:51 handel kernel: INFO: task syslogd:9120 blocked for more than 120 seconds. Feb 8 17:07:51 handel kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 17:07:51 handel kernel: syslogd D 00000ede44a71082 0 9120 1 9123 9089 (NOTLB) Feb 8 17:07:51 handel kernel: ffff880078087d88 0000000000000282 0000000000000000 0000000000000001 Feb 8 17:07:51 handel kernel: 000000000000000a ffff8800798d70c0 ffff8800000320c0 0000000000022b41 Feb 8 17:07:51 handel kernel: ffff8800798d72a8 0000000000000000 Feb 8 17:07:51 handel kernel: Call Trace: Feb 8 17:07:51 handel kernel: [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 Feb 8 17:07:51 handel kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 17:07:51 handel kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff Feb 8 17:07:51 handel kernel: [<ffffffff8023119d>] __writeback_single_inode+0x1e9/0x328 Feb 8 17:07:51 handel kernel: [<ffffffff802d330d>] do_readv_writev+0x26e/0x291 Feb 8 17:07:51 handel kernel: [<ffffffff802e5b8b>] sync_inode+0x24/0x33 Feb 8 17:07:51 handel kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc Feb 8 17:07:51 handel kernel: [<ffffffff80251e07>] do_fsync+0x52/0xa4 Feb 8 17:07:51 handel kernel: [<ffffffff802d3b11>] __do_fsync+0x23/0x36 Feb 8 17:07:51 handel kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6 Feb 8 17:07:51 handel kernel: Feb 8 17:03:45 telemann kernel: INFO: task syslogd:7704 blocked for more than 120 seconds. Feb 8 17:03:45 telemann kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 17:03:45 telemann kernel: syslogd D 00000ee120ea024a 0 7704 1 7707 7673 (NOTLB) Feb 8 17:03:45 telemann kernel: ffff8800754dfd88 0000000000000282 0000000000000000 0000000000000001 Feb 8 17:03:45 telemann kernel: 000000000000000a ffff88007e65e860 ffff88000001e820 0000000000021814 Feb 8 17:03:45 telemann kernel: ffff88007e65ea48 0000000000000000 Feb 8 17:03:45 telemann kernel: Call Trace: Feb 8 17:03:45 telemann kernel: [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 Feb 8 17:03:45 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 17:03:45 telemann kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff Feb 8 17:03:45 telemann kernel: [<ffffffff8023119d>] __writeback_single_inode+0x1e9/0x328 Feb 8 17:03:45 telemann kernel: [<ffffffff802d330d>] do_readv_writev+0x26e/0x291 Feb 8 17:07:35 telemann kernel: [<ffffffff802e5b8b>] sync_inode+0x24/0x33 Feb 8 17:07:35 telemann kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc Feb 8 17:07:35 telemann kernel: [<ffffffff80251e07>] do_fsync+0x52/0xa4 Feb 8 17:07:35 telemann kernel: [<ffffffff802d3b11>] __do_fsync+0x23/0x36 Feb 8 17:07:35 telemann kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6 Feb 8 17:07:35 telemann kernel: Feb 8 17:07:35 telemann kernel: INFO: task syslogd:7704 blocked for more than 120 seconds. Feb 8 17:07:35 telemann kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 17:07:35 telemann kernel: syslogd D 00000ee120ea024a 0 7704 1 7707 7673 (NOTLB) Feb 8 17:07:35 telemann kernel: ffff8800754dfd88 0000000000000282 0000000000000000 0000000000000001 Feb 8 17:07:35 telemann kernel: 000000000000000a ffff88007e65e860 ffff88000001e820 0000000000021814 Feb 8 17:07:35 telemann kernel: ffff88007e65ea48 0000000000000000 Feb 8 17:07:35 telemann kernel: Call Trace: Feb 8 17:07:35 telemann kernel: [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 Feb 8 17:07:35 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 17:07:35 telemann kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff Feb 8 17:07:35 telemann kernel: [<ffffffff8023119d>] __writeback_single_inode+0x1e9/0x328 Feb 8 17:07:35 telemann kernel: [<ffffffff802d330d>] do_readv_writev+0x26e/0x291 Feb 8 17:07:35 telemann kernel: [<ffffffff802e5b8b>] sync_inode+0x24/0x33 Feb 8 17:07:35 telemann kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc Feb 8 17:07:35 telemann kernel: [<ffffffff80251e07>] do_fsync+0x52/0xa4 Feb 8 17:07:35 telemann kernel: [<ffffffff802d3b11>] __do_fsync+0x23/0x36 Feb 8 17:07:35 telemann kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6 Feb 8 17:07:35 telemann kernel: Feb 8 17:07:35 telemann kernel: BUG: soft lockup - CPU#0 stuck for 287s! [swapper:0] Feb 8 17:07:35 telemann kernel: CPU 0: Feb 8 17:07:35 telemann kernel: Modules linked in: tun 8021q netloop netbk blktap blkbk bridge ipmi_devintf ipmi_si ipmi_msghandler dell_rbu autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat ip_nat ip_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi ac parport_pc lp parport 8250_pnp sr_mod cdrom sg pcspkr i2c_piix4 serio_raw 8250 i2c_core serial_core bnx2 amd64_edac_mod edac_mc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc scsi_transport_fc ahci libata shpchp mpt2sas scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Feb 8 17:07:35 telemann kernel: Pid: 0, comm: swapper Not tainted 2.6.18-194.32.1.el5xen #1 Feb 8 17:07:36 telemann kernel: RIP: e030:[<ffffffff802063aa>] [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 Feb 8 17:07:36 telemann kernel: RSP: e02b:ffffffff80645f58 EFLAGS: 00000246 Feb 8 17:07:36 telemann kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff802063aa Feb 8 17:07:36 telemann kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001 Feb 8 17:07:36 telemann kernel: RBP: 0000000000000000 R08: 0000000000000038 R09: 00000001003cd738 Feb 8 17:07:36 telemann kernel: R10: ffff88007e6c3b00 R11: 0000000000000246 R12: 0000000000000000 Feb 8 17:07:36 telemann kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Feb 8 17:07:36 telemann kernel: FS: 00002ad0c0f9ba30(0000) GS:ffffffff805d3000(0000) knlGS:0000000000000000 Feb 8 17:07:36 telemann kernel: CS: e033 DS: 0000 ES: 0000 Feb 8 17:07:36 telemann kernel: Feb 8 17:07:36 telemann kernel: Call Trace: Feb 8 17:07:36 telemann kernel: [<ffffffff8026f4f3>] raw_safe_halt+0x84/0xa8 Feb 8 17:07:36 telemann kernel: [<ffffffff8026ca88>] xen_idle+0x38/0x4a Feb 8 17:07:36 telemann kernel: [<ffffffff8024af6c>] cpu_idle+0x97/0xba Feb 8 17:07:36 telemann kernel: [<ffffffff8064eb0f>] start_kernel+0x21f/0x224 Feb 8 17:07:36 telemann kernel: [<ffffffff8064e1e5>] _sinittext+0x1e5/0x1eb Feb 8 17:07:36 telemann kernel: Cheers, Pim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pim van Riezen
2011-Feb-08 16:28 UTC
Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?
On Feb 8, 2011, at 17:10 , Pim van Riezen wrote:> > On Feb 8, 2011, at 16:53 , Pasi Kärkkäinen wrote: > >> On Tue, Feb 08, 2011 at 01:39:06PM +0100, Pim van Riezen wrote: >>> Addendum: >>> >>> The Dells are actually R715. >>> The dom0 kernel is actually vmlinuz-2.6.18-194.32.1.el5xen >>> >> >> Have you gived dom0 fixed amount of memory, and also increase dom0 vcpu weights >> so that dom0 will always get enough cpu time to take care of things? > > Fixed dom0_mem, yes. > Weighting, seems not, but just did a testrun with the dom0 weight set to 512. I got 2 task blocks on one node (that seems a new development) and the same plus another raw_safe_halt soft lockup on the other:Also tried pinning 2 cpus for domain-0. Still soft lockups. Pi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2011-Feb-08 16:51 UTC
Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?
On Tue, Feb 08, 2011 at 05:28:35PM +0100, Pim van Riezen wrote:> On Feb 8, 2011, at 17:10 , Pim van Riezen wrote: > > > > > On Feb 8, 2011, at 16:53 , Pasi Kärkkäinen wrote: > > > >> On Tue, Feb 08, 2011 at 01:39:06PM +0100, Pim van Riezen wrote: > >>> Addendum: > >>> > >>> The Dells are actually R715. > >>> The dom0 kernel is actually vmlinuz-2.6.18-194.32.1.el5xen > >>> > >> > >> Have you gived dom0 fixed amount of memory, and also increase dom0 vcpu weights > >> so that dom0 will always get enough cpu time to take care of things? > > > > Fixed dom0_mem, yes. > > Weighting, seems not, but just did a testrun with the dom0 weight set to 512. I got 2 task blocks on one node (that seems a new development) and the same plus another raw_safe_halt soft lockup on the other: > > Also tried pinning 2 cpus for domain-0. Still soft lockups. >Did you also make sure VMs don''t use those 2 pcpus dedicated for dom0? You have to explicitly configure each VM not to use those pcpus. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pim van Riezen
2011-Feb-08 17:08 UTC
Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?
On Feb 8, 2011, at 17:51 , Pasi Kärkkäinen wrote:> On Tue, Feb 08, 2011 at 05:28:35PM +0100, Pim van Riezen wrote: >> On Feb 8, 2011, at 17:10 , Pim van Riezen wrote: >> >>> >>> On Feb 8, 2011, at 16:53 , Pasi Kärkkäinen wrote: >>> >>>> Have you gived dom0 fixed amount of memory, and also increase dom0 vcpu weights >>>> so that dom0 will always get enough cpu time to take care of things? >>> >>> Fixed dom0_mem, yes. >>> Weighting, seems not, but just did a testrun with the dom0 weight set to 512. I got 2 task blocks on one node (that seems a new development) and the same plus another raw_safe_halt soft lockup on the other: >> >> Also tried pinning 2 cpus for domain-0. Still soft lockups. >> > > Did you also make sure VMs don''t use those 2 pcpus dedicated for dom0? > You have to explicitly configure each VM not to use those pcpus.That seems to have done the trick. Added to xen command line: dom0_max_vcpus=2 dom0_vcpus_pin Then tested after running this command: xm list | ( read && read && cat ) | cut -f1 -d" " | while read guest; do xm vcpu-pin $guest 0 2-23; done No soft-lockups. Will do a longer test now. If something new comes up I will report. Cheers, Pim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pim van Riezen
2011-Feb-08 17:21 UTC
Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?
On Feb 8, 2011, at 18:08 , Pim van Riezen wrote:> On Feb 8, 2011, at 17:51 , Pasi Kärkkäinen wrote: >> >> Did you also make sure VMs don''t use those 2 pcpus dedicated for dom0? >> You have to explicitly configure each VM not to use those pcpus. > > That seems to have done the trick.Alas, I was too soon in drawing a conclusion. After a new 10 minute run: Feb 8 18:12:30 telemann kernel: INFO: task bash:12225 blocked for more than 120 seconds. Feb 8 18:12:30 telemann kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 18:12:30 telemann kernel: bash D ffff88006ac7bd08 0 12225 1 8260 (L-TLB) Feb 8 18:12:30 telemann kernel: ffff88006ac7bb88 0000000000000246 0000000300000000 ffff88007ec3a6d8 Feb 8 18:12:30 telemann kernel: 0000000000000009 ffff88006c16e820 ffff88007a5a9080 000000000008f03e Feb 8 18:12:30 telemann kernel: ffff88006c16ea08 ffffffff8022f10c Feb 8 18:12:30 telemann kernel: Call Trace: Feb 8 18:12:30 telemann kernel: [<ffffffff8022f10c>] __wake_up+0x38/0x4f Feb 8 18:12:30 telemann kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff Feb 8 18:12:30 telemann kernel: [<ffffffff802994d1>] flush_cpu_workqueue+0x83/0xb5 Feb 8 18:12:30 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:12:30 telemann kernel: [<ffffffff80263914>] mutex_lock+0xd/0x1d Feb 8 18:12:30 telemann kernel: [<ffffffff80299563>] flush_workqueue+0x60/0x87 Feb 8 18:12:41 telemann kernel: [<ffffffff80394af5>] release_dev+0x503/0x67b Feb 8 18:12:55 telemann kernel: [<ffffffff8020b860>] release_pages+0x158/0x165 Feb 8 18:13:09 telemann kernel: [<ffffffff80255821>] tty_release+0x11/0x1a Feb 8 18:13:23 telemann kernel: [<ffffffff80213492>] __fput+0xd3/0x1bd Feb 8 18:13:38 telemann kernel: [<ffffffff802243cb>] filp_close+0x5c/0x64 Feb 8 18:13:51 telemann kernel: [<ffffffff8023a392>] put_files_struct+0x63/0xae Feb 8 18:14:06 telemann kernel: [<ffffffff802160cd>] do_exit+0x31d/0x902 Feb 8 18:14:19 telemann kernel: [<ffffffff8024ae4d>] cpuset_exit+0x0/0x88 Feb 8 18:14:33 telemann kernel: [<ffffffff8022b920>] get_signal_to_deliver+0x477/0x4aa Feb 8 18:14:49 telemann kernel: [<ffffffff8025d19e>] do_notify_resume+0x9c/0x7ba Feb 8 18:15:01 telemann kernel: [<ffffffff80294ea1>] __group_send_sig_info+0xb9/0xc8 Feb 8 18:15:08 telemann kernel: [<ffffffff8025cb0b>] group_send_sig_info+0x62/0x6f Feb 8 18:15:22 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:15:37 telemann kernel: [<ffffffff802afd73>] audit_syscall_entry+0x180/0x1b3 Feb 8 18:15:49 telemann kernel: [<ffffffff80245a48>] sys_rt_sigreturn+0x327/0x35a Feb 8 18:16:03 telemann kernel: [<ffffffff802b0175>] audit_syscall_exit+0x336/0x362 Feb 8 18:16:17 telemann kernel: [<ffffffff8026042c>] int_signal+0x12/0x17 Feb 8 18:16:31 telemann kernel: Feb 8 18:16:44 telemann kernel: INFO: task bash:12225 blocked for more than 120 seconds. Feb 8 18:16:58 telemann kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 18:17:12 telemann kernel: bash D ffff88006ac7bd08 0 12225 1 8260 (L-TLB) Feb 8 18:17:26 telemann kernel: ffff88006ac7bb88 0000000000000246 0000000300000000 ffff88007ec3a6d8 Feb 8 18:17:39 telemann kernel: 0000000000000009 ffff88006c16e820 ffff88007a5a9080 000000000008f03e Feb 8 18:17:54 telemann kernel: ffff88006c16ea08 ffffffff8022f10c Feb 8 18:18:08 telemann kernel: Call Trace: Feb 8 18:18:21 telemann kernel: [<ffffffff8022f10c>] __wake_up+0x38/0x4f Feb 8 18:18:34 telemann kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff Feb 8 18:18:47 telemann kernel: [<ffffffff802994d1>] flush_cpu_workqueue+0x83/0xb5 Feb 8 18:18:58 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:18:58 telemann kernel: [<ffffffff80263914>] mutex_lock+0xd/0x1d Feb 8 18:18:58 telemann kernel: [<ffffffff80299563>] flush_workqueue+0x60/0x87 Feb 8 18:18:58 telemann kernel: [<ffffffff80394af5>] release_dev+0x503/0x67b Feb 8 18:18:58 telemann kernel: [<ffffffff8020b860>] release_pages+0x158/0x165 Feb 8 18:18:58 telemann kernel: [<ffffffff80255821>] tty_release+0x11/0x1a Feb 8 18:18:58 telemann kernel: [<ffffffff80213492>] __fput+0xd3/0x1bd Feb 8 18:18:58 telemann kernel: [<ffffffff802243cb>] filp_close+0x5c/0x64 Feb 8 18:18:58 telemann kernel: [<ffffffff8023a392>] put_files_struct+0x63/0xae Feb 8 18:18:58 telemann kernel: [<ffffffff802160cd>] do_exit+0x31d/0x902 Feb 8 18:18:58 telemann kernel: [<ffffffff8024ae4d>] cpuset_exit+0x0/0x88 Feb 8 18:18:58 telemann kernel: [<ffffffff8022b920>] get_signal_to_deliver+0x477/0x4aa Feb 8 18:18:58 telemann kernel: [<ffffffff8025d19e>] do_notify_resume+0x9c/0x7ba Feb 8 18:18:58 telemann kernel: [<ffffffff80294ea1>] __group_send_sig_info+0xb9/0xc8 Feb 8 18:18:58 telemann kernel: [<ffffffff8025cb0b>] group_send_sig_info+0x62/0x6f Feb 8 18:18:58 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:18:58 telemann kernel: [<ffffffff802afd73>] audit_syscall_entry+0x180/0x1b3 Feb 8 18:18:58 telemann kernel: [<ffffffff80245a48>] sys_rt_sigreturn+0x327/0x35a Feb 8 18:18:58 telemann kernel: [<ffffffff802b0175>] audit_syscall_exit+0x336/0x362 Feb 8 18:18:59 telemann kernel: [<ffffffff8026042c>] int_signal+0x12/0x17 Feb 8 18:18:59 telemann kernel: Feb 8 18:18:59 telemann kernel: INFO: task bash:12225 blocked for more than 120 seconds. Feb 8 18:18:59 telemann kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 18:18:59 telemann kernel: bash D ffff88006ac7bd08 0 12225 1 8260 (L-TLB) Feb 8 18:18:59 telemann kernel: ffff88006ac7bb88 0000000000000246 0000000300000000 ffff88007ec3a6d8 Feb 8 18:18:59 telemann kernel: 0000000000000009 ffff88006c16e820 ffff88007a5a9080 000000000008f03e Feb 8 18:18:59 telemann kernel: ffff88006c16ea08 ffffffff8022f10c Feb 8 18:18:59 telemann kernel: Call Trace: Feb 8 18:18:59 telemann kernel: [<ffffffff8022f10c>] __wake_up+0x38/0x4f Feb 8 18:18:59 telemann kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff Feb 8 18:18:59 telemann kernel: [<ffffffff802994d1>] flush_cpu_workqueue+0x83/0xb5 Feb 8 18:18:59 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:18:59 telemann kernel: [<ffffffff80263914>] mutex_lock+0xd/0x1d Feb 8 18:18:59 telemann kernel: [<ffffffff80299563>] flush_workqueue+0x60/0x87 Feb 8 18:18:59 telemann kernel: [<ffffffff80394af5>] release_dev+0x503/0x67b Feb 8 18:18:59 telemann kernel: [<ffffffff8020b860>] release_pages+0x158/0x165 Feb 8 18:18:59 telemann kernel: [<ffffffff80255821>] tty_release+0x11/0x1a Feb 8 18:18:59 telemann kernel: [<ffffffff80213492>] __fput+0xd3/0x1bd Feb 8 18:18:59 telemann kernel: [<ffffffff802243cb>] filp_close+0x5c/0x64 Feb 8 18:18:59 telemann kernel: [<ffffffff8023a392>] put_files_struct+0x63/0xae Feb 8 18:18:59 telemann kernel: [<ffffffff802160cd>] do_exit+0x31d/0x902 Feb 8 18:18:59 telemann kernel: [<ffffffff8024ae4d>] cpuset_exit+0x0/0x88 Feb 8 18:18:59 telemann kernel: [<ffffffff8022b920>] get_signal_to_deliver+0x477/0x4aa Feb 8 18:18:59 telemann kernel: [<ffffffff8025d19e>] do_notify_resume+0x9c/0x7ba Feb 8 18:18:59 telemann kernel: [<ffffffff80294ea1>] __group_send_sig_info+0xb9/0xc8 Feb 8 18:18:59 telemann kernel: [<ffffffff8025cb0b>] group_send_sig_info+0x62/0x6f Feb 8 18:18:59 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:18:59 telemann kernel: [<ffffffff802afd73>] audit_syscall_entry+0x180/0x1b3 Feb 8 18:18:59 telemann kernel: [<ffffffff80245a48>] sys_rt_sigreturn+0x327/0x35a Feb 8 18:18:59 telemann kernel: [<ffffffff802b0175>] audit_syscall_exit+0x336/0x362 Feb 8 18:18:59 telemann kernel: [<ffffffff8026042c>] int_signal+0x12/0x17 Feb 8 18:18:59 telemann kernel: Feb 8 18:18:59 telemann kernel: INFO: task bash:12225 blocked for more than 120 seconds. Feb 8 18:18:59 telemann kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 18:18:59 telemann kernel: bash D ffff88006ac7bd08 0 12225 1 8260 (L-TLB) Feb 8 18:18:59 telemann kernel: ffff88006ac7bb88 0000000000000246 0000000300000000 ffff88007ec3a6d8 Feb 8 18:18:59 telemann kernel: 0000000000000009 ffff88006c16e820 ffff88007a5a9080 000000000008f03e Feb 8 18:18:59 telemann kernel: ffff88006c16ea08 ffffffff8022f10c Feb 8 18:18:59 telemann kernel: Call Trace: Feb 8 18:18:59 telemann kernel: [<ffffffff8022f10c>] __wake_up+0x38/0x4f Feb 8 18:18:59 telemann kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff Feb 8 18:18:59 telemann kernel: [<ffffffff802994d1>] flush_cpu_workqueue+0x83/0xb5 Feb 8 18:18:59 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:18:59 telemann kernel: [<ffffffff80263914>] mutex_lock+0xd/0x1d Feb 8 18:18:59 telemann kernel: [<ffffffff80299563>] flush_workqueue+0x60/0x87 Feb 8 18:18:59 telemann kernel: [<ffffffff80394af5>] release_dev+0x503/0x67b Feb 8 18:18:59 telemann kernel: [<ffffffff8020b860>] release_pages+0x158/0x165 Feb 8 18:18:59 telemann kernel: [<ffffffff80255821>] tty_release+0x11/0x1a Feb 8 18:18:59 telemann kernel: [<ffffffff80213492>] __fput+0xd3/0x1bd Feb 8 18:18:59 telemann kernel: [<ffffffff802243cb>] filp_close+0x5c/0x64 Feb 8 18:18:59 telemann kernel: [<ffffffff8023a392>] put_files_struct+0x63/0xae Feb 8 18:18:59 telemann kernel: [<ffffffff802160cd>] do_exit+0x31d/0x902 Feb 8 18:18:59 telemann kernel: [<ffffffff8024ae4d>] cpuset_exit+0x0/0x88 Feb 8 18:18:59 telemann kernel: [<ffffffff8022b920>] get_signal_to_deliver+0x477/0x4aa Feb 8 18:18:59 telemann kernel: [<ffffffff8025d19e>] do_notify_resume+0x9c/0x7ba Feb 8 18:19:00 telemann kernel: [<ffffffff80294ea1>] __group_send_sig_info+0xb9/0xc8 Feb 8 18:19:00 telemann kernel: [<ffffffff8025cb0b>] group_send_sig_info+0x62/0x6f Feb 8 18:19:00 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:19:00 telemann kernel: [<ffffffff802afd73>] audit_syscall_entry+0x180/0x1b3 Feb 8 18:19:00 telemann kernel: [<ffffffff80245a48>] sys_rt_sigreturn+0x327/0x35a Feb 8 18:19:00 telemann kernel: [<ffffffff802b0175>] audit_syscall_exit+0x336/0x362 Feb 8 18:19:00 telemann kernel: [<ffffffff8026042c>] int_signal+0x12/0x17 Feb 8 18:19:00 telemann kernel: Feb 8 18:11:23 handel kernel: xenbr0: received tcn bpdu on port 1(eth0) Feb 8 18:11:23 handel kernel: xenbr0: topology change detected, propagating Feb 8 18:14:54 handel kernel: INFO: task syslogd:11299 blocked for more than 120 seconds. Feb 8 18:14:54 handel kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 18:14:54 handel kernel: syslogd D 0000031e848fed46 0 11299 1 11302 11268 (NOTLB) Feb 8 18:14:54 handel kernel: ffff880079603d88 0000000000000282 0000000000000000 0000000000000001 Feb 8 18:14:54 handel kernel: 000000000000000a ffff88007e5b9100 ffff88000002b040 0000000000026ea9 Feb 8 18:14:54 handel kernel: ffff88007e5b92e8 0000000000000000 Feb 8 18:14:54 handel kernel: Call Trace: Feb 8 18:14:54 handel kernel: [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 Feb 8 18:14:54 handel kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:14:54 handel kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff Feb 8 18:14:54 handel kernel: [<ffffffff8023119d>] __writeback_single_inode+0x1e9/0x328 Feb 8 18:19:15 handel kernel: [<ffffffff802d330d>] do_readv_writev+0x26e/0x291 Feb 8 18:19:15 handel kernel: [<ffffffff802e5b8b>] sync_inode+0x24/0x33 Feb 8 18:19:15 handel kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc Feb 8 18:19:15 handel kernel: [<ffffffff80251e07>] do_fsync+0x52/0xa4 Feb 8 18:19:15 handel kernel: [<ffffffff802d3b11>] __do_fsync+0x23/0x36 Feb 8 18:19:15 handel kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6 Feb 8 18:19:15 handel kernel: Feb 8 18:19:15 handel kernel: INFO: task syslogd:11299 blocked for more than 120 seconds. Feb 8 18:19:15 handel kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 18:19:15 handel kernel: syslogd D 0000031e848fed46 0 11299 1 11302 11268 (NOTLB) Feb 8 18:19:15 handel kernel: ffff880079603d88 0000000000000282 0000000000000000 0000000000000001 Feb 8 18:19:15 handel kernel: 000000000000000a ffff88007e5b9100 ffff88000002b040 0000000000026ea9 Feb 8 18:19:15 handel kernel: ffff88007e5b92e8 0000000000000000 Feb 8 18:19:15 handel kernel: Call Trace: Feb 8 18:19:15 handel kernel: [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 Feb 8 18:19:15 handel kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:19:15 handel kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff Feb 8 18:19:15 handel kernel: [<ffffffff8023119d>] __writeback_single_inode+0x1e9/0x328 Feb 8 18:19:15 handel kernel: [<ffffffff802d330d>] do_readv_writev+0x26e/0x291 Feb 8 18:19:15 handel kernel: [<ffffffff802e5b8b>] sync_inode+0x24/0x33 Feb 8 18:19:15 handel kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc Feb 8 18:19:15 handel kernel: [<ffffffff80251e07>] do_fsync+0x52/0xa4 Feb 8 18:19:15 handel kernel: [<ffffffff802d3b11>] __do_fsync+0x23/0x36 Feb 8 18:19:15 handel kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6 Feb 8 18:19:15 handel kernel: Feb 8 18:19:15 handel kernel: INFO: task syslogd:11299 blocked for more than 120 seconds. Feb 8 18:19:15 handel kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 8 18:19:15 handel kernel: syslogd D 0000031e848fed46 0 11299 1 11302 11268 (NOTLB) Feb 8 18:19:15 handel kernel: ffff880079603d88 0000000000000282 0000000000000000 0000000000000001 Feb 8 18:19:15 handel kernel: 000000000000000a ffff88007e5b9100 ffff88000002b040 0000000000026ea9 Feb 8 18:19:15 handel kernel: ffff88007e5b92e8 0000000000000000 Feb 8 18:19:15 handel kernel: Call Trace: Feb 8 18:19:15 handel kernel: [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 Feb 8 18:19:15 handel kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e Feb 8 18:19:16 handel kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff Feb 8 18:19:16 handel kernel: [<ffffffff8023119d>] __writeback_single_inode+0x1e9/0x328 Feb 8 18:19:16 handel kernel: [<ffffffff802d330d>] do_readv_writev+0x26e/0x291 Feb 8 18:19:16 handel kernel: [<ffffffff802e5b8b>] sync_inode+0x24/0x33 Feb 8 18:19:16 handel kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc Feb 8 18:19:16 handel kernel: [<ffffffff80251e07>] do_fsync+0x52/0xa4 Feb 8 18:19:16 handel kernel: [<ffffffff802d3b11>] __do_fsync+0x23/0x36 Feb 8 18:19:16 handel kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6 Feb 8 18:19:16 handel kernel: Cheers, Pim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Feb-10 17:08 UTC
Re: [Xen-devel] Xen hypervisor external denial of service vulnerability?
On Tue, Feb 08, 2011 at 06:21:25PM +0100, Pim van Riezen wrote:> > On Feb 8, 2011, at 18:08 , Pim van Riezen wrote: > > > On Feb 8, 2011, at 17:51 , Pasi Kärkkäinen wrote: > >> > >> Did you also make sure VMs don''t use those 2 pcpus dedicated for dom0? > >> You have to explicitly configure each VM not to use those pcpus. > > > > That seems to have done the trick. > > Alas, I was too soon in drawing a conclusion. After a new 10 minute run:Did you try to run the 2.6.32 pvops type kernel? Asking b/c it looks like the issue is due to the fact that mutex lock is held for a very very long time. The spinlock implementation in 2.6.32 changed so it might provide a better solution.> > Feb 8 18:12:30 telemann kernel: INFO: task bash:12225 blocked for more than 120 seconds. > Feb 8 18:12:30 telemann kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 8 18:12:30 telemann kernel: bash D ffff88006ac7bd08 0 12225 1 8260 (L-TLB) > Feb 8 18:12:30 telemann kernel: ffff88006ac7bb88 0000000000000246 0000000300000000 ffff88007ec3a6d8 > Feb 8 18:12:30 telemann kernel: 0000000000000009 ffff88006c16e820 ffff88007a5a9080 000000000008f03e > Feb 8 18:12:30 telemann kernel: ffff88006c16ea08 ffffffff8022f10c > Feb 8 18:12:30 telemann kernel: Call Trace: > Feb 8 18:12:30 telemann kernel: [<ffffffff8022f10c>] __wake_up+0x38/0x4f > Feb 8 18:12:30 telemann kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff > Feb 8 18:12:30 telemann kernel: [<ffffffff802994d1>] flush_cpu_workqueue+0x83/0xb5 > Feb 8 18:12:30 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:12:30 telemann kernel: [<ffffffff80263914>] mutex_lock+0xd/0x1d > Feb 8 18:12:30 telemann kernel: [<ffffffff80299563>] flush_workqueue+0x60/0x87 > Feb 8 18:12:41 telemann kernel: [<ffffffff80394af5>] release_dev+0x503/0x67b > Feb 8 18:12:55 telemann kernel: [<ffffffff8020b860>] release_pages+0x158/0x165 > Feb 8 18:13:09 telemann kernel: [<ffffffff80255821>] tty_release+0x11/0x1a > Feb 8 18:13:23 telemann kernel: [<ffffffff80213492>] __fput+0xd3/0x1bd > Feb 8 18:13:38 telemann kernel: [<ffffffff802243cb>] filp_close+0x5c/0x64 > Feb 8 18:13:51 telemann kernel: [<ffffffff8023a392>] put_files_struct+0x63/0xae > Feb 8 18:14:06 telemann kernel: [<ffffffff802160cd>] do_exit+0x31d/0x902 > Feb 8 18:14:19 telemann kernel: [<ffffffff8024ae4d>] cpuset_exit+0x0/0x88 > Feb 8 18:14:33 telemann kernel: [<ffffffff8022b920>] get_signal_to_deliver+0x477/0x4aa > Feb 8 18:14:49 telemann kernel: [<ffffffff8025d19e>] do_notify_resume+0x9c/0x7ba > Feb 8 18:15:01 telemann kernel: [<ffffffff80294ea1>] __group_send_sig_info+0xb9/0xc8 > Feb 8 18:15:08 telemann kernel: [<ffffffff8025cb0b>] group_send_sig_info+0x62/0x6f > Feb 8 18:15:22 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:15:37 telemann kernel: [<ffffffff802afd73>] audit_syscall_entry+0x180/0x1b3 > Feb 8 18:15:49 telemann kernel: [<ffffffff80245a48>] sys_rt_sigreturn+0x327/0x35a > Feb 8 18:16:03 telemann kernel: [<ffffffff802b0175>] audit_syscall_exit+0x336/0x362 > Feb 8 18:16:17 telemann kernel: [<ffffffff8026042c>] int_signal+0x12/0x17 > Feb 8 18:16:31 telemann kernel: > Feb 8 18:16:44 telemann kernel: INFO: task bash:12225 blocked for more than 120 seconds. > Feb 8 18:16:58 telemann kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 8 18:17:12 telemann kernel: bash D ffff88006ac7bd08 0 12225 1 8260 (L-TLB) > Feb 8 18:17:26 telemann kernel: ffff88006ac7bb88 0000000000000246 0000000300000000 ffff88007ec3a6d8 > Feb 8 18:17:39 telemann kernel: 0000000000000009 ffff88006c16e820 ffff88007a5a9080 000000000008f03e > Feb 8 18:17:54 telemann kernel: ffff88006c16ea08 ffffffff8022f10c > Feb 8 18:18:08 telemann kernel: Call Trace: > Feb 8 18:18:21 telemann kernel: [<ffffffff8022f10c>] __wake_up+0x38/0x4f > Feb 8 18:18:34 telemann kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff > Feb 8 18:18:47 telemann kernel: [<ffffffff802994d1>] flush_cpu_workqueue+0x83/0xb5 > Feb 8 18:18:58 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:18:58 telemann kernel: [<ffffffff80263914>] mutex_lock+0xd/0x1d > Feb 8 18:18:58 telemann kernel: [<ffffffff80299563>] flush_workqueue+0x60/0x87 > Feb 8 18:18:58 telemann kernel: [<ffffffff80394af5>] release_dev+0x503/0x67b > Feb 8 18:18:58 telemann kernel: [<ffffffff8020b860>] release_pages+0x158/0x165 > Feb 8 18:18:58 telemann kernel: [<ffffffff80255821>] tty_release+0x11/0x1a > Feb 8 18:18:58 telemann kernel: [<ffffffff80213492>] __fput+0xd3/0x1bd > Feb 8 18:18:58 telemann kernel: [<ffffffff802243cb>] filp_close+0x5c/0x64 > Feb 8 18:18:58 telemann kernel: [<ffffffff8023a392>] put_files_struct+0x63/0xae > Feb 8 18:18:58 telemann kernel: [<ffffffff802160cd>] do_exit+0x31d/0x902 > Feb 8 18:18:58 telemann kernel: [<ffffffff8024ae4d>] cpuset_exit+0x0/0x88 > Feb 8 18:18:58 telemann kernel: [<ffffffff8022b920>] get_signal_to_deliver+0x477/0x4aa > Feb 8 18:18:58 telemann kernel: [<ffffffff8025d19e>] do_notify_resume+0x9c/0x7ba > Feb 8 18:18:58 telemann kernel: [<ffffffff80294ea1>] __group_send_sig_info+0xb9/0xc8 > Feb 8 18:18:58 telemann kernel: [<ffffffff8025cb0b>] group_send_sig_info+0x62/0x6f > Feb 8 18:18:58 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:18:58 telemann kernel: [<ffffffff802afd73>] audit_syscall_entry+0x180/0x1b3 > Feb 8 18:18:58 telemann kernel: [<ffffffff80245a48>] sys_rt_sigreturn+0x327/0x35a > Feb 8 18:18:58 telemann kernel: [<ffffffff802b0175>] audit_syscall_exit+0x336/0x362 > Feb 8 18:18:59 telemann kernel: [<ffffffff8026042c>] int_signal+0x12/0x17 > Feb 8 18:18:59 telemann kernel: > Feb 8 18:18:59 telemann kernel: INFO: task bash:12225 blocked for more than 120 seconds. > Feb 8 18:18:59 telemann kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 8 18:18:59 telemann kernel: bash D ffff88006ac7bd08 0 12225 1 8260 (L-TLB) > Feb 8 18:18:59 telemann kernel: ffff88006ac7bb88 0000000000000246 0000000300000000 ffff88007ec3a6d8 > Feb 8 18:18:59 telemann kernel: 0000000000000009 ffff88006c16e820 ffff88007a5a9080 000000000008f03e > Feb 8 18:18:59 telemann kernel: ffff88006c16ea08 ffffffff8022f10c > Feb 8 18:18:59 telemann kernel: Call Trace: > Feb 8 18:18:59 telemann kernel: [<ffffffff8022f10c>] __wake_up+0x38/0x4f > Feb 8 18:18:59 telemann kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff > Feb 8 18:18:59 telemann kernel: [<ffffffff802994d1>] flush_cpu_workqueue+0x83/0xb5 > Feb 8 18:18:59 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:18:59 telemann kernel: [<ffffffff80263914>] mutex_lock+0xd/0x1d > Feb 8 18:18:59 telemann kernel: [<ffffffff80299563>] flush_workqueue+0x60/0x87 > Feb 8 18:18:59 telemann kernel: [<ffffffff80394af5>] release_dev+0x503/0x67b > Feb 8 18:18:59 telemann kernel: [<ffffffff8020b860>] release_pages+0x158/0x165 > Feb 8 18:18:59 telemann kernel: [<ffffffff80255821>] tty_release+0x11/0x1a > Feb 8 18:18:59 telemann kernel: [<ffffffff80213492>] __fput+0xd3/0x1bd > Feb 8 18:18:59 telemann kernel: [<ffffffff802243cb>] filp_close+0x5c/0x64 > Feb 8 18:18:59 telemann kernel: [<ffffffff8023a392>] put_files_struct+0x63/0xae > Feb 8 18:18:59 telemann kernel: [<ffffffff802160cd>] do_exit+0x31d/0x902 > Feb 8 18:18:59 telemann kernel: [<ffffffff8024ae4d>] cpuset_exit+0x0/0x88 > Feb 8 18:18:59 telemann kernel: [<ffffffff8022b920>] get_signal_to_deliver+0x477/0x4aa > Feb 8 18:18:59 telemann kernel: [<ffffffff8025d19e>] do_notify_resume+0x9c/0x7ba > Feb 8 18:18:59 telemann kernel: [<ffffffff80294ea1>] __group_send_sig_info+0xb9/0xc8 > Feb 8 18:18:59 telemann kernel: [<ffffffff8025cb0b>] group_send_sig_info+0x62/0x6f > Feb 8 18:18:59 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:18:59 telemann kernel: [<ffffffff802afd73>] audit_syscall_entry+0x180/0x1b3 > Feb 8 18:18:59 telemann kernel: [<ffffffff80245a48>] sys_rt_sigreturn+0x327/0x35a > Feb 8 18:18:59 telemann kernel: [<ffffffff802b0175>] audit_syscall_exit+0x336/0x362 > Feb 8 18:18:59 telemann kernel: [<ffffffff8026042c>] int_signal+0x12/0x17 > Feb 8 18:18:59 telemann kernel: > Feb 8 18:18:59 telemann kernel: INFO: task bash:12225 blocked for more than 120 seconds. > Feb 8 18:18:59 telemann kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 8 18:18:59 telemann kernel: bash D ffff88006ac7bd08 0 12225 1 8260 (L-TLB) > Feb 8 18:18:59 telemann kernel: ffff88006ac7bb88 0000000000000246 0000000300000000 ffff88007ec3a6d8 > Feb 8 18:18:59 telemann kernel: 0000000000000009 ffff88006c16e820 ffff88007a5a9080 000000000008f03e > Feb 8 18:18:59 telemann kernel: ffff88006c16ea08 ffffffff8022f10c > Feb 8 18:18:59 telemann kernel: Call Trace: > Feb 8 18:18:59 telemann kernel: [<ffffffff8022f10c>] __wake_up+0x38/0x4f > Feb 8 18:18:59 telemann kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff > Feb 8 18:18:59 telemann kernel: [<ffffffff802994d1>] flush_cpu_workqueue+0x83/0xb5 > Feb 8 18:18:59 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:18:59 telemann kernel: [<ffffffff80263914>] mutex_lock+0xd/0x1d > Feb 8 18:18:59 telemann kernel: [<ffffffff80299563>] flush_workqueue+0x60/0x87 > Feb 8 18:18:59 telemann kernel: [<ffffffff80394af5>] release_dev+0x503/0x67b > Feb 8 18:18:59 telemann kernel: [<ffffffff8020b860>] release_pages+0x158/0x165 > Feb 8 18:18:59 telemann kernel: [<ffffffff80255821>] tty_release+0x11/0x1a > Feb 8 18:18:59 telemann kernel: [<ffffffff80213492>] __fput+0xd3/0x1bd > Feb 8 18:18:59 telemann kernel: [<ffffffff802243cb>] filp_close+0x5c/0x64 > Feb 8 18:18:59 telemann kernel: [<ffffffff8023a392>] put_files_struct+0x63/0xae > Feb 8 18:18:59 telemann kernel: [<ffffffff802160cd>] do_exit+0x31d/0x902 > Feb 8 18:18:59 telemann kernel: [<ffffffff8024ae4d>] cpuset_exit+0x0/0x88 > Feb 8 18:18:59 telemann kernel: [<ffffffff8022b920>] get_signal_to_deliver+0x477/0x4aa > Feb 8 18:18:59 telemann kernel: [<ffffffff8025d19e>] do_notify_resume+0x9c/0x7ba > Feb 8 18:19:00 telemann kernel: [<ffffffff80294ea1>] __group_send_sig_info+0xb9/0xc8 > Feb 8 18:19:00 telemann kernel: [<ffffffff8025cb0b>] group_send_sig_info+0x62/0x6f > Feb 8 18:19:00 telemann kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:19:00 telemann kernel: [<ffffffff802afd73>] audit_syscall_entry+0x180/0x1b3 > Feb 8 18:19:00 telemann kernel: [<ffffffff80245a48>] sys_rt_sigreturn+0x327/0x35a > Feb 8 18:19:00 telemann kernel: [<ffffffff802b0175>] audit_syscall_exit+0x336/0x362 > Feb 8 18:19:00 telemann kernel: [<ffffffff8026042c>] int_signal+0x12/0x17 > Feb 8 18:19:00 telemann kernel: > > Feb 8 18:11:23 handel kernel: xenbr0: received tcn bpdu on port 1(eth0) > Feb 8 18:11:23 handel kernel: xenbr0: topology change detected, propagating > Feb 8 18:14:54 handel kernel: INFO: task syslogd:11299 blocked for more than 120 seconds. > Feb 8 18:14:54 handel kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 8 18:14:54 handel kernel: syslogd D 0000031e848fed46 0 11299 1 11302 11268 (NOTLB) > Feb 8 18:14:54 handel kernel: ffff880079603d88 0000000000000282 0000000000000000 0000000000000001 > Feb 8 18:14:54 handel kernel: 000000000000000a ffff88007e5b9100 ffff88000002b040 0000000000026ea9 > Feb 8 18:14:54 handel kernel: ffff88007e5b92e8 0000000000000000 > Feb 8 18:14:54 handel kernel: Call Trace: > Feb 8 18:14:54 handel kernel: [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 > Feb 8 18:14:54 handel kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:14:54 handel kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff > Feb 8 18:14:54 handel kernel: [<ffffffff8023119d>] __writeback_single_inode+0x1e9/0x328 > Feb 8 18:19:15 handel kernel: [<ffffffff802d330d>] do_readv_writev+0x26e/0x291 > Feb 8 18:19:15 handel kernel: [<ffffffff802e5b8b>] sync_inode+0x24/0x33 > Feb 8 18:19:15 handel kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc > Feb 8 18:19:15 handel kernel: [<ffffffff80251e07>] do_fsync+0x52/0xa4 > Feb 8 18:19:15 handel kernel: [<ffffffff802d3b11>] __do_fsync+0x23/0x36 > Feb 8 18:19:15 handel kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6 > Feb 8 18:19:15 handel kernel: > Feb 8 18:19:15 handel kernel: INFO: task syslogd:11299 blocked for more than 120 seconds. > Feb 8 18:19:15 handel kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 8 18:19:15 handel kernel: syslogd D 0000031e848fed46 0 11299 1 11302 11268 (NOTLB) > Feb 8 18:19:15 handel kernel: ffff880079603d88 0000000000000282 0000000000000000 0000000000000001 > Feb 8 18:19:15 handel kernel: 000000000000000a ffff88007e5b9100 ffff88000002b040 0000000000026ea9 > Feb 8 18:19:15 handel kernel: ffff88007e5b92e8 0000000000000000 > Feb 8 18:19:15 handel kernel: Call Trace: > Feb 8 18:19:15 handel kernel: [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 > Feb 8 18:19:15 handel kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:19:15 handel kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff > Feb 8 18:19:15 handel kernel: [<ffffffff8023119d>] __writeback_single_inode+0x1e9/0x328 > Feb 8 18:19:15 handel kernel: [<ffffffff802d330d>] do_readv_writev+0x26e/0x291 > Feb 8 18:19:15 handel kernel: [<ffffffff802e5b8b>] sync_inode+0x24/0x33 > Feb 8 18:19:15 handel kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc > Feb 8 18:19:15 handel kernel: [<ffffffff80251e07>] do_fsync+0x52/0xa4 > Feb 8 18:19:15 handel kernel: [<ffffffff802d3b11>] __do_fsync+0x23/0x36 > Feb 8 18:19:15 handel kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6 > Feb 8 18:19:15 handel kernel: > Feb 8 18:19:15 handel kernel: INFO: task syslogd:11299 blocked for more than 120 seconds. > Feb 8 18:19:15 handel kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 8 18:19:15 handel kernel: syslogd D 0000031e848fed46 0 11299 1 11302 11268 (NOTLB) > Feb 8 18:19:15 handel kernel: ffff880079603d88 0000000000000282 0000000000000000 0000000000000001 > Feb 8 18:19:15 handel kernel: 000000000000000a ffff88007e5b9100 ffff88000002b040 0000000000026ea9 > Feb 8 18:19:15 handel kernel: ffff88007e5b92e8 0000000000000000 > Feb 8 18:19:15 handel kernel: Call Trace: > Feb 8 18:19:15 handel kernel: [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5 > Feb 8 18:19:15 handel kernel: [<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e > Feb 8 18:19:16 handel kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff > Feb 8 18:19:16 handel kernel: [<ffffffff8023119d>] __writeback_single_inode+0x1e9/0x328 > Feb 8 18:19:16 handel kernel: [<ffffffff802d330d>] do_readv_writev+0x26e/0x291 > Feb 8 18:19:16 handel kernel: [<ffffffff802e5b8b>] sync_inode+0x24/0x33 > Feb 8 18:19:16 handel kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc > Feb 8 18:19:16 handel kernel: [<ffffffff80251e07>] do_fsync+0x52/0xa4 > Feb 8 18:19:16 handel kernel: [<ffffffff802d3b11>] __do_fsync+0x23/0x36 > Feb 8 18:19:16 handel kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6 > Feb 8 18:19:16 handel kernel: > > Cheers, > Pim > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel