Hi all! I want to update my Debian Lenny xen servers to Squeeze. I am testing with a new install. All installs Ok but when I install the multipath package I get a kernel crash. I am searched a bit with google but I have not found any solution. Are there anybody in the list working with this configuration? PS: If I boot my server without Xen, with a standard kernel, multipath and the other is working right. Best regards for the help, Agustin _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Henrik Langos
2011-Feb-11 14:18 UTC
Re: [Xen-users] Debian Squeeze, xen, multipath and iscsi
On Fri, Feb 11, 2011 at 02:35:53PM +0100, Agustin Lopez wrote:> > Hi all! > > I want to update my Debian Lenny xen servers to Squeeze. > I am testing with a new install. All installs Ok but when I install the > multipath > package I get a kernel crash. > > I am searched a bit with google but I have not found any solution. > > Are there anybody in the list working with this configuration? > > PS: If I boot my server without Xen, with a standard kernel, multipath > and the other is working right. >What exactly crashes? dom0 ? Do you get a kernel dump on the console? I have pretty much the same setup here (iSCSI + multipath + Xen + Squeeze dom0 + Lenny/Etch PVM domUs) and I had some trouble with multipath and iSCSI beeing a little touchy. Basically my dom0 kernel hates to have fast iSCSI logout/login sequences. You''ll have to give multipathd some time to cleanly remove multipath devices before you do another login. Otherwise I get stuff like this where kpartx (the thing that manages of device nodes for partitions) triggers some race condition: Feb 10 06:46:43 xenhost03 kernel: [225060.039126] BUG: unable to handle kernel paging request at ffff88001558b010 Feb 10 06:46:43 xenhost03 kernel: [225060.039172] IP: [<ffffffff8100e428>] xen_set_pmd+0x15/0x2c Feb 10 06:46:43 xenhost03 kernel: [225060.039210] PGD 1002067 PUD 1006067 PMD 18a067 PTE 801000001558b065 Feb 10 06:46:43 xenhost03 kernel: [225060.039253] Oops: 0003 [#1] SMP Feb 10 06:46:43 xenhost03 kernel: [225060.039284] last sysfs file: /sys/devices/virtual/block/dm-6/dm/suspended Feb 10 06:46:43 xenhost03 kernel: [225060.039319] CPU 0 Feb 10 06:46:43 xenhost03 kernel: [225060.039344] Modules linked in: tun dm_round_robin crc32c xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables bridge stp xen_evtchn xenfs ib_iser rdma_cm ib_c m iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh loop snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i915 drm_kms_helper drm snd_timer i2c_i801 evdev parport_pc psmouse serio_raw pcspkr i2c_ algo_bit parport i2c_core snd soundcore video output snd_page_alloc button processor acpi_processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd ata_generic ata_piix libata ehci_hcd scsi_mod e1000e usbcore nls_base thermal thermal_sys [last unloaded: scsi_wait_scan] Feb 10 06:46:43 xenhost03 kernel: [225060.039851] Pid: 9259, comm: kpartx_id Not tainted 2.6.32-5-xen-amd64 #1 To Be Filled By O.E.M. Feb 10 06:46:43 xenhost03 kernel: [225060.039904] RIP: e030:[<ffffffff8100e428>] [<ffffffff8100e428>] xen_set_pmd+0x15/0x2c Feb 10 06:46:43 xenhost03 kernel: [225060.039959] RSP: e02b:ffff880013ad3b18 EFLAGS: 00010246 Feb 10 06:46:43 xenhost03 kernel: [225060.039990] RAX: 0000000000000000 RBX: ffff88001558b010 RCX: ffff880000000000 Feb 10 06:46:43 xenhost03 kernel: [225060.040006] RDX: ffffea0000000000 RSI: 0000000001cc0000 RDI: ffff88001558b010 Feb 10 06:46:43 xenhost03 kernel: [225060.040006] RBP: 0000000000000000 R08: 0000000001cc0000 R09: ffff880073c03100 Feb 10 06:46:43 xenhost03 kernel: [225060.040006] R10: 0000000000000000 R11: ffff88002ce3bd78 R12: 000000000061c000 Feb 10 06:46:43 xenhost03 kernel: [225060.040006] R13: 0000000000400000 R14: ffff88001558b010 R15: ffff88002e156000 Feb 10 06:46:43 xenhost03 kernel: [225060.040006] FS: 00007f5094f64700(0000) GS:ffff880003630000(0000) knlGS:0000000000000000 Feb 10 06:46:43 xenhost03 kernel: [225060.040006] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 10 06:46:43 xenhost03 kernel: [225060.040006] CR2: ffff88001558b010 CR3: 0000000011a2b000 CR4: 0000000000002660 Feb 10 06:46:43 xenhost03 kernel: [225060.040006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 10 06:46:43 xenhost03 kernel: [225060.040006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 10 06:46:43 xenhost03 kernel: [225060.040006] Process kpartx_id (pid: 9259, threadinfo ffff880013ad2000, task ffff880002747810) Feb 10 06:46:43 xenhost03 kernel: [225060.040006] Stack: Feb 10 06:46:43 xenhost03 kernel: [225060.040006] ffff880000000000 0000000000600000 0000000000400000 ffffffff810cf886 Feb 10 06:46:43 xenhost03 kernel: [225060.040006] <0> ffff880013ad3fd8 0000000017ab2067 ffff880002159180 0000000000000000 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] <0> 0000000000000000 000000000061bfff 000000000061bfff 0000000001c00000 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] Call Trace: Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810cf886>] ? free_pgd_range+0x226/0x3bf Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810cfabb>] ? free_pgtables+0x9c/0xbd Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810d129d>] ? exit_mmap+0xef/0x148 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff8104cb09>] ? mmput+0x3c/0xdf Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810f44d6>] ? flush_old_exec+0x45c/0x548 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff811270d0>] ? load_elf_binary+0x0/0x1954 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff8112746d>] ? load_elf_binary+0x39d/0x1954 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810cc572>] ? follow_page+0x2ad/0x303 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810ce136>] ? __get_user_pages+0x3ea/0x47b Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810f4fcb>] ? get_arg_page+0x61/0x110 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff811270d0>] ? load_elf_binary+0x0/0x1954 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810f3caa>] ? search_binary_handler+0xb4/0x245 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810f54a7>] ? do_execve+0x1e4/0x2c3 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff81010500>] ? sys_execve+0x35/0x4c Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff81011f9a>] ? stub_execve+0x6a/0xc0 Feb 10 06:46:43 xenhost03 kernel: [225060.040636] Code: fb ff ff e8 c6 f4 01 00 bf 01 00 00 00 e8 c9 ea ff ff 59 5e 5b c3 55 48 89 f5 53 48 89 fb 48 83 ec 08 e8 6e e3 ff ff 84 c0 75 08 <48> 89 2b 41 59 5b 5d c3 41 58 48 89 df 48 89 ee 5b 5d e9 7e ff Feb 10 06:46:43 xenhost03 kernel: [225060.042981] RIP [<ffffffff8100e428>] xen_set_pmd+0x15/0x2c Feb 10 06:46:43 xenhost03 kernel: [225060.042981] RSP <ffff880013ad3b18> Feb 10 06:46:43 xenhost03 kernel: [225060.042981] CR2: ffff88001558b010 Feb 10 06:46:43 xenhost03 kernel: [225060.042981] ---[ end trace 9939eec096f5a2de ]--- Also I noticed dom0 lockups of more than a minute when starting HVM domUs while another domU was creating heavy IO load. Those only disapeared when I gave my dom0 a fixed ammount of RAM instead of balooning it down. Other than that I had no bad trouble. (Well, life migration of Lenny 32 bit domUs on 64bit dom0 doesn''t work because the Lenny domU kernel is not good at that). I didn''t do a new install of squeeze though. I started with lenny and upgraded to squeeze. cheers -henrik _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Agustin Lopez
2011-Feb-16 11:43 UTC
Re: [Xen-users] Debian Squeeze, xen, multipath and iscsi
Hi! Thanks for the answer. Below is the kernel message I am repeatly getting in the log. The system crash only with the interaction between xen, iscsi and multipath. Again, system is Debian Squeeze running on a Fujitsu PRIMERGY RX200 S4 with 8 cores Intel(R) Xeon(R) CPU E5405 @ 2.00GHz iSCSI is from a EMC cabinet. Any help, please? Agustin Modules linked in: dm_round_robin scsi_dh_emc crc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp xen_evtchn xenfs dm_multipath dm_mod scsi_dh loop i2c_i801 usbhid ioatdma dca hid shpchp i2c_cor Feb 16 12:08:01 ariete kernel: [ 358.633003] Pid: 1672, comm: dmsetup Tainted: G D 2.6.32-5-xen-amd64 #1 PRIMERGY RX200 S4 Feb 16 12:08:01 ariete kernel: [ 358.633003] RIP: e030:[<ffffffff8130cb16>] [<ffffffff8130cb16>] _spin_lock+0x13/0x1b Feb 16 12:08:01 ariete kernel: [ 358.633003] RSP: e02b:ffff8807dbccdb10 EFLAGS: 00000297 Feb 16 12:08:01 ariete kernel: [ 358.633003] RAX: 0000000000000022 RBX: ffff8807dbccdb28 RCX: ffff8807dbccdb68 Feb 16 12:08:01 ariete kernel: [ 358.633003] RDX: 0000000000000021 RSI: 0000000000000200 RDI: ffff8807dbe1c300 Feb 16 12:08:01 ariete kernel: [ 358.633003] RBP: 0000000000000200 R08: 0000000000000008 R09: ffffffff814eb870 Feb 16 12:08:01 ariete kernel: [ 358.633003] R10: 000000000000000b R11: ffff8807dbe1c280 R12: ffff8807dbe1c280 Feb 16 12:08:01 ariete kernel: [ 358.633003] R13: 000000000000c580 R14: ffff8807dbccdb28 R15: ffffffff814eb830 Feb 16 12:08:01 ariete kernel: [ 358.633003] FS: 00007fe9c607a7a0(0000) GS:ffff8800280c7000(0000) knlGS:0000000000000000 Feb 16 12:08:01 ariete kernel: [ 358.633003] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 16 12:08:01 ariete kernel: [ 358.633003] CR2: 00007fe9c5803420 CR3: 0000000001001000 CR4: 0000000000002660 Feb 16 12:08:01 ariete kernel: [ 358.633003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 16 12:08:01 ariete kernel: [ 358.633003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 16 12:08:01 ariete kernel: [ 358.633003] Call Trace: Feb 16 12:08:01 ariete kernel: [ 358.633003] [<ffffffff8100dd87>] ? xen_exit_mmap+0xf8/0x136 Feb 16 12:08:01 ariete kernel: [ 358.633003] [<ffffffff810d1208>] ? exit_mmap+0x5a/0x148 Feb 16 12:08:01 ariete kernel: [ 358.633003] [<ffffffff8104cb09>] ? mmput+0x3c/0xdf Feb 16 12:08:01 ariete kernel: [ 358.633003] [<ffffffff81050702>] ? exit_mm+0x102/0x10d Feb 16 12:08:01 ariete kernel: [ 358.633003] [<ffffffff8130ca72>] ? _spin_lock_irq+0x7/0x22 Feb 16 12:08:01 ariete kernel: [ 358.633003] [<ffffffff81052127>] ? do_exit+0x1f8/0x6c6 Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff8100ecdf>] ? xen_restore_fl_direct_end+0x0/0x1 Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff8130cb3a>] ? _spin_unlock_irqrestore+0xd/0xe Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff8104f3af>] ? release_console_sem+0x17e/0x1af Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff8130d9dd>] ? oops_end+0xaf/0xb4 Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff810135f0>] ? do_invalid_op+0x8b/0x95 Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff8100c694>] ? pin_pagetable_pfn+0x2d/0x36 Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffffa01bb9ea>] ? copy_params+0x71/0xb1 [dm_mod] Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff810baf07>] ? __alloc_pages_nodemask+0x11c/0x5f5 Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff8101293b>] ? invalid_op+0x1b/0x20 Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff8100c694>] ? pin_pagetable_pfn+0x2d/0x36 Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff8100c690>] ? pin_pagetable_pfn+0x29/0x36 Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff810cd4e2>] ? __pte_alloc+0x6b/0xc6 Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff810cb394>] ? pmd_alloc+0x28/0x5b Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff810cd60b>] ? handle_mm_fault+0xce/0x80f Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff810fbc5c>] ? do_vfs_ioctl+0x48d/0x4cb Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff8130f016>] ? do_page_fault+0x2e0/0x2fc Feb 16 12:08:01 ariete kernel: [ 358.642915] [<ffffffff8130ceb5>] ? page_fault+0x25/0x30 El 11/02/2011 15:18, Henrik Langos escribió:> On Fri, Feb 11, 2011 at 02:35:53PM +0100, Agustin Lopez wrote: >> Hi all! >> >> I want to update my Debian Lenny xen servers to Squeeze. >> I am testing with a new install. All installs Ok but when I install the >> multipath >> package I get a kernel crash. >> >> I am searched a bit with google but I have not found any solution. >> >> Are there anybody in the list working with this configuration? >> >> PS: If I boot my server without Xen, with a standard kernel, multipath >> and the other is working right. >> > What exactly crashes? dom0 ? Do you get a kernel dump on the console? > > I have pretty much the same setup here (iSCSI + multipath + Xen + > Squeeze dom0 + Lenny/Etch PVM domUs) and I had some trouble with > multipath and iSCSI beeing a little touchy. > > Basically my dom0 kernel hates to have fast iSCSI logout/login > sequences. > > You''ll have to give multipathd some time to cleanly remove > multipath devices before you do another login. > > Otherwise I get stuff like this where kpartx (the thing that > manages of device nodes for partitions) triggers some > race condition: > > Feb 10 06:46:43 xenhost03 kernel: [225060.039126] BUG: unable to handle kernel paging request at ffff88001558b010 > Feb 10 06:46:43 xenhost03 kernel: [225060.039172] IP: [<ffffffff8100e428>] xen_set_pmd+0x15/0x2c > Feb 10 06:46:43 xenhost03 kernel: [225060.039210] PGD 1002067 PUD 1006067 PMD 18a067 PTE 801000001558b065 > Feb 10 06:46:43 xenhost03 kernel: [225060.039253] Oops: 0003 [#1] SMP > Feb 10 06:46:43 xenhost03 kernel: [225060.039284] last sysfs file: /sys/devices/virtual/block/dm-6/dm/suspended > Feb 10 06:46:43 xenhost03 kernel: [225060.039319] CPU 0 > Feb 10 06:46:43 xenhost03 kernel: [225060.039344] Modules linked in: tun dm_round_robin crc32c xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables bridge stp xen_evtchn xenfs ib_iser rdma_cm ib_c > m iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh loop snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i915 drm_kms_helper drm snd_timer i2c_i801 evdev parport_pc psmouse serio_raw pcspkr i2c_ > algo_bit parport i2c_core snd soundcore video output snd_page_alloc button processor acpi_processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd ata_generic ata_piix libata ehci_hcd scsi_mod e1000e usbcore nls_base thermal thermal_sys > [last unloaded: scsi_wait_scan] > Feb 10 06:46:43 xenhost03 kernel: [225060.039851] Pid: 9259, comm: kpartx_id Not tainted 2.6.32-5-xen-amd64 #1 To Be Filled By O.E.M. > Feb 10 06:46:43 xenhost03 kernel: [225060.039904] RIP: e030:[<ffffffff8100e428>] [<ffffffff8100e428>] xen_set_pmd+0x15/0x2c > Feb 10 06:46:43 xenhost03 kernel: [225060.039959] RSP: e02b:ffff880013ad3b18 EFLAGS: 00010246 > Feb 10 06:46:43 xenhost03 kernel: [225060.039990] RAX: 0000000000000000 RBX: ffff88001558b010 RCX: ffff880000000000 > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] RDX: ffffea0000000000 RSI: 0000000001cc0000 RDI: ffff88001558b010 > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] RBP: 0000000000000000 R08: 0000000001cc0000 R09: ffff880073c03100 > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] R10: 0000000000000000 R11: ffff88002ce3bd78 R12: 000000000061c000 > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] R13: 0000000000400000 R14: ffff88001558b010 R15: ffff88002e156000 > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] FS: 00007f5094f64700(0000) GS:ffff880003630000(0000) knlGS:0000000000000000 > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] CR2: ffff88001558b010 CR3: 0000000011a2b000 CR4: 0000000000002660 > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] Process kpartx_id (pid: 9259, threadinfo ffff880013ad2000, task ffff880002747810) > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] Stack: > Feb 10 06:46:43 xenhost03 kernel: [225060.040006] ffff880000000000 0000000000600000 0000000000400000 ffffffff810cf886 > Feb 10 06:46:43 xenhost03 kernel: [225060.040006]<0> ffff880013ad3fd8 0000000017ab2067 ffff880002159180 0000000000000000 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636]<0> 0000000000000000 000000000061bfff 000000000061bfff 0000000001c00000 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] Call Trace: > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810cf886>] ? free_pgd_range+0x226/0x3bf > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810cfabb>] ? free_pgtables+0x9c/0xbd > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810d129d>] ? exit_mmap+0xef/0x148 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff8104cb09>] ? mmput+0x3c/0xdf > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810f44d6>] ? flush_old_exec+0x45c/0x548 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff811270d0>] ? load_elf_binary+0x0/0x1954 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff8112746d>] ? load_elf_binary+0x39d/0x1954 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810cc572>] ? follow_page+0x2ad/0x303 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810ce136>] ? __get_user_pages+0x3ea/0x47b > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810f4fcb>] ? get_arg_page+0x61/0x110 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff811270d0>] ? load_elf_binary+0x0/0x1954 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810f3caa>] ? search_binary_handler+0xb4/0x245 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff810f54a7>] ? do_execve+0x1e4/0x2c3 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff81010500>] ? sys_execve+0x35/0x4c > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] [<ffffffff81011f9a>] ? stub_execve+0x6a/0xc0 > Feb 10 06:46:43 xenhost03 kernel: [225060.040636] Code: fb ff ff e8 c6 f4 01 00 bf 01 00 00 00 e8 c9 ea ff ff 59 5e 5b c3 55 48 89 f5 53 48 89 fb 48 83 ec 08 e8 6e e3 ff ff 84 c0 75 08<48> 89 2b 41 59 5b 5d c3 41 58 48 89 df 48 89 ee 5b 5d e9 7e ff > Feb 10 06:46:43 xenhost03 kernel: [225060.042981] RIP [<ffffffff8100e428>] xen_set_pmd+0x15/0x2c > Feb 10 06:46:43 xenhost03 kernel: [225060.042981] RSP<ffff880013ad3b18> > Feb 10 06:46:43 xenhost03 kernel: [225060.042981] CR2: ffff88001558b010 > Feb 10 06:46:43 xenhost03 kernel: [225060.042981] ---[ end trace 9939eec096f5a2de ]--- > > > Also I noticed dom0 lockups of more than a minute when starting > HVM domUs while another domU was creating heavy IO load. > Those only disapeared when I gave my dom0 a fixed ammount of > RAM instead of balooning it down. > > > Other than that I had no bad trouble. (Well, life migration of > Lenny 32 bit domUs on 64bit dom0 doesn''t work because the > Lenny domU kernel is not good at that). > > > I didn''t do a new install of squeeze though. I started > with lenny and upgraded to squeeze. > > cheers > -henrik > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
davide.vaghetti@ing.unipi.it
2011-Apr-06 07:08 UTC
[Xen-users] Re: Debian Squeeze, xen, multipath and iscsi
Hi, I had the very same problem. After getting almost mad trying to fix the issue mixing different combination of boot option I followed the advice of Henrik: let the multipath be completely loaded __before__ the iscsi daemon. That is, don''t let the open-iscsi be loaded at boot time, or at least remove it from the relevant runlevel and load via rc.local. In my environment that fixed the issue (the latest kernel update from Debian do not make any difference). good luck davide -- Davide Vaghetti Faculty of Engineer University of Pisa - Italy -- View this message in context: http://xen.1045712.n5.nabble.com/Debian-Squeeze-xen-multipath-and-iscsi-tp3381162p4285815.html Sent from the Xen - User mailing list archive at Nabble.com. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Henrik Langos
2011-Apr-06 08:55 UTC
Re: [Xen-users] Re: Debian Squeeze, xen, multipath and iscsi
On Wed, Apr 06, 2011 at 12:08:37AM -0700, davide.vaghetti@ing.unipi.it wrote:> Hi, > > I had the very same problem. After getting almost mad trying to fix the > issue mixing different combination of boot option I followed the advice of > Henrik: let the multipath be completely loaded __before__ the iscsi daemon. > That is, don''t let the open-iscsi be loaded at boot time, or at least remove > it from the relevant runlevel and load via rc.local. In my environment that > fixed the issue (the latest kernel update from Debian do not make any > difference).Hi Davide, I had to reboot my Xen host recently and upon iSCSI login I repeatedly ran into that same problem. I don''t do any automatic iSCSI logins during boot and I don''t start any Xen guest on boot. So the system _was_ completely up and idle. Still everytime I did the iscsi login (which currently logs in to 12 volumes via 2 paths each) I ended up with a CPUs stuck for a minute or something plainly wrong like this: [225060.039126] BUG: unable to handle kernel paging request at ffff88001558b010 [225060.039172] IP: [<ffffffff8100e428>] xen_set_pmd+0x15/0x2c [225060.039210] PGD 1002067 PUD 1006067 PMD 18a067 PTE 801000001558b065 [225060.039253] Oops: 0003 [#1] SMP [225060.039284] last sysfs file: /sys/devices/virtual/block/dm-6/dm/suspended [225060.039319] CPU 0 ... where kpartx_id or udevd or any other part of the device mapper eco system would trigger a memory management problem. The workaround "hold your breath and cross your fingers"-style was to - stop the multipath daemon, then - do the iSCSI login, and then - restart the mutipath daemon. like this: /etc/init.d/multipath-tools stop sleep 5 iscsiadm -m node -p ''192.168.0.1:3260'' --login sleep 5 /etc/init.d/multipath-tools start ( while alternately praying and cursing. ;-) ) That way I was able to login to all iSCSI targets without immediately triggering bugs / race conditions. Actually I am not sure that I need the multipathd at all. I have after all a pretty simple setup. # multipath -ll ... 36090a068302e3e73a9d4041bd000e0b3 dm-11 EQLOGIC,100E-00 size=10G features=''1 queue_if_no_path'' hwhandler=''0'' wp=rw `-+- policy=''round-robin 0'' prio=2 status=active |- 24:0:0:0 sdt 65:48 active ready running `- 23:0:0:0 sdo 8:224 active ready running 36090a068302e4ee710d5341bd000a04b dm-16 EQLOGIC,100E-00 size=38G features=''1 queue_if_no_path'' hwhandler=''0'' wp=rw `-+- policy=''round-robin 0'' prio=2 status=active |- 20:0:0:0 sdl 8:176 active ready running `- 19:0:0:0 sdk 8:160 active ready running ... My host has two gigabit interfaces that are dedicated to storage traffic. Each connects to a switch that connects to one of the controllers of my storage. There is no cross-connect between the switches. So if any one part fails, then one path is unavailable but there should be no need to reconfigure anything. Could anybody hit me with a clue stick? ;-) Did anybody try a different Dom0 system? I am tempted to try XCP as Dom0 instead of Debian but I''d love to know if anybody else already did that switch and what their experience was. Does anybody have experience with iSCSI and multipath on XCP ? cheers -henrik _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Davide Vaghetti
2011-Apr-11 11:24 UTC
Re: [Xen-users] Re: Debian Squeeze, xen, multipath and iscsi
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/06/2011 10:55 AM, Henrik Langos wrote:> On Wed, Apr 06, 2011 at 12:08:37AM -0700, davide.vaghetti@ing.unipi.it wrote: >> Hi, >> >> I had the very same problem. After getting almost mad trying to fix the >> issue mixing different combination of boot option I followed the advice of >> Henrik: let the multipath be completely loaded __before__ the iscsi daemon. >> That is, don''t let the open-iscsi be loaded at boot time, or at least remove >> it from the relevant runlevel and load via rc.local. In my environment that >> fixed the issue (the latest kernel update from Debian do not make any >> difference). > > Hi Davide, > > I had to reboot my Xen host recently and upon iSCSI login I repeatedly ran > into that same problem. I don''t do any automatic iSCSI logins during boot > and I don''t start any Xen guest on boot. So the system _was_ completely up > and idle. Still everytime I did the iscsi login (which currently logs in > to 12 volumes via 2 paths each) I ended up with a CPUs stuck for a minute > or something plainly wrong like this: >Too bad! And well... I had the same problems. Lately, I restarted a node of my xen cluster (Pacemaker/Corosync), and despite having disabled iscsi at boot, one of the CPU stucked on multipath. In the end, I removed even multipath (boot -tools and tools-boot) from the startup, and now it works. To make the iscsi/multipath combination works I have to start the multipath __after__ iscsi (and, obviously, after Xen is fully started up). The good news is that it seems stable. Let''s make a deal, the first to come up to a better solution will let the other know! bye and thanks for sharing davide - -- Dott. Davide Vaghetti Centro Servizi Informatici Facolta'' di Ingegneria Universita'' di Pisa PGP: http://keys.keysigning.org:11371/pks/lookup?op=get&search=0x7A1B3BA18C4E0A4D -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2i5NAACgkQehs7oYxOCk2WqwCg9ao9CuFA7KH0QKPN1bJ0VUsY TVgAoMMDWFi9lyIu2s3s4rPPaN0a78+k =BOba -----END PGP SIGNATURE----- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users