Christian Gajan
2008-Apr-23 13:06 UTC
[Lustre-discuss] configuring lustre network get kernel panic
Hi, I try to configure luste 1.6.4.3 + OFED 1.2.5.5 + RHEL5u1 (2.6.18-53.1.13) compilation and installation steps are ok - build kernel 2.6.18-53.1.13 + lustre patch OK - boot with new kernel - build OFED 1.2.5.5 with new kernel - install OFED - boot again with ofed drivers - build lustre 1.6.4.3 rpm with-o2ib=/usr/src/ofa-kernel-1.2.5.5 with-linux=/usr/src/linux-2.6.18-53.1.13.el5.lustre-1.6.4.3 - install lustre rpm 1- lustre-ldiskfs 2- lustre-module 3- lustre without any warning When I begin to configure my lustre I get a kernel panic # ifconfig ib1 ib1 Link encap:InfiniBand HWaddr 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:192.168.1.16 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::203:ba00:100:5142/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:42 errors:0 dropped:2 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:336 (336.0 b) TX bytes:8186 (7.9 KiB) # cat /etc/modprobe.conf ... options lnet ip2nets="o2ib0(ib1) 192.168.1.[16-19] # modprobe lnet # lctl network configure ko2iblnd: no version for "ib_fmr_pool_unmap" found: kernel tainted. general protection fault: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 1 Modules linked in: ko2iblnd(U) lnet(U) libcfs(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_uverbs(U) ib_umad(U) ib_ipath(U) mlx4_ib(U) mlx4_core(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6(U) dm_mirror(U) dm_multipath(U) dm_mod(U) video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) qla2xxx(U) shpchp(U) scsi_transport_fc(U) ide_cd(U) cdrom(U) forcedeth(U) i2c_nforce2(U) i2c_core(U) k8temp(U) hwmon(U) tg3(U) k8_edac(U) edac_mc(U) serio_raw(U) ib_mthca(U) ib_mad(U) ib_core(U) pcspkr(U) sg(U) sata_nv(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) Pid: 11797, comm: lctl Tainted: GF 2.6.18-53.1.13.el5_lustre.1.6.4.3.v2 #1 RIP: 0010:[<ffffffff88703f1a>] [<ffffffff88703f1a>] :ko2iblnd:kiblnd_map_tx_descs+0xea/0x180 RSP: 0000:ffff81022ba85808 EFLAGS: 00010282 RAX: ffffffff881446df RBX: ffffc20000071000 RCX: 0000000000000001 RDX: 0000000000001000 RSI: ffff81022be20000 RDI: ffff81023fd95000 RBP: ffff81023b9fc640 R08: ffff81022ba84000 R09: 000000000000003f R10: ffff810107f60008 R11: 0000000000000100 R12: 0000000000000001 R13: 0000000000000001 R14: 0000000000000000 R15: ffff81023b9fc668 FS: 00002aaaaaaec360(0000) GS:ffff810107e99440(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000034e7e95770 CR3: 000000043d90e000 CR4: 00000000000006e0 Process lctl (pid: 11797, threadinfo ffff81022ba84000, task ffff81023a9247a0) Stack: ffff81023b9fc340 ffff81023f9fba00 ffff81023b9fc340 ffff81023b9fc640 ffff81023b9fc2c0 ffff81022c2c5746 ffff81022e2cdac0 ffffffff887075fa ffff81022ba85858 ffffffff886c7e0a 0000000000000000 ffffffff886f9018 Call Trace: [<ffffffff887075fa>] :ko2iblnd:kiblnd_startup+0x9fa/0xb10 [<ffffffff886c7e0a>] :lnet:lnet_trimwhite+0x2a/0x60 [<ffffffff886c5738>] :lnet:lnet_startup_lndnis+0x128/0x630 [<ffffffff88692fe8>] :libcfs:cfs_alloc+0x28/0x60 [<ffffffff886c635e>] :lnet:LNetNIInit+0xfe/0x1e9 [<ffffffff8009d458>] ktime_get_ts+0x1a/0x4e [<ffffffff800625bf>] __down_read+0x12/0x92 [<ffffffff800c360f>] zone_statistics+0x3e/0x6d [<ffffffff886d40b3>] :lnet:lnet_configure+0x33/0x60 [<ffffffff88698790>] :libcfs:libcfs_ioctl+0x490/0x550 [<ffffffff800c360f>] zone_statistics+0x3e/0x6d [<ffffffff8000afce>] __find_get_block+0x15c/0x16c [<ffffffff8011cf2e>] selinux_ipc_permission+0x0/0x2f [<ffffffff80128931>] constraint_expr_eval+0x298/0x45d [<ffffffff80128931>] constraint_expr_eval+0x298/0x45d [<ffffffff8012546c>] avtab_search_node+0x38/0x6a [<ffffffff80128d3f>] context_struct_compute_av+0x249/0x2ba [<ffffffff8011b7dc>] avc_alloc_node+0x3a/0x187 [<ffffffff8011bb31>] avc_has_perm_noaudit+0x208/0x36b [<ffffffff8011c85f>] avc_has_perm+0x43/0x55 [<ffffffff8011c85f>] avc_has_perm+0x43/0x55 [<ffffffff8011d396>] inode_has_perm+0x56/0x63 [<ffffffff88695b42>] :libcfs:libcfs_ioctl+0x142/0x170 [<ffffffff8011d437>] file_has_perm+0x94/0xa3 [<ffffffff80062bfd>] lock_kernel+0x1b/0x32 [<ffffffff8003fc46>] do_ioctl+0x55/0x6b [<ffffffff8002fc81>] vfs_ioctl+0x248/0x261 [<ffffffff8004a230>] sys_ioctl+0x59/0x78 [<ffffffff8005b28d>] tracesys+0xd5/0xe0 Code: ff 50 08 eb 18 90 48 8b 05 d9 16 d3 f7 b9 01 00 00 00 ba 00 RIP [<ffffffff88703f1a>] :ko2iblnd:kiblnd_map_tx_descs+0xea/0x180 RSP <ffff81022ba85808> <0>Kernel panic - not syncing: Fatal exception Any idea about a mistake in my procedure or any known issue ? regards christian -------------- next part -------------- A non-text attachment was scrubbed... Name: christian.gajan.vcf Type: text/x-vcard Size: 233 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080423/1cebef98/attachment-0002.vcf
Sébastien Buisson
2008-Apr-23 13:19 UTC
[Lustre-discuss] configuring lustre network get kernel panic
Hi Christian, We had the same problem a few months ago. See bugzilla 14988 for all the details, and comment #21 (https://bugzilla.lustre.org/show_bug.cgi?id=14988#c21) for the solution. Regards, Sebastien. Christian Gajan a ?crit :> Hi, > > I try to configure luste 1.6.4.3 + OFED 1.2.5.5 + RHEL5u1 (2.6.18-53.1.13) > > compilation and installation steps are ok > > - build kernel 2.6.18-53.1.13 + lustre patch OK > - boot with new kernel > - build OFED 1.2.5.5 with new kernel > - install OFED > - boot again with ofed drivers > - build lustre 1.6.4.3 rpm with-o2ib=/usr/src/ofa-kernel-1.2.5.5 > with-linux=/usr/src/linux-2.6.18-53.1.13.el5.lustre-1.6.4.3 > - install lustre rpm 1- lustre-ldiskfs 2- lustre-module 3- lustre > without any warning > > When I begin to configure my lustre I get a kernel panic > > # ifconfig ib1 > ib1 Link encap:InfiniBand HWaddr > 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 > inet addr:192.168.1.16 Bcast:192.168.1.255 Mask:255.255.255.0 > inet6 addr: fe80::203:ba00:100:5142/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 > RX packets:6 errors:0 dropped:0 overruns:0 frame:0 > TX packets:42 errors:0 dropped:2 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:336 (336.0 b) TX bytes:8186 (7.9 KiB) > # cat /etc/modprobe.conf > ... > options lnet ip2nets="o2ib0(ib1) 192.168.1.[16-19] > # modprobe lnet > # lctl network configure > ko2iblnd: no version for "ib_fmr_pool_unmap" found: kernel tainted. > general protection fault: 0000 [1] SMP last sysfs file: > /devices/pci0000:00/0000:00:00.0/irq > CPU 1 Modules linked in: ko2iblnd(U) lnet(U) libcfs(U) nfs(U) lockd(U) > fscache(U) nfs_acl(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) > sunrpc(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) > ib_uverbs(U) ib_umad(U) ib_ipath(U) mlx4_ib(U) mlx4_core(U) ib_ipoib(U) > ib_cm(U) ib_sa(U) ipv6(U) dm_mirror(U) dm_multipath(U) dm_mod(U) > video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) > acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) qla2xxx(U) > shpchp(U) scsi_transport_fc(U) ide_cd(U) cdrom(U) forcedeth(U) > i2c_nforce2(U) i2c_core(U) k8temp(U) hwmon(U) tg3(U) k8_edac(U) > edac_mc(U) serio_raw(U) ib_mthca(U) ib_mad(U) ib_core(U) pcspkr(U) sg(U) > sata_nv(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) > ohci_hcd(U) uhci_hcd(U) > Pid: 11797, comm: lctl Tainted: GF > 2.6.18-53.1.13.el5_lustre.1.6.4.3.v2 #1 > RIP: 0010:[<ffffffff88703f1a>] [<ffffffff88703f1a>] > :ko2iblnd:kiblnd_map_tx_descs+0xea/0x180 > RSP: 0000:ffff81022ba85808 EFLAGS: 00010282 > RAX: ffffffff881446df RBX: ffffc20000071000 RCX: 0000000000000001 > RDX: 0000000000001000 RSI: ffff81022be20000 RDI: ffff81023fd95000 > RBP: ffff81023b9fc640 R08: ffff81022ba84000 R09: 000000000000003f > R10: ffff810107f60008 R11: 0000000000000100 R12: 0000000000000001 > R13: 0000000000000001 R14: 0000000000000000 R15: ffff81023b9fc668 > FS: 00002aaaaaaec360(0000) GS:ffff810107e99440(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00000034e7e95770 CR3: 000000043d90e000 CR4: 00000000000006e0 > Process lctl (pid: 11797, threadinfo ffff81022ba84000, task > ffff81023a9247a0) > Stack: ffff81023b9fc340 ffff81023f9fba00 ffff81023b9fc340 ffff81023b9fc640 > ffff81023b9fc2c0 ffff81022c2c5746 ffff81022e2cdac0 ffffffff887075fa > ffff81022ba85858 ffffffff886c7e0a 0000000000000000 ffffffff886f9018 > Call Trace: > [<ffffffff887075fa>] :ko2iblnd:kiblnd_startup+0x9fa/0xb10 > [<ffffffff886c7e0a>] :lnet:lnet_trimwhite+0x2a/0x60 > [<ffffffff886c5738>] :lnet:lnet_startup_lndnis+0x128/0x630 > [<ffffffff88692fe8>] :libcfs:cfs_alloc+0x28/0x60 > [<ffffffff886c635e>] :lnet:LNetNIInit+0xfe/0x1e9 > [<ffffffff8009d458>] ktime_get_ts+0x1a/0x4e > [<ffffffff800625bf>] __down_read+0x12/0x92 > [<ffffffff800c360f>] zone_statistics+0x3e/0x6d > [<ffffffff886d40b3>] :lnet:lnet_configure+0x33/0x60 > [<ffffffff88698790>] :libcfs:libcfs_ioctl+0x490/0x550 > [<ffffffff800c360f>] zone_statistics+0x3e/0x6d > [<ffffffff8000afce>] __find_get_block+0x15c/0x16c > [<ffffffff8011cf2e>] selinux_ipc_permission+0x0/0x2f > [<ffffffff80128931>] constraint_expr_eval+0x298/0x45d > [<ffffffff80128931>] constraint_expr_eval+0x298/0x45d > [<ffffffff8012546c>] avtab_search_node+0x38/0x6a > [<ffffffff80128d3f>] context_struct_compute_av+0x249/0x2ba > [<ffffffff8011b7dc>] avc_alloc_node+0x3a/0x187 > [<ffffffff8011bb31>] avc_has_perm_noaudit+0x208/0x36b > [<ffffffff8011c85f>] avc_has_perm+0x43/0x55 > [<ffffffff8011c85f>] avc_has_perm+0x43/0x55 > [<ffffffff8011d396>] inode_has_perm+0x56/0x63 > [<ffffffff88695b42>] :libcfs:libcfs_ioctl+0x142/0x170 > [<ffffffff8011d437>] file_has_perm+0x94/0xa3 > [<ffffffff80062bfd>] lock_kernel+0x1b/0x32 > [<ffffffff8003fc46>] do_ioctl+0x55/0x6b > [<ffffffff8002fc81>] vfs_ioctl+0x248/0x261 > [<ffffffff8004a230>] sys_ioctl+0x59/0x78 > [<ffffffff8005b28d>] tracesys+0xd5/0xe0 > > > Code: ff 50 08 eb 18 90 48 8b 05 d9 16 d3 f7 b9 01 00 00 00 ba 00 RIP > [<ffffffff88703f1a>] :ko2iblnd:kiblnd_map_tx_descs+0xea/0x180 > RSP <ffff81022ba85808> > <0>Kernel panic - not syncing: Fatal exception > > > Any idea about a mistake in my procedure > or any known issue ? > > regards > > christian > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss