David Levi Hevroni
2008-Aug-04 17:59 UTC
[Lustre-discuss] Lustre 1.6.5.1 client with kernel 2.6.22.14
I see few weeks ago some discuses about using kernel 2.6.22.14. We tried using this kernel with lustre 1.6.5.1 and OFED-1.3. When we mount lustre vi tcp it''s look fine, but when we mount via IB we get "general protection fault", some details: CentOS-5.2 with patchless vanilla kernel 2.6.22.14 (we also try the same kernel with luster 1.6.5.1 patch but it not so different ) install OFED-1.3 after reboot it look O.K we test ib_send_bw / lat and it look fine. Next install lustre 1.6.5.1, "./configure --with-linux=/usr/src/linux-2.6.22.14 " and reboot the system. We modify /etc/modprobe.conf: #lustre setting options lnet networks=tcp then: mount -t lustre 192.168.1.20 at tcp:/spfs /mnt/lustrefs and it look fine. When we add IB by modify /etc/modprobe.conf: options lnet networks=o2ib,tcp reboot the system we had the following error: $modprobe lnet O.K lctl lctl > network up Message from syslogd@ at Tue Aug 5 00:35:30 2008 ... grid06 kernel: general protection fault: 0000 [1] SMP Segmentation fault and when we look at /var/log/messages: Aug 5 00:35:30 grid06 kernel: general protection fault: 0000 [1] SMP Aug 5 00:35:30 grid06 kernel: CPU 1 Aug 5 00:35:30 grid06 kernel: Modules linked in: ko2iblnd rdma_cm iw_cm ib_addr lnet libcfs ib_uverbs ib_umad cxgb3 ib_ipath mlx4_ib mlx4_core ib_ipoib ib_cm ib_sa ib_mthca ib_mad ib_core Aug 5 00:35:30 grid06 kernel: Pid: 7723, comm: lctl Not tainted 2.6.22.14 #1 Aug 5 00:35:30 grid06 kernel: RIP: 0010:[<ffffffff88161eea>] [<ffffffff88161eea>] :ko2iblnd:kiblnd_map_tx_descs+0x4a/0x160 Aug 5 00:35:30 grid06 kernel: RSP: 0000:ffff81010514f818 EFLAGS: 00010286 Aug 5 00:35:30 grid06 kernel: RAX: ffffffff8802a695 RBX: ffffc200014ae000 RCX: 0000000000000001 Aug 5 00:35:30 grid06 kernel: RDX: 0000000000001000 RSI: ffff8100f1d3a000 RDI: ffff81007d632000 Aug 5 00:35:30 grid06 kernel: RBP: ffff8101056b37c0 R08: 0000000000000000 R09: ffff81010514f798 Aug 5 00:35:30 grid06 kernel: R10: ffff81010514f7df R11: 0000000000003a98 R12: 0000000000000001 Aug 5 00:35:30 grid06 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: ffff8101056b37e8 Aug 5 00:35:30 grid06 kernel: FS: 00002b3ae29f86e0(0000) GS:ffff810105d4b740(0000) knlGS:00000000f7d576c0 Aug 5 00:35:30 grid06 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 5 00:35:30 grid06 kernel: CR2: 00002ab6525776d0 CR3: 00000000f2e93000 CR4: 00000000000006e0 Aug 5 00:35:30 grid06 kernel: Process lctl (pid: 7723, threadinfo ffff81010514e000, task ffff8100f5db2040) Aug 5 00:35:30 grid06 kernel: Stack: ffff8100f178e7c0 ffff810105d69600 ffffffff8817b788 ffff8101056b37c0 Aug 5 00:35:30 grid06 kernel: ffff8101056b3e40 ffffffff88175558 ffff81010512f980 ffffffff88166eb3 Aug 5 00:35:30 grid06 kernel: ffffffff8810641f 0000000000000002 0000000000000000 0000000000000001 Aug 5 00:35:30 grid06 kernel: Call Trace: Aug 5 00:35:30 grid06 kernel: [<ffffffff88166eb3>] :ko2iblnd:kiblnd_startup+0x2d3/0xa20 Aug 5 00:35:30 grid06 kernel: [<ffffffff8811a9f9>] :lnet:lnet_startup_lndnis+0xc9/0x6a0 Aug 5 00:35:30 grid06 kernel: [<ffffffff880fc0c8>] :libcfs:cfs_alloc+0x28/0x60 Aug 5 00:35:30 grid06 kernel: [<ffffffff8811b785>] :lnet:LNetNIInit+0x145/0x210 Aug 5 00:35:30 grid06 kernel: [<ffffffff8064a9c4>] __down_read+0x12/0x9a Aug 5 00:35:30 grid06 kernel: [<ffffffff8812a15a>] :lnet:lnet_configure+0x4a/0x60 Aug 5 00:35:30 grid06 kernel: [<ffffffff8810145a>] :libcfs:libcfs_ioctl+0xba/0x5b0 Aug 5 00:35:30 grid06 kernel: [<ffffffff8063a056>] xs_send_kvec+0x80/0x89 Aug 5 00:35:30 grid06 kernel: [<ffffffff80638e55>] xprt_timer+0x0/0x7f Aug 5 00:35:30 grid06 kernel: [<ffffffff8063cbd0>] rpc_wake_up_next+0x15c/0x163 Aug 5 00:35:30 grid06 kernel: [<ffffffff80638779>] __xprt_lock_write_next_cong+0x48/0x90 Aug 5 00:35:30 grid06 kernel: [<ffffffff802299b0>] find_busiest_group+0x252/0x684 Aug 5 00:35:30 grid06 kernel: [<ffffffff8064b08a>] __reacquire_kernel_lock+0x26/0x44 Aug 5 00:35:30 grid06 kernel: [<ffffffff80649456>] thread_return+0xac/0xe4 Aug 5 00:35:30 grid06 kernel: [<ffffffff8028a129>] __d_lookup+0xb0/0x100 Aug 5 00:35:30 grid06 kernel: [<ffffffff80281464>] do_lookup+0x63/0x1ae Aug 5 00:35:30 grid06 kernel: [<ffffffff8028a5e5>] dput+0x26/0x115 Aug 5 00:35:30 grid06 kernel: [<ffffffff802836cf>] __link_path_walk+0xb9b/0xcf0 Aug 5 00:35:30 grid06 kernel: [<ffffffff80390975>] n_tty_chars_in_buffer+0x68/0x70 Aug 5 00:35:30 grid06 kernel: [<ffffffff80242810>] remove_wait_queue+0x12/0x45 Aug 5 00:35:30 grid06 kernel: [<ffffffff8028e42a>] mntput_no_expire+0x1c/0x79 Aug 5 00:35:30 grid06 kernel: [<ffffffff802838f2>] link_path_walk+0xce/0xe0 Aug 5 00:35:30 grid06 kernel: [<ffffffff8023aefb>] recalc_sigpending_and_wake+0x9/0x1a Aug 5 00:35:30 grid06 kernel: [<ffffffff80351863>] __strncpy_from_user+0x17/0x41 Aug 5 00:35:30 grid06 kernel: [<ffffffff880fc0c8>] :libcfs:cfs_alloc+0x28/0x60 Aug 5 00:35:30 grid06 kernel: [<ffffffff88100acd>] :libcfs:libcfs_psdev_open+0x6d/0x2c0 Aug 5 00:35:30 grid06 kernel: [<ffffffff8027cf71>] exact_lock+0xc/0x14 Aug 5 00:35:30 grid06 kernel: [<ffffffff80649f1c>] mutex_lock+0xd/0x1e Aug 5 00:35:30 grid06 kernel: [<ffffffff80393915>] misc_open+0x1b5/0x1c0 Aug 5 00:35:30 grid06 kernel: [<ffffffff8027d435>] chrdev_open+0x167/0x196 Aug 5 00:35:30 grid06 kernel: [<ffffffff880fec0f>] :libcfs:libcfs_ioctl+0xaf/0x160 Aug 5 00:35:30 grid06 kernel: [<ffffffff8022b7f9>] default_wake_function+0x0/0xe Aug 5 00:35:30 grid06 kernel: [<ffffffff8064b032>] lock_kernel+0x1b/0x37 Aug 5 00:35:30 grid06 kernel: [<ffffffff880feb60>] :libcfs:libcfs_ioctl+0x0/0x160 Aug 5 00:35:30 grid06 kernel: [<ffffffff802858cd>] do_ioctl+0x9d/0xb6 Aug 5 00:35:30 grid06 kernel: [<ffffffff80285b29>] vfs_ioctl+0x243/0x25c Aug 5 00:35:30 grid06 kernel: [<ffffffff80285b7e>] sys_ioctl+0x3c/0x5e Aug 5 00:35:30 grid06 kernel: [<ffffffff8020935e>] system_call+0x7e/0x83 Aug 5 00:35:30 grid06 kernel: Aug 5 00:35:30 grid06 kernel: Aug 5 00:35:30 grid06 kernel: Code: ff 50 08 48 89 43 50 48 8b 45 28 48 89 58 08 48 89 03 4c 89 Aug 5 00:35:30 grid06 kernel: RIP [<ffffffff88161eea>] :ko2iblnd:kiblnd_map_tx_descs+0x4a/0x160 Aug 5 00:35:30 grid06 kernel: RSP <ffff81010514f818> Is there is something wrong in the our configuration ? Thanks David Levi-Hevroni Papp Tamas tompos at martos.bme.hu Tue Jun 17 12:36:29 PDT 2008 * Previous message: [Lustre-discuss] 2.6.22 * Next message: [Lustre-discuss] MGS disk size and activity * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] Bernd Schubert wrote:> > Yeah, this is what I immediately thought when I saw your trace. The kernel > developer somehow manage to change the interface to the cache functions > on each kernel version (though not during the last digit subversions) > The trace lets me thing these functions have been called with the wrong > arguments. However, lustre already has wrapper functions for this and > I guess the configure script did something wrong this time. > Unless the lustre developers step in, I will try to find some time > tomorrow or on Thursday to check what''s wrong.Well, thank you very much. Have somebody else tried 2.6.22 and lustre? Bye, tamas
Brian J. Murrell
2008-Aug-06 14:36 UTC
[Lustre-discuss] Lustre 1.6.5.1 client with kernel 2.6.22.14
On Mon, 2008-08-04 at 20:59 +0300, David Levi Hevroni wrote:> I see few weeks ago some discuses about using kernel 2.6.22.14. We > tried using this kernel with lustre 1.6.5.1 and OFED-1.3. When we > mount lustre vi tcp it''s look fine, but when we mount via IB we get > "general protection fault", some details:Can you file a bug about this in our bugzilla please? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080806/c330f2de/attachment.bin