Hello-- Nir Gal wrote:> > I am using SUSE Enterprise Linux 8 with service pack 2 or 3. > Should I use the series call suse-2.4.21 or suse-2.4.21-2 ?You should use our pre-packaged kernel RPM, unless you have an exceptionally good reason to use something else.> Does any one have a good experience with Lustre on SUSE ?We do not currently supply a kernel patch for the SUSE Enterprise Linux 8, although there should be no problems with running the pre-packaged Lustre kernel from the web site. Each kernel patch that we maintain is a significant burden and time investment, so I''m afraid that we can''t do that without a support contract. If there are simple things that we can do to make that kernel more suitable -- i.e., include more drivers in the configuration -- just let us know. Thanks-- -Phil
Chris Samuel
2006-May-19 07:36 UTC
Frequent RPM kernel NFSd panic''s ? (was Re: [Lustre-discuss] SUSE SLES 8)
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 1 Apr 2004 05:45 am, Phil Schwan wrote:> You should use our pre-packaged kernel RPM, unless you have an > exceptionally good reason to use something else.I was trying it on our existing storage node (NFS) which I was hoping to make one of two Lustre servers, but I''ve had to drop back to a normal Redhat (or in this case Fedora Legacy as RH have dropped support for 7.3) kernel as we''ve seen a number of panic''s in NFSd. RPM version 2.4.20-28.9_lustre.1.0.4smp. They look like this : Apr 1 09:06:37 stgnode kernel: kernel BUG in header file at line 319 Apr 1 09:06:37 stgnode kernel: ------------[ cut here ]------------ Apr 1 09:06:37 stgnode kernel: kernel BUG at panic.c:313! Apr 1 09:06:37 stgnode kernel: invalid operand: 0000 Apr 1 09:06:37 stgnode kernel: nfsd autofs nfs lockd sunrpc e1000 gm usb-ohci usbcore ips sd_mod scsi_mod Apr 1 09:06:37 stgnode kernel: CPU: 0 Apr 1 09:06:37 stgnode kernel: EIP: 0060:[<c011ec71>] Tainted: P Apr 1 09:06:37 stgnode kernel: EFLAGS: 00010282 Apr 1 09:06:37 stgnode kernel: Apr 1 09:06:37 stgnode kernel: EIP is at __out_of_line_bug [kernel] 0x11 (2.4.20-28.9_lustre.1.0.4smp) Apr 1 09:06:37 stgnode kernel: eax: 00000026 ebx: e0439cc0 ecx: c03bec80 edx: f67ffd68 Apr 1 09:06:37 stgnode kernel: esi: f67ffe1c edi: e0439d23 ebp: f67ffdc4 esp: f67ffdbc Apr 1 09:06:37 stgnode kernel: ds: 0068 es: 0068 ss: 0068 Apr 1 09:06:37 stgnode kernel: Process nfsd (pid: 1255, stackpage=3Df67fd000) Apr 1 09:06:37 stgnode kernel: Stack: c0285180 0000013f f67ffdec c0164cc2 0000013f c015a2d5 c44762c0 f67ffe1c Apr 1 09:06:37 stgnode kernel: 00000000 00000000 c44762c0 dabeb100 f67ffe08 c015b708 c44762c0 f67ffe1c Apr 1 09:06:37 stgnode kernel: 00000000 001bc2a5 f58ac6b0 f67ffe34 c015b81b f67ffe1c c44762c0 00000000 Apr 1 09:06:37 stgnode kernel: Call Trace: [<c0164cc2>] d_alloc [kernel] 0x132 (0xf67ffdc8)) Apr 1 09:06:38 stgnode kernel: [<c015a2d5>] cached_lookup [kernel] 0x15 (0xf67ffdd0)) Apr 1 09:06:38 stgnode kernel: [<c015b708>] lookup_hash_it [kernel] 0x68 (0xf67ffdf0)) Apr 1 09:06:38 stgnode kernel: [<c015b81b>] lookup_one_len_it [kernel] 0x5b (0xf67ffe0c)) Apr 1 09:06:38 stgnode kernel: [<c015b846>] lookup_one_len [kernel] 0x16 (0xf67ffe38)) Apr 1 09:06:38 stgnode kernel: [<f9aeed80>] nfsd_lookup [nfsd] 0x350 (0xf67ffe50)) Apr 1 09:06:39 stgnode kernel: [<c011b1e5>] __wake_up [kernel] 0x45 (0xf67ffeb8)) Apr 1 09:06:39 stgnode kernel: [<f9ab680d>] svc_sock_enqueue [sunrpc] 0x18d (0xf67ffee0)) Apr 1 09:06:39 stgnode kernel: [<f9ab70dd>] svc_udp_recvfrom [sunrpc] 0x2ed (0xf67ffefc)) Apr 1 09:06:39 stgnode kernel: [<f9ab6b9c>] svc_sendto [sunrpc] 0x7c (0xf67fff04)) Apr 1 09:06:39 stgnode kernel: [<f9af5138>] nfsd3_proc_lookup [nfsd] 0xd8 (0xf67fff28)) Apr 1 09:06:39 stgnode kernel: [<f9afe6ac>] nfsd_procedures3 [nfsd] 0x6c (0xf67fff4c)) Apr 1 09:06:39 stgnode kernel: [<f9aeb598>] nfsd_dispatch [nfsd] 0xb8 (0xf67fff58)) Apr 1 09:06:39 stgnode kernel: [<f9ab6435>] svc_process_Rsmp_aee2c6cb [sunrpc] 0x3d5 (0xf67fff78)) Apr 1 09:06:39 stgnode kernel: [<f9afe6ac>] nfsd_procedures3 [nfsd] 0x6c (0xf67fff9c)) Apr 1 09:06:39 stgnode kernel: [<f9afdff8>] nfsd_version3 [nfsd] 0x0 (0xf67fffa0)) Apr 1 09:06:39 stgnode kernel: [<f9afe018>] nfsd_program [nfsd] 0x0 (0xf67fffa4)) Apr 1 09:06:39 stgnode kernel: [<f9aeb37f>] nfsd [nfsd] 0x1df (0xf67fffc0)) Apr 1 09:06:39 stgnode kernel: [<f9aeb1a0>] nfsd [nfsd] 0x0 (0xf67fffe0)) Apr 1 09:06:39 stgnode kernel: [<c0107545>] kernel_thread_helper [kernel] 0x5 (0xf67ffff0)) Apr 1 09:06:39 stgnode kernel: Apr 1 09:06:39 stgnode kernel: Apr 1 09:06:39 stgnode kernel: Code: 0f 0b 39 01 1d 60 28 c0 58 5a 90 8d 74 26 00 eb fe 8d b4 26 and here''s another: Mar 26 17:30:55 stgnode kernel: kernel BUG in header file at line 319 Mar 26 17:30:55 stgnode kernel: ------------[ cut here ]------------ Mar 26 17:30:55 stgnode kernel: kernel BUG at panic.c:313! Mar 26 17:30:55 stgnode kernel: invalid operand: 0000 Mar 26 17:30:55 stgnode kernel: qla2300_conf qla2300 nfsd autofs nfs lockd sunrpc e1000 gm usb-ohci usbcore ips sd_mod scsi_mod Mar 26 17:30:55 stgnode kernel: CPU: 1 Mar 26 17:30:55 stgnode kernel: EIP: 0060:[<c011ec71>] Tainted: P Mar 26 17:30:55 stgnode kernel: EFLAGS: 00010282 Mar 26 17:30:55 stgnode kernel: Mar 26 17:30:55 stgnode kernel: EIP is at __out_of_line_bug [kernel] 0x11 (2.4.20-28.9_lustre.1.0.4smp) Mar 26 17:30:55 stgnode kernel: eax: 00000026 ebx: f7fdd240 ecx: c03bf640 edx: c7433d68 Mar 26 17:30:55 stgnode kernel: esi: c7433e1c edi: f7fdd2a3 ebp: c7433dc4 esp: c7433dbc Mar 26 17:30:55 stgnode kernel: ds: 0068 es: 0068 ss: 0068 Mar 26 17:30:56 stgnode kernel: Process nfsd (pid: 1276, stackpage=3Dc7431000) Mar 26 17:30:56 stgnode kernel: Stack: c0285180 0000013f c7433dec c0164cc2 0000013f c015a2d5 f3f70a40 c7433e1c Mar 26 17:30:56 stgnode kernel: 00000000 00000000 f3f70a40 e4800c80 c7433e08 c015b708 f3f70a40 c7433e1c Mar 26 17:30:57 stgnode kernel: 00000000 0023605a c0ff5eb0 c7433e34 c015b81b c7433e1c f3f70a40 00000000 Mar 26 17:30:57 stgnode kernel: Call Trace: [<c0164cc2>] d_alloc [kernel] 0x132 (0xc7433dc8)) Mar 26 17:30:57 stgnode kernel: [<c015a2d5>] cached_lookup [kernel] 0x15 (0xc7433dd0)) Mar 26 17:30:58 stgnode kernel: [<c015b708>] lookup_hash_it [kernel] 0x68 (0xc7433df0)) Mar 26 17:30:58 stgnode kernel: [<c015b81b>] lookup_one_len_it [kernel] 0x5b (0xc7433e0c)) Mar 26 17:30:58 stgnode kernel: [<c015b846>] lookup_one_len [kernel] 0x16 (0xc7433e38)) Mar 26 17:30:58 stgnode kernel: [<f9aeed80>] nfsd_lookup [nfsd] 0x350 (0xc7433e50)) Mar 26 17:30:58 stgnode kernel: [<c011b1e5>] __wake_up [kernel] 0x45 (0xc7433eb8)) Mar 26 17:30:58 stgnode kernel: [<f9ab680d>] svc_sock_enqueue [sunrpc] 0x18d (0xc7433ee0)) Mar 26 17:30:58 stgnode kernel: [<f9ab70dd>] svc_udp_recvfrom [sunrpc] 0x2ed (0xc7433efc)) Mar 26 17:30:58 stgnode kernel: [<f9ab6b9c>] svc_sendto [sunrpc] 0x7c (0xc7433f04)) Mar 26 17:30:58 stgnode kernel: [<f9af5138>] nfsd3_proc_lookup [nfsd] 0xd8 (0xc7433f28)) Mar 26 17:30:59 stgnode kernel: [<f9afe6ac>] nfsd_procedures3 [nfsd] 0x6c (0xc7433f4c)) Mar 26 17:30:59 stgnode kernel: [<f9aeb598>] nfsd_dispatch [nfsd] 0xb8 (0xc7433f58)) Mar 26 17:30:59 stgnode kernel: [<f9ab6435>] svc_process_Rsmp_aee2c6cb [sunrpc] 0x3d5 (0xc7433f78)) Mar 26 17:30:59 stgnode kernel: [<f9afe6ac>] nfsd_procedures3 [nfsd] 0x6c (0xc7433f9c)) Mar 26 17:31:00 stgnode kernel: [<f9afdff8>] nfsd_version3 [nfsd] 0x0 (0xc7433fa0)) Mar 26 17:31:00 stgnode kernel: [<f9afe018>] nfsd_program [nfsd] 0x0 (0xc7433fa4)) Mar 26 17:31:00 stgnode kernel: [<f9aeb37f>] nfsd [nfsd] 0x1df (0xc7433fc0)) Mar 26 17:31:00 stgnode kernel: [<f9afdfe0>] nfsd_list [nfsd] 0x0 (0xc7433fd4)) Mar 26 17:31:00 stgnode kernel: [<f9aeb1a0>] nfsd [nfsd] 0x0 (0xc7433fe0)) Mar 26 17:31:00 stgnode kernel: [<c0107545>] kernel_thread_helper [kernel] 0x5 (0xc7433ff0)) Mar 26 17:31:00 stgnode kernel: Mar 26 17:31:00 stgnode kernel: Mar 26 17:31:00 stgnode kernel: Code: 0f 0b 39 01 1d 60 28 c0 58 5a 90 8d 74 26 00 eb fe 8d b4 26 At that point the nfsd''s on the server get wedged into disk waits, the cluster stops and I have to reboot the storage node with the Big Red Switch.. :-( Prior to this we''d had one hang in almost a year of operation, but with the Lustre RPM kernel we''ve had a hang and those two panics in two weeks. I took the opportunity of rebooting for a new kernel to whip out the FC card in case my boss allows me to continue trialing Lustre on another box to see if it''s worth using here. Any ideas ? =2D --=20 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAa6NtO2KABBYQAh8RAnI2AJ0Sx9wdNeJ5lXhcDjYltK39qkwEigCdH4xO nqTFPL565eAPEj7nwlV5uRw=3D =3DI8wi =2D----END PGP SIGNATURE-----