I tried hard to get the fc5 patches to work (not on FC5, though, although I did try with 2.6.16-rc6-git3+tux, which is basically what FC5 uses, minus dozens of miscellaneous patches); the patches apply well, but the result oopses frequently. Using 1.5.91 with the same 2.6.12.6 kernel that was rock solid with Lustre 1.5.90 also seems to be rather unstable. It doesn''t oops as much, but it seems to fail to recover properly from server shutdowns. Switching everything back to 1.5.90 works like a charm. Is anyone having much luck with Lustre 1.5.91? Thanks, Brent PS I''m using Ubuntu Dapper. To get that to work, I used the vanilla 2.6.12 kernel patches. Dapper hates 2.6.12 (it breaks udev, which now requires features only in 2.6.15+), but you can get it to work using the yaird initramfs tool (add "ramdisk = /usr/sbin/mkinitrd.yaird" to /etc/kernel-img.conf to automatically generate if you use make-kpkg) and putting critical modules (such as e1000, in my case) in /etc/modules. "apt-get install gcc-3.4" to compile your kernel and lustre. 1.5.91 seems unstable, but 1.5.90 seems completely solid (so far). The TCP zero copy patch in lustre 1.5.91 should be fine to use for 1.5.90. Let me know if you need any other tips to run on Dapper. You might be able to use Lustre 1.5.91 patchless client on machines that don''t need to be servers; I haven''t tried that, yet.
Hi, Brent Which gcc version did you use? I suggest you use gcc-3.2, or 3.3 when you try FC5 + lustre. You should try lustre with kernel-2.6.15-1.2054_FC5. The FC5 patches in 1.5.91 was created based on this kernel. thanks wangdi Brent A Nelson wr> I tried hard to get the fc5 patches to work (not on FC5, though, > although I did try with 2.6.16-rc6-git3+tux, which is basically what > FC5 uses, minus dozens of miscellaneous patches); the patches apply > well, but the result oopses frequently. > > Using 1.5.91 with the same 2.6.12.6 kernel that was rock solid with > Lustre 1.5.90 also seems to be rather unstable. It doesn''t oops as > much, but it seems to fail to recover properly from server shutdowns. > Switching everything back to 1.5.90 works like a charm. > > Is anyone having much luck with Lustre 1.5.91? > > Thanks, > > Brent > > PS I''m using Ubuntu Dapper. To get that to work, I used the vanilla > 2.6.12 kernel patches. Dapper hates 2.6.12 (it breaks udev, which now > requires features only in 2.6.15+), but you can get it to work using > the yaird initramfs tool (add "ramdisk = /usr/sbin/mkinitrd.yaird" to > /etc/kernel-img.conf to automatically generate if you use make-kpkg) > and putting critical modules (such as e1000, in my case) in > /etc/modules. "apt-get install gcc-3.4" to compile your kernel and > lustre. 1.5.91 seems unstable, but 1.5.90 seems completely solid (so > far). The TCP zero copy patch in lustre 1.5.91 should be fine to use > for 1.5.90. Let me know if you need any other tips to run on Dapper. > You might be able to use Lustre 1.5.91 patchless client on machines > that don''t need to be servers; I haven''t tried that, yet. > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
I used gcc-3.4 (which worked fine with Lustre 1.5.90) and then gcc-3.3, which didn''t seem to make a difference. I used kernel-2.6.15-1.2054_FC5''s 2.6.15 plus 2.6.16-rc6 plus 2.6.16-rc6-git3 plus the tux patch (but ot the rest of the patches or the Xen patches). The Lustre patches applied cleanly after that, except the TCP-zero-copy patch seemed to have an unnecessary duplication, and I seemed to need fsprivate-2.6.patch. I had also tried with a vanilla 2.6.16.5+tux and 2.6.16+tux, with extremely similar results. Perhaps this just wasn''t close enough to FC5, or ??? Thanks, Brent On Thu, 6 Jul 2006, wangdi wrote:> Hi, Brent > > Which gcc version did you use? I suggest you use gcc-3.2, or 3.3 when you try > FC5 + lustre. > You should try lustre with kernel-2.6.15-1.2054_FC5. The FC5 patches in > 1.5.91 was created based on this kernel. > > thanks > wangdi > > > Brent A Nelson wr >> I tried hard to get the fc5 patches to work (not on FC5, though, although I >> did try with 2.6.16-rc6-git3+tux, which is basically what FC5 uses, minus >> dozens of miscellaneous patches); the patches apply well, but the result >> oopses frequently. >> >> Using 1.5.91 with the same 2.6.12.6 kernel that was rock solid with Lustre >> 1.5.90 also seems to be rather unstable. It doesn''t oops as much, but it >> seems to fail to recover properly from server shutdowns. Switching >> everything back to 1.5.90 works like a charm. >> >> Is anyone having much luck with Lustre 1.5.91? >> >> Thanks, >> >> Brent >> >> PS I''m using Ubuntu Dapper. To get that to work, I used the vanilla 2.6.12 >> kernel patches. Dapper hates 2.6.12 (it breaks udev, which now requires >> features only in 2.6.15+), but you can get it to work using the yaird >> initramfs tool (add "ramdisk = /usr/sbin/mkinitrd.yaird" to >> /etc/kernel-img.conf to automatically generate if you use make-kpkg) and >> putting critical modules (such as e1000, in my case) in /etc/modules. >> "apt-get install gcc-3.4" to compile your kernel and lustre. 1.5.91 seems >> unstable, but 1.5.90 seems completely solid (so far). The TCP zero copy >> patch in lustre 1.5.91 should be fine to use for 1.5.90. Let me know if >> you need any other tips to run on Dapper. You might be able to use Lustre >> 1.5.91 patchless client on machines that don''t need to be servers; I >> haven''t tried that, yet. >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss@clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Hi, Brent I just tried lustre-1.5.91 + kernel 2.6.15-1.2054_FC5 + gcc-3.3. It works fine here. What kind of oops did you find? How to reproduce it? Could you please post here? Btw: fsprivate-2.6.patch is not needed for FC5 kernel, but you should apply this patch. thanks wangdi Brent A Nelson wrote:> I used gcc-3.4 (which worked fine with Lustre 1.5.90) and then > gcc-3.3, which didn''t seem to make a difference. I used > kernel-2.6.15-1.2054_FC5''s 2.6.15 plus 2.6.16-rc6 plus 2.6.16-rc6-git3 > plus the tux patch (but ot the rest of the patches or the Xen > patches). The Lustre patches applied cleanly after that, except the > TCP-zero-copy patch seemed to have an unnecessary duplication, and I > seemed to need fsprivate-2.6.patch. > > I had also tried with a vanilla 2.6.16.5+tux and 2.6.16+tux, with > extremely similar results. > > Perhaps this just wasn''t close enough to FC5, or ??? > > Thanks, > > Brent > > On Thu, 6 Jul 2006, wangdi wrote: > >> Hi, Brent >> >> Which gcc version did you use? I suggest you use gcc-3.2, or 3.3 when >> you try FC5 + lustre. >> You should try lustre with kernel-2.6.15-1.2054_FC5. The FC5 patches >> in 1.5.91 was created based on this kernel. >> >> thanks >> wangdi >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >-------------- next part -------------- --- lustre/llite/llite_internal.h.bak 2006-07-06 16:04:13.000000000 +0800 +++ lustre/llite/llite_internal.h 2006-07-06 18:56:13.000000000 +0800 @@ -34,7 +34,8 @@ struct lustre_intent_data { #endif #define LL_IT2STR(it) ((it) ? ldlm_it2str((it)->it_op) : "0") -#if !defined(LUSTRE_KERNEL_VERSION) || (LUSTRE_KERNEL_VERSION < 46) +#if !defined(LUSTRE_KERNEL_VERSION) || (LUSTRE_KERNEL_VERSION < 46 \ + || LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,15)) #define LUSTRE_FPRIVATE(file) ((file)->private_data) #else #if (LUSTRE_KERNEL_VERSION < 46)
I just built a kernel with all FC5 non-xen patches applied and compiled it with gcc-3.3. I applied your patch and compiled 1.5.91 with gcc-3.3 (and I didn''t need the fsprivate-2.6.patch). It still has the recovery problems I saw previously. It seems that if only one client connects while Lustre is in its recovery window, the recovery never truly completes (i.e., /proc/fs/lustre/obdfilter/lustre-OST*/recovery_status still says recovering even after Lustre logs a recovery abort from timeout). I umounted my client and remounted it, which worked for a moment and the OSTs claimed recovery was complete, but then the MDT node claimed that it hadn''t heard from the client in 231s and evicted it! Unmounting and remounting the client again, the MDT oopsed (see attached)! Thanks, Brent On Thu, 6 Jul 2006, wangdi wrote:> Hi, Brent > > I just tried lustre-1.5.91 + kernel 2.6.15-1.2054_FC5 + gcc-3.3. It works > fine here. > > What kind of oops did you find? How to reproduce it? Could you please post > here? > > Btw: fsprivate-2.6.patch is not needed for FC5 kernel, but you should apply > this patch. > > thanks > wangdi > > Brent A Nelson wrote: >> I used gcc-3.4 (which worked fine with Lustre 1.5.90) and then gcc-3.3, >> which didn''t seem to make a difference. I used kernel-2.6.15-1.2054_FC5''s >> 2.6.15 plus 2.6.16-rc6 plus 2.6.16-rc6-git3 plus the tux patch (but ot the >> rest of the patches or the Xen patches). The Lustre patches applied >> cleanly after that, except the TCP-zero-copy patch seemed to have an >> unnecessary duplication, and I seemed to need fsprivate-2.6.patch. >> >> I had also tried with a vanilla 2.6.16.5+tux and 2.6.16+tux, with extremely >> similar results. >> >> Perhaps this just wasn''t close enough to FC5, or ??? >> >> Thanks, >> >> Brent >> >> On Thu, 6 Jul 2006, wangdi wrote: >> >>> Hi, Brent >>> >>> Which gcc version did you use? I suggest you use gcc-3.2, or 3.3 when you >>> try FC5 + lustre. >>> You should try lustre with kernel-2.6.15-1.2054_FC5. The FC5 patches in >>> 1.5.91 was created based on this kernel. >>> >>> thanks >>> wangdi >>> >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss@clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > >-------------- next part -------------- ll_mdt_06 S F8C63580 5864 21001 1 21002 21000 (L-TLB) Using defaults from ksymoops -t elf32-i386 -a i386 ea8e98f8 e7f161a2 e7f278e0 f8c63580 00000286 ea8e9900 c02ff769 c18ca520 00000530 0fe72612 000004f1 c0376ba0 eaf6ec30 eaf6ed58 000000fa 00139c80 00139c80 ffffffff c02fed1d c18cb378 c18cb378 00139c80 c012c87f eaf6ec30 Call Trace: [<c02ff769>] _spin_lock_irqsave+0x9/0xd [<c02fed1d>] schedule_timeout+0xad/0xc9 [<c012c87f>] process_timeout+0x0/0x5 [<fb095470>] ptlrpc_set_wait+0x3cf/0x607 [ptlrpc] [<c011f245>] default_wake_function+0x0/0xc [<fb094b55>] ptlrpc_expired_set+0x0/0x1b5 [ptlrpc] [<fb094d27>] ptlrpc_interrupted_set+0x0/0x1da [ptlrpc] [<fb094b55>] ptlrpc_expired_set+0x0/0x1b5 [ptlrpc] [<fb094d27>] ptlrpc_interrupted_set+0x0/0x1da [ptlrpc] [<faa0cd5a>] lov_create+0xb4a/0x1573 [lov] [<f8c452d6>] cfs_alloc+0x3e/0x67 [libcfs] [<faa269a6>] lov_alloc_memmd+0x151/0x944 [lov] [<faa2854e>] lov_setstripe+0x783/0x8e5 [lov] [<faa20a29>] lov_iocontrol+0xc51/0x18b1 [lov] [<fa82088e>] mds_create_objects+0x3b9c/0x7684 [mds] [<fa8256a0>] mds_finish_open+0x3c0/0xa96 [mds] [<fa82b90a>] mds_open+0x4d59/0x5567 [mds] [<f8c6d111>] entry_set_group_info+0x248/0x590 [lvfs] [<f8c6d7f7>] upcall_cache_get_entry+0x39e/0xc07 [lvfs] [<fa80e8dd>] mds_reint_rec+0x1c3/0x27d [mds] [<fb0abccc>] lustre_msg_string+0x7c/0x784 [ptlrpc] [<fa8317fa>] mds_open_unpack+0x3b3/0x44f [mds] [<fa7dcf30>] mds_reint+0x700/0x7e1 [mds] [<fa7f0f15>] mds_intent_policy+0xb50/0x13d5 [mds] [<fb07267f>] ldlm_handle_enqueue+0x211d/0x4b9e [ptlrpc] [<fa7f03c5>] mds_intent_policy+0x0/0x13d5 [mds] [<fb036c7b>] ldlm_lock_enqueue+0x10a/0x6c8 [ptlrpc] [<fb073713>] ldlm_handle_enqueue+0x31b1/0x4b9e [ptlrpc] [<fb06dd33>] ldlm_server_blocking_ast+0x0/0x1055 [ptlrpc] [<fa7e5e9a>] mds_handle+0x605b/0x89ed [mds] [<fb0b6730>] ptlrpc_server_handle_request+0x1398/0x1b99 [ptlrpc] [<fb0b6741>] ptlrpc_server_handle_request+0x13a9/0x1b99 [ptlrpc] [<fb0b81f0>] ptlrpc_main+0xa10/0xb4c [ptlrpc] [<c011f245>] default_wake_function+0x0/0xc [<fb0b77d3>] ptlrpc_retry_rqbds+0x0/0xd [ptlrpc] [<c0103c56>] ret_from_fork+0x6/0x14 [<fb0b77d3>] ptlrpc_retry_rqbds+0x0/0xd [ptlrpc] [<fb0b77e0>] ptlrpc_main+0x0/0xb4c [ptlrpc] [<c01023bd>] kernel_thread_helper+0x5/0xb <1>LustreError: dumping log to /tmp/lustre-log.1152224767.21001 Warning (Oops_read): Code line not seen, dumping what data is available Proc; ll_mdt_06>>EIP; f8c63580 <END_OF_CODE+38758580/????> <====Trace; c02ff769 <_spin_lock_irqsave+9/d> Trace; c012c87f <process_timeout+0/5> Trace; c011f245 <default_wake_function+0/c> Trace; fb094d27 <END_OF_CODE+3ab89d27/????> Trace; fb094d27 <END_OF_CODE+3ab89d27/????> Trace; f8c452d6 <END_OF_CODE+3873a2d6/????> Trace; faa2854e <END_OF_CODE+3a51d54e/????> Trace; fa82088e <END_OF_CODE+3a31588e/????> Trace; fa82b90a <END_OF_CODE+3a32090a/????> Trace; f8c6d7f7 <END_OF_CODE+387627f7/????> Trace; fb0abccc <END_OF_CODE+3aba0ccc/????> Trace; fa7dcf30 <END_OF_CODE+3a2d1f30/????> Trace; fb07267f <END_OF_CODE+3ab6767f/????> Trace; fb036c7b <END_OF_CODE+3ab2bc7b/????> Trace; fb06dd33 <END_OF_CODE+3ab62d33/????> Trace; fb0b6730 <END_OF_CODE+3abab730/????> Trace; fb0b81f0 <END_OF_CODE+3abad1f0/????> Trace; fb0b77d3 <END_OF_CODE+3abac7d3/????> Trace; fb0b77d3 <END_OF_CODE+3abac7d3/????> Trace; c01023bd <kernel_thread_helper+5/b>
Hi, Brent I just tried recovery test (FC5 + lustre) here. I works fine. But I did it in vmware, not real node. Could you please tell me how to reproduce it detailedly? Btw: Could this be reproduced with 2.6-rhel4 kernel? or only FC5 kernel? thanks wangdi Brent A Nelson wrote:> I just built a kernel with all FC5 non-xen patches applied and > compiled it with gcc-3.3. I applied your patch and compiled 1.5.91 > with gcc-3.3 (and I didn''t need the fsprivate-2.6.patch). > > It still has the recovery problems I saw previously. It seems that if > only one client connects while Lustre is in its recovery window, the > recovery never truly completes (i.e., > /proc/fs/lustre/obdfilter/lustre-OST*/recovery_status still says > recovering even after Lustre logs a recovery abort from timeout). I > umounted my client and remounted it, which worked for a moment and the > OSTs claimed recovery was complete, but then the MDT node claimed that > it hadn''t heard from the client in 231s and evicted it! Unmounting and > remounting the client again, the MDT oopsed (see attached)! > > Thanks, > > Brent > > On Thu, 6 Jul 2006, wangdi wrote: > >> Hi, Brent >> >> I just tried lustre-1.5.91 + kernel 2.6.15-1.2054_FC5 + gcc-3.3. It >> works fine here. >> >> What kind of oops did you find? How to reproduce it? Could you please >> post here? >> >> Btw: fsprivate-2.6.patch is not needed for FC5 kernel, but you should >> apply this patch. >> >> thanks >> wangdi >> >> Brent A Nelson wrote: >>> I used gcc-3.4 (which worked fine with Lustre 1.5.90) and then >>> gcc-3.3, which didn''t seem to make a difference. I used >>> kernel-2.6.15-1.2054_FC5''s 2.6.15 plus 2.6.16-rc6 plus >>> 2.6.16-rc6-git3 plus the tux patch (but ot the rest of the patches >>> or the Xen patches). The Lustre patches applied cleanly after that, >>> except the TCP-zero-copy patch seemed to have an unnecessary >>> duplication, and I seemed to need fsprivate-2.6.patch. >>> >>> I had also tried with a vanilla 2.6.16.5+tux and 2.6.16+tux, with >>> extremely similar results. >>> >>> Perhaps this just wasn''t close enough to FC5, or ??? >>> >>> Thanks, >>> >>> Brent >>> >>> On Thu, 6 Jul 2006, wangdi wrote: >>> >>>> Hi, Brent >>>> >>>> Which gcc version did you use? I suggest you use gcc-3.2, or 3.3 >>>> when you try FC5 + lustre. >>>> You should try lustre with kernel-2.6.15-1.2054_FC5. The FC5 >>>> patches in 1.5.91 was created based on this kernel. >>>> >>>> thanks >>>> wangdi >>>> >>>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss@clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> >> >> > ------------------------------------------------------------------------ > > ll_mdt_06 S F8C63580 5864 21001 1 21002 21000 (L-TLB) > Using defaults from ksymoops -t elf32-i386 -a i386 > ea8e98f8 e7f161a2 e7f278e0 f8c63580 00000286 ea8e9900 c02ff769 c18ca520 > 00000530 0fe72612 000004f1 c0376ba0 eaf6ec30 eaf6ed58 000000fa 00139c80 > 00139c80 ffffffff c02fed1d c18cb378 c18cb378 00139c80 c012c87f eaf6ec30 > Call Trace: > [<c02ff769>] _spin_lock_irqsave+0x9/0xd [<c02fed1d>] schedule_timeout+0xad/0xc9 > [<c012c87f>] process_timeout+0x0/0x5 [<fb095470>] ptlrpc_set_wait+0x3cf/0x607 [ptlrpc] > [<c011f245>] default_wake_function+0x0/0xc [<fb094b55>] ptlrpc_expired_set+0x0/0x1b5 [ptlrpc] > [<fb094d27>] ptlrpc_interrupted_set+0x0/0x1da [ptlrpc] [<fb094b55>] ptlrpc_expired_set+0x0/0x1b5 [ptlrpc] > [<fb094d27>] ptlrpc_interrupted_set+0x0/0x1da [ptlrpc] [<faa0cd5a>] lov_create+0xb4a/0x1573 [lov] > [<f8c452d6>] cfs_alloc+0x3e/0x67 [libcfs] [<faa269a6>] lov_alloc_memmd+0x151/0x944 [lov] > [<faa2854e>] lov_setstripe+0x783/0x8e5 [lov] [<faa20a29>] lov_iocontrol+0xc51/0x18b1 [lov] > [<fa82088e>] mds_create_objects+0x3b9c/0x7684 [mds] [<fa8256a0>] mds_finish_open+0x3c0/0xa96 [mds] > [<fa82b90a>] mds_open+0x4d59/0x5567 [mds] [<f8c6d111>] entry_set_group_info+0x248/0x590 [lvfs] > [<f8c6d7f7>] upcall_cache_get_entry+0x39e/0xc07 [lvfs] [<fa80e8dd>] mds_reint_rec+0x1c3/0x27d [mds] > [<fb0abccc>] lustre_msg_string+0x7c/0x784 [ptlrpc] [<fa8317fa>] mds_open_unpack+0x3b3/0x44f [mds] > [<fa7dcf30>] mds_reint+0x700/0x7e1 [mds] [<fa7f0f15>] mds_intent_policy+0xb50/0x13d5 [mds] > [<fb07267f>] ldlm_handle_enqueue+0x211d/0x4b9e [ptlrpc] [<fa7f03c5>] mds_intent_policy+0x0/0x13d5 [mds] > [<fb036c7b>] ldlm_lock_enqueue+0x10a/0x6c8 [ptlrpc] [<fb073713>] ldlm_handle_enqueue+0x31b1/0x4b9e [ptlrpc] > [<fb06dd33>] ldlm_server_blocking_ast+0x0/0x1055 [ptlrpc] [<fa7e5e9a>] mds_handle+0x605b/0x89ed [mds] > [<fb0b6730>] ptlrpc_server_handle_request+0x1398/0x1b99 [ptlrpc] [<fb0b6741>] ptlrpc_server_handle_request+0x13a9/0x1b99 [ptlrpc] > [<fb0b81f0>] ptlrpc_main+0xa10/0xb4c [ptlrpc] [<c011f245>] default_wake_function+0x0/0xc > [<fb0b77d3>] ptlrpc_retry_rqbds+0x0/0xd [ptlrpc] [<c0103c56>] ret_from_fork+0x6/0x14 > [<fb0b77d3>] ptlrpc_retry_rqbds+0x0/0xd [ptlrpc] [<fb0b77e0>] ptlrpc_main+0x0/0xb4c [ptlrpc] > [<c01023bd>] kernel_thread_helper+0x5/0xb <1>LustreError: dumping log to /tmp/lustre-log.1152224767.21001 > Warning (Oops_read): Code line not seen, dumping what data is available > > Proc; ll_mdt_06 > > >>> EIP; f8c63580 <END_OF_CODE+38758580/????> <====>>> > > Trace; c02ff769 <_spin_lock_irqsave+9/d> > Trace; c012c87f <process_timeout+0/5> > Trace; c011f245 <default_wake_function+0/c> > Trace; fb094d27 <END_OF_CODE+3ab89d27/????> > Trace; fb094d27 <END_OF_CODE+3ab89d27/????> > Trace; f8c452d6 <END_OF_CODE+3873a2d6/????> > Trace; faa2854e <END_OF_CODE+3a51d54e/????> > Trace; fa82088e <END_OF_CODE+3a31588e/????> > Trace; fa82b90a <END_OF_CODE+3a32090a/????> > Trace; f8c6d7f7 <END_OF_CODE+387627f7/????> > Trace; fb0abccc <END_OF_CODE+3aba0ccc/????> > Trace; fa7dcf30 <END_OF_CODE+3a2d1f30/????> > Trace; fb07267f <END_OF_CODE+3ab6767f/????> > Trace; fb036c7b <END_OF_CODE+3ab2bc7b/????> > Trace; fb06dd33 <END_OF_CODE+3ab62d33/????> > Trace; fb0b6730 <END_OF_CODE+3abab730/????> > Trace; fb0b81f0 <END_OF_CODE+3abad1f0/????> > Trace; fb0b77d3 <END_OF_CODE+3abac7d3/????> > Trace; fb0b77d3 <END_OF_CODE+3abac7d3/????> > Trace; c01023bd <kernel_thread_helper+5/b> > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
wangdi wrote:> ------------------------------------------------------------------------ > > --- lustre/llite/llite_internal.h.bak 2006-07-06 16:04:13.000000000 +0800 > +++ lustre/llite/llite_internal.h 2006-07-06 18:56:13.000000000 +0800 > @@ -34,7 +34,8 @@ struct lustre_intent_data { > #endif > > #define LL_IT2STR(it) ((it) ? ldlm_it2str((it)->it_op) : "0") > -#if !defined(LUSTRE_KERNEL_VERSION) || (LUSTRE_KERNEL_VERSION < 46) > +#if !defined(LUSTRE_KERNEL_VERSION) || (LUSTRE_KERNEL_VERSION < 46 \ > + || LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,15)) > #define LUSTRE_FPRIVATE(file) ((file)->private_data) > #else > #if (LUSTRE_KERNEL_VERSION < 46) > >36 #define LL_IT2STR(it) ((it) ? ldlm_it2str((it)->it_op) : "0") 37 #if !defined(LUSTRE_KERNEL_VERSION) || (LUSTRE_KERNEL_VERSION < 46) 38 #define LUSTRE_FPRIVATE(file) ((file)->private_data) 39 #else 40 #if (LUSTRE_KERNEL_VERSION < 46) 41 #define LUSTRE_FPRIVATE(file) ((file)->private_data) 42 #else 43 #define LUSTRE_FPRIVATE(file) ((file)->fs_private) 44 #endif 45 #endif Look at line 37-38 and line 40-41, should be cleaned up? -- Qi Yong System Software Engineer Cluster File Systems, Inc.
On Fri, 7 Jul 2006, wangdi wrote:> Hi, Brent > > I just tried recovery test (FC5 + lustre) here. I works fine. But I did it in > vmware, not real node. Could you please tell me how to reproduce it > detailedly? >Try the following: server1: mkfs.lustre --mdt --mgs --failnode=server2 --reformat /dev/drbd4 mkfs.lustre --ost --failnode=server2 --mgsnode=server1 --mgsnode=server2 --reformat /dev/drbd0 mkfs.lustre --ost --failnode=server2 --mgsnode=server1 --mgsnode=server2 --reformat /dev/drbd1 mount -t lustre /dev/drbd4 /mnt/mdt mount -t lustre /dev/drbd0 /mnt/ost0 mount -t lustre /dev/drbd1 /mnt/ost1 server2: mkfs.lustre --ost --failnode=server1 --mgsnode=server1 --mgsnode=server2 --reformat /dev/drbd2 mkfs.lustre --ost --failnode=server1 --mgsnode=server1 --mgsnode=server2 --reformat /dev/drbd3 mount -t lustre /dev/drbd2 /mnt/ost2 mount -t lustre /dev/drbd3 /mnt/ost3 client1: mount -t lustre server1:/lustre /lustre1 Write some stuff, do some du''s, etc., then try unmounting everything and remounting everything. If that works, try some more reads, writes, and du''s. If it still works, then I don''t know what''s going on, and I''ll need to get you more details.> Btw: Could this be reproduced with 2.6-rhel4 kernel? or only FC5 kernel? > thanks > wangdiI haven''t tried a 2.6-rhel4 kernel. I did try my trusty 2.6.12.6 with vanilla 2.6.12 patches. It works solidly with 1.5.90, but I believe the results with 1.5.91 were quite similar to the FC5-like kernel. Use 1.5.91, it fails quickly; use 1.5.90, it happily recovers after a few moments, as it should, and I''ve never hit an oops. Hmm, I wonder if I could use alien to convert your 2.6-rhel4 rpm to a deb and, presumably, manually create an initramfs (or initrd if the kernel is too old to support initramfs)... Thanks, Brent PS I don''t suppose ClusterFS is considering supporting Ubuntu at some point? Are there any, ahem, PAYING customers out there interested in Ubuntu support?
Hi, Brent We did the test as what you said, and indeed got the error message when accessing the fs after remount the lustre. "MGS said: Lustre: MGS: haven''t heard from 192.168.1.165@tcp in 232 seconds. Last request was at 1152512686. I think it''s dead, and I am evicting it. " But although the message there, the client still could work and recovery_status also indicate that the recovery is completed. This error msg is indeed a bug, we will fix that. Thanks wangdi Brent A Nelson wrote:> On Fri, 7 Jul 2006, wangdi wrote: > >> Hi, Brent >> >> I just tried recovery test (FC5 + lustre) here. I works fine. But I >> did it in vmware, not real node. Could you please tell me how to >> reproduce it detailedly? >> > > Try the following: > > server1: > mkfs.lustre --mdt --mgs --failnode=server2 --reformat /dev/drbd4 > mkfs.lustre --ost --failnode=server2 --mgsnode=server1 > --mgsnode=server2 --reformat /dev/drbd0 > mkfs.lustre --ost --failnode=server2 --mgsnode=server1 --mgsnode=server2 > --reformat /dev/drbd1 > mount -t lustre /dev/drbd4 /mnt/mdt > mount -t lustre /dev/drbd0 /mnt/ost0 > mount -t lustre /dev/drbd1 /mnt/ost1 > > server2: > mkfs.lustre --ost --failnode=server1 --mgsnode=server1 > --mgsnode=server2 --reformat /dev/drbd2 > mkfs.lustre --ost --failnode=server1 --mgsnode=server1 > --mgsnode=server2 --reformat /dev/drbd3 > mount -t lustre /dev/drbd2 /mnt/ost2 > mount -t lustre /dev/drbd3 /mnt/ost3 > > client1: > mount -t lustre server1:/lustre /lustre1 > > Write some stuff, do some du''s, etc., then try unmounting everything > and remounting everything. If that works, try some more reads, > writes, and du''s. If it still works, then I don''t know what''s going > on, and I''ll need to get you more details. > >> Btw: Could this be reproduced with 2.6-rhel4 kernel? or only FC5 >> kernel? thanks >> wangdi > > I haven''t tried a 2.6-rhel4 kernel. I did try my trusty 2.6.12.6 with > vanilla 2.6.12 patches. It works solidly with 1.5.90, but I believe > the results with 1.5.91 were quite similar to the FC5-like kernel. > Use 1.5.91, it fails quickly; use 1.5.90, it happily recovers after a > few moments, as it should, and I''ve never hit an oops. > > Hmm, I wonder if I could use alien to convert your 2.6-rhel4 rpm to a > deb and, presumably, manually create an initramfs (or initrd if the > kernel is too old to support initramfs)... > > Thanks, > > Brent > > PS I don''t suppose ClusterFS is considering supporting Ubuntu at some > point? Are there any, ahem, PAYING customers out there interested in > Ubuntu support? > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Actually, that message just means that an old MGC client was lost (probably due to node shutdown). When Lustre clients or servers are restarted, they will create a new MGC uuid, so the MGS may believe there are two live MGCs on the same node (the new one and the old lost one.) The eviction message is a healthy cleaning up of the old MGC. wangdi wrote:> Hi, Brent > > We did the test as what you said, and indeed got the error message > when accessing the fs after remount the lustre. > "MGS said: Lustre: MGS: haven''t heard from 192.168.1.165@tcp in 232 > seconds. Last request was at 1152512686. I think it''s dead, and I am > evicting it. " > But although the message there, the client still could work and > recovery_status also indicate that the recovery is completed. This > error msg is indeed a bug, we will fix that. > > Thanks > wangdi > > Brent A Nelson wrote: > >> On Fri, 7 Jul 2006, wangdi wrote: >> >>> Hi, Brent >>> >>> I just tried recovery test (FC5 + lustre) here. I works fine. But I >>> did it in vmware, not real node. Could you please tell me how to >>> reproduce it detailedly? >>> >> >> Try the following: >> >> server1: >> mkfs.lustre --mdt --mgs --failnode=server2 --reformat /dev/drbd4 >> mkfs.lustre --ost --failnode=server2 --mgsnode=server1 >> --mgsnode=server2 --reformat /dev/drbd0 >> mkfs.lustre --ost --failnode=server2 --mgsnode=server1 --mgsnode=server2 >> --reformat /dev/drbd1 >> mount -t lustre /dev/drbd4 /mnt/mdt >> mount -t lustre /dev/drbd0 /mnt/ost0 >> mount -t lustre /dev/drbd1 /mnt/ost1 >> >> server2: >> mkfs.lustre --ost --failnode=server1 --mgsnode=server1 >> --mgsnode=server2 --reformat /dev/drbd2 >> mkfs.lustre --ost --failnode=server1 --mgsnode=server1 >> --mgsnode=server2 --reformat /dev/drbd3 >> mount -t lustre /dev/drbd2 /mnt/ost2 >> mount -t lustre /dev/drbd3 /mnt/ost3 >> >> client1: >> mount -t lustre server1:/lustre /lustre1 >> >> Write some stuff, do some du''s, etc., then try unmounting everything >> and remounting everything. If that works, try some more reads, >> writes, and du''s. If it still works, then I don''t know what''s going >> on, and I''ll need to get you more details. >> >>> Btw: Could this be reproduced with 2.6-rhel4 kernel? or only FC5 >>> kernel? thanks >>> wangdi >> >> >> I haven''t tried a 2.6-rhel4 kernel. I did try my trusty 2.6.12.6 >> with vanilla 2.6.12 patches. It works solidly with 1.5.90, but I >> believe the results with 1.5.91 were quite similar to the FC5-like >> kernel. Use 1.5.91, it fails quickly; use 1.5.90, it happily >> recovers after a few moments, as it should, and I''ve never hit an oops. >> >> Hmm, I wonder if I could use alien to convert your 2.6-rhel4 rpm to a >> deb and, presumably, manually create an initramfs (or initrd if the >> kernel is too old to support initramfs)... >> >> Thanks, >> >> Brent >> >> PS I don''t suppose ClusterFS is considering supporting Ubuntu at some >> point? Are there any, ahem, PAYING customers out there interested in >> Ubuntu support? >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss@clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >