thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre 1.5.91? [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Brent A Nelson

2006-Jul-05 20:18 UTC

[Lustre-discuss] Lustre 1.5.91?

I tried hard to get the fc5 patches to work (not on FC5, though, although 
I did try with 2.6.16-rc6-git3+tux, which is basically what FC5 uses, 
minus dozens of miscellaneous patches); the patches apply well, but the 
result oopses frequently.

Using 1.5.91 with the same 2.6.12.6 kernel that was rock solid with Lustre 
1.5.90 also seems to be rather unstable.  It doesn''t oops as much, but
it
seems to fail to recover properly from server shutdowns.  Switching 
everything back to 1.5.90 works like a charm.

Is anyone having much luck with Lustre 1.5.91?

Thanks,

Brent

PS I''m using Ubuntu Dapper.  To get that to work, I used the vanilla 
2.6.12 kernel patches.  Dapper hates 2.6.12 (it breaks udev, which now 
requires features only in 2.6.15+), but you can get it to work using the 
yaird initramfs tool (add "ramdisk = /usr/sbin/mkinitrd.yaird" to 
/etc/kernel-img.conf to automatically generate if you use make-kpkg) and 
putting critical modules (such as e1000, in my case) in /etc/modules. 
"apt-get install gcc-3.4" to compile your kernel and lustre.  1.5.91
seems
unstable, but 1.5.90 seems completely solid (so far).  The TCP zero copy 
patch in lustre 1.5.91 should be fine to use for 1.5.90.  Let me know if 
you need any other tips to run on Dapper.  You might be able to use Lustre 
1.5.91 patchless client on machines that don''t need to be servers; I 
haven''t tried that, yet.

wangdi

2006-Jul-05 22:19 UTC

head link

[Lustre-discuss] Lustre 1.5.91?

Hi, Brent

Which gcc version did you use? I suggest you use gcc-3.2, or 3.3 when 
you try FC5 + lustre.
You should try lustre with kernel-2.6.15-1.2054_FC5. The FC5 patches in 
1.5.91 was created based on this kernel.

thanks
wangdi


Brent A Nelson wr> I tried hard to get the fc5 patches to work (not on FC5, though, 
> although I did try with 2.6.16-rc6-git3+tux, which is basically what 
> FC5 uses, minus dozens of miscellaneous patches); the patches apply 
> well, but the result oopses frequently.
>
> Using 1.5.91 with the same 2.6.12.6 kernel that was rock solid with 
> Lustre 1.5.90 also seems to be rather unstable.  It doesn''t oops
as
> much, but it seems to fail to recover properly from server shutdowns.  
> Switching everything back to 1.5.90 works like a charm.
>
> Is anyone having much luck with Lustre 1.5.91?
>
> Thanks,
>
> Brent
>
> PS I''m using Ubuntu Dapper.  To get that to work, I used the
vanilla
> 2.6.12 kernel patches.  Dapper hates 2.6.12 (it breaks udev, which now 
> requires features only in 2.6.15+), but you can get it to work using 
> the yaird initramfs tool (add "ramdisk =
/usr/sbin/mkinitrd.yaird" to
> /etc/kernel-img.conf to automatically generate if you use make-kpkg) 
> and putting critical modules (such as e1000, in my case) in 
> /etc/modules. "apt-get install gcc-3.4" to compile your kernel
and
> lustre.  1.5.91 seems unstable, but 1.5.90 seems completely solid (so 
> far).  The TCP zero copy patch in lustre 1.5.91 should be fine to use 
> for 1.5.90.  Let me know if you need any other tips to run on Dapper.  
> You might be able to use Lustre 1.5.91 patchless client on machines 
> that don''t need to be servers; I haven''t tried that, yet.
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

Brent A Nelson

2006-Jul-05 23:52 UTC

head link

[Lustre-discuss] Lustre 1.5.91?

I used gcc-3.4 (which worked fine with Lustre 1.5.90) and then gcc-3.3, 
which didn''t seem to make a difference.  I used
kernel-2.6.15-1.2054_FC5''s
2.6.15 plus 2.6.16-rc6 plus 2.6.16-rc6-git3 plus the tux patch (but ot the 
rest of the patches or the Xen patches).  The Lustre patches applied 
cleanly after that, except the TCP-zero-copy patch seemed to have an 
unnecessary duplication, and I seemed to need fsprivate-2.6.patch.

I had also tried with a vanilla 2.6.16.5+tux and 2.6.16+tux, with 
extremely similar results.

Perhaps this just wasn''t close enough to FC5, or ???

Thanks,

Brent

On Thu, 6 Jul 2006, wangdi wrote:
> Hi, Brent
>
> Which gcc version did you use? I suggest you use gcc-3.2, or 3.3 when you
try
> FC5 + lustre.
> You should try lustre with kernel-2.6.15-1.2054_FC5. The FC5 patches in 
> 1.5.91 was created based on this kernel.
>
> thanks
> wangdi
>
>
> Brent A Nelson wr
>> I tried hard to get the fc5 patches to work (not on FC5, though,
although I
>> did try with 2.6.16-rc6-git3+tux, which is basically what FC5 uses,
minus
>> dozens of miscellaneous patches); the patches apply well, but the
result
>> oopses frequently.
>> 
>> Using 1.5.91 with the same 2.6.12.6 kernel that was rock solid with
Lustre
>> 1.5.90 also seems to be rather unstable.  It doesn''t oops as
much, but it
>> seems to fail to recover properly from server shutdowns.  Switching 
>> everything back to 1.5.90 works like a charm.
>> 
>> Is anyone having much luck with Lustre 1.5.91?
>> 
>> Thanks,
>> 
>> Brent
>> 
>> PS I''m using Ubuntu Dapper.  To get that to work, I used the
vanilla 2.6.12
>> kernel patches.  Dapper hates 2.6.12 (it breaks udev, which now
requires
>> features only in 2.6.15+), but you can get it to work using the yaird 
>> initramfs tool (add "ramdisk = /usr/sbin/mkinitrd.yaird" to 
>> /etc/kernel-img.conf to automatically generate if you use make-kpkg)
and
>> putting critical modules (such as e1000, in my case) in /etc/modules. 
>> "apt-get install gcc-3.4" to compile your kernel and lustre. 
1.5.91 seems
>> unstable, but 1.5.90 seems completely solid (so far).  The TCP zero
copy
>> patch in lustre 1.5.91 should be fine to use for 1.5.90.  Let me know
if
>> you need any other tips to run on Dapper.  You might be able to use
Lustre
>> 1.5.91 patchless client on machines that don''t need to be
servers; I
>> haven''t tried that, yet.
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss@clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>> 
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

wangdi

2006-Jul-06 06:01 UTC

head link

[Lustre-discuss] Lustre 1.5.91?

Hi, Brent

I just tried lustre-1.5.91 + kernel 2.6.15-1.2054_FC5 + gcc-3.3.  It 
works fine here.

What kind of oops did you find? How to reproduce it? Could you please 
post here?

Btw: fsprivate-2.6.patch is not needed for FC5 kernel, but you should 
apply this patch.

thanks
wangdi

Brent A Nelson wrote:> I used gcc-3.4 (which worked fine with Lustre 1.5.90) and then 
> gcc-3.3, which didn''t seem to make a difference.  I used 
> kernel-2.6.15-1.2054_FC5''s 2.6.15 plus 2.6.16-rc6 plus
2.6.16-rc6-git3
> plus the tux patch (but ot the rest of the patches or the Xen 
> patches).  The Lustre patches applied cleanly after that, except the 
> TCP-zero-copy patch seemed to have an unnecessary duplication, and I 
> seemed to need fsprivate-2.6.patch.
>
> I had also tried with a vanilla 2.6.16.5+tux and 2.6.16+tux, with 
> extremely similar results.
>
> Perhaps this just wasn''t close enough to FC5, or ???
>
> Thanks,
>
> Brent
>
> On Thu, 6 Jul 2006, wangdi wrote:
>
>> Hi, Brent
>>
>> Which gcc version did you use? I suggest you use gcc-3.2, or 3.3 when 
>> you try FC5 + lustre.
>> You should try lustre with kernel-2.6.15-1.2054_FC5. The FC5 patches 
>> in 1.5.91 was created based on this kernel.
>>
>> thanks
>> wangdi
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
-------------- next part --------------
--- lustre/llite/llite_internal.h.bak	2006-07-06 16:04:13.000000000 +0800
+++ lustre/llite/llite_internal.h	2006-07-06 18:56:13.000000000 +0800
@@ -34,7 +34,8 @@ struct lustre_intent_data {
 #endif
 
 #define LL_IT2STR(it) ((it) ? ldlm_it2str((it)->it_op) : "0")
-#if !defined(LUSTRE_KERNEL_VERSION) || (LUSTRE_KERNEL_VERSION < 46)
+#if !defined(LUSTRE_KERNEL_VERSION) || (LUSTRE_KERNEL_VERSION < 46 \
+             || LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,15))
 #define LUSTRE_FPRIVATE(file) ((file)->private_data)
 #else
 #if (LUSTRE_KERNEL_VERSION < 46)

Brent A Nelson

2006-Jul-06 16:43 UTC

head link

[Lustre-discuss] Lustre 1.5.91?

I just built a kernel with all FC5 non-xen patches applied and compiled it 
with gcc-3.3.  I applied your patch and compiled 1.5.91 with gcc-3.3 (and 
I didn''t need the fsprivate-2.6.patch).

It still has the recovery problems I saw previously.  It seems that if 
only one client connects while Lustre is in its recovery window, the 
recovery never truly completes (i.e., 
/proc/fs/lustre/obdfilter/lustre-OST*/recovery_status still says 
recovering even after Lustre logs a recovery abort from timeout).  I 
umounted my client and remounted it, which worked for a moment and the 
OSTs claimed recovery was complete, but then the MDT node claimed that it 
hadn''t heard from the client in 231s and evicted it! Unmounting and 
remounting the client again, the MDT oopsed (see attached)!

Thanks,

Brent

On Thu, 6 Jul 2006, wangdi wrote:
> Hi, Brent
>
> I just tried lustre-1.5.91 + kernel 2.6.15-1.2054_FC5 + gcc-3.3.  It works 
> fine here.
>
> What kind of oops did you find? How to reproduce it? Could you please post 
> here?
>
> Btw: fsprivate-2.6.patch is not needed for FC5 kernel, but you should apply
> this patch.
>
> thanks
> wangdi
>
> Brent A Nelson wrote:
>> I used gcc-3.4 (which worked fine with Lustre 1.5.90) and then gcc-3.3,
>> which didn''t seem to make a difference.  I used
kernel-2.6.15-1.2054_FC5''s
>> 2.6.15 plus 2.6.16-rc6 plus 2.6.16-rc6-git3 plus the tux patch (but ot
the
>> rest of the patches or the Xen patches).  The Lustre patches applied 
>> cleanly after that, except the TCP-zero-copy patch seemed to have an 
>> unnecessary duplication, and I seemed to need fsprivate-2.6.patch.
>> 
>> I had also tried with a vanilla 2.6.16.5+tux and 2.6.16+tux, with
extremely
>> similar results.
>> 
>> Perhaps this just wasn''t close enough to FC5, or ???
>> 
>> Thanks,
>> 
>> Brent
>> 
>> On Thu, 6 Jul 2006, wangdi wrote:
>> 
>>> Hi, Brent
>>> 
>>> Which gcc version did you use? I suggest you use gcc-3.2, or 3.3
when you
>>> try FC5 + lustre.
>>> You should try lustre with kernel-2.6.15-1.2054_FC5. The FC5
patches in
>>> 1.5.91 was created based on this kernel.
>>> 
>>> thanks
>>> wangdi
>>> 
>>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss@clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>> 
>
>-------------- next part --------------
ll_mdt_06     S F8C63580  5864 21001      1         21002 21000 (L-TLB)
Using defaults from ksymoops -t elf32-i386 -a i386
ea8e98f8 e7f161a2 e7f278e0 f8c63580 00000286 ea8e9900 c02ff769 c18ca520
       00000530 0fe72612 000004f1 c0376ba0 eaf6ec30 eaf6ed58 000000fa 00139c80
       00139c80 ffffffff c02fed1d c18cb378 c18cb378 00139c80 c012c87f eaf6ec30
Call Trace:
 [<c02ff769>] _spin_lock_irqsave+0x9/0xd     [<c02fed1d>]
schedule_timeout+0xad/0xc9
 [<c012c87f>] process_timeout+0x0/0x5     [<fb095470>]
ptlrpc_set_wait+0x3cf/0x607 [ptlrpc]
 [<c011f245>] default_wake_function+0x0/0xc     [<fb094b55>]
ptlrpc_expired_set+0x0/0x1b5 [ptlrpc]
 [<fb094d27>] ptlrpc_interrupted_set+0x0/0x1da [ptlrpc]    
[<fb094b55>] ptlrpc_expired_set+0x0/0x1b5 [ptlrpc]
 [<fb094d27>] ptlrpc_interrupted_set+0x0/0x1da [ptlrpc]    
[<faa0cd5a>] lov_create+0xb4a/0x1573 [lov]
 [<f8c452d6>] cfs_alloc+0x3e/0x67 [libcfs]     [<faa269a6>]
lov_alloc_memmd+0x151/0x944 [lov]
 [<faa2854e>] lov_setstripe+0x783/0x8e5 [lov]     [<faa20a29>]
lov_iocontrol+0xc51/0x18b1 [lov]
 [<fa82088e>] mds_create_objects+0x3b9c/0x7684 [mds]    
[<fa8256a0>] mds_finish_open+0x3c0/0xa96 [mds]
 [<fa82b90a>] mds_open+0x4d59/0x5567 [mds]     [<f8c6d111>]
entry_set_group_info+0x248/0x590 [lvfs]
 [<f8c6d7f7>] upcall_cache_get_entry+0x39e/0xc07 [lvfs]    
[<fa80e8dd>] mds_reint_rec+0x1c3/0x27d [mds]
 [<fb0abccc>] lustre_msg_string+0x7c/0x784 [ptlrpc]     [<fa8317fa>]
mds_open_unpack+0x3b3/0x44f [mds]
 [<fa7dcf30>] mds_reint+0x700/0x7e1 [mds]     [<fa7f0f15>]
mds_intent_policy+0xb50/0x13d5 [mds]
 [<fb07267f>] ldlm_handle_enqueue+0x211d/0x4b9e [ptlrpc]    
[<fa7f03c5>] mds_intent_policy+0x0/0x13d5 [mds]
 [<fb036c7b>] ldlm_lock_enqueue+0x10a/0x6c8 [ptlrpc]    
[<fb073713>] ldlm_handle_enqueue+0x31b1/0x4b9e [ptlrpc]
 [<fb06dd33>] ldlm_server_blocking_ast+0x0/0x1055 [ptlrpc]    
[<fa7e5e9a>] mds_handle+0x605b/0x89ed [mds]
 [<fb0b6730>] ptlrpc_server_handle_request+0x1398/0x1b99 [ptlrpc]    
[<fb0b6741>] ptlrpc_server_handle_request+0x13a9/0x1b99 [ptlrpc]
 [<fb0b81f0>] ptlrpc_main+0xa10/0xb4c [ptlrpc]     [<c011f245>]
default_wake_function+0x0/0xc
 [<fb0b77d3>] ptlrpc_retry_rqbds+0x0/0xd [ptlrpc]     [<c0103c56>]
ret_from_fork+0x6/0x14
 [<fb0b77d3>] ptlrpc_retry_rqbds+0x0/0xd [ptlrpc]     [<fb0b77e0>]
ptlrpc_main+0x0/0xb4c [ptlrpc]
 [<c01023bd>] kernel_thread_helper+0x5/0xb    <1>LustreError:
dumping log to /tmp/lustre-log.1152224767.21001
Warning (Oops_read): Code line not seen, dumping what data is available

Proc;  ll_mdt_06
>>EIP; f8c63580 <END_OF_CODE+38758580/????>   <====Trace; c02ff769 <_spin_lock_irqsave+9/d>
Trace; c012c87f <process_timeout+0/5>
Trace; c011f245 <default_wake_function+0/c>
Trace; fb094d27 <END_OF_CODE+3ab89d27/????>
Trace; fb094d27 <END_OF_CODE+3ab89d27/????>
Trace; f8c452d6 <END_OF_CODE+3873a2d6/????>
Trace; faa2854e <END_OF_CODE+3a51d54e/????>
Trace; fa82088e <END_OF_CODE+3a31588e/????>
Trace; fa82b90a <END_OF_CODE+3a32090a/????>
Trace; f8c6d7f7 <END_OF_CODE+387627f7/????>
Trace; fb0abccc <END_OF_CODE+3aba0ccc/????>
Trace; fa7dcf30 <END_OF_CODE+3a2d1f30/????>
Trace; fb07267f <END_OF_CODE+3ab6767f/????>
Trace; fb036c7b <END_OF_CODE+3ab2bc7b/????>
Trace; fb06dd33 <END_OF_CODE+3ab62d33/????>
Trace; fb0b6730 <END_OF_CODE+3abab730/????>
Trace; fb0b81f0 <END_OF_CODE+3abad1f0/????>
Trace; fb0b77d3 <END_OF_CODE+3abac7d3/????>
Trace; fb0b77d3 <END_OF_CODE+3abac7d3/????>
Trace; c01023bd <kernel_thread_helper+5/b>

wangdi

2006-Jul-07 01:35 UTC

head link

[Lustre-discuss] Lustre 1.5.91?

Hi, Brent

I just tried recovery test (FC5 + lustre) here. I works fine. But I did 
it in vmware, not real node. 
Could you please tell me how to reproduce it detailedly?

Btw: Could this be reproduced with 2.6-rhel4 kernel? or only FC5 kernel? 

thanks
wangdi


Brent A Nelson wrote:> I just built a kernel with all FC5 non-xen patches applied and 
> compiled it with gcc-3.3.  I applied your patch and compiled 1.5.91 
> with gcc-3.3 (and I didn''t need the fsprivate-2.6.patch).
>
> It still has the recovery problems I saw previously.  It seems that if 
> only one client connects while Lustre is in its recovery window, the 
> recovery never truly completes (i.e., 
> /proc/fs/lustre/obdfilter/lustre-OST*/recovery_status still says 
> recovering even after Lustre logs a recovery abort from timeout).  I 
> umounted my client and remounted it, which worked for a moment and the 
> OSTs claimed recovery was complete, but then the MDT node claimed that 
> it hadn''t heard from the client in 231s and evicted it! Unmounting
and
> remounting the client again, the MDT oopsed (see attached)!
>
> Thanks,
>
> Brent
>
> On Thu, 6 Jul 2006, wangdi wrote:
>
>> Hi, Brent
>>
>> I just tried lustre-1.5.91 + kernel 2.6.15-1.2054_FC5 + gcc-3.3.  It 
>> works fine here.
>>
>> What kind of oops did you find? How to reproduce it? Could you please 
>> post here?
>>
>> Btw: fsprivate-2.6.patch is not needed for FC5 kernel, but you should 
>> apply this patch.
>>
>> thanks
>> wangdi
>>
>> Brent A Nelson wrote:
>>> I used gcc-3.4 (which worked fine with Lustre 1.5.90) and then 
>>> gcc-3.3, which didn''t seem to make a difference.  I used 
>>> kernel-2.6.15-1.2054_FC5''s 2.6.15 plus 2.6.16-rc6 plus 
>>> 2.6.16-rc6-git3 plus the tux patch (but ot the rest of the patches 
>>> or the Xen patches).  The Lustre patches applied cleanly after
that,
>>> except the TCP-zero-copy patch seemed to have an unnecessary 
>>> duplication, and I seemed to need fsprivate-2.6.patch.
>>>
>>> I had also tried with a vanilla 2.6.16.5+tux and 2.6.16+tux, with 
>>> extremely similar results.
>>>
>>> Perhaps this just wasn''t close enough to FC5, or ???
>>>
>>> Thanks,
>>>
>>> Brent
>>>
>>> On Thu, 6 Jul 2006, wangdi wrote:
>>>
>>>> Hi, Brent
>>>>
>>>> Which gcc version did you use? I suggest you use gcc-3.2, or
3.3
>>>> when you try FC5 + lustre.
>>>> You should try lustre with kernel-2.6.15-1.2054_FC5. The FC5 
>>>> patches in 1.5.91 was created based on this kernel.
>>>>
>>>> thanks
>>>> wangdi
>>>>
>>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss@clusterfs.com
>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>
>>
>>
> ------------------------------------------------------------------------
>
> ll_mdt_06     S F8C63580  5864 21001      1         21002 21000 (L-TLB)
> Using defaults from ksymoops -t elf32-i386 -a i386
> ea8e98f8 e7f161a2 e7f278e0 f8c63580 00000286 ea8e9900 c02ff769 c18ca520
>        00000530 0fe72612 000004f1 c0376ba0 eaf6ec30 eaf6ed58 000000fa
00139c80
>        00139c80 ffffffff c02fed1d c18cb378 c18cb378 00139c80 c012c87f
eaf6ec30
> Call Trace:
>  [<c02ff769>] _spin_lock_irqsave+0x9/0xd     [<c02fed1d>]
schedule_timeout+0xad/0xc9
>  [<c012c87f>] process_timeout+0x0/0x5     [<fb095470>]
ptlrpc_set_wait+0x3cf/0x607 [ptlrpc]
>  [<c011f245>] default_wake_function+0x0/0xc     [<fb094b55>]
ptlrpc_expired_set+0x0/0x1b5 [ptlrpc]
>  [<fb094d27>] ptlrpc_interrupted_set+0x0/0x1da [ptlrpc]    
[<fb094b55>] ptlrpc_expired_set+0x0/0x1b5 [ptlrpc]
>  [<fb094d27>] ptlrpc_interrupted_set+0x0/0x1da [ptlrpc]    
[<faa0cd5a>] lov_create+0xb4a/0x1573 [lov]
>  [<f8c452d6>] cfs_alloc+0x3e/0x67 [libcfs]     [<faa269a6>]
lov_alloc_memmd+0x151/0x944 [lov]
>  [<faa2854e>] lov_setstripe+0x783/0x8e5 [lov]     [<faa20a29>]
lov_iocontrol+0xc51/0x18b1 [lov]
>  [<fa82088e>] mds_create_objects+0x3b9c/0x7684 [mds]    
[<fa8256a0>] mds_finish_open+0x3c0/0xa96 [mds]
>  [<fa82b90a>] mds_open+0x4d59/0x5567 [mds]     [<f8c6d111>]
entry_set_group_info+0x248/0x590 [lvfs]
>  [<f8c6d7f7>] upcall_cache_get_entry+0x39e/0xc07 [lvfs]    
[<fa80e8dd>] mds_reint_rec+0x1c3/0x27d [mds]
>  [<fb0abccc>] lustre_msg_string+0x7c/0x784 [ptlrpc]    
[<fa8317fa>] mds_open_unpack+0x3b3/0x44f [mds]
>  [<fa7dcf30>] mds_reint+0x700/0x7e1 [mds]     [<fa7f0f15>]
mds_intent_policy+0xb50/0x13d5 [mds]
>  [<fb07267f>] ldlm_handle_enqueue+0x211d/0x4b9e [ptlrpc]    
[<fa7f03c5>] mds_intent_policy+0x0/0x13d5 [mds]
>  [<fb036c7b>] ldlm_lock_enqueue+0x10a/0x6c8 [ptlrpc]    
[<fb073713>] ldlm_handle_enqueue+0x31b1/0x4b9e [ptlrpc]
>  [<fb06dd33>] ldlm_server_blocking_ast+0x0/0x1055 [ptlrpc]    
[<fa7e5e9a>] mds_handle+0x605b/0x89ed [mds]
>  [<fb0b6730>] ptlrpc_server_handle_request+0x1398/0x1b99 [ptlrpc]    
[<fb0b6741>] ptlrpc_server_handle_request+0x13a9/0x1b99 [ptlrpc]
>  [<fb0b81f0>] ptlrpc_main+0xa10/0xb4c [ptlrpc]     [<c011f245>]
default_wake_function+0x0/0xc
>  [<fb0b77d3>] ptlrpc_retry_rqbds+0x0/0xd [ptlrpc]    
[<c0103c56>] ret_from_fork+0x6/0x14
>  [<fb0b77d3>] ptlrpc_retry_rqbds+0x0/0xd [ptlrpc]    
[<fb0b77e0>] ptlrpc_main+0x0/0xb4c [ptlrpc]
>  [<c01023bd>] kernel_thread_helper+0x5/0xb    <1>LustreError:
dumping log to /tmp/lustre-log.1152224767.21001
> Warning (Oops_read): Code line not seen, dumping what data is available
>
> Proc;  ll_mdt_06
>
>   
>>> EIP; f8c63580 <END_OF_CODE+38758580/????>  
<====>>>
>
> Trace; c02ff769 <_spin_lock_irqsave+9/d>
> Trace; c012c87f <process_timeout+0/5>
> Trace; c011f245 <default_wake_function+0/c>
> Trace; fb094d27 <END_OF_CODE+3ab89d27/????>
> Trace; fb094d27 <END_OF_CODE+3ab89d27/????>
> Trace; f8c452d6 <END_OF_CODE+3873a2d6/????>
> Trace; faa2854e <END_OF_CODE+3a51d54e/????>
> Trace; fa82088e <END_OF_CODE+3a31588e/????>
> Trace; fa82b90a <END_OF_CODE+3a32090a/????>
> Trace; f8c6d7f7 <END_OF_CODE+387627f7/????>
> Trace; fb0abccc <END_OF_CODE+3aba0ccc/????>
> Trace; fa7dcf30 <END_OF_CODE+3a2d1f30/????>
> Trace; fb07267f <END_OF_CODE+3ab6767f/????>
> Trace; fb036c7b <END_OF_CODE+3ab2bc7b/????>
> Trace; fb06dd33 <END_OF_CODE+3ab62d33/????>
> Trace; fb0b6730 <END_OF_CODE+3abab730/????>
> Trace; fb0b81f0 <END_OF_CODE+3abad1f0/????>
> Trace; fb0b77d3 <END_OF_CODE+3abac7d3/????>
> Trace; fb0b77d3 <END_OF_CODE+3abac7d3/????>
> Trace; c01023bd <kernel_thread_helper+5/b>
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

Qi Yong

2006-Jul-07 07:00 UTC

head link

[Lustre-discuss] Lustre 1.5.91?

wangdi wrote:
> ------------------------------------------------------------------------
> 
> --- lustre/llite/llite_internal.h.bak	2006-07-06 16:04:13.000000000 +0800
> +++ lustre/llite/llite_internal.h	2006-07-06 18:56:13.000000000 +0800
> @@ -34,7 +34,8 @@ struct lustre_intent_data {
>  #endif
>  
>  #define LL_IT2STR(it) ((it) ? ldlm_it2str((it)->it_op) : "0")
> -#if !defined(LUSTRE_KERNEL_VERSION) || (LUSTRE_KERNEL_VERSION < 46)
> +#if !defined(LUSTRE_KERNEL_VERSION) || (LUSTRE_KERNEL_VERSION < 46 \
> +             || LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,15))
>  #define LUSTRE_FPRIVATE(file) ((file)->private_data)
>  #else
>  #if (LUSTRE_KERNEL_VERSION < 46)
> 
> 
 36 #define LL_IT2STR(it) ((it) ? ldlm_it2str((it)->it_op) : "0")
 37 #if !defined(LUSTRE_KERNEL_VERSION) || (LUSTRE_KERNEL_VERSION < 46)
 38 #define LUSTRE_FPRIVATE(file) ((file)->private_data)
 39 #else
 40 #if (LUSTRE_KERNEL_VERSION < 46)
 41 #define LUSTRE_FPRIVATE(file) ((file)->private_data)
 42 #else
 43 #define LUSTRE_FPRIVATE(file) ((file)->fs_private)
 44 #endif
 45 #endif

Look at line 37-38 and line 40-41, should be cleaned up?

-- 
Qi Yong
System Software Engineer
Cluster File Systems, Inc.

Brent A Nelson

2006-Jul-07 09:42 UTC

head link

[Lustre-discuss] Lustre 1.5.91?

On Fri, 7 Jul 2006, wangdi wrote:
> Hi, Brent
>
> I just tried recovery test (FC5 + lustre) here. I works fine. But I did it
in
> vmware, not real node. Could you please tell me how to reproduce it 
> detailedly?
>
Try the following:

server1:
mkfs.lustre --mdt --mgs --failnode=server2 --reformat /dev/drbd4
mkfs.lustre --ost --failnode=server2 --mgsnode=server1 --mgsnode=server2 
--reformat /dev/drbd0
mkfs.lustre --ost --failnode=server2 --mgsnode=server1 --mgsnode=server2
--reformat /dev/drbd1
mount -t lustre /dev/drbd4 /mnt/mdt
mount -t lustre /dev/drbd0 /mnt/ost0
mount -t lustre /dev/drbd1 /mnt/ost1

server2:
mkfs.lustre --ost --failnode=server1 --mgsnode=server1 --mgsnode=server2 
--reformat /dev/drbd2
mkfs.lustre --ost --failnode=server1 --mgsnode=server1 --mgsnode=server2 
--reformat /dev/drbd3
mount -t lustre /dev/drbd2 /mnt/ost2
mount -t lustre /dev/drbd3 /mnt/ost3

client1:
mount -t lustre server1:/lustre /lustre1

Write some stuff, do some du''s, etc., then try unmounting everything
and
remounting everything.  If that works, try some more reads, writes, and 
du''s.  If it still works, then I don''t know what''s
going on, and I''ll
need to get you more details.
> Btw: Could this be reproduced with 2.6-rhel4 kernel? or only FC5 kernel? 
> thanks
> wangdi
I haven''t tried a 2.6-rhel4 kernel.  I did try my trusty 2.6.12.6 with 
vanilla 2.6.12 patches.  It works solidly with 1.5.90, but I believe the 
results with 1.5.91 were quite similar to the FC5-like kernel.  Use 
1.5.91, it fails quickly; use 1.5.90, it happily recovers after a few 
moments, as it should, and I''ve never hit an oops.

Hmm, I wonder if I could use alien to convert your 2.6-rhel4 rpm to a deb 
and, presumably, manually create an initramfs (or initrd if the kernel is 
too old to support initramfs)...

Thanks,

Brent

PS I don''t suppose ClusterFS is considering supporting Ubuntu at some 
point? Are there any, ahem, PAYING customers out there interested in 
Ubuntu support?

wangdi

2006-Jul-10 00:50 UTC

head link

[Lustre-discuss] Lustre 1.5.91?

Hi, Brent

We did the test as what you said, and indeed got the error message when 
accessing the fs after remount the lustre.
"MGS said: Lustre: MGS: haven''t heard from 192.168.1.165@tcp in
232
seconds. Last request was at 1152512686. I think it''s dead, and I am 
evicting it. "
But although the message there, the client still could work and 
recovery_status also indicate that the recovery is completed.   This 
error msg is indeed a bug, we will fix that.

Thanks
wangdi

Brent A Nelson wrote:> On Fri, 7 Jul 2006, wangdi wrote:
>
>> Hi, Brent
>>
>> I just tried recovery test (FC5 + lustre) here. I works fine. But I 
>> did it in vmware, not real node. Could you please tell me how to 
>> reproduce it detailedly?
>>
>
> Try the following:
>
> server1:
> mkfs.lustre --mdt --mgs --failnode=server2 --reformat /dev/drbd4
> mkfs.lustre --ost --failnode=server2 --mgsnode=server1 
> --mgsnode=server2 --reformat /dev/drbd0
> mkfs.lustre --ost --failnode=server2 --mgsnode=server1 --mgsnode=server2
> --reformat /dev/drbd1
> mount -t lustre /dev/drbd4 /mnt/mdt
> mount -t lustre /dev/drbd0 /mnt/ost0
> mount -t lustre /dev/drbd1 /mnt/ost1
>
> server2:
> mkfs.lustre --ost --failnode=server1 --mgsnode=server1 
> --mgsnode=server2 --reformat /dev/drbd2
> mkfs.lustre --ost --failnode=server1 --mgsnode=server1 
> --mgsnode=server2 --reformat /dev/drbd3
> mount -t lustre /dev/drbd2 /mnt/ost2
> mount -t lustre /dev/drbd3 /mnt/ost3
>
> client1:
> mount -t lustre server1:/lustre /lustre1
>
> Write some stuff, do some du''s, etc., then try unmounting
everything
> and remounting everything.  If that works, try some more reads, 
> writes, and du''s.  If it still works, then I don''t know
what''s going
> on, and I''ll need to get you more details.
>
>> Btw: Could this be reproduced with 2.6-rhel4 kernel? or only FC5 
>> kernel? thanks
>> wangdi
>
> I haven''t tried a 2.6-rhel4 kernel.  I did try my trusty 2.6.12.6
with
> vanilla 2.6.12 patches.  It works solidly with 1.5.90, but I believe 
> the results with 1.5.91 were quite similar to the FC5-like kernel.  
> Use 1.5.91, it fails quickly; use 1.5.90, it happily recovers after a 
> few moments, as it should, and I''ve never hit an oops.
>
> Hmm, I wonder if I could use alien to convert your 2.6-rhel4 rpm to a 
> deb and, presumably, manually create an initramfs (or initrd if the 
> kernel is too old to support initramfs)...
>
> Thanks,
>
> Brent
>
> PS I don''t suppose ClusterFS is considering supporting Ubuntu at
some
> point? Are there any, ahem, PAYING customers out there interested in 
> Ubuntu support?
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

Nathaniel Rutman

2006-Jul-10 11:06 UTC

head link

[Lustre-discuss] Lustre 1.5.91?

Actually, that message just means that an old MGC client was lost 
(probably due to node shutdown).  When Lustre clients or servers are 
restarted, they will create a new MGC uuid, so the MGS may believe there 
are two live MGCs on the same node (the new one and the old lost one.)  
The eviction message is a healthy cleaning up of the old MGC.

wangdi wrote:
> Hi, Brent
>
> We did the test as what you said, and indeed got the error message 
> when accessing the fs after remount the lustre.
> "MGS said: Lustre: MGS: haven''t heard from 192.168.1.165@tcp
in 232
> seconds. Last request was at 1152512686. I think it''s dead, and I
am
> evicting it. "
> But although the message there, the client still could work and 
> recovery_status also indicate that the recovery is completed.   This 
> error msg is indeed a bug, we will fix that.
>
> Thanks
> wangdi
>
> Brent A Nelson wrote:
>
>> On Fri, 7 Jul 2006, wangdi wrote:
>>
>>> Hi, Brent
>>>
>>> I just tried recovery test (FC5 + lustre) here. I works fine. But I
>>> did it in vmware, not real node. Could you please tell me how to 
>>> reproduce it detailedly?
>>>
>>
>> Try the following:
>>
>> server1:
>> mkfs.lustre --mdt --mgs --failnode=server2 --reformat /dev/drbd4
>> mkfs.lustre --ost --failnode=server2 --mgsnode=server1 
>> --mgsnode=server2 --reformat /dev/drbd0
>> mkfs.lustre --ost --failnode=server2 --mgsnode=server1
--mgsnode=server2
>> --reformat /dev/drbd1
>> mount -t lustre /dev/drbd4 /mnt/mdt
>> mount -t lustre /dev/drbd0 /mnt/ost0
>> mount -t lustre /dev/drbd1 /mnt/ost1
>>
>> server2:
>> mkfs.lustre --ost --failnode=server1 --mgsnode=server1 
>> --mgsnode=server2 --reformat /dev/drbd2
>> mkfs.lustre --ost --failnode=server1 --mgsnode=server1 
>> --mgsnode=server2 --reformat /dev/drbd3
>> mount -t lustre /dev/drbd2 /mnt/ost2
>> mount -t lustre /dev/drbd3 /mnt/ost3
>>
>> client1:
>> mount -t lustre server1:/lustre /lustre1
>>
>> Write some stuff, do some du''s, etc., then try unmounting
everything
>> and remounting everything.  If that works, try some more reads, 
>> writes, and du''s.  If it still works, then I don''t
know what''s going
>> on, and I''ll need to get you more details.
>>
>>> Btw: Could this be reproduced with 2.6-rhel4 kernel? or only FC5 
>>> kernel? thanks
>>> wangdi
>>
>>
>> I haven''t tried a 2.6-rhel4 kernel.  I did try my trusty
2.6.12.6
>> with vanilla 2.6.12 patches.  It works solidly with 1.5.90, but I 
>> believe the results with 1.5.91 were quite similar to the FC5-like 
>> kernel.  Use 1.5.91, it fails quickly; use 1.5.90, it happily 
>> recovers after a few moments, as it should, and I''ve never hit
an oops.
>>
>> Hmm, I wonder if I could use alien to convert your 2.6-rhel4 rpm to a 
>> deb and, presumably, manually create an initramfs (or initrd if the 
>> kernel is too old to support initramfs)...
>>
>> Thanks,
>>
>> Brent
>>
>> PS I don''t suppose ClusterFS is considering supporting Ubuntu
at some
>> point? Are there any, ahem, PAYING customers out there interested in 
>> Ubuntu support?
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss@clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

Lustre discuss - Jul 2006 - Lustre 1.5.91?

[Lustre-discuss] Lustre 1.5.91?

[Lustre-discuss] Lustre 1.5.91?

[Lustre-discuss] Lustre 1.5.91?

[Lustre-discuss] Lustre 1.5.91?

[Lustre-discuss] Lustre 1.5.91?

[Lustre-discuss] Lustre 1.5.91?

[Lustre-discuss] Lustre 1.5.91?

[Lustre-discuss] Lustre 1.5.91?

[Lustre-discuss] Lustre 1.5.91?

[Lustre-discuss] Lustre 1.5.91?