thr3ads.net - Lustre discuss - [Lustre-discuss] More newbie issues [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Robert Healey

2008-Jul-25 21:22 UTC

[Lustre-discuss] More newbie issues

Greetings.  I managed to work around the problem I was having with
 >> Except from dmesg:
 >> Jul 17 16:32:42 compute-4-10 kernel: LustreError:
 >> 3834:0:(ldlm_lock.c:430:__ldlm_handle2lock()) 
 >ASSERTION(lock->l_resource
 >> != NULL)
 >> failed

 >This looks like bug 15269.

by moving from RHEL 5.2 on a v20z to RHEL 4.6 on Thumper, ran into a 
different problem which appears to be a deal breaker for me.  I forcibly 
failed one of the drives in the raid-1 mirror for the MDT and the file 
system promptly stopped responding to clients.  The rest of the machine 
worked just fine.  A reboot of both the client + server cleared the 
problem.  Its looking like Solaris/ZFS might be a better answer for me.

Thank you for the advice.

-- 
Bob Healey
Systems Administrator
Physics Department, RPI
healer at rpi.edu

Robert Healey

2008-Jul-28 12:28 UTC

head link

[Lustre-discuss] More newbie issues

On further investigation, the Ethernet interface was accidentally 
removed while I was swapping drives.  Time for more testing, since the 
cluster is subject to frequent unexpected power cuts.  How does Lustre 
compare to ext3 in terms of having clients unexpectedly power cycled?

Bob Healey

Robert Healey wrote:> Greetings.  I managed to work around the problem I was having with
>  >> Except from dmesg:
>  >> Jul 17 16:32:42 compute-4-10 kernel: LustreError:
>  >> 3834:0:(ldlm_lock.c:430:__ldlm_handle2lock()) 
>  >ASSERTION(lock->l_resource
>  >> != NULL)
>  >> failed
> 
>  >This looks like bug 15269.
> 
> by moving from RHEL 5.2 on a v20z to RHEL 4.6 on Thumper, ran into a 
> different problem which appears to be a deal breaker for me.  I forcibly 
> failed one of the drives in the raid-1 mirror for the MDT and the file 
> system promptly stopped responding to clients.  The rest of the machine 
> worked just fine.  A reboot of both the client + server cleared the 
> problem.  Its looking like Solaris/ZFS might be a better answer for me.
> 
> Thank you for the advice.
> 
-- 
Bob Healey
Systems Administrator
Physics Department, RPI
healer at rpi.edu

Robin Humble

2008-Jul-28 14:07 UTC

head link

[Lustre-discuss] More newbie issues

On Fri, Jul 25, 2008 at 05:22:56PM -0400, Robert Healey
wrote:>Greetings.  I managed to work around the problem I was having with
> >> Except from dmesg:
> >> Jul 17 16:32:42 compute-4-10 kernel: LustreError:
> >> 3834:0:(ldlm_lock.c:430:__ldlm_handle2lock()) 
> >ASSERTION(lock->l_resource
> >> != NULL)
> >> failed
> >This looks like bug 15269.
>by moving from RHEL 5.2 on a v20z to RHEL 4.6 on Thumper, ran into a 
>different problem which appears to be a deal breaker for me.  I forcibly 
>failed one of the drives in the raid-1 mirror for the MDT and the file 
>system promptly stopped responding to clients.  The rest of the machine 
>worked just fine.  A reboot of both the client + server cleared the 
md doing failover shouldn''t hang or stop anything. pulling disks on md
raid5 and raid1''s has worked fine for me in the past.

what does /proc/mdstat look like? dmesg?
how did you forcibly fail the disk?
>problem.  Its looking like Solaris/ZFS might be a better answer for me.
they''re not really comparable filesystems.
or do you mean you''ll use Lustre''s ZFS on OSS''s? I
didn''t think that
was available yet...

cheers,
robin

Johann Lombardi

2008-Jul-28 16:08 UTC

head link

[Lustre-discuss] More newbie issues

On Fri, Jul 25, 2008 at 05:22:56PM -0400, Robert Healey
wrote:> Greetings.  I managed to work around the problem I was having with
>  >> Except from dmesg:
>  >> Jul 17 16:32:42 compute-4-10 kernel: LustreError:
>  >> 3834:0:(ldlm_lock.c:430:__ldlm_handle2lock()) 
>  >ASSERTION(lock->l_resource
>  >> != NULL)
>  >> failed
> 
>  >This looks like bug 15269.
> 
> by moving from RHEL 5.2 on a v20z to RHEL 4.6 on Thumper, ran into a 
FYI, this problem is being worked on under bugzilla ticket 16496 (same as 15269)
and is related neither to the kernel version nor to some specific hardware.

HTH

Johann

Andreas Dilger

2008-Jul-31 06:50 UTC

head link

[Lustre-discuss] More newbie issues

On Jul 28, 2008  08:28 -0400, Robert Healey wrote:> On further investigation, the Ethernet interface was accidentally 
> removed while I was swapping drives.  Time for more testing, since the 
> cluster is subject to frequent unexpected power cuts.  How does Lustre 
> compare to ext3 in terms of having clients unexpectedly power cycled?
Lustre uses ext3 back-end storage, so it behaves the same.  On the OSTs
the data is actually written synchronously so there is no real distinction
between the ext3 data={ordered,writeback} modes.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Jul 2008 - More newbie issues

[Lustre-discuss] More newbie issues

[Lustre-discuss] More newbie issues

[Lustre-discuss] More newbie issues

[Lustre-discuss] More newbie issues

[Lustre-discuss] More newbie issues