Greetings. I managed to work around the problem I was having with >> Except from dmesg: >> Jul 17 16:32:42 compute-4-10 kernel: LustreError: >> 3834:0:(ldlm_lock.c:430:__ldlm_handle2lock()) >ASSERTION(lock->l_resource >> != NULL) >> failed >This looks like bug 15269. by moving from RHEL 5.2 on a v20z to RHEL 4.6 on Thumper, ran into a different problem which appears to be a deal breaker for me. I forcibly failed one of the drives in the raid-1 mirror for the MDT and the file system promptly stopped responding to clients. The rest of the machine worked just fine. A reboot of both the client + server cleared the problem. Its looking like Solaris/ZFS might be a better answer for me. Thank you for the advice. -- Bob Healey Systems Administrator Physics Department, RPI healer at rpi.edu
On further investigation, the Ethernet interface was accidentally removed while I was swapping drives. Time for more testing, since the cluster is subject to frequent unexpected power cuts. How does Lustre compare to ext3 in terms of having clients unexpectedly power cycled? Bob Healey Robert Healey wrote:> Greetings. I managed to work around the problem I was having with > >> Except from dmesg: > >> Jul 17 16:32:42 compute-4-10 kernel: LustreError: > >> 3834:0:(ldlm_lock.c:430:__ldlm_handle2lock()) > >ASSERTION(lock->l_resource > >> != NULL) > >> failed > > >This looks like bug 15269. > > by moving from RHEL 5.2 on a v20z to RHEL 4.6 on Thumper, ran into a > different problem which appears to be a deal breaker for me. I forcibly > failed one of the drives in the raid-1 mirror for the MDT and the file > system promptly stopped responding to clients. The rest of the machine > worked just fine. A reboot of both the client + server cleared the > problem. Its looking like Solaris/ZFS might be a better answer for me. > > Thank you for the advice. >-- Bob Healey Systems Administrator Physics Department, RPI healer at rpi.edu
On Fri, Jul 25, 2008 at 05:22:56PM -0400, Robert Healey wrote:>Greetings. I managed to work around the problem I was having with > >> Except from dmesg: > >> Jul 17 16:32:42 compute-4-10 kernel: LustreError: > >> 3834:0:(ldlm_lock.c:430:__ldlm_handle2lock()) > >ASSERTION(lock->l_resource > >> != NULL) > >> failed > >This looks like bug 15269. >by moving from RHEL 5.2 on a v20z to RHEL 4.6 on Thumper, ran into a >different problem which appears to be a deal breaker for me. I forcibly >failed one of the drives in the raid-1 mirror for the MDT and the file >system promptly stopped responding to clients. The rest of the machine >worked just fine. A reboot of both the client + server cleared themd doing failover shouldn''t hang or stop anything. pulling disks on md raid5 and raid1''s has worked fine for me in the past. what does /proc/mdstat look like? dmesg? how did you forcibly fail the disk?>problem. Its looking like Solaris/ZFS might be a better answer for me.they''re not really comparable filesystems. or do you mean you''ll use Lustre''s ZFS on OSS''s? I didn''t think that was available yet... cheers, robin
On Fri, Jul 25, 2008 at 05:22:56PM -0400, Robert Healey wrote:> Greetings. I managed to work around the problem I was having with > >> Except from dmesg: > >> Jul 17 16:32:42 compute-4-10 kernel: LustreError: > >> 3834:0:(ldlm_lock.c:430:__ldlm_handle2lock()) > >ASSERTION(lock->l_resource > >> != NULL) > >> failed > > >This looks like bug 15269. > > by moving from RHEL 5.2 on a v20z to RHEL 4.6 on Thumper, ran into aFYI, this problem is being worked on under bugzilla ticket 16496 (same as 15269) and is related neither to the kernel version nor to some specific hardware. HTH Johann
On Jul 28, 2008 08:28 -0400, Robert Healey wrote:> On further investigation, the Ethernet interface was accidentally > removed while I was swapping drives. Time for more testing, since the > cluster is subject to frequent unexpected power cuts. How does Lustre > compare to ext3 in terms of having clients unexpectedly power cycled?Lustre uses ext3 back-end storage, so it behaves the same. On the OSTs the data is actually written synchronously so there is no real distinction between the ext3 data={ordered,writeback} modes. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.