Hello, after a crash (hardware failure) of an OST with two lustre partitions one partition (/dev/sdb) cannot be remounted after restart. The second (/dev/sdc) partition mounts fine. What needs to be done in such a case ? I tried to move the mountpoint because of the "file exists" message but that does not help. Any pointers welcome. Heiko OST messages after mount command: mount -t lustre /dev/sdb /mnt/data/ost3 <snip> Aug 13 11:18:53 sadosrd20 kjournald starting. Commit interval 5 seconds Aug 13 11:18:53 sadosrd20 LDISKFS FS on sdb, internal journal Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 13 11:18:53 sadosrd20 LDISKFS-fs: file extents enabled Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mballoc enabled Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:(genops.c:246:class_newdev()) Device scia-OST0004 already exists, won''t add Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of type obdfilter : -17 Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_config.c:1070:class_config_llog_handler()) Err -17 on cfg command: Aug 13 11:18:54 sadosrd20 Lustre: cmd=cf001 0:scia-OST0004 1:obdfilter 2:scia-OST0004_UUID Aug 13 11:18:54 sadosrd20 LustreError: 15c-8: MGC192.168.16.122 at tcp: The configuration from log ''scia-OST0004'' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_mount.c:1091:server_start_targets()) failed to start server scia-OST0004: -17 Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_mount.c:1597:server_fill_super()) Unable to start targets: -17 Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_mount.c:1382:server_put_super()) no obd scia-OST0004 Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 generated and it took 7512 Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 256 preallocated, 0 discarded Aug 13 11:18:55 sadosrd20 Lustre: server umount scia-OST0004 complete Aug 13 11:18:55 sadosrd20 LustreError: 7247:0: (obd_mount.c:1951:lustre_fill_super()) Unable to mount (-17) <snap> OST parameter: mkfs.lustre --param="failover.mode=failout" --fsname scia --ost --mkfsoptions=''-i 2097152 -E stride=16 -b 4096'' --mgsnode=mds1 at tcp0 /dev/sdb mkfs.lustre --param="failover.mode=failout" --fsname scia --ost --mkfsoptions=''-i 2097152 -E stride=16 -b 4096'' --mgsnode=mds1 at tcp0 /dev/sdc MDS parameter: mkfs.lustre --fsname=scia --mdt --mgs --failnode=mds2 /dev/drbd0 Just for your info the OST output of the ok partition after mounting: Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, recovered transactions 72449 to 74105 Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 7548 and revoked 0/0 blocks Aug 13 11:27:00 sadosrd20 kjournald starting. Commit interval 5 seconds Aug 13 11:27:00 sadosrd20 LDISKFS FS on sdc, internal journal Aug 13 11:27:00 sadosrd20 LDISKFS-fs: recovery complete. Aug 13 11:27:00 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 13 11:27:01 sadosrd20 kjournald starting. Commit interval 5 seconds Aug 13 11:27:01 sadosrd20 LDISKFS FS on sdc, internal journal Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 13 11:27:01 sadosrd20 LDISKFS-fs: file extents enabled Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mballoc enabled Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:(filter.c:1732:filter_common_setup()) scia-OST0005: recovery disabled Aug 13 11:27:01 sadosrd20 Lustre: 7267:0: (filter.c:744:filter_init_server_data()) scia-OST0005: recovery support OFF Aug 13 11:27:01 sadosrd20 Lustre: OST scia-OST0005 now serving dev (scia-OST0005/ca6d322c-65d4-968c-4f25-3f37937678a8) with recovery disabled Aug 13 11:27:01 sadosrd20 Lustre: Server scia-OST0005 on device /dev/sdc has started Aug 13 11:27:06 sadosrd20 Lustre: scia-OST0005: received MDS connection from 192.168.16.122 at tcp Aug 13 11:27:06 sadosrd20 Lustre: 6414:0: (filter.c:2774:filter_destroy_precreated()) scia-OST0005: deleting orphan objects from 3073 to 3180
Hello again, any idea what can be done in such a case ? Regards Heiko Hello, after a crash (hardware failure) of an OST with two lustre partitions one partition (/dev/sdb) cannot be remounted after restart. The second (/dev/sdc) partition mounts fine. What needs to be done in such a case ? I tried to move the mountpoint because of the "file exists" message but that does not help. Any pointers welcome. Heiko OST messages after mount command: mount -t lustre /dev/sdb /mnt/data/ost3 <snip> Aug 13 11:18:53 sadosrd20 kjournald starting. Commit interval 5 seconds Aug 13 11:18:53 sadosrd20 LDISKFS FS on sdb, internal journal Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 13 11:18:53 sadosrd20 LDISKFS-fs: file extents enabled Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mballoc enabled Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:(genops.c:246:class_newdev()) Device scia-OST0004 already exists, won''t add Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of type obdfilter : -17 Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_config.c:1070:class_config_llog_handler()) Err -17 on cfg command: Aug 13 11:18:54 sadosrd20 Lustre: cmd=cf001 0:scia-OST0004 1:obdfilter 2:scia-OST0004_UUID Aug 13 11:18:54 sadosrd20 LustreError: 15c-8: MGC192.168.16.122 at tcp: The configuration from log ''scia-OST0004'' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_mount.c:1091:server_start_targets()) failed to start server scia-OST0004: -17 Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_mount.c:1597:server_fill_super()) Unable to start targets: -17 Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_mount.c:1382:server_put_super()) no obd scia-OST0004 Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 generated and it took 7512 Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 256 preallocated, 0 discarded Aug 13 11:18:55 sadosrd20 Lustre: server umount scia-OST0004 complete Aug 13 11:18:55 sadosrd20 LustreError: 7247:0: (obd_mount.c:1951:lustre_fill_super()) Unable to mount (-17) <snap> OST parameter: mkfs.lustre --param="failover.mode=failout" --fsname scia --ost --mkfsoptions=''-i 2097152 -E stride=16 -b 4096'' --mgsnode=mds1 at tcp0 /dev/sdb mkfs.lustre --param="failover.mode=failout" --fsname scia --ost --mkfsoptions=''-i 2097152 -E stride=16 -b 4096'' --mgsnode=mds1 at tcp0 /dev/sdc MDS parameter: mkfs.lustre --fsname=scia --mdt --mgs --failnode=mds2 /dev/drbd0 Just for your info the OST output of the ok partition after mounting: Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, recovered transactions 72449 to 74105 Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 7548 and revoked 0/0 blocks Aug 13 11:27:00 sadosrd20 kjournald starting. Commit interval 5 seconds Aug 13 11:27:00 sadosrd20 LDISKFS FS on sdc, internal journal Aug 13 11:27:00 sadosrd20 LDISKFS-fs: recovery complete. Aug 13 11:27:00 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 13 11:27:01 sadosrd20 kjournald starting. Commit interval 5 seconds Aug 13 11:27:01 sadosrd20 LDISKFS FS on sdc, internal journal Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 13 11:27:01 sadosrd20 LDISKFS-fs: file extents enabled Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mballoc enabled Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:(filter.c:1732:filter_common_setup()) scia-OST0005: recovery disabled Aug 13 11:27:01 sadosrd20 Lustre: 7267:0: (filter.c:744:filter_init_server_data()) scia-OST0005: recovery support OFF Aug 13 11:27:01 sadosrd20 Lustre: OST scia-OST0005 now serving dev (scia-OST0005/ca6d322c-65d4-968c-4f25-3f37937678a8) with recovery disabled Aug 13 11:27:01 sadosrd20 Lustre: Server scia-OST0005 on device /dev/sdc has started Aug 13 11:27:06 sadosrd20 Lustre: scia-OST0005: received MDS connection from 192.168.16.122 at tcp Aug 13 11:27:06 sadosrd20 Lustre: 6414:0: (filter.c:2774:filter_destroy_precreated()) scia-OST0005: deleting orphan objects from 3073 to 3180
Hello, Replying to myself. No we couldn''t get lustre up again and had to reinstall from scratch. :-( Keeping fingers crossed now we are running the productive system .... What bugs us is this part of the message on the MDS: Aug 13 11:18:54 sadosrd20 LustreError: 15c-8: MGC192.168.16.122 at tcp: The configuration from log ''scia-OST0004'' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Unfortunatly there are no more infos in the syslog. Regards Heiko Hello again, any idea what can be done in such a case ? Regards Heiko Hello, after a crash (hardware failure) of an OST with two lustre partitions one partition (/dev/sdb) cannot be remounted after restart. The second (/dev/sdc) partition mounts fine. What needs to be done in such a case ? I tried to move the mountpoint because of the "file exists" message but that does not help. Any pointers welcome. Heiko OST messages after mount command: mount -t lustre /dev/sdb /mnt/data/ost3 <snip> Aug 13 11:18:53 sadosrd20 kjournald starting. Commit interval 5 seconds Aug 13 11:18:53 sadosrd20 LDISKFS FS on sdb, internal journal Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 13 11:18:53 sadosrd20 LDISKFS-fs: file extents enabled Aug 13 11:18:53 sadosrd20 LDISKFS-fs: mballoc enabled Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:(genops.c:246:class_newdev()) Device scia-OST0004 already exists, won''t add Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of type obdfilter : -17 Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_config.c:1070:class_config_llog_handler()) Err -17 on cfg command: Aug 13 11:18:54 sadosrd20 Lustre: cmd=cf001 0:scia-OST0004 1:obdfilter 2:scia-OST0004_UUID Aug 13 11:18:54 sadosrd20 LustreError: 15c-8: MGC192.168.16.122 at tcp: The configuration from log ''scia-OST0004'' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_mount.c:1091:server_start_targets()) failed to start server scia-OST0004: -17 Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_mount.c:1597:server_fill_super()) Unable to start targets: -17 Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: (obd_mount.c:1382:server_put_super()) no obd scia-OST0004 Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success) Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 1 generated and it took 7512 Aug 13 11:18:55 sadosrd20 LDISKFS-fs: mballoc: 256 preallocated, 0 discarded Aug 13 11:18:55 sadosrd20 Lustre: server umount scia-OST0004 complete Aug 13 11:18:55 sadosrd20 LustreError: 7247:0: (obd_mount.c:1951:lustre_fill_super()) Unable to mount (-17) <snap> OST parameter: mkfs.lustre --param="failover.mode=failout" --fsname scia --ost --mkfsoptions=''-i 2097152 -E stride=16 -b 4096'' --mgsnode=mds1 at tcp0 /dev/sdb mkfs.lustre --param="failover.mode=failout" --fsname scia --ost --mkfsoptions=''-i 2097152 -E stride=16 -b 4096'' --mgsnode=mds1 at tcp0 /dev/sdc MDS parameter: mkfs.lustre --fsname=scia --mdt --mgs --failnode=mds2 /dev/drbd0 Just for your info the OST output of the ok partition after mounting: Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, recovered transactions 72449 to 74105 Aug 13 11:26:58 sadosrd20 (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 7548 and revoked 0/0 blocks Aug 13 11:27:00 sadosrd20 kjournald starting. Commit interval 5 seconds Aug 13 11:27:00 sadosrd20 LDISKFS FS on sdc, internal journal Aug 13 11:27:00 sadosrd20 LDISKFS-fs: recovery complete. Aug 13 11:27:00 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 13 11:27:01 sadosrd20 kjournald starting. Commit interval 5 seconds Aug 13 11:27:01 sadosrd20 LDISKFS FS on sdc, internal journal Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 13 11:27:01 sadosrd20 LDISKFS-fs: file extents enabled Aug 13 11:27:01 sadosrd20 LDISKFS-fs: mballoc enabled Aug 13 11:27:01 sadosrd20 Lustre: 7267:0:(filter.c:1732:filter_common_setup()) scia-OST0005: recovery disabled Aug 13 11:27:01 sadosrd20 Lustre: 7267:0: (filter.c:744:filter_init_server_data()) scia-OST0005: recovery support OFF Aug 13 11:27:01 sadosrd20 Lustre: OST scia-OST0005 now serving dev (scia-OST0005/ca6d322c-65d4-968c-4f25-3f37937678a8) with recovery disabled Aug 13 11:27:01 sadosrd20 Lustre: Server scia-OST0005 on device /dev/sdc has started Aug 13 11:27:06 sadosrd20 Lustre: scia-OST0005: received MDS connection from 192.168.16.122 at tcp Aug 13 11:27:06 sadosrd20 Lustre: 6414:0: (filter.c:2774:filter_destroy_precreated()) scia-OST0005: deleting orphan objects from 3073 to 3180 _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ------------------------------------------------------- -- ----------------------------------------------------------------------- Dipl.-Ing. Heiko Schr?ter Institute of Environmental Physics (IUP) ? ?phone: ++49-(0)421-218-4080 Institute of Remote Sensing (IFE) ? ? ? ? ? fax: ? ++49-(0)421-218-4555 University of Bremen (FB1) P.O. Box 330440 ? ? ? ? ? ? ? email: ?schroete at iup.physik.uni-bremen.de Otto-Hahn-Allee 1 ? ? ? ? ? 28359 Bremen ? ? ? ? ? ? ? ? Germany -----------------------------------------------------------------------
On Thu, Aug 14, 2008 at 08:40:05AM +0200, Heiko Schroeter wrote:> What needs to be done in such a case ? > I tried to move the mountpoint because of the "file exists" message but that > does not help. > > Aug 13 11:18:54 sadosrd20 LustreError: 7247:0:(genops.c:246:class_newdev()) > Device scia-OST0004 already exists, won''t add > Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: > (obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of type > obdfilter : -17It seems that OST0004 is already up. Is this OST automatically mounted at boot time (via fstab)? Could you please run ''lctl dl''? Johann
Am Montag, 25. August 2008 13:54:46 schrieb Johann Lombardi: Sorry, the system had to be setup from scratch as we couldn''t find a solution. This has already happen. The mounts were done manually. Unfortunatly i did not memorize exactly the output of ''lctl dl''. Thanks anyway for your effort. Regards Heiko> On Thu, Aug 14, 2008 at 08:40:05AM +0200, Heiko Schroeter wrote: > > What needs to be done in such a case ? > > I tried to move the mountpoint because of the "file exists" message but > > that does not help. > > > > Aug 13 11:18:54 sadosrd20 LustreError: > > 7247:0:(genops.c:246:class_newdev()) Device scia-OST0004 already exists, > > won''t add > > Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: > > (obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of > > type obdfilter : -17 > > It seems that OST0004 is already up. Is this OST automatically mounted at > boot time (via fstab)? Could you please run ''lctl dl''? > > Johann
Darn, We are curious what happened now. On Mon, Aug 25, 2008 at 9:35 AM, Heiko Schroeter <schroete at iup.physik.uni-bremen.de> wrote:> Am Montag, 25. August 2008 13:54:46 schrieb Johann Lombardi: > > Sorry, the system had to be setup from scratch as we couldn''t find a solution. > This has already happen. > > The mounts were done manually. > Unfortunatly i did not memorize exactly the output of ''lctl dl''. > > Thanks anyway for your effort. > > Regards > Heiko > > >> On Thu, Aug 14, 2008 at 08:40:05AM +0200, Heiko Schroeter wrote: >> > What needs to be done in such a case ? >> > I tried to move the mountpoint because of the "file exists" message but >> > that does not help. >> > >> > Aug 13 11:18:54 sadosrd20 LustreError: >> > 7247:0:(genops.c:246:class_newdev()) Device scia-OST0004 already exists, >> > won''t add >> > Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: >> > (obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of >> > type obdfilter : -17 >> >> It seems that OST0004 is already up. Is this OST automatically mounted at >> boot time (via fstab)? Could you please run ''lctl dl''? >> >> Johann > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Since the new setup everything is running fine. Why ? Except my backbone, which keeps on etching when something is not quite figured out ;-) When this sort of breakdown happens again i do have a blank undersigned ''Blitz Holliday'' form ready in my desk .... Seriously, i also would like to know what the cause (besides the power loss) is and how to repair a lustre system in such a case. Heiko> Darn, We are curious what happened now. > > On Mon, Aug 25, 2008 at 9:35 AM, Heiko Schroeter > > <schroete at iup.physik.uni-bremen.de> wrote: > > Am Montag, 25. August 2008 13:54:46 schrieb Johann Lombardi: > > > > Sorry, the system had to be setup from scratch as we couldn''t find a > > solution. This has already happen. > > > > The mounts were done manually. > > Unfortunatly i did not memorize exactly the output of ''lctl dl''. > > > > Thanks anyway for your effort. > > > > Regards > > Heiko > > > >> On Thu, Aug 14, 2008 at 08:40:05AM +0200, Heiko Schroeter wrote: > >> > What needs to be done in such a case ? > >> > I tried to move the mountpoint because of the "file exists" message > >> > but that does not help. > >> > > >> > Aug 13 11:18:54 sadosrd20 LustreError: > >> > 7247:0:(genops.c:246:class_newdev()) Device scia-OST0004 already > >> > exists, won''t add > >> > Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: > >> > (obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of > >> > type obdfilter : -17 > >> > >> It seems that OST0004 is already up. Is this OST automatically mounted > >> at boot time (via fstab)? Could you please run ''lctl dl''? > >> > >> Johann > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
LOL... I am in the same situation, I want to see what problems other people have so I can try to help them and I can further avoid it. I am a big proponent of "your problems are my problems" :-) On Tue, Aug 26, 2008 at 9:03 AM, Heiko Schroeter <schroete at iup.physik.uni-bremen.de> wrote:> > Since the new setup everything is running fine. Why ? > > Except my backbone, which keeps on etching when something is not quite figured > out ;-) > > When this sort of breakdown happens again i do have a blank undersigned ''Blitz > Holliday'' form ready in my desk .... > > Seriously, i also would like to know what the cause (besides the power loss) > is and how to repair a lustre system in such a case. > > Heiko > > >> Darn, We are curious what happened now. >> >> On Mon, Aug 25, 2008 at 9:35 AM, Heiko Schroeter >> >> <schroete at iup.physik.uni-bremen.de> wrote: >> > Am Montag, 25. August 2008 13:54:46 schrieb Johann Lombardi: >> > >> > Sorry, the system had to be setup from scratch as we couldn''t find a >> > solution. This has already happen. >> > >> > The mounts were done manually. >> > Unfortunatly i did not memorize exactly the output of ''lctl dl''. >> > >> > Thanks anyway for your effort. >> > >> > Regards >> > Heiko >> > >> >> On Thu, Aug 14, 2008 at 08:40:05AM +0200, Heiko Schroeter wrote: >> >> > What needs to be done in such a case ? >> >> > I tried to move the mountpoint because of the "file exists" message >> >> > but that does not help. >> >> > >> >> > Aug 13 11:18:54 sadosrd20 LustreError: >> >> > 7247:0:(genops.c:246:class_newdev()) Device scia-OST0004 already >> >> > exists, won''t add >> >> > Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: >> >> > (obd_config.c:180:class_attach()) Cannot create device scia-OST0004 of >> >> > type obdfilter : -17 >> >> >> >> It seems that OST0004 is already up. Is this OST automatically mounted >> >> at boot time (via fstab)? Could you please run ''lctl dl''? >> >> >> >> Johann >> > >> > _______________________________________________ >> > Lustre-discuss mailing list >> > Lustre-discuss at lists.lustre.org >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
Mag Gam wrote:> LOL... I am in the same situation, I want to see what problems other > people have so I can try to help them and I can further avoid it. I am > a big proponent of "your problems are my problems" :-)When we first implemented Lustre, I had several learning curves with OSTs going down, drives failing... Eventually we settled on backing up the Lustre filesystem to a backup array in the case a OST would fail. It does take some work, but we find that rebuilding Lustre after an OST failure works for us. -- Jeremy Mann jeremy at biochem.uthscsa.edu University of Texas Health Science Center Bioinformatics Core Facility (210) 567-2672
Am Dienstag, 26. August 2008 15:20:01 schrieb Mag Gam: I did respond to the list (and myself) of this unresolved issue and the rerack last week (19.8). Unfortunatly i dont'' have time to do more investigations because we moved on to our productive system and i had to get things going. And _this_ bugs me. So, i don''t know about you but i do have my forms ready ... Heiko> LOL... I am in the same situation, I want to see what problems other > people have so I can try to help them and I can further avoid it. I am > a big proponent of "your problems are my problems" :-) > > > > On Tue, Aug 26, 2008 at 9:03 AM, Heiko Schroeter > > <schroete at iup.physik.uni-bremen.de> wrote: > > Since the new setup everything is running fine. Why ? > > > > Except my backbone, which keeps on etching when something is not quite > > figured out ;-) > > > > When this sort of breakdown happens again i do have a blank undersigned > > ''Blitz Holliday'' form ready in my desk .... > > > > Seriously, i also would like to know what the cause (besides the power > > loss) is and how to repair a lustre system in such a case. > > > > Heiko > > > >> Darn, We are curious what happened now. > >> > >> On Mon, Aug 25, 2008 at 9:35 AM, Heiko Schroeter > >> > >> <schroete at iup.physik.uni-bremen.de> wrote: > >> > Am Montag, 25. August 2008 13:54:46 schrieb Johann Lombardi: > >> > > >> > Sorry, the system had to be setup from scratch as we couldn''t find a > >> > solution. This has already happen. > >> > > >> > The mounts were done manually. > >> > Unfortunatly i did not memorize exactly the output of ''lctl dl''. > >> > > >> > Thanks anyway for your effort. > >> > > >> > Regards > >> > Heiko > >> > > >> >> On Thu, Aug 14, 2008 at 08:40:05AM +0200, Heiko Schroeter wrote: > >> >> > What needs to be done in such a case ? > >> >> > I tried to move the mountpoint because of the "file exists" message > >> >> > but that does not help. > >> >> > > >> >> > Aug 13 11:18:54 sadosrd20 LustreError: > >> >> > 7247:0:(genops.c:246:class_newdev()) Device scia-OST0004 already > >> >> > exists, won''t add > >> >> > Aug 13 11:18:54 sadosrd20 LustreError: 7247:0: > >> >> > (obd_config.c:180:class_attach()) Cannot create device scia-OST0004 > >> >> > of type obdfilter : -17 > >> >> > >> >> It seems that OST0004 is already up. Is this OST automatically > >> >> mounted at boot time (via fstab)? Could you please run ''lctl dl''? > >> >> > >> >> Johann > >> > > >> > _______________________________________________ > >> > Lustre-discuss mailing list > >> > Lustre-discuss at lists.lustre.org > >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Am Dienstag, 26. August 2008 15:35:38 schrieb Jeremy Mann:> Mag Gam wrote: > > LOL... I am in the same situation, I want to see what problems other > > people have so I can try to help them and I can further avoid it. I am > > a big proponent of "your problems are my problems" :-) > > When we first implemented Lustre, I had several learning curves with > OSTs going down, drives failing... Eventually we settled on backing up > the Lustre filesystem to a backup array in the case a OST would fail. It > does take some work, but we find that rebuilding Lustre after an OST > failure works for us.Hm, we have 52TB OST space so far and more than 25TB are coming within the next days. So no backup space there ... I can live with an OST breaking down. Since we do stripe a single file onto a single OST and the lustre works as a ''fast'' data archive there is no problem to recreate the data on a single OST. But in this case the faulty OST couldn''t be removed from the lustre system at all ! (Or at least we couldn''t do it ...) That is the deeper reason why we did a new setup. Sorry if i haven''t said this earlier. Heiko