In replying to myself here,
I did mange to get my disk mounted to perform a new benchmark test.
I could not figure a way around the "mount.lustre: mount /dev/sdg1 at /
srv/lustre/mds/crew5-MDT0000 failed:
Address already in use; The target service''s index is already in use.
(/dev/sdg1)" error.
Even rebooting both the OSS and MDS did not help.
So, being as this was just a test of hw with a different stripesize
setting on an LSI 8888ELP RAID card (128kB in place of default 64kB),
I re-created both the OST and the MDT using a new, unique fsname and
all of the same hardware.
This worked like a charm.
....someday I will have to figure out what to do with the "cast-off"
MDT names which I apparently may no longer use...
Any comment, observations, suggestions appreciated.
Later,
megan
On Aug 14, 12:55?pm, "Ms. Megan Larko" <dobsonu... at gmail.com>
wrote:> Hello,
>
> As a part of my continuing to benchmark Lustre to ascertain was may be
> best-suited for our needs here, I have re-created at the LSI 8888ELP
> card level some of my arrays from my earlier benchmark posts. ?The
> card is now sending /dev/sdf 998999Mb with 128kB stripesize and
> /dev/sdg 6992995 Mb with 128kB stripesize to my OSS. ?The sdf and sdg
> formatted fine with Lustre and mounted without issue on the OSS.
> Recycling the MGS MDT''s seem to have been a problem. ?When I tried
to
> mount the MDT on the MGS after mounting the new OST''s the mounts
> performed without error, but the bonnie benchmark test as run before
> would hang every time.
>
> Sample of errors in MGS file /var/log/messages:
> Aug 13 12:39:30 mds1 kernel: Lustre: crew5-OST0001-osc: Connection to
> service crew5-OST000
> 1 via nid 172.18.0.15 at o2ib was lost; in progress operations using this
> service will wait for recovery to complete.
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Aug 13 12:39:30 mds1 kernel:
> LustreError: 167-0: This client was evicted by crew5-OST0001;
> ?in progress operations using this service will fail.
> ? ? ? ? ? ? ? ? ? ?Aug 13 12:39:30 mds1 kernel: Lustre:
> crew5-OST0001-osc: Connection restored to service crew5-OST0001 using
> nid 172.18.0.15 at o2ib.
> Aug 13 12:39:30 mds1 kernel: Lustre: MDS crew5-MDT0000:
> crew5-OST0001_UUID now active, resetting orphans
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Aug 13 12:42:42
> mds1 kernel: Lustre: 3406:0:(ldlm_lib.c:519:target_handle_reconnect())
> cre
> w5-MDT0000: 50b043bb-0e8c-7a5b-b0fe-6bdb67d21e0b reconnecting
> ? ? ? ? ? ? ? ? ? ?Aug 13 12:42:42 mds1 kernel: Lustre:
> 3406:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 24 previous
> similar messages
> Aug 13 12:42:42 mds1 kernel: Lustre:
> 3406:0:(ldlm_lib.c:747:target_handle_connect()) crew5-MDT0000: refuse
> reconnection from
> 50b043bb-0e8c-7a5b-b0fe-6bdb67d21... at 172.18.0.14@o2ib to
> 0xffff81006994d000; still busy with 2 active RPCs
> Aug 13 12:42:42 mds1 kernel: Lustre:
> 3406:0:(ldlm_lib.c:747:target_handle_connect()) Skipped 24 previous
> similar messages
> ? ? Aug 13 12:42:42 mds1 kernel: LustreError:
> 3406:0:(ldlm_lib.c:1442:target_send_reply_msg())
> ?@@@ processing error (-16) ?req at ffff81007b621400 x600107/t0
> o38->50b043bb-0e8c-7a5b-b0fe-6bdb67d21e0b at NET_0x50000ac12000e_UUID:-1
> lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 ? ?Aug 13 12:42:42 mds1
> kernel: LustreError: 3406:0:(ldlm_lib.c:1442:target_send_reply_msg())
> ?Skipped 24 previous similar messages
> ? ? ? ? ? ? ? ? ? ?Aug 13 12:43:40 mds1 kernel: LustreError: 11-0: an
> error occurred while communicating with 172.18.0.15 at o2ib. The
> ost_connect operation failed with -19
> Aug 13 12:43:40 mds1 kernel: LustreError: Skipped 7 previous similar
> messages ? ? ? ? ? ? Aug 13 12:47:50 mds1 kernel: Lustre:
> crew5-OST0001-osc: Connection to service crew5-OST0001 via nid
> 172.18.0.15 at o2ib was lost; in progress operations using this service
> will wait f
> or recovery to complete.
>
> Sample of errors on OSS file /var/log/messages:
> Aug 13 12:39:30 oss4 kernel: Lustre: crew5-OST0001: received MDS
> connection from 172.18.0.10 at o2ib
> Aug 13 12:43:57 oss4 kernel: Lustre: crew5-OST0001: haven''t heard
from
> client crew5-mdtlov_UUID (at 172.18.0.10 at o2ib) in 267 seconds. I think
> it''s dead, and I am evicting it.
> Aug 13 12:46:27 oss4 kernel: LustreError: 137-5: UUID
> ''crew5-OST0000_UUID'' is not available ?for connect (no
target)
> Aug 13 12:46:27 oss4 kernel: LustreError: Skipped 51 previous similar
messages
> Aug 13 12:46:27 oss4 kernel: LustreError:
> 4151:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error
> (-19) ?req at ffff81042fa56800 x600171/t0 o8-><?>@<?>:-1
lens 240/0 ref 0
> fl Interpret:/0/0 rc -19/0
> Aug 13 12:46:27 oss4 kernel: LustreError:
> 4151:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 52 previous
> similar messages
> Aug 13 12:47:50 oss4 kernel: Lustre: crew5-OST0001: received MDS
> connection from 172.18.0.10 at o2ib
>
> In lctl all pings were successful. ?Additionally files on Lustre disks
> on our live system using the same MGS were all fine; no errors in
> logfile.
>
> I thought that maybe changing the disk kB size and reformatting the
> OST without reformatting the MDT was a problem. ?So I unmounted OST
> and MDT and reformatted the MDT on the MGS. ?All okay. ? The OST
> remount without error. ? ?The MDT on the MGS will not remount:
>
> [root at mds1 ~]# mount -t lustre /dev/sdg1 /srv/lustre/mds/crew5-MDT0000
> mount.lustre: mount /dev/sdg1 at /srv/lustre/mds/crew5-MDT0000 failed:
> Address already in use
> The target service''s index is already in use. (/dev/sdg1)
>
> Again, the live systems on the MGS are still fine. ? A web search for
> the error suggested I try " tunefs.lustre --reformat --index=0
> --writeconf=/dev/sdg1" but I was unable to get a syntax of that
> command that would run for me (I tried various index= and adding
> /dev/sdg1 at the end of the line but it failed each time and only
> reprinted the help without indicating what about what I typed was
> unparseable).
>
> My current thought is that a stripesize of 128kB from the LSI 8888ELP
> card is not testable on my Lustre 1.6.4 system. ?This does not seem to
> be an accurate statement from what I have read of Lustre but seems to
> be what is occurring on my systems. ?I will test one more time back at
> 64kB stripesize.
>
> megan
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss