Hello, I have a setup with Lustre server and Lustre clients using o2ib. It works. I decided to add more clients, unfortunately the new clients does not have IB card. So I added the option on the server: options lnet networks="o2ib,tcp0" /usr/local/lustre/sbin/lctl list_nids 10.0.0.1 at o2ib 192.168.0.1 at tcp However, a client using tcp complains about: mount -t lustre 192.168.0.1 at tcp:/spfs /mnt/lustre/ mount.lustre: mount 192.168.0.1 at tcp:/spfs at /mnt/lustre failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) This is from dmesg: LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for 10.0.0.1 at o2ib LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find peer 10.0.0.1 at o2ib! LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can''t add initial connection LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL connection LustreError: 15342:0:(obd_config.c:336:class_setup()) setup spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2) LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on cfg command: Lustre: cmd=cf003 0:spfs-MDT0000-mdc 1:spfs-MDT0000_UUID 2:10.0.0.1 at o2ib LustreError: 15c-8: MGC192.168.0.1 at tcp: The configuration from log ''spfs-client'' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log: -2 LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: client ffff8801d1d67c00 umount complete LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount (-2) Is there a way I can upgrade the singlehomed server to the multihomed server? Do I really need to setup a router? How does it work? Is there any slowdown due to routing? -- Luk?? Hejtm?nek
Hi, Do you have just one lustre server which serves as OSS and MDS/MGS ? Can you paste output from `lctl ping <server_nid>` run on client? Does the ethernet client has only one interface or is there more? Did you also set lnet option (in modprobe.conf) on the clients? Can you send output from `lctl list_nids` run on server(s) And also output from `tunefs.lustre --print /dev/<lustre_target>` run on the server Cheers Wojciech Lukas Hejtmanek wrote:> Hello, > > I have a setup with Lustre server and Lustre clients using o2ib. It works. > I decided to add more clients, unfortunately the new clients does not have IB > card. So I added the option on the server: > options lnet networks="o2ib,tcp0" > > /usr/local/lustre/sbin/lctl list_nids > 10.0.0.1 at o2ib > 192.168.0.1 at tcp > > However, a client using tcp complains about: > mount -t lustre 192.168.0.1 at tcp:/spfs /mnt/lustre/ > mount.lustre: mount 192.168.0.1 at tcp:/spfs at /mnt/lustre failed: No such file or > directory > Is the MGS specification correct? > Is the filesystem name correct? > If upgrading, is the copied client log valid? (see upgrade docs) > > This is from dmesg: > LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for > 10.0.0.1 at o2ib > LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find > peer 10.0.0.1 at o2ib! > LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can''t add initial > connection > LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL > connection > LustreError: 15342:0:(obd_config.c:336:class_setup()) setup > spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2) > LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on > cfg command: > Lustre: cmd=cf003 0:spfs-MDT0000-mdc 1:spfs-MDT0000_UUID 2:10.0.0.1 at o2ib > LustreError: 15c-8: MGC192.168.0.1 at tcp: The configuration from log > ''spfs-client'' failed (-2). This may be the result of communication errors > between this node and the MGS, a bad configuration, or other errors. See the > syslog for more information. > LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log: > -2 > LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup > LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108 > from cancel RPC: canceling anyway > LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list()) > ldlm_cli_cancel_list: -108 > Lustre: client ffff8801d1d67c00 umount complete > LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount > (-2) > > Is there a way I can upgrade the singlehomed server to the multihomed server? > Do I really need to setup a router? How does it work? Is there any slowdown > due to routing? > >-- Wojciech Turek Assistant System Manager High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517
On Thu, Jan 08, 2009 at 04:46:39PM +0000, Wojciech Turek wrote:> Do you have just one lustre server which serves as OSS and MDS/MGS ?yes only one lustre server which serves as two OSSs and one MDS/MGS.> Can you paste output from `lctl ping <server_nid>` run on client?./lctl ping 192.168.0.1 at tcp 12345-0 at lo 12345-10.0.0.1 at o2ib 12345-192.168.0.1 at tcp> Does the ethernet client has only one interface or is there more?only one.> Did you also set lnet option (in modprobe.conf) on the clients?no, lnet has no option on the client.> Can you send output from `lctl list_nids` run on server(s)# /usr/local/lustre/sbin/lctl list_nids 10.0.0.1 at o2ib 192.168.0.1 at tcp> And also output from `tunefs.lustre --print /dev/<lustre_target>` run on > the server# /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_1 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: Permanent disk data: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: exiting before disk write. # /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_2 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib Permanent disk data: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib exiting before disk write. # /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_3 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib Permanent disk data: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib exiting before disk write.> > Cheers > > Wojciech > > > > Lukas Hejtmanek wrote: >> Hello, >> >> I have a setup with Lustre server and Lustre clients using o2ib. It works. >> I decided to add more clients, unfortunately the new clients does not have IB >> card. So I added the option on the server: >> options lnet networks="o2ib,tcp0" >> >> /usr/local/lustre/sbin/lctl list_nids >> 10.0.0.1 at o2ib >> 192.168.0.1 at tcp >> >> However, a client using tcp complains about: >> mount -t lustre 192.168.0.1 at tcp:/spfs /mnt/lustre/ >> mount.lustre: mount 192.168.0.1 at tcp:/spfs at /mnt/lustre failed: No such file or >> directory >> Is the MGS specification correct? >> Is the filesystem name correct? >> If upgrading, is the copied client log valid? (see upgrade docs) >> >> This is from dmesg: >> LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for >> 10.0.0.1 at o2ib >> LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find >> peer 10.0.0.1 at o2ib! >> LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can''t add initial >> connection >> LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL >> connection >> LustreError: 15342:0:(obd_config.c:336:class_setup()) setup >> spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2) >> LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on >> cfg command: >> Lustre: cmd=cf003 0:spfs-MDT0000-mdc 1:spfs-MDT0000_UUID >> 2:10.0.0.1 at o2ib LustreError: 15c-8: MGC192.168.0.1 at tcp: The >> configuration from log >> ''spfs-client'' failed (-2). This may be the result of communication errors >> between this node and the MGS, a bad configuration, or other errors. See the >> syslog for more information. >> LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log: >> -2 >> LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup >> LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108 >> from cancel RPC: canceling anyway >> LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list()) >> ldlm_cli_cancel_list: -108 >> Lustre: client ffff8801d1d67c00 umount complete >> LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount >> (-2) >> >> Is there a way I can upgrade the singlehomed server to the multihomed server? >> Do I really need to setup a router? How does it work? Is there any slowdown >> due to routing? >> >> > > -- > Wojciech Turek > > Assistant System Manager > High Performance Computing Service > University of Cambridge > Email: wjt27 at cam.ac.uk > Tel: (+)44 1223 763517-- Luk?? Hejtm?nek
Hi, You need to add new Ethernet NID to the Lustre target config logs Stop your Lustre file system (umount everything) Run this on all OST(s) and MDT tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp --writeconf /dev/dev/Scratch_VG/Scratch_1 tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp --writeconf /dev/dev/Scratch_VG/Scratch_2 tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp --writeconf /dev/dev/Scratch_VG/Scratch_3 Above commands erase current lustre configuration logs from the Lustre targets and write new configuration. Mount MDT, OSTs and the client and let me know how it works for you. I also recommend to add modprobe.conf line on the clients, although this is not necessary in your case, it will make configuration more sane. options lnet networks=tcp(eth0) Cheers Wojciech Turek Lukas Hejtmanek wrote: On Thu, Jan 08, 2009 at 04:46:39PM +0000, Wojciech Turek wrote: Do you have just one lustre server which serves as OSS and MDS/MGS ? yes only one lustre server which serves as two OSSs and one MDS/MGS. Can you paste output from `lctl ping ` run on client? ./lctl ping 192.168.0.1@tcp 12345-0@lo 12345-10.0.0.1@o2ib 12345-192.168.0.1@tcp Does the ethernet client has only one interface or is there more? only one. Did you also set lnet option (in modprobe.conf) on the clients? no, lnet has no option on the client. Can you send output from `lctl list_nids` run on server(s) # /usr/local/lustre/sbin/lctl list_nids 10.0.0.1@o2ib 192.168.0.1@tcp And also output from `tunefs.lustre --print /dev/` run on the server # /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_1 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: Permanent disk data: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: exiting before disk write. # /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_2 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1@o2ib Permanent disk data: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1@o2ib exiting before disk write. # /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_3 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1@o2ib Permanent disk data: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1@o2ib exiting before disk write. Cheers Wojciech Lukas Hejtmanek wrote: Hello, I have a setup with Lustre server and Lustre clients using o2ib. It works. I decided to add more clients, unfortunately the new clients does not have IB card. So I added the option on the server: options lnet networks="o2ib,tcp0" /usr/local/lustre/sbin/lctl list_nids 10.0.0.1@o2ib 192.168.0.1@tcp However, a client using tcp complains about: mount -t lustre 192.168.0.1@tcp:/spfs /mnt/lustre/ mount.lustre: mount 192.168.0.1@tcp:/spfs at /mnt/lustre failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) This is from dmesg: LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for 10.0.0.1@o2ib LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find peer 10.0.0.1@o2ib! LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can''t add initial connection LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL connection LustreError: 15342:0:(obd_config.c:336:class_setup()) setup spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2) LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on cfg command: Lustre: cmd=cf003 0:spfs-MDT0000-mdc 1:spfs-MDT0000_UUID 2:10.0.0.1@o2ib LustreError: 15c-8: MGC192.168.0.1@tcp: The configuration from log ''spfs-client'' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log: -2 LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: client ffff8801d1d67c00 umount complete LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount (-2) Is there a way I can upgrade the singlehomed server to the multihomed server? Do I really need to setup a router? How does it work? Is there any slowdown due to routing? -- Wojciech Turek Assistant System Manager High Performance Computing Service University of Cambridge Email: wjt27-KWPb1pKIrIJaa/9Udqfwiw@public.gmane.org Tel: (+)44 1223 763517 _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hello,> Hi, > > You need to add new Ethernet NID to the Lustre target config logs > Stop your Lustre file system (umount everything) > Run this on all OST(s) and MDT > tunefs.lustre --erase-param --mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp > --writeconf /dev/dev/Scratch_VG/Scratch_1 > tunefs.lustre --erase-param --mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp > --writeconf /dev/dev/Scratch_VG/Scratch_2 > tunefs.lustre --erase-param --mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp > --writeconf /dev/dev/Scratch_VG/Scratch_3 > > Above commands erase current lustre configuration logs from the Lustre targets > and write new configuration. > > Mount MDT, OSTs and the client and let me know how it works for you. > I also recommend to add modprobe.conf line on the clients, although this is > not necessary in your case, it will make configuration more sane. > options lnet networks=tcp(eth0)I did it. Unfortunately, the TCP client still does not work. After remount, the IB client works. The TCP client is able to mount and list the file sytem but it is unable to read, write or create files. Dmesg shows these errors: Lustre: 17857:0:(import.c:396:import_select_connection()) spfs-OST0000-osc-ffff8800e2492800: tried all connections, increasing latency to 26s Lustre: 17857:0:(import.c:396:import_select_connection()) Skipped 1 previous similar message LustreError: 11-0: an error occurred while communicating with 192.168.0.1 at tcp. The ost_connect operation failed with -16 LustreError: Skipped 1 previous similar message This is how it looks like now: /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_1 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp Permanent disk data: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp exiting before disk write. /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_2 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp Permanent disk data: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp exiting before disk write. /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_3 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp Permanent disk data: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp exiting before disk write. -- Luk?? Hejtm?nek
Hi I see you get errno -16 -- -16 = EBUSY. This say client reconnected to server which already work on different request from this client. After old rpc from this client will be finished - client will be reconnected. -- Did you cleanly stopped lustre? The procedure I use is: 1) on the client umount /mnt/ lustre_rmmod 2) On the OSS umount /mnt/ lustre_rmmod 3) On the MDS umount /mnt/ lustre_rmmod Make sure that your lustre isn''t in the recovery state Please run on OSS and MDS following commands cat /proc/fs/lustre/obdfilter/*/recovery_status cat /proc/fs/lustre/mds/*/recovery_status If you see there COMPLETE or INACTIVE it means that Lustre isn''t in the recovery mode. Please could you paste here output from command run on the Lustre client lctl list_nids and from command run on server: lctl ping Cheers, Wojciech Lukas Hejtmanek wrote: Hello, Hi, You need to add new Ethernet NID to the Lustre target config logs Stop your Lustre file system (umount everything) Run this on all OST(s) and MDT tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp --writeconf /dev/dev/Scratch_VG/Scratch_1 tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp --writeconf /dev/dev/Scratch_VG/Scratch_2 tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp --writeconf /dev/dev/Scratch_VG/Scratch_3 Above commands erase current lustre configuration logs from the Lustre targets and write new configuration. Mount MDT, OSTs and the client and let me know how it works for you. I also recommend to add modprobe.conf line on the clients, although this is not necessary in your case, it will make configuration more sane. options lnet networks=tcp(eth0) I did it. Unfortunately, the TCP client still does not work. After remount, the IB client works. The TCP client is able to mount and list the file sytem but it is unable to read, write or create files. Dmesg shows these errors: Lustre: 17857:0:(import.c:396:import_select_connection()) spfs-OST0000-osc-ffff8800e2492800: tried all connections, increasing latency to 26s Lustre: 17857:0:(import.c:396:import_select_connection()) Skipped 1 previous similar message LustreError: 11-0: an error occurred while communicating with 192.168.0.1@tcp. The ost_connect operation failed with -16 LustreError: Skipped 1 previous similar message This is how it looks like now: /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_1 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp Permanent disk data: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp exiting before disk write. /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_2 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp Permanent disk data: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp exiting before disk write. /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_3 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp Permanent disk data: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp exiting before disk write. _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hello,> I see you get errno -16 > -- > > -16 = EBUSY. This say client reconnected to server which already work on > different request from this client. After old rpc from this client will > be finished - client will be reconnected. > > -- > > Did you cleanly stopped lustre? The procedure I use is:I did not restart the TCP client. After its restarting, it is OK now. Thanks a lot! -- Luk?? Hejtm?nek