Hello, I have a setup with Lustre server and Lustre clients using o2ib. It works. I decided to add more clients, unfortunately the new clients does not have IB card. So I added the option on the server: options lnet networks="o2ib,tcp0" /usr/local/lustre/sbin/lctl list_nids 10.0.0.1 at o2ib 192.168.0.1 at tcp However, a client using tcp complains about: mount -t lustre 192.168.0.1 at tcp:/spfs /mnt/lustre/ mount.lustre: mount 192.168.0.1 at tcp:/spfs at /mnt/lustre failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) This is from dmesg: LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for 10.0.0.1 at o2ib LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find peer 10.0.0.1 at o2ib! LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can''t add initial connection LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL connection LustreError: 15342:0:(obd_config.c:336:class_setup()) setup spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2) LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on cfg command: Lustre: cmd=cf003 0:spfs-MDT0000-mdc 1:spfs-MDT0000_UUID 2:10.0.0.1 at o2ib LustreError: 15c-8: MGC192.168.0.1 at tcp: The configuration from log ''spfs-client'' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log: -2 LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: client ffff8801d1d67c00 umount complete LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount (-2) Is there a way I can upgrade the singlehomed server to the multihomed server? Do I really need to setup a router? How does it work? Is there any slowdown due to routing? -- Luk?? Hejtm?nek
Hi, Do you have just one lustre server which serves as OSS and MDS/MGS ? Can you paste output from `lctl ping <server_nid>` run on client? Does the ethernet client has only one interface or is there more? Did you also set lnet option (in modprobe.conf) on the clients? Can you send output from `lctl list_nids` run on server(s) And also output from `tunefs.lustre --print /dev/<lustre_target>` run on the server Cheers Wojciech Lukas Hejtmanek wrote:> Hello, > > I have a setup with Lustre server and Lustre clients using o2ib. It works. > I decided to add more clients, unfortunately the new clients does not have IB > card. So I added the option on the server: > options lnet networks="o2ib,tcp0" > > /usr/local/lustre/sbin/lctl list_nids > 10.0.0.1 at o2ib > 192.168.0.1 at tcp > > However, a client using tcp complains about: > mount -t lustre 192.168.0.1 at tcp:/spfs /mnt/lustre/ > mount.lustre: mount 192.168.0.1 at tcp:/spfs at /mnt/lustre failed: No such file or > directory > Is the MGS specification correct? > Is the filesystem name correct? > If upgrading, is the copied client log valid? (see upgrade docs) > > This is from dmesg: > LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for > 10.0.0.1 at o2ib > LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find > peer 10.0.0.1 at o2ib! > LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can''t add initial > connection > LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL > connection > LustreError: 15342:0:(obd_config.c:336:class_setup()) setup > spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2) > LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on > cfg command: > Lustre: cmd=cf003 0:spfs-MDT0000-mdc 1:spfs-MDT0000_UUID 2:10.0.0.1 at o2ib > LustreError: 15c-8: MGC192.168.0.1 at tcp: The configuration from log > ''spfs-client'' failed (-2). This may be the result of communication errors > between this node and the MGS, a bad configuration, or other errors. See the > syslog for more information. > LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log: > -2 > LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup > LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108 > from cancel RPC: canceling anyway > LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list()) > ldlm_cli_cancel_list: -108 > Lustre: client ffff8801d1d67c00 umount complete > LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount > (-2) > > Is there a way I can upgrade the singlehomed server to the multihomed server? > Do I really need to setup a router? How does it work? Is there any slowdown > due to routing? > >-- Wojciech Turek Assistant System Manager High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517
On Thu, Jan 08, 2009 at 04:46:39PM +0000, Wojciech Turek wrote:> Do you have just one lustre server which serves as OSS and MDS/MGS ?yes only one lustre server which serves as two OSSs and one MDS/MGS.> Can you paste output from `lctl ping <server_nid>` run on client?./lctl ping 192.168.0.1 at tcp 12345-0 at lo 12345-10.0.0.1 at o2ib 12345-192.168.0.1 at tcp> Does the ethernet client has only one interface or is there more?only one.> Did you also set lnet option (in modprobe.conf) on the clients?no, lnet has no option on the client.> Can you send output from `lctl list_nids` run on server(s)# /usr/local/lustre/sbin/lctl list_nids 10.0.0.1 at o2ib 192.168.0.1 at tcp> And also output from `tunefs.lustre --print /dev/<lustre_target>` run on > the server# /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_1 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: Permanent disk data: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: exiting before disk write. # /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_2 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib Permanent disk data: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib exiting before disk write. # /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_3 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib Permanent disk data: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib exiting before disk write.> > Cheers > > Wojciech > > > > Lukas Hejtmanek wrote: >> Hello, >> >> I have a setup with Lustre server and Lustre clients using o2ib. It works. >> I decided to add more clients, unfortunately the new clients does not have IB >> card. So I added the option on the server: >> options lnet networks="o2ib,tcp0" >> >> /usr/local/lustre/sbin/lctl list_nids >> 10.0.0.1 at o2ib >> 192.168.0.1 at tcp >> >> However, a client using tcp complains about: >> mount -t lustre 192.168.0.1 at tcp:/spfs /mnt/lustre/ >> mount.lustre: mount 192.168.0.1 at tcp:/spfs at /mnt/lustre failed: No such file or >> directory >> Is the MGS specification correct? >> Is the filesystem name correct? >> If upgrading, is the copied client log valid? (see upgrade docs) >> >> This is from dmesg: >> LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for >> 10.0.0.1 at o2ib >> LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find >> peer 10.0.0.1 at o2ib! >> LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can''t add initial >> connection >> LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL >> connection >> LustreError: 15342:0:(obd_config.c:336:class_setup()) setup >> spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2) >> LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on >> cfg command: >> Lustre: cmd=cf003 0:spfs-MDT0000-mdc 1:spfs-MDT0000_UUID >> 2:10.0.0.1 at o2ib LustreError: 15c-8: MGC192.168.0.1 at tcp: The >> configuration from log >> ''spfs-client'' failed (-2). This may be the result of communication errors >> between this node and the MGS, a bad configuration, or other errors. See the >> syslog for more information. >> LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log: >> -2 >> LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup >> LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108 >> from cancel RPC: canceling anyway >> LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list()) >> ldlm_cli_cancel_list: -108 >> Lustre: client ffff8801d1d67c00 umount complete >> LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount >> (-2) >> >> Is there a way I can upgrade the singlehomed server to the multihomed server? >> Do I really need to setup a router? How does it work? Is there any slowdown >> due to routing? >> >> > > -- > Wojciech Turek > > Assistant System Manager > High Performance Computing Service > University of Cambridge > Email: wjt27 at cam.ac.uk > Tel: (+)44 1223 763517-- Luk?? Hejtm?nek
Hi,
You need to add new Ethernet NID to the Lustre target config logs
Stop your Lustre file system (umount everything)
Run this on all OST(s) and MDT
tunefs.lustre
--erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp --writeconf
/dev/dev/Scratch_VG/Scratch_1
tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
--writeconf /dev/dev/Scratch_VG/Scratch_2
tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
--writeconf /dev/dev/Scratch_VG/Scratch_3
Above commands erase current lustre configuration logs from the Lustre
targets and write new configuration.
Mount MDT, OSTs and the client and let me know how it works for you.
I also recommend to add modprobe.conf line on the clients, although
this is not necessary in your case, it will make configuration more
sane.
options lnet networks=tcp(eth0)
Cheers
Wojciech Turek
Lukas Hejtmanek wrote:
On Thu, Jan 08, 2009 at 04:46:39PM +0000, Wojciech Turek wrote:
Do you have just one lustre server which serves as OSS and MDS/MGS ?
yes only one lustre server which serves as two OSSs and one MDS/MGS.
Can you paste output from `lctl ping ` run on client?
./lctl ping 192.168.0.1@tcp
12345-0@lo
12345-10.0.0.1@o2ib
12345-192.168.0.1@tcp
Does the ethernet client has only one interface or is there more?
only one.
Did you also set lnet option (in modprobe.conf) on the clients?
no, lnet has no option on the client.
Can you send output from `lctl list_nids` run on server(s)
# /usr/local/lustre/sbin/lctl list_nids
10.0.0.1@o2ib
192.168.0.1@tcp
And also output from `tunefs.lustre --print /dev/` run on
the server
# /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_1
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: spfs-MDT0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:
Permanent disk data:
Target: spfs-MDT0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:
exiting before disk write.
# /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_2
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: spfs-OST0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1@o2ib
Permanent disk data:
Target: spfs-OST0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1@o2ib
exiting before disk write.
# /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_3
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: spfs-OST0001
Index: 1
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1@o2ib
Permanent disk data:
Target: spfs-OST0001
Index: 1
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1@o2ib
exiting before disk write.
Cheers
Wojciech
Lukas Hejtmanek wrote:
Hello,
I have a setup with Lustre server and Lustre clients using o2ib. It works.
I decided to add more clients, unfortunately the new clients does not have IB
card. So I added the option on the server:
options lnet networks="o2ib,tcp0"
/usr/local/lustre/sbin/lctl list_nids
10.0.0.1@o2ib
192.168.0.1@tcp
However, a client using tcp complains about:
mount -t lustre 192.168.0.1@tcp:/spfs /mnt/lustre/
mount.lustre: mount 192.168.0.1@tcp:/spfs at /mnt/lustre failed: No such file or
directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
This is from dmesg:
LustreError: 15342:0:(events.c:454:ptlrpc_uuid_to_peer()) No NID found for
10.0.0.1@o2ib
LustreError: 15342:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot find
peer 10.0.0.1@o2ib!
LustreError: 15342:0:(ldlm_lib.c:321:client_obd_setup()) can''t add
initial
connection
LustreError: 17831:0:(connection.c:144:ptlrpc_put_connection()) NULL
connection
LustreError: 15342:0:(obd_config.c:336:class_setup()) setup
spfs-MDT0000-mdc-ffff8801d1d67c00 failed (-2)
LustreError: 15342:0:(obd_config.c:1074:class_config_llog_handler()) Err -2 on
cfg command:
Lustre: cmd=cf003 0:spfs-MDT0000-mdc 1:spfs-MDT0000_UUID
2:10.0.0.1@o2ib LustreError: 15c-8: MGC192.168.0.1@tcp: The
configuration from log
''spfs-client'' failed (-2). This may be the result of
communication errors
between this node and the MGS, a bad configuration, or other errors. See the
syslog for more information.
LustreError: 15314:0:(llite_lib.c:1063:ll_fill_super()) Unable to process log:
-2
LustreError: 15314:0:(obd_config.c:403:class_cleanup()) Device 2 not setup
LustreError: 15314:0:(ldlm_request.c:984:ldlm_cli_cancel_req()) Got rc -108
from cancel RPC: canceling anyway
LustreError: 15314:0:(ldlm_request.c:1593:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -108
Lustre: client ffff8801d1d67c00 umount complete
LustreError: 15314:0:(obd_mount.c:1957:lustre_fill_super()) Unable to mount
(-2)
Is there a way I can upgrade the singlehomed server to the multihomed server?
Do I really need to setup a router? How does it work? Is there any slowdown
due to routing?
--
Wojciech Turek
Assistant System Manager
High Performance Computing Service
University of Cambridge
Email: wjt27-KWPb1pKIrIJaa/9Udqfwiw@public.gmane.org
Tel: (+)44 1223 763517
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hello,> Hi, > > You need to add new Ethernet NID to the Lustre target config logs > Stop your Lustre file system (umount everything) > Run this on all OST(s) and MDT > tunefs.lustre --erase-param --mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp > --writeconf /dev/dev/Scratch_VG/Scratch_1 > tunefs.lustre --erase-param --mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp > --writeconf /dev/dev/Scratch_VG/Scratch_2 > tunefs.lustre --erase-param --mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp > --writeconf /dev/dev/Scratch_VG/Scratch_3 > > Above commands erase current lustre configuration logs from the Lustre targets > and write new configuration. > > Mount MDT, OSTs and the client and let me know how it works for you. > I also recommend to add modprobe.conf line on the clients, although this is > not necessary in your case, it will make configuration more sane. > options lnet networks=tcp(eth0)I did it. Unfortunately, the TCP client still does not work. After remount, the IB client works. The TCP client is able to mount and list the file sytem but it is unable to read, write or create files. Dmesg shows these errors: Lustre: 17857:0:(import.c:396:import_select_connection()) spfs-OST0000-osc-ffff8800e2492800: tried all connections, increasing latency to 26s Lustre: 17857:0:(import.c:396:import_select_connection()) Skipped 1 previous similar message LustreError: 11-0: an error occurred while communicating with 192.168.0.1 at tcp. The ost_connect operation failed with -16 LustreError: Skipped 1 previous similar message This is how it looks like now: /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_1 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp Permanent disk data: Target: spfs-MDT0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp exiting before disk write. /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_2 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp Permanent disk data: Target: spfs-OST0000 Index: 0 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp exiting before disk write. /usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_3 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp Permanent disk data: Target: spfs-OST0001 Index: 1 Lustre FS: spfs Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.0.0.1 at o2ib,192.168.0.1 at tcp exiting before disk write. -- Luk?? Hejtm?nek
Hi
I see you get errno -16
--
-16 = EBUSY. This say client reconnected to server which already work on
different request from this client. After old rpc from this client will
be finished - client will be reconnected.
--
Did you cleanly stopped lustre? The procedure I use is:
1) on the client
umount /mnt/
lustre_rmmod
2) On the OSS
umount /mnt/
lustre_rmmod
3) On the MDS
umount /mnt/
lustre_rmmod
Make sure that your lustre isn''t in the recovery state
Please run on OSS and MDS following commands
cat /proc/fs/lustre/obdfilter/*/recovery_status
cat /proc/fs/lustre/mds/*/recovery_status
If you see there COMPLETE or INACTIVE it means that Lustre isn''t in the
recovery mode.
Please could you paste here output from command run on the Lustre client
lctl list_nids
and from command run on server:
lctl ping
Cheers,
Wojciech
Lukas Hejtmanek wrote:
Hello,
Hi,
You need to add new Ethernet NID to the Lustre target config logs
Stop your Lustre file system (umount everything)
Run this on all OST(s) and MDT
tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
--writeconf /dev/dev/Scratch_VG/Scratch_1
tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
--writeconf /dev/dev/Scratch_VG/Scratch_2
tunefs.lustre --erase-param --mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
--writeconf /dev/dev/Scratch_VG/Scratch_3
Above commands erase current lustre configuration logs from the Lustre targets
and write new configuration.
Mount MDT, OSTs and the client and let me know how it works for you.
I also recommend to add modprobe.conf line on the clients, although this is
not necessary in your case, it will make configuration more sane.
options lnet networks=tcp(eth0)
I did it. Unfortunately, the TCP client still does not work. After remount,
the IB client works.
The TCP client is able to mount and list the file sytem but it is unable to
read, write or create files.
Dmesg shows these errors:
Lustre: 17857:0:(import.c:396:import_select_connection())
spfs-OST0000-osc-ffff8800e2492800: tried all connections, increasing latency
to 26s
Lustre: 17857:0:(import.c:396:import_select_connection()) Skipped 1 previous
similar message
LustreError: 11-0: an error occurred while communicating with 192.168.0.1@tcp.
The ost_connect operation failed with -16
LustreError: Skipped 1 previous similar message
This is how it looks like now:
/usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_1
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: spfs-MDT0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
Permanent disk data:
Target: spfs-MDT0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
exiting before disk write.
/usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_2
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: spfs-OST0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
Permanent disk data:
Target: spfs-OST0000
Index: 0
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
exiting before disk write.
/usr/local/lustre/sbin/tunefs.lustre --print /dev/Scratch_VG/Scratch_3
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: spfs-OST0001
Index: 1
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
Permanent disk data:
Target: spfs-OST0001
Index: 1
Lustre FS: spfs
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.0.0.1@o2ib,192.168.0.1@tcp
exiting before disk write.
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hello,> I see you get errno -16 > -- > > -16 = EBUSY. This say client reconnected to server which already work on > different request from this client. After old rpc from this client will > be finished - client will be reconnected. > > -- > > Did you cleanly stopped lustre? The procedure I use is:I did not restart the TCP client. After its restarting, it is OK now. Thanks a lot! -- Luk?? Hejtm?nek