Do you have entries in /etc/hosts which map the real host names (host1, host3) to 127.0.0.1? host1 and host3 need to resolve to the correct IP addresses for all concerned, or Portals gets confused. Hope that helps-- -Phil On Wed, 2004-08-11 at 15:56, mwah@ncsu.edu wrote:> Hi, > > I''m using lustre 1.0.4 and I used the pre-patched kernel on both nodes. > The following is what I used to create the xml file where host1 will serve > as the MDS server and OST server while host3 is the client. > > #!/bin/sh > > # Create nodes > lmc -o local.xml --add node --node host1 --nid host1 --nettype tcp > lmc -m local.xml --add net --node host1 --nid host1 --nettype tcp > lmc -m local.xml --add net --node host3 --nid host3 --nettype tcp > > # Cofigure MDS > lmc -m local.xml --format --add mds --node host1 --mds mds1 --fstype ext3 > --dev /home/lustre/mds1 --size 10485760 > > # Configures LOV and OSTs > lmc -m local.xml --add lov --lov lov1 --mds mds1 --stripe_sz 65536 > --stripe_cnt 0 --stripe_pattern 0 > lmc -m local.xml --add lov --lov lov2 --mds mds1 --stripe_sz 65536 > --stripe_cnt 0 --stripe_pattern 0 > lmc -m local.xml --add ost --node host1 --lov lov1 --ost ost1 --fstype > ext3 --dev /home/lustre/ost1 --size 10485760 > lmc -m local.xml --add ost --node host1 --lov lov1 --ost ost2 --fstype > ext3 --dev /home/lustre/ost2 --size 10485760 > lmc -m local.xml --add ost --node host1 --lov lov2 --ost ost3 --fstype > ext3 --dev /home/lustre/ost3 --size 10485760 > > # Configure client > lmc -m local.xml --add net --node client --nid ''*'' --nettype tcp > lmc -m local.xml --add mtpt --node client --path /mnt/lustre --mds mds1 > --lov lov1 > > Then I ran the following on host1: > > lconf --maxlevel 40 > lconf --reformat --gdb --node host1 local.xml > lconf --minlevel 50 > > And ran the following on host3: > > lconf --node client local.xml > > I see the following error messages: > > host 3: > LustreError: 2116:(socknal_cb.c:2119:ksocknal_hello()) Connected to nid > 0x7f000001 0:127.0.0.1, but expecting 0x80de030b 0:127.0.0.1 > LustreError: 2116:(socknal_cb.c:2436:ksocknal_autoconnect()) Deleting > packet type 1 len 168 (0x7f000001 0:127.0.0.1->0x80de030b 0:127.0.0.1) > LustreError: 2253:(client.c:812:ptlrpc_expire_one_request()) @@@ timeout > req@ce973c00 x1/t0 o8->ost1_UUID@NID_radish1_UUID:6 lens 168/64 ref 1 fl > RPC:/0/0 rc 0/0 > LustreError: 2253:(lov_obd.c:147:lov_connect()) Target ost1_UUID connect > error -110 > LustreError: 2253:(mdc_request.c:838:mdc_init_ea_size()) MDC failed > connect to LOV lov1 (-110) > LustreError: 2115:(socknal_cb.c:2119:ksocknal_hello()) Connected to nid > 0x7f000001 0:127.0.0.1, but expecting 0x80de030b 0:127.0.0.1 > LustreError: 2115:(socknal_cb.c:2436:ksocknal_autoconnect()) Deleting > packet type 1 len 168 (0x7f000001 0:127.0.0.1->0x80de030b 0:127.0.0.1) > LustreError: 2253:(client.c:812:ptlrpc_expire_one_request()) @@@ timeout > req@ce973600 x2/t0 o38->mds1_UUID@NID_radish1_UUID:12 lens 168/64 ref 1 fl > RPC:/0/0 rc 0/0 > LustreError: 2253:(llite_lib.c:115:lustre_common_fill_super()) cannot > connect to MDC_host3_mds1_MNT_client: rc = -110 > > > host1: > > LustreError: 5421:(socknal.c:842:ksocknal_create_conn()) Closed 6 stale > conns to nid 0x7f000001 ip <ip address> > Lustre: 5388:(socknal_cb.c:1544:ksocknal_process_receive()) [f4fa8800] EOF > from 0x7f000001 ip <ip address>:32771 > Lustre: 5388:(socknal_cb.c:1544:ksocknal_process_receive()) [e1c7e800] EOF > from 0x7f000001 ip <ip address>:32772 > > Any ideas? > > Thanks, > Mark > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
Hi, I''m using lustre 1.0.4 and I used the pre-patched kernel on both nodes. The following is what I used to create the xml file where host1 will serve as the MDS server and OST server while host3 is the client. #!/bin/sh # Create nodes lmc -o local.xml --add node --node host1 --nid host1 --nettype tcp lmc -m local.xml --add net --node host1 --nid host1 --nettype tcp lmc -m local.xml --add net --node host3 --nid host3 --nettype tcp # Cofigure MDS lmc -m local.xml --format --add mds --node host1 --mds mds1 --fstype ext3 --dev /home/lustre/mds1 --size 10485760 # Configures LOV and OSTs lmc -m local.xml --add lov --lov lov1 --mds mds1 --stripe_sz 65536 --stripe_cnt 0 --stripe_pattern 0 lmc -m local.xml --add lov --lov lov2 --mds mds1 --stripe_sz 65536 --stripe_cnt 0 --stripe_pattern 0 lmc -m local.xml --add ost --node host1 --lov lov1 --ost ost1 --fstype ext3 --dev /home/lustre/ost1 --size 10485760 lmc -m local.xml --add ost --node host1 --lov lov1 --ost ost2 --fstype ext3 --dev /home/lustre/ost2 --size 10485760 lmc -m local.xml --add ost --node host1 --lov lov2 --ost ost3 --fstype ext3 --dev /home/lustre/ost3 --size 10485760 # Configure client lmc -m local.xml --add net --node client --nid ''*'' --nettype tcp lmc -m local.xml --add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1 Then I ran the following on host1: lconf --maxlevel 40 lconf --reformat --gdb --node host1 local.xml lconf --minlevel 50 And ran the following on host3: lconf --node client local.xml I see the following error messages: host 3: LustreError: 2116:(socknal_cb.c:2119:ksocknal_hello()) Connected to nid 0x7f000001 0:127.0.0.1, but expecting 0x80de030b 0:127.0.0.1 LustreError: 2116:(socknal_cb.c:2436:ksocknal_autoconnect()) Deleting packet type 1 len 168 (0x7f000001 0:127.0.0.1->0x80de030b 0:127.0.0.1) LustreError: 2253:(client.c:812:ptlrpc_expire_one_request()) @@@ timeout req@ce973c00 x1/t0 o8->ost1_UUID@NID_radish1_UUID:6 lens 168/64 ref 1 fl RPC:/0/0 rc 0/0 LustreError: 2253:(lov_obd.c:147:lov_connect()) Target ost1_UUID connect error -110 LustreError: 2253:(mdc_request.c:838:mdc_init_ea_size()) MDC failed connect to LOV lov1 (-110) LustreError: 2115:(socknal_cb.c:2119:ksocknal_hello()) Connected to nid 0x7f000001 0:127.0.0.1, but expecting 0x80de030b 0:127.0.0.1 LustreError: 2115:(socknal_cb.c:2436:ksocknal_autoconnect()) Deleting packet type 1 len 168 (0x7f000001 0:127.0.0.1->0x80de030b 0:127.0.0.1) LustreError: 2253:(client.c:812:ptlrpc_expire_one_request()) @@@ timeout req@ce973600 x2/t0 o38->mds1_UUID@NID_radish1_UUID:12 lens 168/64 ref 1 fl RPC:/0/0 rc 0/0 LustreError: 2253:(llite_lib.c:115:lustre_common_fill_super()) cannot connect to MDC_host3_mds1_MNT_client: rc = -110 host1: LustreError: 5421:(socknal.c:842:ksocknal_create_conn()) Closed 6 stale conns to nid 0x7f000001 ip <ip address> Lustre: 5388:(socknal_cb.c:1544:ksocknal_process_receive()) [f4fa8800] EOF from 0x7f000001 ip <ip address>:32771 Lustre: 5388:(socknal_cb.c:1544:ksocknal_process_receive()) [e1c7e800] EOF from 0x7f000001 ip <ip address>:32772 Any ideas? Thanks, Mark
Yes, that helps! I''m getting the hang of operating this. Very flexible. Configuring is quite easy using lmc to produce the xml file. Some questions: 1) It is easy enough to add OSTs to the configuration but what if we lost the original xml file? 2) Which utility can we use to determine the current configuration? Like which host is the OST, MDS or clients? 3) When we add OSTs to a LOV that is ebing mounted by a client, what is a proper way to resize? I''m currently doing lconf cleanup on the client and lconf --node client again. Is that right? Thanks! -Mark> Do you have entries in /etc/hosts which map the real host names (host1, > host3) to 127.0.0.1? > > host1 and host3 need to resolve to the correct IP addresses for all > concerned, or Portals gets confused. > > Hope that helps-- > > -Phil > >> I see the following error messages: >> >> host 3: >> LustreError: 2116:(socknal_cb.c:2119:ksocknal_hello()) Connected to nid >> 0x7f000001 0:127.0.0.1, but expecting 0x80de030b 0:127.0.0.1 >> LustreError: 2116:(socknal_cb.c:2436:ksocknal_autoconnect()) Deleting >> packet type 1 len 168 (0x7f000001 0:127.0.0.1->0x80de030b 0:127.0.0.1) >> LustreError: 2253:(client.c:812:ptlrpc_expire_one_request()) @@@ timeout >> req@ce973c00 x1/t0 o8->ost1_UUID@NID_radish1_UUID:6 lens 168/64 ref 1 fl >> RPC:/0/0 rc 0/0 >> LustreError: 2253:(lov_obd.c:147:lov_connect()) Target ost1_UUID connect >> error -110 >> LustreError: 2253:(mdc_request.c:838:mdc_init_ea_size()) MDC failed >> connect to LOV lov1 (-110) >> LustreError: 2115:(socknal_cb.c:2119:ksocknal_hello()) Connected to nid >> 0x7f000001 0:127.0.0.1, but expecting 0x80de030b 0:127.0.0.1 >> LustreError: 2115:(socknal_cb.c:2436:ksocknal_autoconnect()) Deleting >> packet type 1 len 168 (0x7f000001 0:127.0.0.1->0x80de030b 0:127.0.0.1) >> LustreError: 2253:(client.c:812:ptlrpc_expire_one_request()) @@@ timeout >> req@ce973600 x2/t0 o38->mds1_UUID@NID_radish1_UUID:12 lens 168/64 ref 1 >> fl >> RPC:/0/0 rc 0/0 >> LustreError: 2253:(llite_lib.c:115:lustre_common_fill_super()) cannot >> connect to MDC_host3_mds1_MNT_client: rc = -110 >> >> >> host1: >> >> LustreError: 5421:(socknal.c:842:ksocknal_create_conn()) Closed 6 stale >> conns to nid 0x7f000001 ip <ip address> >> Lustre: 5388:(socknal_cb.c:1544:ksocknal_process_receive()) [f4fa8800] >> EOF >> from 0x7f000001 ip <ip address>:32771 >> Lustre: 5388:(socknal_cb.c:1544:ksocknal_process_receive()) [e1c7e800] >> EOF >> from 0x7f000001 ip <ip address>:32772 >> >> Any ideas? >> >> Thanks, >> Mark >>
--mOr7kNv8EL30+EI+ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Aug 13, 2004 12:59 -0400, mwah@ncsu.edu wrote:> 1) It is easy enough to add OSTs to the configuration but what if we lost > the original xml file?Keep a backup. The most important part isn''t the .xml file, but rather the lmc shell script that created it. It is possible to recreate this if necessary rather easily as long as you know the names assigned to the OSTs and MDSes (client is irrelevant, normally we just call it "client" in any case).> 2) Which utility can we use to determine the current configuration? Like > which host is the OST, MDS or clients?It is possible to look at the xml file with a bit of effort, the lmc shell script is easier of course. You can find out the names easily from a running system in /proc/fs/lustre also, it is more effort if Lustre isn''t running.> 3) When we add OSTs to a LOV that is ebing mounted by a client, what is a > proper way to resize? I''m currently doing lconf cleanup on the client and > lconf --node client again. Is that right?Correct. Note that you also need to run "lconf --write-conf <new xml>" on the unmounted MDS device in order to add the OST to the client configuration, assuming you are using the "zero conf" method of configuring clients. Hmm, that might only be applicable to 1.2 and later, not sure what version you are running. We will support dynamic OST addition and removal in a later version of Lustre - it is already in the late development/testing stages. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ --mOr7kNv8EL30+EI+ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFBHVbNpIg59Q01vtYRAh+MAKCF4ecHgpB60JwZNsJ0oejovWcpGACdHygY jWdPupCw4Bt+Pt/qYFjEThs=9ctC -----END PGP SIGNATURE----- --mOr7kNv8EL30+EI+--