>> If Server1 (running the OST/MDS) crashes, can I read MDS/OST file system >> from Server2 (with Server1 completely dead or power off)? >> >> Can I start another MDS server on Server2 to read OST data file? >> I have tried to launch (on Server2) "lconf --gbd --node Server2 >> <config.xml>" but doesn''t work. > It is a little bit more complicated than just running lconf on Server2. > First, you need to maintain the list of active servers. You can use a > shared XML > file or a LDAP server. This list has to be updated whenever you change the > active server. > Then, you also need to set up an upcall script on the client side. > This script will be invoked when the client gets communication errors. > The upcall will look up the current active node for the failed service and > will > run lconf to complete recovery. See the archive of this mailing list for > upcall > script examples.What does it means "to maintain the list of active servers"? I just use a shared XML file (config.xml) that I copy from Server1 to Server2. I''m not using LDAP, then I specify the active node on the lconf command line: [Server2]#lconf --node Server2 --select ost1-test=Server2 --select mds-test=Server2 config.xml (my Server1 MDS/OST are died...) ..but it doesn''t work!!! My config script updated # Create nodes : rm -f config.xml lmc -o config.xml --add node --node Server1 lmc -m config.xml --add net --node Server1 --nid Server1 --nettype tcp lmc -m config.xml --add node --node Server2 lmc -m config.xml --add net --node Server2 --nid Server2 --nettype tcp # Configure MDS lmc -m config.xml --add mds --node Server1 --mds mds-test --failover --fstype ext3 --dev /dev/sdb5 lmc -m config.xml --add mds --node Server2 --mds mds-test --failover --fstype ext3 --dev /dev/sdb5 # Configure OST lmc -m config.xml --add ost --node Server1 --ost ost1-test --failover --fstype ext3 --dev /dev/sdc1 lmc -m config.xml --add ost --node Server2 --ost ost1-test --failover --fstype ext3 --dev /dev/sdc1 # Configure client lmc -m config.xml --add mtpt --node Server1 --path /mnt/lustre --mds mds-test --ost ost1-test lmc -m config.xml --add mtpt --node Server2 --path /mnt/lustre --mds mds-test --ost ost1-test The problem is: how can I set up an upcall script on the client side? Must I insert --lustre_upcall in #configure client script? In the archive there is a simple script example, but it is without explanations (What does it means --tgt_uuid or --conn_uuid?) Regards, Alessandro
Alessandro, The upcall script looks familiar. ;-) The problem is that you need to pass the --select info to the upcall script''s lconf command also otherwise the client will not know about the location change of the servers. You can do this by: a) regenerating the XML file, but this time specifying the preferred node as the first definition for the device(s) in question, with the other node second. The config implicitly selects the first node defined as being associated with a device as the preferred server. b) store the select info in some shared fashion. E.g. a common config file which is used to generate the --select option to lconf c) store the config in LDAP and use the lactive command to update the preferred server setting Fergal. -- Fergal.McCarthy@HP.com (The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated, you should consider this message and attachments as "HP CONFIDENTIAL".) -----Original Message----- From: lustre-discuss-admin@lists.clusterfs.com [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of Alessandro Sent: 30 March 2005 15:57 To: lustre-discuss@lists.clusterfs.com Subject: Re: [Lustre-discuss] MDS died Thanks, sorry for my inattention... (I ''ve written 100 times=20 lconf --help...:-)) Now I''m thinking: if "... If you use multiple --select statements I don''t believe they are cumulative so possibly only the last set of info is used, meaning that you may only be seeing the MDS would try to start up with no OST or vice versa...", what can I do if at the same time (on the same machine) MDS and OST are=20 died? In other words, if my Server1 is poweroff (where MDS and OST reside, plus=20 client1), how can I access to shared storage partition with my Server2=20 (where is only the client2 when Server1 is on)??? The failover method doesn''t work... A new config script only for Server2(with MDS, OST and client) doesn''t=20 work... Can you help me? Reminder info (useful for other people, perhaps...): Server1: MDS, OST, client1 Server2: client2 My config script # Create nodes : rm -f config.xml lmc -o config.xml --add node --node Server1 lmc -m config.xml --add net --node Server1 --nid Server1 --nettype tcp lmc -m config.xml --add node --node Server2 lmc -m config.xml --add net --node Server2 --nid Server2 --nettype tcp # Configure MDS lmc -m config.xml --add mds --node Server1 --mds=20 mds-test --failover --fstype ext3 --dev /dev/sdb5 lmc -m config.xml --add mds --node Server2 --mds=20 mds-test --failover --fstype ext3 --dev /dev/sdb5 # Configure OST lmc -m config.xml --add ost --node Server1 --ost=20 ost1-test --failover --fstype ext3 --dev /dev/sdc1 lmc -m config.xml --add ost --node Server2 --ost=20 ost1-test --failover --fstype ext3 --dev /dev/sdc1 # Configure client lmc -m config.xml --add mtpt --node Server1 --path /mnt/lustre --mds=20 mds-test --ost ost1-test lmc -m config.xml --add mtpt --node Server2 --path /mnt/lustre --mds=20 mds-test --ost ost1-testStarting the system Server1#lconf --node Server1 config.xml Server2#lconf --node Server2 config.xml Try (done but doesn''t work) Server1 poweroff Server2# lconf --node Server2 --select=20 mds-test=3DServer2,ost1-test=3DServer2 --lustre_upcall=3D/home/tests/lustre/upcall_script=20 config.xml My upcall_script (do you know this?) #!/bin/bash # Get the Upcall Type UPCALL_TYPE=3D"${1}" shift case "${UPCALL_TYPE}" in FAILED_IMPORT) lconf --recover \ --tgt_uuid ${1} \ --client_uuid ${2} \ --conn_uuid ${3} \ ;; esac Cheers, Alessandro ----- Original Message -----=20 From: "Mc Carthy, Fergal" <fergal.mccarthy@hp.com> To: "Alessandro" <alessandro_avallone@hotmail.com>;=20 <lustre-discuss@lists.clusterfs.com> Sent: Tuesday, March 29, 2005 5:53 PM Subject: RE: [Lustre-discuss] MDS died Alessandro, I think that you may need to specify the active server info using a single select statement, i.e. lconf ... --select ost1-test=3DServer2,mds-test=3DServer2 ... This is what the lconf --help output shows: % lconf --help | grep select --select service=3DnodeA,service2=3DnodeB (default=3D[]) If you use multiple --select statements I don''t believe they are cumulative so possibly only the last set of info is used, meaning that you may only be seeing the MDS would try to start up with no OST or vice versa. Fergal. _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.clusterfs.com https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
Thanks, sorry for my inattention... (I ''ve written 100 times lconf --help...:-)) Now I''m thinking: if "... If you use multiple --select statements I don''t believe they are cumulative so possibly only the last set of info is used, meaning that you may only be seeing the MDS would try to start up with no OST or vice versa...", what can I do if at the same time (on the same machine) MDS and OST are died? In other words, if my Server1 is poweroff (where MDS and OST reside, plus client1), how can I access to shared storage partition with my Server2 (where is only the client2 when Server1 is on)??? The failover method doesn''t work... A new config script only for Server2(with MDS, OST and client) doesn''t work... Can you help me? Reminder info (useful for other people, perhaps...): Server1: MDS, OST, client1 Server2: client2 My config script # Create nodes : rm -f config.xml lmc -o config.xml --add node --node Server1 lmc -m config.xml --add net --node Server1 --nid Server1 --nettype tcp lmc -m config.xml --add node --node Server2 lmc -m config.xml --add net --node Server2 --nid Server2 --nettype tcp # Configure MDS lmc -m config.xml --add mds --node Server1 --mds mds-test --failover --fstype ext3 --dev /dev/sdb5 lmc -m config.xml --add mds --node Server2 --mds mds-test --failover --fstype ext3 --dev /dev/sdb5 # Configure OST lmc -m config.xml --add ost --node Server1 --ost ost1-test --failover --fstype ext3 --dev /dev/sdc1 lmc -m config.xml --add ost --node Server2 --ost ost1-test --failover --fstype ext3 --dev /dev/sdc1 # Configure client lmc -m config.xml --add mtpt --node Server1 --path /mnt/lustre --mds mds-test --ost ost1-test lmc -m config.xml --add mtpt --node Server2 --path /mnt/lustre --mds mds-test --ost ost1-testStarting the system Server1#lconf --node Server1 config.xml Server2#lconf --node Server2 config.xml Try (done but doesn''t work) Server1 poweroff Server2# lconf --node Server2 --select mds-test=Server2,ost1-test=Server2 --lustre_upcall=/home/tests/lustre/upcall_script config.xml My upcall_script (do you know this?) #!/bin/bash # Get the Upcall Type UPCALL_TYPE="${1}" shift case "${UPCALL_TYPE}" in FAILED_IMPORT) lconf --recover \ --tgt_uuid ${1} \ --client_uuid ${2} \ --conn_uuid ${3} \ ;; esac Cheers, Alessandro ----- Original Message ----- From: "Mc Carthy, Fergal" <fergal.mccarthy@hp.com> To: "Alessandro" <alessandro_avallone@hotmail.com>; <lustre-discuss@lists.clusterfs.com> Sent: Tuesday, March 29, 2005 5:53 PM Subject: RE: [Lustre-discuss] MDS died Alessandro, I think that you may need to specify the active server info using a single select statement, i.e. lconf ... --select ost1-test=Server2,mds-test=Server2 ... This is what the lconf --help output shows: % lconf --help | grep select --select service=nodeA,service2=nodeB (default=[]) If you use multiple --select statements I don''t believe they are cumulative so possibly only the last set of info is used, meaning that you may only be seeing the MDS would try to start up with no OST or vice versa. Fergal.
Alessandro, I think that you may need to specify the active server info using a single select statement, i.e.=20 lconf ... --select ost1-test=3DServer2,mds-test=3DServer2 ... This is what the lconf --help output shows: % lconf --help | grep select --select service=3DnodeA,service2=3DnodeB (default=3D[]) If you use multiple --select statements I don''t believe they are cumulative so possibly only the last set of info is used, meaning that you may only be seeing the MDS would try to start up with no OST or vice versa. Fergal. -- Fergal.McCarthy@HP.com (The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated, you should consider this message and attachments as "HP CONFIDENTIAL".) -----Original Message----- From: lustre-discuss-admin@lists.clusterfs.com [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of Alessandro Sent: 29 March 2005 16:26 To: lustre-discuss@lists.clusterfs.com Subject: Re: [Lustre-discuss] MDS died>> If Server1 (running the OST/MDS) crashes, can I read MDS/OST filesystem>> from Server2 (with Server1 completely dead or power off)? >> >> Can I start another MDS server on Server2 to read OST data file? >> I have tried to launch (on Server2) "lconf --gbd --node Server2 >> <config.xml>" but doesn''t work. > It is a little bit more complicated than just running lconf onServer2.> First, you need to maintain the list of active servers. You can use a > shared XML > file or a LDAP server. This list has to be updated whenever you changethe> active server. > Then, you also need to set up an upcall script on the client side. > This script will be invoked when the client gets communication errors. > The upcall will look up the current active node for the failed serviceand> will > run lconf to complete recovery. See the archive of this mailing listfor> upcall > script examples.What does it means "to maintain the list of active servers"? I just use a shared XML file (config.xml) that I copy from Server1 to Server2. I''m not using LDAP, then I specify the active node on the lconf command=20 line: [Server2]#lconf --node Server2 --select ost1-test=3DServer2 --select=20 mds-test=3DServer2 config.xml (my Server1 MDS/OST are died...) ..but it doesn''t work!!! My config script updated # Create nodes : rm -f config.xml lmc -o config.xml --add node --node Server1 lmc -m config.xml --add net --node Server1 --nid Server1 --nettype tcp lmc -m config.xml --add node --node Server2 lmc -m config.xml --add net --node Server2 --nid Server2 --nettype tcp # Configure MDS lmc -m config.xml --add mds --node Server1 --mds=20 mds-test --failover --fstype ext3 --dev /dev/sdb5 lmc -m config.xml --add mds --node Server2 --mds=20 mds-test --failover --fstype ext3 --dev /dev/sdb5 # Configure OST lmc -m config.xml --add ost --node Server1 --ost=20 ost1-test --failover --fstype ext3 --dev /dev/sdc1 lmc -m config.xml --add ost --node Server2 --ost=20 ost1-test --failover --fstype ext3 --dev /dev/sdc1 # Configure client lmc -m config.xml --add mtpt --node Server1 --path /mnt/lustre --mds=20 mds-test --ost ost1-test lmc -m config.xml --add mtpt --node Server2 --path /mnt/lustre --mds=20 mds-test --ost ost1-test The problem is: how can I set up an upcall script on the client side? Must I insert --lustre_upcall in #configure client script? In the archive there is a simple script example, but it is without=20 explanations (What does it means --tgt_uuid or --conn_uuid?) Regards, Alessandro=20 _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.clusterfs.com https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
> My test-systems is > Server1 running OST/MDS/client1 > Server2 running client2FYI, there is a known memory deadlock issue when running client and OST on the same node. See the archive of lustre-discuss for more information.> If Server1 (running the OST/MDS) crashes, can I read MDS/OST file system > from Server2 (with Server1 completely dead or power off)? > > Can I start another MDS server on Server2 to read OST data file? > I have tried to launch (on Server2) "lconf --gbd --node Server2 > <config.xml>" but doesn''t work.It is a little bit more complicated than just running lconf on Server2. First, you need to maintain the list of active servers. You can use a shared XML file or a LDAP server. This list has to be updated whenever you change the active server. Then, you also need to set up an upcall script on the client side. This script will be invoked when the client gets communication errors. The upcall will look up the current active node for the failed service and will run lconf to complete recovery. See the archive of this mailing list for upcall script examples.> # Configure MDS > lmc -m config.xml --add mds --node Server1 --mds > mds-test --failover --fstype ext3 --dev /dev/sdb5 > lmc -m config.xml --add mds --node Server2 --mds > mds-test --failover --fstype ext3 --dev /dev/sdb5 > > # Configure OST > lmc -m config.xml --add ost --node Server1 --ost ost1-test --fstype > ext3 --dev /dev/sdc1You have to do the same thing as for the MDS config. The Lustre wiki explains how to set up an OST/MDS failover config: https://wiki.clusterfs.com/lustre/LustreFailover Johann
Hi First, thanks to all help me to understand and evaluate Lustre file system. I have the last question: My test-systems is Server1 running OST/MDS/client1 Server2 running client2 MDS and OST are shared device (/dev/sdc1 and /dev/sdb5) Server1#lconf --gdb --node Server1 config.xml Server2#lconf --gdb --node Server2 config.xml and it''s all ok! If Server1 (running the OST/MDS) crashes, can I read MDS/OST file system from Server2 (with Server1 completely dead or power off)? Can I start another MDS server on Server2 to read OST data file? I have tried to launch (on Server2) "lconf --gbd --node Server2 <config.xml>" but doesn''t work. Useful info: My config script # Create nodes : rm -f config.xml lmc -o config.xml --add node --node Server1 lmc -m config.xml --add net --node Server1 --nid Server1 --nettype tcp lmc -m config.xml --add node --node Server2 lmc -m config.xml --add net --node Server2 --nid Server2 --nettype tcp # Configure MDS lmc -m config.xml --add mds --node Server1 --mds mds-test --failover --fstype ext3 --dev /dev/sdb5 lmc -m config.xml --add mds --node Server2 --mds mds-test --failover --fstype ext3 --dev /dev/sdb5 # Configure OST lmc -m config.xml --add ost --node Server1 --ost ost1-test --fstype ext3 --dev /dev/sdc1 # Configure client lmc -m config.xml --add mtpt --node Server1 --path /mnt/lustre --mds mds-test --ost ost1-test lmc -m config.xml --add mtpt --node Server2 --path /mnt/lustre --mds mds-test --ost ost1-test