Felix, Evan J
2006-May-19 07:36 UTC
[Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return
can you Tell us if there are any logs coming out in your messages logs, or in dmesg that may tell us more information. It almost seems like your Device is timing out, or not responding. A little more information about what the devices are would also be helpful Evan ________________________________ From: lustre-discuss-bounces@clusterfs.com [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Manqing Liu Sent: Wednesday, March 01, 2006 11:55 AM To: lustre-discuss@clusterfs.com Subject: [Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return When I ran "lconf -reformat config.xml", it hangs at the very end of the setup, any suggestions? Thanks! [root@rac2 Lustre]# uname -a Linux rac2 2.6.9-22.0.2.EL_lustre.1.4.6smp #1 SMP Sun Feb 19 01:03:36 EST 2006 i686 i686 i386 GNU/Linux [root@rac2 Lustre]# rpm -aq | grep lust lustre-modules-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp lvm2-cluster-2.01.14-1.0.RHEL4 kernel-smp-2.6.9-22.0.2.EL_lustre.1.4.6 lustre-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp system-config-cluster-1.0.16-1.0 lustre-debuginfo-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp == Simple config.sh #!/bin/bash rm -f config.xml lmc -o config.xml --add net --node rac2 --nid rac2 --nettype tcp lmc -m config.xml --add mds --node rac2 --mds mds1 --fstype ext3 --dev /dev/ipsan/sda lmc -m config.xml --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 --stripe_pattern 0 lmc -m config.xml --add ost --node rac2 --lov lov1 --ost ost1 --fstype ext3 --dev /dev/ipsan/sdc lmc -m config.xml --add ost --node rac2 --lov lov1 --ost ost2 --fstype ext3 --dev /dev/ipsan/sdd lmc -m config.xml --add mtpt --node rac2 --path /mnt/lustre --mds mds1 --lov lov1 [root@rac2 Lustre]# sh config.sh [root@rac2 Lustre]# lconf --verbose --reformat config.xml .... record> End recording log rac2 on mds1 + /usr/sbin/lctl ignore_errors cfg_device $mds1 cleanup detach quit + losetup /dev/loop0 + losetup /dev/loop1 + losetup /dev/loop2 + losetup /dev/loop3 + losetup /dev/loop4 + losetup /dev/loop5 + losetup /dev/loop6 + losetup /dev/loop7 changing mtime of LOGS to 1141241563 + mktemp /tmp/lustre-cmd.XXXXXXXX + debugfs -w -R "mi /LOGS" </tmp/lustre-cmd.xHj12969 /dev/ipsan/sda MDSDEV: mds1 mds1_UUID /dev/ipsan/sda ldiskfs 0 no + /usr/sbin/lctl attach mdt MDT MDT_UUID quit + /usr/sbin/lctl cfg_device MDT setup quit + dumpe2fs -f -h /dev/ipsan/sda no external journal found for /dev/ipsan/sda MDS mount options: errors=remount-ro + /usr/sbin/lctl attach mds mds1 mds1_UUID quit + /usr/sbin/lctl cfg_device mds1 setup /dev/ipsan/sda ldiskfs mds1 errors=remount-ro quit Hanging here without returning.
Manqing Liu
2006-May-19 07:36 UTC
[Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return
When I ran "lconf -reformat config.xml", it hangs at the very end of the setup, any suggestions? Thanks! [root@rac2 Lustre]# uname -a Linux rac2 2.6.9-22.0.2.EL_lustre.1.4.6smp #1 SMP Sun Feb 19 01:03:36 EST 2006 i686 i686 i386 GNU/Linux [root@rac2 Lustre]# rpm -aq | grep lust lustre-modules-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp lvm2-cluster-2.01.14-1.0.RHEL4 kernel-smp-2.6.9-22.0.2.EL_lustre.1.4.6 lustre-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp system-config-cluster-1.0.16-1.0 lustre-debuginfo-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp == Simple config.sh #!/bin/bash rm -f config.xml lmc -o config.xml --add net --node rac2 --nid rac2 --nettype tcp lmc -m config.xml --add mds --node rac2 --mds mds1 --fstype ext3 --dev /dev/ipsan/sda lmc -m config.xml --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 --stripe_pattern 0 lmc -m config.xml --add ost --node rac2 --lov lov1 --ost ost1 --fstype ext3 --dev /dev/ipsan/sdc lmc -m config.xml --add ost --node rac2 --lov lov1 --ost ost2 --fstype ext3 --dev /dev/ipsan/sdd lmc -m config.xml --add mtpt --node rac2 --path /mnt/lustre --mds mds1 --lov lov1 [root@rac2 Lustre]# sh config.sh [root@rac2 Lustre]# lconf --verbose --reformat config.xml .... record> End recording log rac2 on mds1 + /usr/sbin/lctl ignore_errors cfg_device $mds1 cleanup detach quit + losetup /dev/loop0 + losetup /dev/loop1 + losetup /dev/loop2 + losetup /dev/loop3 + losetup /dev/loop4 + losetup /dev/loop5 + losetup /dev/loop6 + losetup /dev/loop7 changing mtime of LOGS to 1141241563 + mktemp /tmp/lustre-cmd.XXXXXXXX + debugfs -w -R "mi /LOGS" </tmp/lustre-cmd.xHj12969 /dev/ipsan/sda MDSDEV: mds1 mds1_UUID /dev/ipsan/sda ldiskfs 0 no + /usr/sbin/lctl attach mdt MDT MDT_UUID quit + /usr/sbin/lctl cfg_device MDT setup quit + dumpe2fs -f -h /dev/ipsan/sda no external journal found for /dev/ipsan/sda MDS mount options: errors=remount-ro + /usr/sbin/lctl attach mds mds1 mds1_UUID quit + /usr/sbin/lctl cfg_device mds1 setup /dev/ipsan/sda ldiskfs mds1 errors=remount-ro quit Hanging here without returning. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060301/4b575c24/attachment.html
wddi_1976
2006-May-19 07:36 UTC
[Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return
Could you tell us your /etc/hosts? the problem maybe there. Manqing Liu wrote:> Thanks, dmesg show some connection error. Could be my network setup > problem. > > > > Right now, I am running MDS/OST/CLIENT on same host, which has 4 > interfaces: bond0 [eth2 and eth3] are using to connect to iSCSI target > for shared storage, and eth0 is a public interface. I am not using > eth1 yet. > > > > [root@rac2 ~]# ifconfig > > bond0 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A > > inet addr:192.168.1.242 Bcast:192.168.1.255 > Mask:255.255.255.0 > > > > eth0 Link encap:Ethernet HWaddr 00:0D:60:D5:DC:D2 > > inet addr:172.30.33.242 Bcast:172.30.33.255 > Mask:255.255.255.0 > > > > eth1 Link encap:Ethernet HWaddr 00:0D:60:D5:DC:D3 > > inet addr:192.168.2.242 Bcast:192.168.2.255 > Mask:255.255.255.0 > > > > eth2 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A > > inet6 addr: fe80::20e:cff:fe37:194a/64 Scope:Link > > > > eth3 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A > > inet6 addr: fe80::20e:cff:fe37:194a/64 Scope:Link > > > > lo Link encap:Local Loopback > > inet addr:127.0.0.1 Mask:255.0.0.0 > > > > >------------------------------------------------------------------------ > >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss@clusterfs.com >https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
Manqing Liu
2006-May-19 07:36 UTC
[Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return
Thanks, dmesg show some connection error. Could be my network setup problem. Right now, I am running MDS/OST/CLIENT on same host, which has 4 interfaces: bond0 [eth2 and eth3] are using to connect to iSCSI target for shared storage, and eth0 is a public interface. I am not using eth1 yet. [root@rac2 ~]# ifconfig bond0 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A inet addr:192.168.1.242 Bcast:192.168.1.255 Mask:255.255.255.0 eth0 Link encap:Ethernet HWaddr 00:0D:60:D5:DC:D2 inet addr:172.30.33.242 Bcast:172.30.33.255 Mask:255.255.255.0 eth1 Link encap:Ethernet HWaddr 00:0D:60:D5:DC:D3 inet addr:192.168.2.242 Bcast:192.168.2.255 Mask:255.255.255.0 eth2 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A inet6 addr: fe80::20e:cff:fe37:194a/64 Scope:Link eth3 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A inet6 addr: fe80::20e:cff:fe37:194a/64 Scope:Link lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 .... LDISKFS-fs: mounted filesystem with ordered data mode. Lustre: 5358:0:(mds_fs.c:239:mds_init_server_data()) mds1: initializing new last_rcvd Lustre: MDT mds1 now serving /dev/ipsan/sda (dd4c1750-6e3a-4342-b4e1-a94174c2fd8d) with recovery enabled Lustre: MDT mds1 has stopped. loop: loaded (max 8 devices) kjournald starting. Commit interval 5 seconds LDISKFS FS on sda, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LustreError: Refusing connection from 172.30.33.242 for 172.30.33.242@tcp: No matching NI LustreError: 3305:0:(socklnd_cb.c:1476:ksocknal_recv_hello()) Error -104 reading HELLO from 172.30.33.242 LustreError: Connection to 172.30.33.242@tcp at host 172.30.33.242 on port 988 was reset: is it running a compatible version of Lustre and is 172.30.33.242@tcp one of its NIDs? LustreError: 3305:0:(socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len 240 192.168.1.242@tcp->172.30.33.242@tcp LustreError: 3305:0:(events.c:54:request_out_callback()) @@@ type 4, status -5 req@f7ea8e00 x1/t0 o8->ost1_UUID@rac2_UUID:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0 LustreError: 5614:0:(client.c:951:ptlrpc_expire_one_request()) @@@ timeout (sent at 1141319106, 0s ago) req@f7ea8e00 x1/t0 o8->ost1_UUID@rac2_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: Refusing connection from 172.30.33.242 for 172.30.33.242@tcp: No matching NI LustreError: 3306:0:(socklnd_cb.c:1476:ksocknal_recv_hello()) Error -104 reading HELLO from 172.30.33.242 -----Original Message----- From: Felix, Evan J [mailto:Evan.Felix@pnl.gov] Sent: Thursday, March 02, 2006 8:27 AM To: Manqing Liu; lustre-discuss@clusterfs.com Subject: RE: [Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return can you Tell us if there are any logs coming out in your messages logs, or in dmesg that may tell us more information. It almost seems like your Device is timing out, or not responding. A little more information about what the devices are would also be helpful Evan ________________________________ From: lustre-discuss-bounces@clusterfs.com [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Manqing Liu Sent: Wednesday, March 01, 2006 11:55 AM To: lustre-discuss@clusterfs.com Subject: [Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return When I ran "lconf -reformat config.xml", it hangs at the very end of the setup, any suggestions? Thanks! [root@rac2 Lustre]# uname -a Linux rac2 2.6.9-22.0.2.EL_lustre.1.4.6smp #1 SMP Sun Feb 19 01:03:36 EST 2006 i686 i686 i386 GNU/Linux [root@rac2 Lustre]# rpm -aq | grep lust lustre-modules-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp lvm2-cluster-2.01.14-1.0.RHEL4 kernel-smp-2.6.9-22.0.2.EL_lustre.1.4.6 lustre-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp system-config-cluster-1.0.16-1.0 lustre-debuginfo-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp == Simple config.sh #!/bin/bash rm -f config.xml lmc -o config.xml --add net --node rac2 --nid rac2 --nettype tcp lmc -m config.xml --add mds --node rac2 --mds mds1 --fstype ext3 --dev /dev/ipsan/sda lmc -m config.xml --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 --stripe_pattern 0 lmc -m config.xml --add ost --node rac2 --lov lov1 --ost ost1 --fstype ext3 --dev /dev/ipsan/sdc lmc -m config.xml --add ost --node rac2 --lov lov1 --ost ost2 --fstype ext3 --dev /dev/ipsan/sdd lmc -m config.xml --add mtpt --node rac2 --path /mnt/lustre --mds mds1 --lov lov1 [root@rac2 Lustre]# sh config.sh [root@rac2 Lustre]# lconf --verbose --reformat config.xml .... record> End recording log rac2 on mds1 + /usr/sbin/lctl ignore_errors cfg_device $mds1 cleanup detach quit + losetup /dev/loop0 + losetup /dev/loop1 + losetup /dev/loop2 + losetup /dev/loop3 + losetup /dev/loop4 + losetup /dev/loop5 + losetup /dev/loop6 + losetup /dev/loop7 changing mtime of LOGS to 1141241563 + mktemp /tmp/lustre-cmd.XXXXXXXX + debugfs -w -R "mi /LOGS" </tmp/lustre-cmd.xHj12969 /dev/ipsan/sda MDSDEV: mds1 mds1_UUID /dev/ipsan/sda ldiskfs 0 no + /usr/sbin/lctl attach mdt MDT MDT_UUID quit + /usr/sbin/lctl cfg_device MDT setup quit + dumpe2fs -f -h /dev/ipsan/sda no external journal found for /dev/ipsan/sda MDS mount options: errors=remount-ro + /usr/sbin/lctl attach mds mds1 mds1_UUID quit + /usr/sbin/lctl cfg_device mds1 setup /dev/ipsan/sda ldiskfs mds1 errors=remount-ro quit Hanging here without returning. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060302/9e451e14/attachment.html
Manqing Liu
2006-May-19 07:36 UTC
[Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return
Yes, finally got it working. I changed /etc/hosts from: 172.30.33.242 rac2 ==> this is public network To 192.168.1.242 rac2 ==> this is iSCSI network Thanks for helping me out. -----Original Message----- From: wddi_1976 [mailto:wddi_1976@yahoo.com.cn] Sent: Thursday, March 02, 2006 6:49 PM To: Manqing Liu Cc: Felix, Evan J; lustre-discuss@clusterfs.com Subject: Re: [Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return Could you tell us your /etc/hosts? the problem maybe there. Manqing Liu wrote:> Thanks, dmesg show some connection error. Could be my network setup > problem. > > > > Right now, I am running MDS/OST/CLIENT on same host, which has 4 > interfaces: bond0 [eth2 and eth3] are using to connect to iSCSI target> for shared storage, and eth0 is a public interface. I am not using > eth1 yet. > > > > [root@rac2 ~]# ifconfig > > bond0 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A > > inet addr:192.168.1.242 Bcast:192.168.1.255 > Mask:255.255.255.0 > > > > eth0 Link encap:Ethernet HWaddr 00:0D:60:D5:DC:D2 > > inet addr:172.30.33.242 Bcast:172.30.33.255 > Mask:255.255.255.0 > > > > eth1 Link encap:Ethernet HWaddr 00:0D:60:D5:DC:D3 > > inet addr:192.168.2.242 Bcast:192.168.2.255 > Mask:255.255.255.0 > > > > eth2 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A > > inet6 addr: fe80::20e:cff:fe37:194a/64 Scope:Link > > > > eth3 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A > > inet6 addr: fe80::20e:cff:fe37:194a/64 Scope:Link > > > > lo Link encap:Local Loopback > > inet addr:127.0.0.1 Mask:255.0.0.0 > > > > >------------------------------------------------------------------------> >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss@clusterfs.com >https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
Mc Carthy, Fergal
2006-May-19 07:36 UTC
[Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return
Have you setup your /etc/modprobe.conf entries correctly? I.e. do you have a line saying options lnet networks=tcp0(eth0) since it appears that you are using the eth0 address from dmesg output? And are you using that address in your xml config for rac2? Fergal. -- Fergal.McCarthy@HP.com (The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated, you should consider this message and attachments as "HP CONFIDENTIAL".) ________________________________ From: lustre-discuss-bounces@clusterfs.com [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Manqing Liu Sent: 02 March 2006 17:12 To: Felix, Evan J; lustre-discuss@clusterfs.com Subject: RE: [Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return Thanks, dmesg show some connection error. Could be my network setup problem. Right now, I am running MDS/OST/CLIENT on same host, which has 4 interfaces: bond0 [eth2 and eth3] are using to connect to iSCSI target for shared storage, and eth0 is a public interface. I am not using eth1 yet. [root@rac2 ~]# ifconfig bond0 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A inet addr:192.168.1.242 Bcast:192.168.1.255 Mask:255.255.255.0 eth0 Link encap:Ethernet HWaddr 00:0D:60:D5:DC:D2 inet addr:172.30.33.242 Bcast:172.30.33.255 Mask:255.255.255.0 eth1 Link encap:Ethernet HWaddr 00:0D:60:D5:DC:D3 inet addr:192.168.2.242 Bcast:192.168.2.255 Mask:255.255.255.0 eth2 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A inet6 addr: fe80::20e:cff:fe37:194a/64 Scope:Link eth3 Link encap:Ethernet HWaddr 00:0E:0C:37:19:4A inet6 addr: fe80::20e:cff:fe37:194a/64 Scope:Link lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 .... LDISKFS-fs: mounted filesystem with ordered data mode. Lustre: 5358:0:(mds_fs.c:239:mds_init_server_data()) mds1: initializing new last_rcvd Lustre: MDT mds1 now serving /dev/ipsan/sda (dd4c1750-6e3a-4342-b4e1-a94174c2fd8d) with recovery enabled Lustre: MDT mds1 has stopped. loop: loaded (max 8 devices) kjournald starting. Commit interval 5 seconds LDISKFS FS on sda, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LustreError: Refusing connection from 172.30.33.242 for 172.30.33.242@tcp: No matching NI LustreError: 3305:0:(socklnd_cb.c:1476:ksocknal_recv_hello()) Error -104 reading HELLO from 172.30.33.242 LustreError: Connection to 172.30.33.242@tcp at host 172.30.33.242 on port 988 was reset: is it running a compatible version of Lustre and is 172.30.33.242@tcp one of its NIDs? LustreError: 3305:0:(socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len 240 192.168.1.242@tcp->172.30.33.242@tcp LustreError: 3305:0:(events.c:54:request_out_callback()) @@@ type 4, status -5 req@f7ea8e00 x1/t0 o8->ost1_UUID@rac2_UUID:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0 LustreError: 5614:0:(client.c:951:ptlrpc_expire_one_request()) @@@ timeout (sent at 1141319106, 0s ago) req@f7ea8e00 x1/t0 o8->ost1_UUID@rac2_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: Refusing connection from 172.30.33.242 for 172.30.33.242@tcp: No matching NI LustreError: 3306:0:(socklnd_cb.c:1476:ksocknal_recv_hello()) Error -104 reading HELLO from 172.30.33.242 -----Original Message----- From: Felix, Evan J [mailto:Evan.Felix@pnl.gov] Sent: Thursday, March 02, 2006 8:27 AM To: Manqing Liu; lustre-discuss@clusterfs.com Subject: RE: [Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return can you Tell us if there are any logs coming out in your messages logs, or in dmesg that may tell us more information. It almost seems like your Device is timing out, or not responding. A little more information about what the devices are would also be helpful Evan ________________________________ From: lustre-discuss-bounces@clusterfs.com [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Manqing Liu Sent: Wednesday, March 01, 2006 11:55 AM To: lustre-discuss@clusterfs.com Subject: [Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return When I ran "lconf -reformat config.xml", it hangs at the very end of the setup, any suggestions? Thanks! [root@rac2 Lustre]# uname -a Linux rac2 2.6.9-22.0.2.EL_lustre.1.4.6smp #1 SMP Sun Feb 19 01:03:36 EST 2006 i686 i686 i386 GNU/Linux [root@rac2 Lustre]# rpm -aq | grep lust lustre-modules-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp lvm2-cluster-2.01.14-1.0.RHEL4 kernel-smp-2.6.9-22.0.2.EL_lustre.1.4.6 lustre-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp system-config-cluster-1.0.16-1.0 lustre-debuginfo-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp == Simple config.sh #!/bin/bash rm -f config.xml lmc -o config.xml --add net --node rac2 --nid rac2 --nettype tcp lmc -m config.xml --add mds --node rac2 --mds mds1 --fstype ext3 --dev /dev/ipsan/sda lmc -m config.xml --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 --stripe_pattern 0 lmc -m config.xml --add ost --node rac2 --lov lov1 --ost ost1 --fstype ext3 --dev /dev/ipsan/sdc lmc -m config.xml --add ost --node rac2 --lov lov1 --ost ost2 --fstype ext3 --dev /dev/ipsan/sdd lmc -m config.xml --add mtpt --node rac2 --path /mnt/lustre --mds mds1 --lov lov1 [root@rac2 Lustre]# sh config.sh [root@rac2 Lustre]# lconf --verbose --reformat config.xml .... record> End recording log rac2 on mds1 + /usr/sbin/lctl ignore_errors cfg_device $mds1 cleanup detach quit + losetup /dev/loop0 + losetup /dev/loop1 + losetup /dev/loop2 + losetup /dev/loop3 + losetup /dev/loop4 + losetup /dev/loop5 + losetup /dev/loop6 + losetup /dev/loop7 changing mtime of LOGS to 1141241563 + mktemp /tmp/lustre-cmd.XXXXXXXX + debugfs -w -R "mi /LOGS" </tmp/lustre-cmd.xHj12969 /dev/ipsan/sda MDSDEV: mds1 mds1_UUID /dev/ipsan/sda ldiskfs 0 no + /usr/sbin/lctl attach mdt MDT MDT_UUID quit + /usr/sbin/lctl cfg_device MDT setup quit + dumpe2fs -f -h /dev/ipsan/sda no external journal found for /dev/ipsan/sda MDS mount options: errors=remount-ro + /usr/sbin/lctl attach mds mds1 mds1_UUID quit + /usr/sbin/lctl cfg_device mds1 setup /dev/ipsan/sda ldiskfs mds1 errors=remount-ro quit Hanging here without returning. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060303/40c80a4d/attachment.html
Manqing Liu
2006-May-19 07:36 UTC
[Lustre-discuss] luster-1.4.6 on RHEL4 => lconf doesn''t return
When I ran "lconf -reformat config.xml", it hangs at the very end of the setup, any suggestions? Thanks! [root@rac2 Lustre]# uname -a Linux rac2 2.6.9-22.0.2.EL_lustre.1.4.6smp #1 SMP Sun Feb 19 01:03:36 EST 2006 i686 i686 i386 GNU/Linux [root@rac2 Lustre]# rpm -aq | grep lust lustre-modules-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp lvm2-cluster-2.01.14-1.0.RHEL4 kernel-smp-2.6.9-22.0.2.EL_lustre.1.4.6 lustre-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp system-config-cluster-1.0.16-1.0 lustre-debuginfo-1.4.6-2.6.9_22.0.2.EL_lustre.1.4.6smp == Simple config.sh #!/bin/bash rm -f config.xml lmc -o config.xml --add net --node rac2 --nid rac2 --nettype tcp lmc -m config.xml --add mds --node rac2 --mds mds1 --fstype ext3 --dev /dev/ipsan/sda lmc -m config.xml --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 --stripe_pattern 0 lmc -m config.xml --add ost --node rac2 --lov lov1 --ost ost1 --fstype ext3 --dev /dev/ipsan/sdc lmc -m config.xml --add ost --node rac2 --lov lov1 --ost ost2 --fstype ext3 --dev /dev/ipsan/sdd lmc -m config.xml --add mtpt --node rac2 --path /mnt/lustre --mds mds1 --lov lov1 [root@rac2 Lustre]# sh config.sh [root@rac2 Lustre]# lconf --verbose --reformat config.xml .... record> End recording log rac2 on mds1 + /usr/sbin/lctl ignore_errors cfg_device $mds1 cleanup detach quit + losetup /dev/loop0 + losetup /dev/loop1 + losetup /dev/loop2 + losetup /dev/loop3 + losetup /dev/loop4 + losetup /dev/loop5 + losetup /dev/loop6 + losetup /dev/loop7 changing mtime of LOGS to 1141241563 + mktemp /tmp/lustre-cmd.XXXXXXXX + debugfs -w -R "mi /LOGS" </tmp/lustre-cmd.xHj12969 /dev/ipsan/sda MDSDEV: mds1 mds1_UUID /dev/ipsan/sda ldiskfs 0 no + /usr/sbin/lctl attach mdt MDT MDT_UUID quit + /usr/sbin/lctl cfg_device MDT setup quit + dumpe2fs -f -h /dev/ipsan/sda no external journal found for /dev/ipsan/sda MDS mount options: errors=remount-ro + /usr/sbin/lctl attach mds mds1 mds1_UUID quit + /usr/sbin/lctl cfg_device mds1 setup /dev/ipsan/sda ldiskfs mds1 errors=remount-ro quit Hanging here without returning Using Ctrl-C, traces as: Traceback (most recent call last): File "/usr/sbin/lconf", line 2827, in ? main() File "/usr/sbin/lconf", line 2820, in main doHost(lustreDB, node_list) File "/usr/sbin/lconf", line 2264, in doHost for_each_profile(node_db, prof_list, doSetup) File "/usr/sbin/lconf", line 2044, in for_each_profile operation(services) File "/usr/sbin/lconf", line 2064, in doSetup n.prepare() File "/usr/sbin/lconf", line 1321, in prepare setup ="%s %s %s %s %s" %(blkdev, self.fstype, self.name, File "/usr/sbin/lconf", line 397, in newdev self.setup(name, setup) File "/usr/sbin/lconf", line 376, in setup self.run(cmds) File "/usr/sbin/lconf", line 278, in run ready = select.select([outfd,errfd],[],[]) # Wait for input KeyboardInterrupt -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060301/c396fdad/attachment.html