Jon Yeargers
2012-Sep-17 14:57 UTC
[Lustre-discuss] Client hangs on ''simple'' lustre setup
Issue: I''m trying to assess the (possible) use of Lustre for our group. To this end I''ve been trying to create a simple system to explore the nuances. I can''t seem to get past the ''llmount.sh'' test with any degree of success. What I''ve done: Each system (throwaway PCs with 70Gb HD, 2Gb RAM) is formatted with CentOS 6.2. I then update everything and install the Lustre kernel from downloads.whamcloud.com and add on the various (appropriate) lustre and e2fs RPM files. Systems are rebooted and tested with ''llmount.sh'' (and then cleared with ''llmountcleanup.sh''). All is well to this point. First I create an MDS/MDT system via: /usr/sbin/mkfs.lustre --mgs --mdt --fsname=lustre --device-size=2000000 --param sys.timeout=20 --mountfsoptions=errors=remount-ro,user_xattr,acl --param lov.stripesize=1048576 --param lov.stripecount=0 --param mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype ldiskfs --reformat /tmp/lustre-mdt1 and then mkdir -p /mnt/mds1 mount -t lustre -o loop,user_xattr,acl /tmp/lustre-mdt1 /mnt/mds1 Next I take 3 systems and create a 2Gb loop mount via: /usr/sbin/mkfs.lustre --ost --fsname=lustre --device-size=2000000 --param sys.timeout=20 --mgsnode=lustre_MDS0 at tcp --backfstype ldiskfs --reformat /tmp/lustre-ost1 mkdir -p /mnt/ost1 mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1 The logs on the MDT box show the OSS boxes connecting up. All appears ok. Last I create a client and attach to the MDT box: mkdir -p /mnt/lustre mount -t lustre -o user_xattr,acl,flock luster_MDS0 at tcp:/lustre /mnt/lustre Again, the log on the MDT box shows the client connection. Appears to be successful. Here''s where the issues (appear to) start. If I do a ''df -h'' on the client it hangs after showing the system drives. If I attempt to create files (via ''dd'') on the lustre mount the session hangs and the job can''t be killed. Rebooting the client is the only solution. I can create and use a client on the MDS/MSG box. Doing so from any other machine will hang. From the MDS box: [root at lustre_mds0 lustre]# lctl dl 0 UP mgs MGS MGS 13 1 UP mgc MGC10.127.24.42 at tcp 7923c008-a0de-1c87-f21a-4a5ab48abb96 5 2 UP lov lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4 3 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 7 4 UP mds mdd_obd-lustre-MDT0000 mdd_obd_uuid-lustre-MDT0000 3 5 UP osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 6 UP osc lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 7 UP lov lustre-clilov-ffff8800631c8000 b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4 8 UP lmv lustre-clilmv-ffff8800631c8000 b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4 9 UP mdc lustre-MDT0000-mdc-ffff8800631c8000 b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 10 UP osc lustre-OST0000-osc-ffff8800631c8000 b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 11 UP osc lustre-OST0001-osc-ffff8800631c8000 b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 12 UP osc lustre-OST0002-osc-ffff8800631c8000 b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 13 UP osc lustre-OST0002-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 [root at lustre_mds0 lustre]# lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 1.4G 83.9M 1.3G 6% /mnt/lustre[MDT:0] lustre-OST0000_UUID 1.9G 1.1G 716.5M 61% /mnt/lustre[OST:0] lustre-OST0001_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:1] lustre-OST0002_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:2] filesystem summary: 5.6G 3.2G 2.1G 60% /mnt/lustre All appears normal. Doing this from another (identical) client: [root at lfstest0 lustre]# lctl dl 0 UP mgc MGC10.127.24.42 at tcp 272a8405-8512-e9de-f532-feb5b7d6f9b1 5 1 UP lov lustre-clilov-ffff880070eee400 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4 2 UP lmv lustre-clilmv-ffff880070eee400 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4 3 UP mdc lustre-MDT0000-mdc-ffff880070eee400 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 4 UP osc lustre-OST0000-osc-ffff880070eee400 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 5 UP osc lustre-OST0001-osc-ffff880070eee400 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 6 UP osc lustre-OST0002-osc-ffff880070eee400 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 [root at lfstest0 lustre]# lfs df UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 1499596 85888 1313708 6% /mnt/lustre[MDT:0] OST0000 : inactive device lustre-OST0001_UUID 1968528 1122468 745996 60% /mnt/lustre[OST:1] OST0002 : inactive device filesystem summary: 1968528 1122468 745996 60% /mnt/luster Doing a ?dd? or ?touch? or even ?df? from this machine will hang it. EDIT: each system has all other systems defined in /etc/hosts and entries in iptables to provide access. All systems have identical setup: [root at lfstest0 lustre]# rpm -qa | grep lustre lustre-ldiskfs-3.3.0-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 lustre-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 kernel-2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 lustre-modules-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 lustre-tests-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 [root at lfstest0 lustre]# uname -a Linux lfstest0 2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 #1 SMP Mon Aug 13 11:00:10 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux [root at lfstest0 lustre]# rpm -qa | grep e2fs e2fsprogs-libs-1.41.90.wc2-7.el6.x86_64 e2fsprogs-1.41.90.wc2-7.el6.x86_64 SO: I''m clearly making several mistakes. Any pointers as to where to start correcting them? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120917/e9db3160/attachment.html
Colin Faber
2012-Sep-17 15:05 UTC
[Lustre-discuss] Client hangs on ''simple'' lustre setup
Hi Jon, If you could provide the logging from your clients and mds (as well as possibly oss''s) that''ll help in determining the problem. -cf On 09/17/2012 08:57 AM, Jon Yeargers wrote:> > Issue: I''m trying to assess the (possible) use of Lustre for our > group. To this end I''ve been trying to create a simple system to > explore the nuances. I can''t seem to get past the ''llmount.sh'' test > with any degree of success. > > What I''ve done: Each system (throwaway PCs with 70Gb HD, 2Gb RAM) is > formatted with CentOS 6.2. I then update everything and install the > Lustre kernel from downloads.whamcloud.com and add on the various > (appropriate) lustre and e2fs RPM files. Systems are rebooted and > tested with ''llmount.sh'' (and then cleared with ''llmountcleanup.sh''). > All is well to this point. > > First I create an MDS/MDT system via: > > /usr/sbin/mkfs.lustre --mgs --mdt --fsname=lustre > --device-size=2000000 --param sys.timeout=20 > --mountfsoptions=errors=remount-ro,user_xattr,acl --param > lov.stripesize=1048576 --param lov.stripecount=0 --param > mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype ldiskfs > --reformat /tmp/lustre-mdt1 > > and then > > mkdir -p /mnt/mds1 > > mount -t lustre -o loop,user_xattr,acl /tmp/lustre-mdt1 /mnt/mds1 > > Next I take 3 systems and create a 2Gb loop mount via: > > /usr/sbin/mkfs.lustre --ost --fsname=lustre --device-size=2000000 > --param sys.timeout=20 --mgsnode=lustre_MDS0 at tcp --backfstype ldiskfs > --reformat /tmp/lustre-ost1 > > mkdir -p /mnt/ost1 > > mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1 > > The logs on the MDT box show the OSS boxes connecting up. All appears ok. > > Last I create a client and attach to the MDT box: > > mkdir -p /mnt/lustre > > mount -t lustre -o user_xattr,acl,flock luster_MDS0 at tcp:/lustre > /mnt/lustre > > Again, the log on the MDT box shows the client connection. Appears to > be successful. > > Here''s where the issues (appear to) start. If I do a ''df -h'' on the > client it hangs after showing the system drives. If I attempt to > create files (via ''dd'') on the lustre mount the session hangs and the > job can''t be killed. Rebooting the client is the only solution. > > I can create and use a client on the MDS/MSG box. Doing so from any > other machine will hang. > > From the MDS box: > > [root at lustre_mds0 lustre]# lctl dl > > 0 UP mgs MGS MGS 13 > > 1 UP mgc MGC10.127.24.42 at tcp 7923c008-a0de-1c87-f21a-4a5ab48abb96 5 > > 2 UP lov lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4 > > 3 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 7 > > 4 UP mds mdd_obd-lustre-MDT0000 mdd_obd_uuid-lustre-MDT0000 3 > > 5 UP osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 > > 6 UP osc lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 > > 7 UP lov lustre-clilov-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4 > > 8 UP lmv lustre-clilmv-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4 > > 9 UP mdc lustre-MDT0000-mdc-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 > > 10 UP osc lustre-OST0000-osc-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 > > 11 UP osc lustre-OST0001-osc-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 > > 12 UP osc lustre-OST0002-osc-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 > > 13 UP osc lustre-OST0002-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 > > [root at lustre_mds0 lustre]# lfs df -h > > UUID bytes Used Available Use% Mounted on > > lustre-MDT0000_UUID 1.4G 83.9M 1.3G 6% /mnt/lustre[MDT:0] > > lustre-OST0000_UUID 1.9G 1.1G 716.5M 61% /mnt/lustre[OST:0] > > lustre-OST0001_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:1] > > lustre-OST0002_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:2] > > filesystem summary: 5.6G 3.2G 2.1G 60% /mnt/lustre > > All appears normal. > > Doing this from another (identical) client: > > [root at lfstest0 lustre]# lctl dl > > 0 UP mgc MGC10.127.24.42 at tcp 272a8405-8512-e9de-f532-feb5b7d6f9b1 5 > > 1 UP lov lustre-clilov-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4 > > 2 UP lmv lustre-clilmv-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4 > > 3 UP mdc lustre-MDT0000-mdc-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 > > 4 UP osc lustre-OST0000-osc-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 > > 5 UP osc lustre-OST0001-osc-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 > > 6 UP osc lustre-OST0002-osc-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 > > [root at lfstest0 lustre]# lfs df > > UUID 1K-blocks Used Available Use% Mounted on > > lustre-MDT0000_UUID 1499596 85888 1313708 6% /mnt/lustre[MDT:0] > > OST0000 : inactive device > > lustre-OST0001_UUID 1968528 1122468 745996 60% /mnt/lustre[OST:1] > > OST0002 : inactive device > > filesystem summary: 1968528 1122468 745996 60% /mnt/luster > > Doing a ?dd? or ?touch? or even ?df? from this machine will hang it. > > EDIT: each system has all other systems defined in /etc/hosts and > entries in iptables to provide access. > > All systems have identical setup: > > [root at lfstest0 lustre]# rpm -qa | grep lustre > > lustre-ldiskfs-3.3.0-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 > > lustre-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 > > kernel-2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 > > lustre-modules-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 > > lustre-tests-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 > > [root at lfstest0 lustre]# uname -a > > Linux lfstest0 2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 #1 SMP Mon > Aug 13 11:00:10 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux > > [root at lfstest0 lustre]# rpm -qa | grep e2fs > > e2fsprogs-libs-1.41.90.wc2-7.el6.x86_64 > > e2fsprogs-1.41.90.wc2-7.el6.x86_64 > > SO: I''m clearly making several mistakes. Any pointers as to where to > start correcting them? > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Jon Yeargers
2012-Sep-17 15:33 UTC
[Lustre-discuss] Client hangs on ''simple'' lustre setup
What''s the preferred method for posting large files to this list? Does Lustre have specific log files that I should look for or just anything in /var/log/messages that seems appropriate? -----Original Message----- From: Colin Faber [mailto:cfaber at gmail.com] Sent: Monday, September 17, 2012 8:05 AM To: Jon Yeargers Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Client hangs on ''simple'' lustre setup Hi Jon, If you could provide the logging from your clients and mds (as well as possibly oss''s) that''ll help in determining the problem. -cf On 09/17/2012 08:57 AM, Jon Yeargers wrote:> > Issue: I''m trying to assess the (possible) use of Lustre for our > group. To this end I''ve been trying to create a simple system to > explore the nuances. I can''t seem to get past the ''llmount.sh'' test > with any degree of success. > > What I''ve done: Each system (throwaway PCs with 70Gb HD, 2Gb RAM) is > formatted with CentOS 6.2. I then update everything and install the > Lustre kernel from downloads.whamcloud.com and add on the various > (appropriate) lustre and e2fs RPM files. Systems are rebooted and > tested with ''llmount.sh'' (and then cleared with ''llmountcleanup.sh''). > All is well to this point. > > First I create an MDS/MDT system via: > > /usr/sbin/mkfs.lustre --mgs --mdt --fsname=lustre > --device-size=2000000 --param sys.timeout=20 > --mountfsoptions=errors=remount-ro,user_xattr,acl --param > lov.stripesize=1048576 --param lov.stripecount=0 --param > mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype ldiskfs > --reformat /tmp/lustre-mdt1 > > and then > > mkdir -p /mnt/mds1 > > mount -t lustre -o loop,user_xattr,acl /tmp/lustre-mdt1 /mnt/mds1 > > Next I take 3 systems and create a 2Gb loop mount via: > > /usr/sbin/mkfs.lustre --ost --fsname=lustre --device-size=2000000 > --param sys.timeout=20 --mgsnode=lustre_MDS0 at tcp --backfstype ldiskfs > --reformat /tmp/lustre-ost1 > > mkdir -p /mnt/ost1 > > mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1 > > The logs on the MDT box show the OSS boxes connecting up. All appears ok. > > Last I create a client and attach to the MDT box: > > mkdir -p /mnt/lustre > > mount -t lustre -o user_xattr,acl,flock luster_MDS0 at tcp:/lustre > /mnt/lustre > > Again, the log on the MDT box shows the client connection. Appears to > be successful. > > Here''s where the issues (appear to) start. If I do a ''df -h'' on the > client it hangs after showing the system drives. If I attempt to > create files (via ''dd'') on the lustre mount the session hangs and the > job can''t be killed. Rebooting the client is the only solution. > > I can create and use a client on the MDS/MSG box. Doing so from any > other machine will hang. > > From the MDS box: > > [root at lustre_mds0 lustre]# lctl dl > > 0 UP mgs MGS MGS 13 > > 1 UP mgc MGC10.127.24.42 at tcp 7923c008-a0de-1c87-f21a-4a5ab48abb96 5 > > 2 UP lov lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4 > > 3 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 7 > > 4 UP mds mdd_obd-lustre-MDT0000 mdd_obd_uuid-lustre-MDT0000 3 > > 5 UP osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 > > 6 UP osc lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 > > 7 UP lov lustre-clilov-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4 > > 8 UP lmv lustre-clilmv-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4 > > 9 UP mdc lustre-MDT0000-mdc-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 > > 10 UP osc lustre-OST0000-osc-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 > > 11 UP osc lustre-OST0001-osc-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 > > 12 UP osc lustre-OST0002-osc-ffff8800631c8000 > b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 > > 13 UP osc lustre-OST0002-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 > > [root at lustre_mds0 lustre]# lfs df -h > > UUID bytes Used Available Use% Mounted on > > lustre-MDT0000_UUID 1.4G 83.9M 1.3G 6% /mnt/lustre[MDT:0] > > lustre-OST0000_UUID 1.9G 1.1G 716.5M 61% /mnt/lustre[OST:0] > > lustre-OST0001_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:1] > > lustre-OST0002_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:2] > > filesystem summary: 5.6G 3.2G 2.1G 60% /mnt/lustre > > All appears normal. > > Doing this from another (identical) client: > > [root at lfstest0 lustre]# lctl dl > > 0 UP mgc MGC10.127.24.42 at tcp 272a8405-8512-e9de-f532-feb5b7d6f9b1 5 > > 1 UP lov lustre-clilov-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4 > > 2 UP lmv lustre-clilmv-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4 > > 3 UP mdc lustre-MDT0000-mdc-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 > > 4 UP osc lustre-OST0000-osc-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 > > 5 UP osc lustre-OST0001-osc-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 > > 6 UP osc lustre-OST0002-osc-ffff880070eee400 > 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 > > [root at lfstest0 lustre]# lfs df > > UUID 1K-blocks Used Available Use% Mounted on > > lustre-MDT0000_UUID 1499596 85888 1313708 6% /mnt/lustre[MDT:0] > > OST0000 : inactive device > > lustre-OST0001_UUID 1968528 1122468 745996 60% /mnt/lustre[OST:1] > > OST0002 : inactive device > > filesystem summary: 1968528 1122468 745996 60% /mnt/luster > > Doing a ?dd? or ?touch? or even ?df? from this machine will hang it. > > EDIT: each system has all other systems defined in /etc/hosts and > entries in iptables to provide access. > > All systems have identical setup: > > [root at lfstest0 lustre]# rpm -qa | grep lustre > > lustre-ldiskfs-3.3.0-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 > > lustre-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 > > kernel-2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 > > lustre-modules-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 > > lustre-tests-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 > > [root at lfstest0 lustre]# uname -a > > Linux lfstest0 2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 #1 SMP Mon > Aug 13 11:00:10 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux > > [root at lfstest0 lustre]# rpm -qa | grep e2fs > > e2fsprogs-libs-1.41.90.wc2-7.el6.x86_64 > > e2fsprogs-1.41.90.wc2-7.el6.x86_64 > > SO: I''m clearly making several mistakes. Any pointers as to where to > start correcting them? > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Colin Faber
2012-Sep-17 15:37 UTC
[Lustre-discuss] Client hangs on ''simple'' lustre setup
Hi, Ideally just post the logging directly around the event and not the entire syslog =) /var/log/messages right around the time you try and mount / ls and dmesg | tail -n 75 or so, if it''s larger, pastebin is a great resource to reference so you''re not polluting peoples inbox with large logs. -cf On 09/17/2012 09:33 AM, Jon Yeargers wrote:> What''s the preferred method for posting large files to this list? > > Does Lustre have specific log files that I should look for or just anything in /var/log/messages that seems appropriate? > > -----Original Message----- > From: Colin Faber [mailto:cfaber at gmail.com] > Sent: Monday, September 17, 2012 8:05 AM > To: Jon Yeargers > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Client hangs on ''simple'' lustre setup > > Hi Jon, > > If you could provide the logging from your clients and mds (as well as possibly oss''s) that''ll help in determining the problem. > > -cf > > On 09/17/2012 08:57 AM, Jon Yeargers wrote: >> Issue: I''m trying to assess the (possible) use of Lustre for our >> group. To this end I''ve been trying to create a simple system to >> explore the nuances. I can''t seem to get past the ''llmount.sh'' test >> with any degree of success. >> >> What I''ve done: Each system (throwaway PCs with 70Gb HD, 2Gb RAM) is >> formatted with CentOS 6.2. I then update everything and install the >> Lustre kernel from downloads.whamcloud.com and add on the various >> (appropriate) lustre and e2fs RPM files. Systems are rebooted and >> tested with ''llmount.sh'' (and then cleared with ''llmountcleanup.sh''). >> All is well to this point. >> >> First I create an MDS/MDT system via: >> >> /usr/sbin/mkfs.lustre --mgs --mdt --fsname=lustre >> --device-size=2000000 --param sys.timeout=20 >> --mountfsoptions=errors=remount-ro,user_xattr,acl --param >> lov.stripesize=1048576 --param lov.stripecount=0 --param >> mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype ldiskfs >> --reformat /tmp/lustre-mdt1 >> >> and then >> >> mkdir -p /mnt/mds1 >> >> mount -t lustre -o loop,user_xattr,acl /tmp/lustre-mdt1 /mnt/mds1 >> >> Next I take 3 systems and create a 2Gb loop mount via: >> >> /usr/sbin/mkfs.lustre --ost --fsname=lustre --device-size=2000000 >> --param sys.timeout=20 --mgsnode=lustre_MDS0 at tcp --backfstype ldiskfs >> --reformat /tmp/lustre-ost1 >> >> mkdir -p /mnt/ost1 >> >> mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1 >> >> The logs on the MDT box show the OSS boxes connecting up. All appears ok. >> >> Last I create a client and attach to the MDT box: >> >> mkdir -p /mnt/lustre >> >> mount -t lustre -o user_xattr,acl,flock luster_MDS0 at tcp:/lustre >> /mnt/lustre >> >> Again, the log on the MDT box shows the client connection. Appears to >> be successful. >> >> Here''s where the issues (appear to) start. If I do a ''df -h'' on the >> client it hangs after showing the system drives. If I attempt to >> create files (via ''dd'') on the lustre mount the session hangs and the >> job can''t be killed. Rebooting the client is the only solution. >> >> I can create and use a client on the MDS/MSG box. Doing so from any >> other machine will hang. >> >> From the MDS box: >> >> [root at lustre_mds0 lustre]# lctl dl >> >> 0 UP mgs MGS MGS 13 >> >> 1 UP mgc MGC10.127.24.42 at tcp 7923c008-a0de-1c87-f21a-4a5ab48abb96 5 >> >> 2 UP lov lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4 >> >> 3 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 7 >> >> 4 UP mds mdd_obd-lustre-MDT0000 mdd_obd_uuid-lustre-MDT0000 3 >> >> 5 UP osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 >> >> 6 UP osc lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 >> >> 7 UP lov lustre-clilov-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4 >> >> 8 UP lmv lustre-clilmv-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4 >> >> 9 UP mdc lustre-MDT0000-mdc-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 >> >> 10 UP osc lustre-OST0000-osc-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 >> >> 11 UP osc lustre-OST0001-osc-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 >> >> 12 UP osc lustre-OST0002-osc-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 >> >> 13 UP osc lustre-OST0002-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 >> >> [root at lustre_mds0 lustre]# lfs df -h >> >> UUID bytes Used Available Use% Mounted on >> >> lustre-MDT0000_UUID 1.4G 83.9M 1.3G 6% /mnt/lustre[MDT:0] >> >> lustre-OST0000_UUID 1.9G 1.1G 716.5M 61% /mnt/lustre[OST:0] >> >> lustre-OST0001_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:1] >> >> lustre-OST0002_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:2] >> >> filesystem summary: 5.6G 3.2G 2.1G 60% /mnt/lustre >> >> All appears normal. >> >> Doing this from another (identical) client: >> >> [root at lfstest0 lustre]# lctl dl >> >> 0 UP mgc MGC10.127.24.42 at tcp 272a8405-8512-e9de-f532-feb5b7d6f9b1 5 >> >> 1 UP lov lustre-clilov-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4 >> >> 2 UP lmv lustre-clilmv-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4 >> >> 3 UP mdc lustre-MDT0000-mdc-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 >> >> 4 UP osc lustre-OST0000-osc-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 >> >> 5 UP osc lustre-OST0001-osc-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 >> >> 6 UP osc lustre-OST0002-osc-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 >> >> [root at lfstest0 lustre]# lfs df >> >> UUID 1K-blocks Used Available Use% Mounted on >> >> lustre-MDT0000_UUID 1499596 85888 1313708 6% /mnt/lustre[MDT:0] >> >> OST0000 : inactive device >> >> lustre-OST0001_UUID 1968528 1122468 745996 60% /mnt/lustre[OST:1] >> >> OST0002 : inactive device >> >> filesystem summary: 1968528 1122468 745996 60% /mnt/luster >> >> Doing a ?dd? or ?touch? or even ?df? from this machine will hang it. >> >> EDIT: each system has all other systems defined in /etc/hosts and >> entries in iptables to provide access. >> >> All systems have identical setup: >> >> [root at lfstest0 lustre]# rpm -qa | grep lustre >> >> lustre-ldiskfs-3.3.0-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 >> >> lustre-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 >> >> kernel-2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 >> >> lustre-modules-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 >> >> lustre-tests-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 >> >> [root at lfstest0 lustre]# uname -a >> >> Linux lfstest0 2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 #1 SMP Mon >> Aug 13 11:00:10 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux >> >> [root at lfstest0 lustre]# rpm -qa | grep e2fs >> >> e2fsprogs-libs-1.41.90.wc2-7.el6.x86_64 >> >> e2fsprogs-1.41.90.wc2-7.el6.x86_64 >> >> SO: I''m clearly making several mistakes. Any pointers as to where to >> start correcting them? >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Jon Yeargers
2012-Sep-17 15:51 UTC
[Lustre-discuss] Client hangs on ''simple'' lustre setup
This is from the MDS machine: I have rebooted the client, mounted the /lustre partition and entered "ls /mnt/lustre". The client is hung and remains so. (NOTE: much of this log is from earlier today - you can see previous attempts to make this process work) Sep 17 06:16:52 lustre_mds0 kernel: Lustre: Lustre: Build Version: RC3--PRISTINE-2.6.32-279.2.1.el6_lustre.gc46c389.x86_ 64 Sep 17 06:16:53 lustre_mds0 kernel: Lustre: Added LNI 10.127.24.42 at tcp [8/256/0/180] Sep 17 06:16:53 lustre_mds0 kernel: Lustre: Accept secure, port 988 Sep 17 06:16:54 lustre_mds0 kernel: Lustre: Lustre OSC module (ffffffffa07f2b20). Sep 17 06:16:54 lustre_mds0 kernel: Lustre: Lustre LOV module (ffffffffa08714c0). Sep 17 06:16:54 lustre_mds0 kernel: Lustre: Lustre client module (ffffffffa095e260). Sep 17 06:17:17 lustre_mds0 kernel: LDISKFS-fs warning (device loop0): ldiskfs_fill_super: extents feature not enabled o n this filesystem, use tune2fs. Sep 17 06:17:17 lustre_mds0 kernel: LDISKFS-fs (loop0): warning: maximal mount count reached, running e2fsck is recommen ded Sep 17 06:17:17 lustre_mds0 kernel: LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: Sep 17 06:18:12 lustre_mds0 kernel: LDISKFS-fs warning (device loop0): ldiskfs_fill_super: extents feature not enabled o n this filesystem, use tune2fs. Sep 17 06:18:12 lustre_mds0 kernel: LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: Sep 17 06:20:49 lustre_mds0 kernel: LDISKFS-fs warning (device loop0): ldiskfs_fill_super: extents feature not enabled o n this filesystem, use tune2fs. Sep 17 06:20:49 lustre_mds0 kernel: LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: Sep 17 06:20:49 lustre_mds0 kernel: LDISKFS-fs warning (device loop0): ldiskfs_fill_super: extents feature not enabled o n this filesystem, use tune2fs. Sep 17 06:20:49 lustre_mds0 kernel: LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: Sep 17 06:20:49 lustre_mds0 kernel: Lustre: MGS MGS started Sep 17 06:20:49 lustre_mds0 kernel: Lustre: 1228:0:(ldlm_lib.c:946:target_handle_connect()) MGS: connection from 7923c00 8-a0de-1c87-f21a-4a5ab48abb96 at 0@lo t0 exp (null) cur 1347888049 last 0 Sep 17 06:20:49 lustre_mds0 kernel: Lustre: MGC10.127.24.42 at tcp: Reactivating import Sep 17 06:20:49 lustre_mds0 kernel: Lustre: MGS: Regenerating lustre-MDTffff log by user request. Sep 17 06:20:49 lustre_mds0 kernel: Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000 Sep 17 06:20:49 lustre_mds0 kernel: Lustre: Enabling ACL Sep 17 06:20:49 lustre_mds0 kernel: Lustre: Enabling user_xattr Sep 17 06:20:50 lustre_mds0 kernel: Lustre: lustre-MDT0000: new disk, initializing Sep 17 06:20:50 lustre_mds0 kernel: Lustre: 1240:0:(mdt_lproc.c:416:lprocfs_wr_identity_upcall()) lustre-MDT0000: identi ty upcall set to /usr/sbin/l_getidentity Sep 17 06:21:04 lustre_mds0 kernel: Lustre: 1228:0:(ldlm_lib.c:946:target_handle_connect()) MGS: connection from b05bf03 e-633e-604d-54ba-9cc47b0b160d at 10.127.56.72@tcp t0 exp (null) cur 1347888064 last 0 Sep 17 06:21:04 lustre_mds0 kernel: Lustre: MGS: Regenerating lustre-OSTffff log by user request. Sep 17 06:21:10 lustre_mds0 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphan s Sep 17 06:22:44 lustre_mds0 kernel: Lustre: 1228:0:(ldlm_lib.c:946:target_handle_connect()) MGS: connection from c72ffd0 3-3f98-3fc6-c8bc-648b4c266970 at 10.127.56.87@tcp t0 exp (null) cur 1347888164 last 0 Sep 17 06:22:44 lustre_mds0 kernel: Lustre: MGS: Regenerating lustre-OSTffff log by user request. Sep 17 06:22:47 lustre_mds0 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0001_UUID now active, resetting orphan s Sep 17 06:24:42 lustre_mds0 kernel: LustreError: 152-6: Ignoring deprecated mount option ''acl''. Sep 17 06:24:42 lustre_mds0 kernel: Lustre: 1253:0:(ldlm_lib.c:946:target_handle_connect()) lustre-MDT0000: connection f rom 8c097db6-d726-216f-0c84-77cbead03807 at 0@lo t0 exp (null) cur 1347888282 last 0 Sep 17 06:24:42 lustre_mds0 kernel: Lustre: Mounted lustre-client Sep 17 06:27:21 lustre_mds0 kernel: Lustre: Modifying parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT00 00 Sep 17 06:27:21 lustre_mds0 kernel: Lustre: Skipped 4 previous similar messages Sep 17 06:27:45 lustre_mds0 kernel: Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0 000000240000400):0:0 Sep 17 06:34:24 lustre_mds0 kernel: Lustre: 1227:0:(ldlm_lib.c:946:target_handle_connect()) MGS: connection from 9077ab0 7-a949-368f-d946-b4d8dc6addea at 10.127.56.88@tcp t0 exp (null) cur 1347888864 last 0 Sep 17 06:34:24 lustre_mds0 kernel: Lustre: MGS: Regenerating lustre-OSTffff log by user request. Sep 17 06:34:28 lustre_mds0 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0002_UUID now active, resetting orphan s Sep 17 06:35:01 lustre_mds0 kernel: LustreError: 1352:0:(ldlm_request.c:1172:ldlm_cli_cancel_req()) Got rc -108 from can cel RPC: canceling anyway Sep 17 06:35:01 lustre_mds0 kernel: LustreError: 1352:0:(ldlm_request.c:1799:ldlm_cli_cancel_list()) ldlm_cli_cancel_lis t: -108 Sep 17 06:35:01 lustre_mds0 kernel: Lustre: Unmounted lustre-client Sep 17 06:35:27 lustre_mds0 kernel: LustreError: 152-6: Ignoring deprecated mount option ''acl''. Sep 17 06:35:27 lustre_mds0 kernel: Lustre: 1252:0:(ldlm_lib.c:946:target_handle_connect()) lustre-MDT0000: connection f rom b6b66579-1f44-90e5-ae63-e778d4ed6ac5 at 0@lo t0 exp (null) cur 1347888927 last 0 Sep 17 06:35:27 lustre_mds0 kernel: Lustre: Mounted lustre-client Sep 17 06:57:08 lustre_mds0 kernel: Lustre: 1227:0:(ldlm_lib.c:946:target_handle_connect()) MGS: connection from 35c69b8 6-f9d2-e274-7ac5-5380a17f9c32 at 10.127.24.35@tcp t0 exp (null) cur 1347890228 last 0 Sep 17 07:31:54 lustre_mds0 kernel: hrtimer: interrupt took 10250792 ns Sep 17 07:34:23 lustre_mds0 kernel: Lustre: 1328:0:(client.c:1780:ptlrpc_expire_one_request()) @@@ Request sent has tim ed out for slow reply: [sent 1347892452/real 1347892452] req at ffff880078a04800 x1413362811414993/t0(0) o104->lustre-MDT0 000 at 10.127.24.35@tcp:15/16 lens 296/192 e 0 to 1 dl 1347892463 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Sep 17 07:34:23 lustre_mds0 kernel: LustreError: 138-a: lustre-MDT0000: A client on nid 10.127.24.35 at tcp was evicted due to a lock blocking callback time out: rc -107 Sep 17 07:34:39 lustre_mds0 kernel: Lustre: MGS: haven''t heard from client 35c69b86-f9d2-e274-7ac5-5380a17f9c32 (at 10.1 27.24.35 at tcp) in 50 seconds. I think it''s dead, and I am evicting it. exp ffff88007bad8000, cur 1347892479 expire 134789 2449 last 1347892429 Sep 17 07:39:56 lustre_mds0 kernel: Lustre: 1228:0:(ldlm_lib.c:946:target_handle_connect()) MGS: connection from 94b20a9 7-ffc1-174f-b2fa-84612be87730 at 10.127.24.35@tcp t0 exp (null) cur 1347892796 last 0 Sep 17 07:39:56 lustre_mds0 kernel: Lustre: 1228:0:(ldlm_lib.c:946:target_handle_connect()) Skipped 1 previous similar m essage Sep 17 07:41:09 lustre_mds0 kernel: Lustre: MGS: haven''t heard from client 94b20a97-ffc1-174f-b2fa-84612be87730 (at 10.1 27.24.35 at tcp) in 48 seconds. I think it''s dead, and I am evicting it. exp ffff88002f159c00, cur 1347892869 expire 134789 2839 last 1347892821 Sep 17 07:41:13 lustre_mds0 kernel: Lustre: lustre-MDT0000: haven''t heard from client 98c74908-5348-2392-0f61-6a7f6d3a03 76 (at 10.127.24.35 at tcp) in 52 seconds. I think it''s dead, and I am evicting it. exp ffff8800765d6000, cur 1347892873 ex pire 1347892843 last 1347892821 Sep 17 07:43:41 lustre_mds0 kernel: Lustre: 1228:0:(ldlm_lib.c:946:target_handle_connect()) MGS: connection from 272a840 5-8512-e9de-f532-feb5b7d6f9b1 at 10.127.56.82@tcp t0 exp (null) cur 1347893021 last 0 Sep 17 07:43:41 lustre_mds0 kernel: Lustre: 1228:0:(ldlm_lib.c:946:target_handle_connect()) Skipped 1 previous similar m essage Sep 17 08:45:08 lustre_mds0 kernel: Lustre: 1228:0:(ldlm_lib.c:946:target_handle_connect()) MGS: connection from 5b11484 9-9a41-b016-4de2-10b780351206 at 10.127.24.35@tcp t0 exp (null) cur 1347896708 last 0 Sep 17 08:45:08 lustre_mds0 kernel: Lustre: 1228:0:(ldlm_lib.c:946:target_handle_connect()) Skipped 1 previous similar m essage Sep 17 08:45:08 lustre_mds0 kernel: Lustre: 1252:0:(ldlm_lib.c:946:target_handle_connect()) lustre-MDT0000: connection f rom 46730388-647a-16f4-b349-df0a8df6a759 at 10.127.24.35@tcp t0 exp (null) cur 1347896708 last 0 [root at lustre_mds0 lustre]# -----Original Message----- From: Colin Faber [mailto:colin_faber at xyratex.com] Sent: Monday, September 17, 2012 8:38 AM To: Jon Yeargers Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Client hangs on ''simple'' lustre setup Hi, Ideally just post the logging directly around the event and not the entire syslog =) /var/log/messages right around the time you try and mount / ls and dmesg | tail -n 75 or so, if it''s larger, pastebin is a great resource to reference so you''re not polluting peoples inbox with large logs. -cf On 09/17/2012 09:33 AM, Jon Yeargers wrote:> What''s the preferred method for posting large files to this list? > > Does Lustre have specific log files that I should look for or just anything in /var/log/messages that seems appropriate? > > -----Original Message----- > From: Colin Faber [mailto:cfaber at gmail.com] > Sent: Monday, September 17, 2012 8:05 AM > To: Jon Yeargers > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Client hangs on ''simple'' lustre setup > > Hi Jon, > > If you could provide the logging from your clients and mds (as well as possibly oss''s) that''ll help in determining the problem. > > -cf > > On 09/17/2012 08:57 AM, Jon Yeargers wrote: >> Issue: I''m trying to assess the (possible) use of Lustre for our >> group. To this end I''ve been trying to create a simple system to >> explore the nuances. I can''t seem to get past the ''llmount.sh'' test >> with any degree of success. >> >> What I''ve done: Each system (throwaway PCs with 70Gb HD, 2Gb RAM) is >> formatted with CentOS 6.2. I then update everything and install the >> Lustre kernel from downloads.whamcloud.com and add on the various >> (appropriate) lustre and e2fs RPM files. Systems are rebooted and >> tested with ''llmount.sh'' (and then cleared with ''llmountcleanup.sh''). >> All is well to this point. >> >> First I create an MDS/MDT system via: >> >> /usr/sbin/mkfs.lustre --mgs --mdt --fsname=lustre >> --device-size=2000000 --param sys.timeout=20 >> --mountfsoptions=errors=remount-ro,user_xattr,acl --param >> lov.stripesize=1048576 --param lov.stripecount=0 --param >> mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype ldiskfs >> --reformat /tmp/lustre-mdt1 >> >> and then >> >> mkdir -p /mnt/mds1 >> >> mount -t lustre -o loop,user_xattr,acl /tmp/lustre-mdt1 /mnt/mds1 >> >> Next I take 3 systems and create a 2Gb loop mount via: >> >> /usr/sbin/mkfs.lustre --ost --fsname=lustre --device-size=2000000 >> --param sys.timeout=20 --mgsnode=lustre_MDS0 at tcp --backfstype ldiskfs >> --reformat /tmp/lustre-ost1 >> >> mkdir -p /mnt/ost1 >> >> mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1 >> >> The logs on the MDT box show the OSS boxes connecting up. All appears ok. >> >> Last I create a client and attach to the MDT box: >> >> mkdir -p /mnt/lustre >> >> mount -t lustre -o user_xattr,acl,flock luster_MDS0 at tcp:/lustre >> /mnt/lustre >> >> Again, the log on the MDT box shows the client connection. Appears to >> be successful. >> >> Here''s where the issues (appear to) start. If I do a ''df -h'' on the >> client it hangs after showing the system drives. If I attempt to >> create files (via ''dd'') on the lustre mount the session hangs and the >> job can''t be killed. Rebooting the client is the only solution. >> >> I can create and use a client on the MDS/MSG box. Doing so from any >> other machine will hang. >> >> From the MDS box: >> >> [root at lustre_mds0 lustre]# lctl dl >> >> 0 UP mgs MGS MGS 13 >> >> 1 UP mgc MGC10.127.24.42 at tcp 7923c008-a0de-1c87-f21a-4a5ab48abb96 5 >> >> 2 UP lov lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4 >> >> 3 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 7 >> >> 4 UP mds mdd_obd-lustre-MDT0000 mdd_obd_uuid-lustre-MDT0000 3 >> >> 5 UP osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 >> >> 6 UP osc lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 >> >> 7 UP lov lustre-clilov-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4 >> >> 8 UP lmv lustre-clilmv-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 4 >> >> 9 UP mdc lustre-MDT0000-mdc-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 >> >> 10 UP osc lustre-OST0000-osc-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 >> >> 11 UP osc lustre-OST0001-osc-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 >> >> 12 UP osc lustre-OST0002-osc-ffff8800631c8000 >> b6b66579-1f44-90e5-ae63-e778d4ed6ac5 5 >> >> 13 UP osc lustre-OST0002-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 >> >> [root at lustre_mds0 lustre]# lfs df -h >> >> UUID bytes Used Available Use% Mounted on >> >> lustre-MDT0000_UUID 1.4G 83.9M 1.3G 6% /mnt/lustre[MDT:0] >> >> lustre-OST0000_UUID 1.9G 1.1G 716.5M 61% /mnt/lustre[OST:0] >> >> lustre-OST0001_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:1] >> >> lustre-OST0002_UUID 1.9G 1.1G 728.5M 60% /mnt/lustre[OST:2] >> >> filesystem summary: 5.6G 3.2G 2.1G 60% /mnt/lustre >> >> All appears normal. >> >> Doing this from another (identical) client: >> >> [root at lfstest0 lustre]# lctl dl >> >> 0 UP mgc MGC10.127.24.42 at tcp 272a8405-8512-e9de-f532-feb5b7d6f9b1 5 >> >> 1 UP lov lustre-clilov-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4 >> >> 2 UP lmv lustre-clilmv-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 4 >> >> 3 UP mdc lustre-MDT0000-mdc-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 >> >> 4 UP osc lustre-OST0000-osc-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 >> >> 5 UP osc lustre-OST0001-osc-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 >> >> 6 UP osc lustre-OST0002-osc-ffff880070eee400 >> 0cb7fd2e-ade0-dab3-c4b9-6b7956ef9720 5 >> >> [root at lfstest0 lustre]# lfs df >> >> UUID 1K-blocks Used Available Use% Mounted on >> >> lustre-MDT0000_UUID 1499596 85888 1313708 6% /mnt/lustre[MDT:0] >> >> OST0000 : inactive device >> >> lustre-OST0001_UUID 1968528 1122468 745996 60% /mnt/lustre[OST:1] >> >> OST0002 : inactive device >> >> filesystem summary: 1968528 1122468 745996 60% /mnt/luster >> >> Doing a ?dd? or ?touch? or even ?df? from this machine will hang it. >> >> EDIT: each system has all other systems defined in /etc/hosts and >> entries in iptables to provide access. >> >> All systems have identical setup: >> >> [root at lfstest0 lustre]# rpm -qa | grep lustre >> >> lustre-ldiskfs-3.3.0-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 >> >> lustre-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 >> >> kernel-2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 >> >> lustre-modules-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 >> >> lustre-tests-2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64.x86_64 >> >> [root at lfstest0 lustre]# uname -a >> >> Linux lfstest0 2.6.32-279.2.1.el6_lustre.gc46c389.x86_64 #1 SMP Mon >> Aug 13 11:00:10 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux >> >> [root at lfstest0 lustre]# rpm -qa | grep e2fs >> >> e2fsprogs-libs-1.41.90.wc2-7.el6.x86_64 >> >> e2fsprogs-1.41.90.wc2-7.el6.x86_64 >> >> SO: I''m clearly making several mistakes. Any pointers as to where to >> start correcting them? >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss