HI I could use some help. I installed lustre on 3 computers mdt/mgs : ************************************************************************************8 [root at x-math20 ~]#mkfs.lustre --reformat --fsname spfs --mdt --mgs /dev/hdb Permanent disk data: Target: spfs-MDTffff Index: unassigned Lustre FS: spfs Mount type: ldiskfs Flags: 0x75 (MDT MGS needs_index first_time update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: device size = 19092MB formatting backing filesystem ldiskfs on /dev/hdb target name spfs-MDTffff 4k blocks 0 options -J size=400 -i 4096 -I 512 -q -O dir_index -F mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-MDTffff -J size=400 -i 4096 -I 512 -q -O dir_index -F /dev/hdb Writing CONFIGS/mountdata [root at x-math20 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda1 19228276 4855244 13396284 27% / none 127432 0 127432 0% /dev/shm /dev/hdb 17105436 455152 15672728 3% /mnt/test/mdt [root at x-math20 ~]# cat /proc/fs/lustre/devices 0 UP mgs MGS MGS 5 1 UP mgc MGC132.66.176.211 at tcp 5f5ba729-6412-3843-2229-1310a0b48f71 5 2 UP mdt MDS MDS_uuid 3 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 [root at x-math20 ~]# *************************************************************end mdt******************************8 so you can see that the MGS is up ond on the ost''s I get an error!! plz help... ost: ********************************************************************** [root at x-mathr11 ~]# mkfs.lustre --reformat --fsname spfs --ost --mgsnode132.66. 176.211 at tcp0 /dev/hdb1 Permanent disk data: Target: spfs-OSTffff Index: unassigned Lustre FS: spfs Mount type: ldiskfs Flags: 0x72 (OST needs_index first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=132.66.176.211 at tcp device size = 19594MB formatting backing filesystem ldiskfs on /dev/hdb1 target name spfs-OSTffff 4k blocks 0 options -J size=400 -i 16384 -I 256 -q -O dir_index -F mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-OSTffff -J size=400 -i 16384 -I 256 -q -O dir_index -F /dev/hdb1 Writing CONFIGS/mountdata [root at x-mathr11 ~]# /CONFIGS/mountdata -bash: /CONFIGS/mountdata: No such file or directory [root at x-mathr11 ~]# mount -t lustre /dev/hdb1 /mnt/test/ost1 mount.lustre: mount /dev/hdb1 at /mnt/test/ost1 failed: Input/output error Is the MGS running? ***********************************************end ost******************************** can any one point out the problem? thanks Avi. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071223/144a7d13/attachment-0002.html
On the oss can you ping the mds/mgs using this command-- lctl ping 132.66.176.211 at tcp0 If it doesn''t ping, list the nids on each node by running lctl list_nids and tell me what comes back. -Aaron On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote:> HI I could use some help. > I installed lustre on 3 computers > mdt/mgs : > > ************************************************************************************8 > [root at x-math20 ~]#mkfs.lustre --reformat --fsname spfs --mdt --mgs / > dev/hdb > > Permanent disk data: > Target: spfs-MDTffff > Index: unassigned > Lustre FS: spfs > Mount type: ldiskfs > Flags: 0x75 > (MDT MGS needs_index first_time update ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: > > device size = 19092MB > formatting backing filesystem ldiskfs on /dev/hdb > target name spfs-MDTffff > 4k blocks 0 > options -J size=400 -i 4096 -I 512 -q -O dir_index -F > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-MDTffff -J size=400 -i 4096 > -I 512 -q -O dir_index -F /dev/hdb > Writing CONFIGS/mountdata > [root at x-math20 ~]# df > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/hda1 19228276 4855244 13396284 27% / > none 127432 0 127432 0% /dev/shm > /dev/hdb 17105436 455152 15672728 3% /mnt/test/mdt > [root at x-math20 ~]# cat /proc/fs/lustre/devices > 0 UP mgs MGS MGS 5 > 1 UP mgc MGC132.66.176.211 at tcp > 5f5ba729-6412-3843-2229-1310a0b48f71 5 > 2 UP mdt MDS MDS_uuid 3 > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 > [root at x-math20 ~]# > *************************************************************end > mdt******************************8 > so you can see that the MGS is up > ond on the ost''s I get an error!! plz help... > > ost: > ********************************************************************** > [root at x-mathr11 ~]# mkfs.lustre --reformat --fsname spfs --ost -- > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 > > Permanent disk data: > Target: spfs-OSTffff > Index: unassigned > Lustre FS: spfs > Mount type: ldiskfs > Flags: 0x72 > (OST needs_index first_time update ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=132.66.176.211 at tcp > > device size = 19594MB > formatting backing filesystem ldiskfs on /dev/hdb1 > target name spfs-OSTffff > 4k blocks 0 > options -J size=400 -i 16384 -I 256 -q -O dir_index -F > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-OSTffff -J size=400 -i > 16384 -I 256 -q -O dir_index -F /dev/hdb1 > Writing CONFIGS/mountdata > [root at x-mathr11 ~]# /CONFIGS/mountdata > -bash: /CONFIGS/mountdata: No such file or directory > [root at x-mathr11 ~]# mount -t lustre /dev/hdb1 /mnt/test/ost1 > mount.lustre: mount /dev/hdb1 at /mnt/test/ost1 failed: Input/output > error > Is the MGS running? > ***********************************************end > ost******************************** > > can any one point out the problem? > thanks Avi. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discussAaron Knister Associate Systems Administrator/Web Designer Center for Research on Environment and Water (301) 595-7001 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071223/81b16457/attachment-0002.html
node 1 *132.66.176.212* *node 2 132.66.176.215* [root at x-math20 ~]# ssh 132.66.176.215 root at 132.66.176.215''s password: ssh(21957) Permission denied, please try again. root at 132.66.176.215''s password: Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 failed to ping 132.66.176.211 at tcp: Input/output error [root at x-mathr11 ~]# lctl list_nids 132.66.176.215 at tcp [root at x-mathr11 ~]# ssh 132.66.176.212 The authenticity of host ''132.66.176.212 (132.66.176.212)'' can''t be established. RSA1 key fingerprint is 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce:7e:74. Are you sure you want to continue connecting (yes/no)? yes ssh(11526) Warning: Permanently added ''132.66.176.212'' (RSA1) to the list of kno wn hosts. root at 132.66.176.212''s password: Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 failed to ping 132.66.176.211 at tcp: Input/output error [root at localhost ~]# lctl list_nids 132.66.176.212 at tcp [root at localhost ~]# thanks for helping!! Avi On Dec 23, 2007 5:32 PM, Aaron Knister <aaron at iges.org> wrote:> On the oss can you ping the mds/mgs using this command-- > > lctl ping 132.66.176.211 at tcp0 > > If it doesn''t ping, list the nids on each node by running > > lctl list_nids > > and tell me what comes back. > > -Aaron > > > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: > > HI I could use some help. > I installed lustre on 3 computers > mdt/mgs : > > > ************************************************************************************8 > [root at x-math20 ~]#mkfs.lustre --reformat --fsname spfs --mdt --mgs > /dev/hdb > > Permanent disk data: > Target: spfs-MDTffff > Index: unassigned > Lustre FS: spfs > Mount type: ldiskfs > Flags: 0x75 > (MDT MGS needs_index first_time update ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: > > device size = 19092MB > formatting backing filesystem ldiskfs on /dev/hdb > target name spfs-MDTffff > 4k blocks 0 > options -J size=400 -i 4096 -I 512 -q -O dir_index -F > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-MDTffff -J size=400 -i 4096 -I > 512 -q -O dir_index -F /dev/hdb > Writing CONFIGS/mountdata > [root at x-math20 ~]# df > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/hda1 19228276 4855244 13396284 27% / > none 127432 0 127432 0% /dev/shm > /dev/hdb 17105436 455152 15672728 3% /mnt/test/mdt > [root at x-math20 ~]# cat /proc/fs/lustre/devices > 0 UP mgs MGS MGS 5 > 1 UP mgc MGC132.66.176.211 at tcp 5f5ba729-6412-3843-2229-1310a0b48f71 5 > 2 UP mdt MDS MDS_uuid 3 > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 > [root at x-math20 ~]# > *************************************************************end > mdt******************************8 > so you can see that the MGS is up > ond on the ost''s I get an error!! plz help... > > ost: > ********************************************************************** > [root at x-mathr11 ~]# mkfs.lustre --reformat --fsname spfs --ost --mgsnode> 132.66. 176.211 at tcp0 /dev/hdb1 > > Permanent disk data: > Target: spfs-OSTffff > Index: unassigned > Lustre FS: spfs > Mount type: ldiskfs > Flags: 0x72 > (OST needs_index first_time update ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=132.66.176.211 at tcp > > device size = 19594MB > formatting backing filesystem ldiskfs on /dev/hdb1 > target name spfs-OSTffff > 4k blocks 0 > options -J size=400 -i 16384 -I 256 -q -O dir_index -F > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-OSTffff -J size=400 -i 16384 -I > 256 -q -O dir_index -F /dev/hdb1 > Writing CONFIGS/mountdata > [root at x-mathr11 ~]# /CONFIGS/mountdata > -bash: /CONFIGS/mountdata: No such file or directory > [root at x-mathr11 ~]# mount -t lustre /dev/hdb1 /mnt/test/ost1 > mount.lustre: mount /dev/hdb1 at /mnt/test/ost1 failed: Input/output error > Is the MGS running? > ***********************************************end > ost******************************** > > can any one point out the problem? > thanks Avi. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > Aaron Knister > Associate Systems Administrator/Web Designer > Center for Research on Environment and Water > > (301) 595-7001 > aaron at iges.org > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071223/949b56eb/attachment-0002.html
Can you check the firewall on each of those machines ( iptables -L ) and paste that here. Also, is this network dedicated to Lustre? Lustre can easily saturate a network interface under load to the point it becomes difficult to login to a node if it only has one interface. I''d recommend using a different interface if you can. On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote:> node 1 132.66.176.212 > node 2 132.66.176.215 > > [root at x-math20 ~]# ssh 132.66.176.215 > root at 132.66.176.215''s password: > ssh(21957) Permission denied, please try again. > root at 132.66.176.215''s password: > Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 > failed to ping 132.66.176.211 at tcp: Input/output error > [root at x-mathr11 ~]# lctl list_nids > 132.66.176.215 at tcp > [root at x-mathr11 ~]# ssh 132.66.176.212 > The authenticity of host ''132.66.176.212 (132.66.176.212)'' can''t be > established. > RSA1 key fingerprint is 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce:7e: > 74. > Are you sure you want to continue connecting (yes/no)? yes > ssh(11526) Warning: Permanently added ''132.66.176.212 '' (RSA1) to > the list of kno > wn hosts. > root at 132.66.176.212''s password: > Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 > failed to ping 132.66.176.211 at tcp: Input/output error > [root at localhost ~]# lctl list_nids > 132.66.176.212 at tcp > [root at localhost ~]# > > thanks for helping!! > Avi > > On Dec 23, 2007 5:32 PM, Aaron Knister <aaron at iges.org> wrote: > On the oss can you ping the mds/mgs using this command-- > > lctl ping 132.66.176.211 at tcp0 > > If it doesn''t ping, list the nids on each node by running > > lctl list_nids > > and tell me what comes back. > > -Aaron > > > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: > >> HI I could use some help. >> I installed lustre on 3 computers >> mdt/mgs : >> >> ************************************************************************************8 >> [root at x-math20 ~]#mkfs.lustre --reformat --fsname spfs --mdt --mgs / >> dev/hdb >> >> Permanent disk data: >> Target: spfs-MDTffff >> Index: unassigned >> Lustre FS: spfs >> Mount type: ldiskfs >> Flags: 0x75 >> (MDT MGS needs_index first_time update ) >> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> Parameters: >> >> device size = 19092MB >> formatting backing filesystem ldiskfs on /dev/hdb >> target name spfs-MDTffff >> 4k blocks 0 >> options -J size=400 -i 4096 -I 512 -q -O dir_index -F >> mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-MDTffff -J size=400 -i >> 4096 -I 512 -q -O dir_index -F /dev/hdb >> Writing CONFIGS/mountdata >> [root at x-math20 ~]# df >> Filesystem 1K-blocks Used Available Use% Mounted on >> /dev/hda1 19228276 4855244 13396284 27% / >> none 127432 0 127432 0% /dev/shm >> /dev/hdb 17105436 455152 15672728 3% /mnt/test/mdt >> [root at x-math20 ~]# cat /proc/fs/lustre/devices >> 0 UP mgs MGS MGS 5 >> 1 UP mgc MGC132.66.176.211 at tcp >> 5f5ba729-6412-3843-2229-1310a0b48f71 5 >> 2 UP mdt MDS MDS_uuid 3 >> 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 >> 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 >> [root at x-math20 ~]# >> *************************************************************end >> mdt******************************8 >> so you can see that the MGS is up >> ond on the ost''s I get an error!! plz help... >> >> ost: >> ********************************************************************** >> [root at x-mathr11 ~]# mkfs.lustre --reformat --fsname spfs --ost -- >> mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 >> >> Permanent disk data: >> Target: spfs-OSTffff >> Index: unassigned >> Lustre FS: spfs >> Mount type: ldiskfs >> Flags: 0x72 >> (OST needs_index first_time update ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=132.66.176.211 at tcp >> >> device size = 19594MB >> formatting backing filesystem ldiskfs on /dev/hdb1 >> target name spfs-OSTffff >> 4k blocks 0 >> options -J size=400 -i 16384 -I 256 -q -O dir_index -F >> mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-OSTffff -J size=400 -i >> 16384 -I 256 -q -O dir_index -F /dev/hdb1 >> Writing CONFIGS/mountdata >> [root at x-mathr11 ~]# /CONFIGS/mountdata >> -bash: /CONFIGS/mountdata: No such file or directory >> [root at x-mathr11 ~]# mount -t lustre /dev/hdb1 /mnt/test/ost1 >> mount.lustre: mount /dev/hdb1 at /mnt/test/ost1 failed: Input/ >> output error >> Is the MGS running? >> ***********************************************end >> ost******************************** >> >> can any one point out the problem? >> thanks Avi. >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > Aaron Knister > Associate Systems Administrator/Web Designer > Center for Research on Environment and Water > > (301) 595-7001 > aaron at iges.org > > > >Aaron Knister Associate Systems Administrator/Web Designer Center for Research on Environment and Water (301) 595-7001 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071223/a0ff434f/attachment-0002.html
hi thanks for the quick response **************************************************************************** ssh root at 132.66.176.212 Scientific Linux CERN SLC release 4.6 (Beryllium) root at 132.66.176.212''s password: Last login: Sun Dec 23 18:02:33 2007 from x-mathr11.tau.ac.il [root at localhost ~]# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination [root at localhost ~]# *************************************************************************** ssh root at 132.66.176.215 root at x-mathr11 ~]# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination RH-Firewall-1-INPUT all -- anywhere anywhere Chain FORWARD (policy ACCEPT) target prot opt source destination RH-Firewall-1-INPUT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain RH-Firewall-1-INPUT (2 references) target prot opt source destination ACCEPT all -- anywhere anywhere ACCEPT icmp -- anywhere anywhere icmp any ACCEPT ipv6-crypt-- anywhere anywhere ACCEPT ipv6-auth-- anywhere anywhere ACCEPT udp -- anywhere 224.0.0.251 udp dpt:5353 ACCEPT udp -- anywhere anywhere udp dpt:ipp ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT tcp -- anywhere anywhere state NEW tcp dpts:30000:30101 ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh ACCEPT udp -- anywhere anywhere state NEW udp dpt:afs3-callback REJECT all -- anywhere anywhere reject-with icmp-host-prohibited [root at x-mathr11 ~]# ********************************************************************************************** what do you mean by "is this network dedicated to Lustre?" I run only lustre on the machines since I would like to have a working configuration. can you recommend me an interface? thanks !, Avi On Dec 23, 2007 9:27 PM, Aaron Knister <aaron at iges.org> wrote:> Can you check the firewall on each of those machines ( iptables -L ) and > paste that here. Also, is this network dedicated to Lustre? Lustre can > easily saturate a network interface under load to the point it becomes > difficult to login to a node if it only has one interface. I''d recommend > using a different interface if you can. > > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: > > node 1 *132.66.176.212* > *node 2 132.66.176.215* > > [root at x-math20 ~]# ssh 132.66.176.215 > root at 132.66.176.215''s password: > ssh(21957) Permission denied, please try again. > root at 132.66.176.215''s password: > Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 > failed to ping 132.66.176.211 at tcp: Input/output error > [root at x-mathr11 ~]# lctl list_nids > 132.66.176.215 at tcp > [root at x-mathr11 ~]# ssh 132.66.176.212 > The authenticity of host ''132.66.176.212 (132.66.176.212)'' can''t be > established. > RSA1 key fingerprint is 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce:7e:74. > Are you sure you want to continue connecting (yes/no)? yes > ssh(11526) Warning: Permanently added ''132.66.176.212 '' (RSA1) to the list > of kno > wn hosts. > root at 132.66.176.212''s password: > Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 > failed to ping 132.66.176.211 at tcp: Input/output error > [root at localhost ~]# lctl list_nids > 132.66.176.212 at tcp > [root at localhost ~]# > > thanks for helping!! > Avi > > On Dec 23, 2007 5:32 PM, Aaron Knister <aaron at iges.org> wrote: > > > On the oss can you ping the mds/mgs using this command-- > > > > lctl ping 132.66.176.211 at tcp0 > > > > If it doesn''t ping, list the nids on each node by running > > > > lctl list_nids > > > > and tell me what comes back. > > > > -Aaron > > > > > > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: > > > > HI I could use some help. > > I installed lustre on 3 computers > > mdt/mgs : > > > > > > ************************************************************************************8 > > [root at x-math20 ~]#mkfs.lustre --reformat --fsname spfs --mdt --mgs > > /dev/hdb > > > > Permanent disk data: > > Target: spfs-MDTffff > > Index: unassigned > > Lustre FS: spfs > > Mount type: ldiskfs > > Flags: 0x75 > > (MDT MGS needs_index first_time update ) > > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > > Parameters: > > > > device size = 19092MB > > formatting backing filesystem ldiskfs on /dev/hdb > > target name spfs-MDTffff > > 4k blocks 0 > > options -J size=400 -i 4096 -I 512 -q -O dir_index -F > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-MDTffff -J size=400 -i 4096 -I > > 512 -q -O dir_index -F /dev/hdb > > Writing CONFIGS/mountdata > > [root at x-math20 ~]# df > > Filesystem 1K-blocks Used Available Use% Mounted on > > /dev/hda1 19228276 4855244 13396284 27% / > > none 127432 0 127432 0% /dev/shm > > /dev/hdb 17105436 455152 15672728 3% /mnt/test/mdt > > [root at x-math20 ~]# cat /proc/fs/lustre/devices > > 0 UP mgs MGS MGS 5 > > 1 UP mgc MGC132.66.176.211 at tcp 5f5ba729-6412-3843-2229-1310a0b48f71 5 > > 2 UP mdt MDS MDS_uuid 3 > > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 > > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 > > [root at x-math20 ~]# > > *************************************************************end > > mdt******************************8 > > so you can see that the MGS is up > > ond on the ost''s I get an error!! plz help... > > > > ost: > > ********************************************************************** > > [root at x-mathr11 ~]# mkfs.lustre --reformat --fsname spfs --ost > > --mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 > > > > Permanent disk data: > > Target: spfs-OSTffff > > Index: unassigned > > Lustre FS: spfs > > Mount type: ldiskfs > > Flags: 0x72 > > (OST needs_index first_time update ) > > Persistent mount opts: errors=remount-ro,extents,mballoc > > Parameters: mgsnode=132.66.176.211 at tcp > > > > device size = 19594MB > > formatting backing filesystem ldiskfs on /dev/hdb1 > > target name spfs-OSTffff > > 4k blocks 0 > > options -J size=400 -i 16384 -I 256 -q -O dir_index -F > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-OSTffff -J size=400 -i 16384 -I > > 256 -q -O dir_index -F /dev/hdb1 > > Writing CONFIGS/mountdata > > [root at x-mathr11 ~]# /CONFIGS/mountdata > > -bash: /CONFIGS/mountdata: No such file or directory > > [root at x-mathr11 ~]# mount -t lustre /dev/hdb1 /mnt/test/ost1 > > mount.lustre: mount /dev/hdb1 at /mnt/test/ost1 failed: Input/output > > error > > Is the MGS running? > > ***********************************************end > > ost******************************** > > > > can any one point out the problem? > > thanks Avi. > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > Aaron Knister > > Associate Systems Administrator/Web Designer > > Center for Research on Environment and Water > > > > (301) 595-7001 > > aaron at iges.org > > > > > > > > > > Aaron Knister > Associate Systems Administrator/Web Designer > Center for Research on Environment and Water > > (301) 595-7001 > aaron at iges.org > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071223/683bbf91/attachment-0002.html
Hello to every one and happy new year.. I think I have reduce my problem to this: lctl ping 132.66.176.211 at tcp0don''t work for me for some strange reason as you can see: *********************************************************************************** [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0 failed to ping 132.66.176.211 at tcp: Input/output error [root at x-math20 ~]# ping 132.66.176.211 PING 132.66.176.211 (132.66.176.211) 56(84) bytes of data. 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m --- 132.66.176.211 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2018ms rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2 [root at x-math20 ~]# ***************************************************************************************** On 12/24/07, Avi Gershon <gershonavi at gmail.com> wrote:> > Hi, > here is the "iptables -L " results: > > NODE 1 132.66.176.212 <root at 132.66.176.212> > Scientific Linux CERN SLC release 4.6 (Beryllium) > root at 132.66.176.212''s password: > Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il > [root at localhost ~]# > [root at localhost ~]# > [root at localhost ~]# iptables -L > Chain INPUT (policy ACCEPT) > target prot opt source destination > > Chain FORWARD (policy ACCEPT) > target prot opt source destination > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > > ************************************************************************************************ > MDT 132.66.176.211 > > Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il > [root at x-math20 ~]# iptables -L > Chain INPUT (policy ACCEPT) > target prot opt source destination > > Chain FORWARD (policy ACCEPT) > target prot opt source destination > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > ************************************************************************* > > NODE 2 132.66.176.215 <root at 132.66.176.215> > Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il > [root at x-mathr11 ~]# iptables -L > Chain INPUT (policy ACCEPT) > target prot opt source destination > RH-Firewall-1-INPUT all -- anywhere anywhere > > Chain FORWARD (policy ACCEPT) > target prot opt source destination > RH-Firewall-1-INPUT all -- anywhere anywhere > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > Chain RH-Firewall-1-INPUT (2 references) > target prot opt source destination > ACCEPT all -- anywhere anywhere > ACCEPT icmp -- anywhere anywhere icmp any > ACCEPT ipv6-crypt-- anywhere anywhere > ACCEPT ipv6-auth-- anywhere anywhere > ACCEPT udp -- anywhere 224.0.0.251 udp dpt:5353 > ACCEPT udp -- anywhere anywhere udp dpt:ipp > ACCEPT all -- anywhere anywhere state > RELATED,ESTAB > LISHED > ACCEPT tcp -- anywhere anywhere state NEW tcp > dpts: > 30000:30101 > ACCEPT tcp -- anywhere anywhere state NEW tcp > dpt:s > sh > ACCEPT udp -- anywhere anywhere state NEW udp > dpt:a > fs3-callback > REJECT all -- anywhere anywhere reject-with > icmp-ho > st-prohibited > [root at x-mathr11 ~]# > > ************************************************************ > one more thing.... > Do you use TCP protocol? or do you use UDP? > > Regards Avi, > P.S I think a beginning of a beautiful friendship.. :-) > > > > On Dec 24, 2007 5:29 PM, Aaron Knister <aaron at iges.org> wrote: > > > That sounds like quite a task! Could you show me the contents of your > > firewall rules on the systems mentioned below? (iptables -L) on each. > > That would help to diagnose the problem further. > > > > -Aaron > > > > On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote: > > > > > Hi Aaron and thank you for you fast answwers. > > > We are working (Avi,Meny and me) on the israeli GRID and we need to > > > create a single huge file system for this GRID. > > > cheers > > > Yan > > > > > > ________________________________ > > > > > > From: Aaron Knister [mailto:aaron at iges.org] > > > Sent: Sun 12/23/2007 8:27 PM > > > To: Avi Gershon > > > Cc: lustre-discuss at clusterfs.com; Yan Benhammou; Meny Ben moshe > > > Subject: Re: [Lustre-discuss] help needed. > > > > > > > > > Can you check the firewall on each of those machines ( iptables -L ) > > > and paste that here. Also, is this network dedicated to Lustre? > > > Lustre can easily saturate a network interface under load to the > > > point it becomes difficult to login to a node if it only has one > > > interface. I''d recommend using a different interface if you can. > > > > > > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: > > > > > > > > > node 1 132.66.176.212 <http://132.66.176.212/> > > > node 2 132.66.176.215 < http://132.66.176.215/> > > > > > > [root at x-math20 ~]# ssh 132.66.176.215 <http://132.66.176.215/ > > > > root at 132.66.176.215''s password: > > > ssh(21957) Permission denied, please try again. > > > root at 132.66.176.215 ''s password: > > > Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il <http://x-math20.tau.ac.il/ > > > > > > > > > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > [root at x-mathr11 ~]# lctl list_nids > > > 132.66.176.215 at tcp > > > [root at x-mathr11 ~]# ssh 132.66.176.212 <http://132.66.176.212/> > > > The authenticity of host '' 132.66.176.212 < > > http://132.66.176.212/> > > > (132.66.176.212 <http://132.66.176.212/> )'' can''t be established. > > > RSA1 key fingerprint is > > 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce: > > > 7e:74. > > > Are you sure you want to continue connecting (yes/no)? yes > > > ssh(11526) Warning: Permanently added ''132.66.176.212 <http://132.66.176.212/ > > > > '' (RSA1) to the list of kno > > > wn hosts. > > > root at 132.66.176.212''s password: > > > Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il < > > http://x-math20.tau.ac.il/ > > > > > > > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > [root at localhost ~]# lctl list_nids > > > 132.66.176.212 at tcp > > > [root at localhost ~]# > > > > > > > > > thanks for helping!! > > > Avi > > > > > > > > > On Dec 23, 2007 5:32 PM, Aaron Knister <aaron at iges.org> wrote: > > > > > > > > > On the oss can you ping the mds/mgs using this command-- > > > > > > > > lctl ping 132.66.176.211 at tcp0 > > > > > > If it doesn''t ping, list the nids on each node by > > running > > > > > > lctl list_nids > > > > > > and tell me what comes back. > > > > > > -Aaron > > > > > > > > > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: > > > > > > > > > HI I could use some help. > > > I installed lustre on 3 computers > > > mdt/mgs : > > > > > > > > > > > ************************************************************************************8 > > > [root at x-math20 ~]#mkfs.lustre --reformat > > --fsname spfs --mdt -- > > > mgs /dev/hdb > > > > > > Permanent disk data: > > > Target: spfs-MDTffff > > > Index: unassigned > > > Lustre FS: spfs > > > Mount type: ldiskfs > > > Flags: 0x75 > > > (MDT MGS needs_index first_time > > update ) > > > Persistent mount opts: > > errors=remount-ro,iopen_nopriv,user_xattr > > > Parameters: > > > > > > device size = 19092MB > > > formatting backing filesystem ldiskfs on > > /dev/hdb > > > target name spfs-MDTffff > > > 4k blocks 0 > > > options -J size=400 -i 4096 -I > > 512 -q -O dir_index > > > -F > > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-MDTffff > > -J size=400 -i > > > 4096 -I 512 -q -O dir_index -F /dev/hdb > > > Writing CONFIGS/mountdata > > > [root at x-math20 ~]# df > > > Filesystem 1K-blocks Used > > Available Use% Mounted on > > > /dev/hda1 19228276 4855244 > > 13396284 27% / > > > none 127432 0 > > 127432 0% /dev/shm > > > /dev/hdb 17105436 455152 > > 15672728 3% /mnt/test/ > > > mdt > > > [root at x-math20 ~]# cat /proc/fs/lustre/devices > > > 0 UP mgs MGS MGS 5 > > > 1 UP mgc MGC132.66.176.211 at tcp > > > 5f5ba729-6412-3843-2229-1310a0b48f71 5 > > > 2 UP mdt MDS MDS_uuid 3 > > > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 > > > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 > > > [ root at x-math20 ~]# > > > > > *************************************************************end > > > mdt******************************8 > > > so you can see that the MGS is up > > > ond on the ost''s I get an error!! plz help... > > > > > > ost: > > > > > > ********************************************************************** > > > [ root at x-mathr11 ~]# mkfs.lustre --reformat > > --fsname spfs --ost -- > > > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 > > > > > > Permanent disk data: > > > Target: spfs-OSTffff > > > Index: unassigned > > > Lustre FS: spfs > > > Mount type: ldiskfs > > > Flags: 0x72 > > > (OST needs_index first_time update > > ) > > > Persistent mount opts: > > errors=remount-ro,extents,mballoc > > > Parameters: mgsnode=132.66.176.211 at tcp > > > > > > device size = 19594MB > > > formatting backing filesystem ldiskfs on > > /dev/hdb1 > > > target name spfs-OSTffff > > > 4k blocks 0 > > > options -J size=400 -i 16384 -I > > 256 -q -O > > > dir_index -F > > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-OSTffff > > -J size=400 -i > > > 16384 -I 256 -q -O dir_index -F /dev/hdb1 > > > Writing CONFIGS/mountdata > > > [ root at x-mathr11 ~]# /CONFIGS/mountdata > > > -bash: /CONFIGS/mountdata: No such file or > > directory > > > [root at x-mathr11 ~]# mount -t lustre /dev/hdb1 > > /mnt/test/ost1 > > > mount.lustre: mount /dev/hdb1 at /mnt/test/ost1 > > failed: Input/ > > > output error > > > Is the MGS running? > > > > > ***********************************************end > > > ost******************************** > > > > > > can any one point out the problem? > > > thanks Avi. > > > > > > > > > > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss at clusterfs.com > > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > > > > > > > > > > Aaron Knister > > > Associate Systems Administrator/Web Designer > > > Center for Research on Environment and Water > > > > > > (301) 595-7001 > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > Associate Systems Administrator/Web Designer > > > Center for Research on Environment and Water > > > > > > (301) 595-7001 > > > aaron at iges.org > > > > > > > > > > > > > Aaron Knister > > Associate Systems Administrator/Web Designer > > Center for Research on Environment and Water > > > > (301) 595-7001 > > aaron at iges.org > > > > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080102/99265f06/attachment-0002.html
On the host x-math20 could you run an "lctl list_nids" and also an "ifconfig -a". I want to see if lnet is listening on the correct interface. Oh could you also post the contents of your /etc/ modprobe.conf. Thanks! -Aaron On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote:> Hello to every one and happy new year.. > I think I have reduce my problem to this: lctl ping > 132.66.176.211 at tcp0 don''t work for me for some strange reason > as you can see: > *********************************************************************************** > [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0 > failed to ping 132.66.176.211 at tcp: Input/output error > [root at x-math20 ~]# ping 132.66.176.211 > PING 132.66.176.211 (132.66.176.211) 56(84) bytes of data. > 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms > 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms > 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m > --- 132.66.176.211 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2018ms > rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2 > [root at x-math20 ~]# > ***************************************************************************************** > > > On 12/24/07, Avi Gershon <gershonavi at gmail.com> wrote: > Hi, > here is the "iptables -L " results: > > NODE 1 132.66.176.212 > Scientific Linux CERN SLC release 4.6 (Beryllium) > root at 132.66.176.212''s password: > Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il > [root at localhost ~]# > [root at localhost ~]# > [root at localhost ~]# iptables -L > Chain INPUT (policy ACCEPT) > target prot opt source destination > Chain FORWARD (policy ACCEPT) > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > ************************************************************************************************ > MDT 132.66.176.211 > > Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il > [root at x-math20 ~]# iptables -L > Chain INPUT (policy ACCEPT) > target prot opt source destination > Chain FORWARD (policy ACCEPT) > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > ************************************************************************* > > NODE 2 132.66.176.215 > Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il > [root at x-mathr11 ~]# iptables -L > > Chain INPUT (policy ACCEPT) > target prot opt source destination > RH-Firewall-1-INPUT all -- anywhere anywhere > Chain FORWARD (policy ACCEPT) > target prot opt source destination > RH-Firewall-1-INPUT all -- anywhere anywhere > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > > Chain RH-Firewall-1-INPUT (2 references) > target prot opt source destination > ACCEPT all -- anywhere anywhere > ACCEPT icmp -- anywhere anywhere icmp any > ACCEPT ipv6-crypt-- anywhere anywhere > ACCEPT ipv6-auth-- anywhere anywhere > ACCEPT udp -- anywhere 224.0.0.251 udp dpt: > 5353 > ACCEPT udp -- anywhere anywhere udp > dpt:ipp > ACCEPT all -- anywhere anywhere state > RELATED,ESTAB > LISHED > ACCEPT tcp -- anywhere anywhere state > NEW tcp dpts: > 30000:30101 > ACCEPT tcp -- anywhere anywhere state > NEW tcp dpt:s > sh > ACCEPT udp -- anywhere anywhere state > NEW udp dpt:a > fs3-callback > REJECT all -- anywhere anywhere reject- > with icmp-ho > st-prohibited > [root at x-mathr11 ~]# > > ************************************************************ > one more thing.... > Do you use TCP protocol? or do you use UDP? > > Regards Avi, > P.S I think a beginning of a beautiful friendship.. :-) > > > > On Dec 24, 2007 5:29 PM, Aaron Knister <aaron at iges.org> wrote: > That sounds like quite a task! Could you show me the contents of your > firewall rules on the systems mentioned below? (iptables -L) on each. > That would help to diagnose the problem further. > > -Aaron > > On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote: > > > Hi Aaron and thank you for you fast answwers. > > We are working (Avi,Meny and me) on the israeli GRID and we need to > > create a single huge file system for this GRID. > > cheers > > Yan > > > > ________________________________ > > > > From: Aaron Knister [mailto:aaron at iges.org] > > Sent: Sun 12/23/2007 8:27 PM > > To: Avi Gershon > > Cc: lustre-discuss at clusterfs.com; Yan Benhammou; Meny Ben moshe > > Subject: Re: [Lustre-discuss] help needed. > > > > > > Can you check the firewall on each of those machines ( iptables -L ) > > and paste that here. Also, is this network dedicated to Lustre? > > Lustre can easily saturate a network interface under load to the > > point it becomes difficult to login to a node if it only has one > > interface. I''d recommend using a different interface if you can. > > > > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: > > > > > > node 1 132.66.176.212 < http://132.66.176.212/> > > node 2 132.66.176.215 < http://132.66.176.215/> > > > > [root at x-math20 ~]# ssh 132.66.176.215 < http:// > 132.66.176.215/ > > > root at 132.66.176.215''s password: > > ssh(21957) Permission denied, please try again. > > root at 132.66.176.215 ''s password: > > Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il > <http://x-math20.tau.ac.il/ > > > > > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 > > failed to ping 132.66.176.211 at tcp: Input/output error > > [root at x-mathr11 ~]# lctl list_nids > > 132.66.176.215 at tcp > > [root at x-mathr11 ~]# ssh 132.66.176.212 <http:// > 132.66.176.212/> > > The authenticity of host '' 132.66.176.212 <http://132.66.176.212/ > > > > ( 132.66.176.212 <http://132.66.176.212/> )'' can''t be established. > > RSA1 key fingerprint is > 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce: > > 7e:74. > > Are you sure you want to continue connecting (yes/no)? yes > > ssh(11526) Warning: Permanently added '' 132.66.176.212 < http://132.66.176.212/ > > > '' (RSA1) to the list of kno > > wn hosts. > > root at 132.66.176.212''s password: > > Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il > <http://x-math20.tau.ac.il/ > > > > > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 > > failed to ping 132.66.176.211 at tcp: Input/output error > > [root at localhost ~]# lctl list_nids > > 132.66.176.212 at tcp > > [root at localhost ~]# > > > > > > thanks for helping!! > > Avi > > > > > > On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org> > wrote: > > > > > > On the oss can you ping the mds/mgs using this > command-- > > > > lctl ping 132.66.176.211 at tcp0 > > > > If it doesn''t ping, list the nids on each node by > running > > > > lctl list_nids > > > > and tell me what comes back. > > > > -Aaron > > > > > > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: > > > > > > HI I could use some help. > > I installed lustre on 3 computers > > mdt/mgs : > > > > > > > ************************************************************************************8 > > [root at x-math20 ~]#mkfs.lustre --reformat -- > fsname spfs --mdt -- > > mgs /dev/hdb > > > > Permanent disk data: > > Target: spfs-MDTffff > > Index: unassigned > > Lustre FS: spfs > > Mount type: ldiskfs > > Flags: 0x75 > > (MDT MGS needs_index > first_time update ) > > Persistent mount opts: errors=remount- > ro,iopen_nopriv,user_xattr > > Parameters: > > > > device size = 19092MB > > formatting backing filesystem ldiskfs on / > dev/hdb > > target name spfs-MDTffff > > 4k blocks 0 > > options -J size=400 -i 4096 - > I 512 -q -O dir_index > > -F > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- > MDTffff -J size=400 -i > > 4096 -I 512 -q -O dir_index -F /dev/hdb > > Writing CONFIGS/mountdata > > [ root at x-math20 ~]# df > > Filesystem 1K-blocks Used > Available Use% Mounted on > > /dev/hda1 19228276 4855244 > 13396284 27% / > > none 127432 0 > 127432 0% /dev/shm > > /dev/hdb 17105436 455152 > 15672728 3% /mnt/test/ > > mdt > > [root at x-math20 ~]# cat /proc/fs/lustre/devices > > 0 UP mgs MGS MGS 5 > > 1 UP mgc MGC132.66.176.211 at tcp > > 5f5ba729-6412-3843-2229-1310a0b48f71 5 > > 2 UP mdt MDS MDS_uuid 3 > > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 > > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 > > [ root at x-math20 ~]# > > > *************************************************************end > > mdt******************************8 > > so you can see that the MGS is up > > ond on the ost''s I get an error!! plz help... > > > > ost: > > > > > ********************************************************************** > > [ root at x-mathr11 ~]# mkfs.lustre --reformat > --fsname spfs --ost -- > > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 > > > > Permanent disk data: > > Target: spfs-OSTffff > > Index: unassigned > > Lustre FS: spfs > > Mount type: ldiskfs > > Flags: 0x72 > > (OST needs_index first_time > update ) > > Persistent mount opts: errors=remount- > ro,extents,mballoc > > Parameters: mgsnode=132.66.176.211 at tcp > > > > device size = 19594MB > > formatting backing filesystem ldiskfs on / > dev/hdb1 > > target name spfs-OSTffff > > 4k blocks 0 > > options -J size=400 -i 16384 - > I 256 -q -O > > dir_index -F > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- > OSTffff -J size=400 -i > > 16384 -I 256 -q -O dir_index -F /dev/hdb1 > > Writing CONFIGS/mountdata > > [ root at x-mathr11 ~]# /CONFIGS/mountdata > > -bash: /CONFIGS/mountdata: No such file or > directory > > [root at x-mathr11 ~]# mount -t lustre /dev/ > hdb1 /mnt/test/ost1 > > mount.lustre: mount /dev/hdb1 at /mnt/test/ > ost1 failed: Input/ > > output error > > Is the MGS running? > > > ***********************************************end > > ost******************************** > > > > can any one point out the problem? > > thanks Avi. > > > > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > > > > Aaron Knister > > Associate Systems Administrator/Web Designer > > Center for Research on Environment and Water > > > > (301) 595-7001 > > aaron at iges.org > > > > > > > > > > > > > > Aaron Knister > > Associate Systems Administrator/Web Designer > > Center for Research on Environment and Water > > > > (301) 595-7001 > > aaron at iges.org > > > > > > > > Aaron Knister > Associate Systems Administrator/Web Designer > Center for Research on Environment and Water > > (301) 595-7001 > aaron at iges.org > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discussAaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080102/636e7553/attachment-0002.html
*Hi, I get this:* *************************************************************************** [root at x-math20 ~]# lctl list_nids 132.66.176.211 at tcp [root at x-math20 ~]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF inet addr:132.66.176.211 Bcast:132.66.255.255 Mask:255.255.0.0 inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0 TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:8180 errors:0 dropped:0 overruns:0 frame:0 TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3335243 (3.1 MiB) TX bytes:3335243 (3.1 MiB) sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) [root at x-math20 ~]# cat /etc/modprobe.conf alias eth0 e100 alias usb-controller uhci-hcd alias scsi_hostadapter ata_piix alias lustre llite options lnet networks=tcp0 [root at x-math20 ~]# ***********************************************************************************************************8 On 1/2/08, Aaron Knister <aaron at iges.org> wrote:> > On the host x-math20 could you run an "lctl list_nids" and also an > "ifconfig -a". I want to see if lnet is listening on the correct interface. > Oh could you also post the contents of your /etc/modprobe.conf. > > Thanks! > > -Aaron > > On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote: > > Hello to every one and happy new year.. > I think I have reduce my problem to this: lctl ping 132.66.176.211 at tcp0don''t work for me for some strange reason > as you can see: > *********************************************************************************** > > [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0 > failed to ping 132.66.176.211 at tcp: Input/output error > [root at x-math20 ~]# ping 132.66.176.211 > PING 132.66.176.211 (132.66.176.211) 56(84) bytes of data. > 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms > 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms > 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m > --- 132.66.176.211 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2018ms > rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2 > [root at x-math20 ~]# > ***************************************************************************************** > > > > On 12/24/07, Avi Gershon <gershonavi at gmail.com> wrote: > > > > Hi, > > here is the "iptables -L " results: > > > > NODE 1 132.66.176.212 <root at 132.66.176.212> > > Scientific Linux CERN SLC release 4.6 (Beryllium) > > root at 132.66.176.212''s password: > > Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il > > [root at localhost ~]# > > [root at localhost ~]# > > [root at localhost ~]# iptables -L > > Chain INPUT (policy ACCEPT) > > target prot opt source destination > > > > Chain FORWARD (policy ACCEPT) > > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > > target prot opt source destination > > ************************************************************************************************ > > > > MDT 132.66.176.211 > > > > Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il > > [root at x-math20 ~]# iptables -L > > Chain INPUT (policy ACCEPT) > > target prot opt source destination > > > > Chain FORWARD (policy ACCEPT) > > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > > target prot opt source destination > > > > ************************************************************************* > > > > NODE 2 132.66.176.215 <root at 132.66.176.215> > > Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il > > [root at x-mathr11 ~]# iptables -L > > Chain INPUT (policy ACCEPT) > > target prot opt source destination > > RH-Firewall-1-INPUT all -- anywhere anywhere > > > > Chain FORWARD (policy ACCEPT) > > target prot opt source destination > > RH-Firewall-1-INPUT all -- anywhere anywhere > > > > Chain OUTPUT (policy ACCEPT) > > target prot opt source destination > > Chain RH-Firewall-1-INPUT (2 references) > > target prot opt source destination > > ACCEPT all -- anywhere anywhere > > ACCEPT icmp -- anywhere anywhere icmp any > > ACCEPT ipv6-crypt-- anywhere anywhere > > ACCEPT ipv6-auth-- anywhere anywhere > > ACCEPT udp -- anywhere 224.0.0.251 udp > > dpt:5353 > > ACCEPT udp -- anywhere anywhere udp dpt:ipp > > ACCEPT all -- anywhere anywhere state > > RELATED,ESTAB > > LISHED > > ACCEPT tcp -- anywhere anywhere state NEW > > tcp dpts: > > 30000:30101 > > ACCEPT tcp -- anywhere anywhere state NEW > > tcp dpt:s > > sh > > ACCEPT udp -- anywhere anywhere state NEW > > udp dpt:a > > fs3-callback > > REJECT all -- anywhere anywhere reject-with > > icmp-ho > > st-prohibited > > [root at x-mathr11 ~]# > > > > ************************************************************ > > one more thing.... > > Do you use TCP protocol? or do you use UDP? > > > > Regards Avi, > > P.S I think a beginning of a beautiful friendship.. :-) > > > > > > > > On Dec 24, 2007 5:29 PM, Aaron Knister <aaron at iges.org> wrote: > > > > > That sounds like quite a task! Could you show me the contents of your > > > firewall rules on the systems mentioned below? (iptables -L) on each. > > > That would help to diagnose the problem further. > > > > > > -Aaron > > > > > > On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote: > > > > > > > Hi Aaron and thank you for you fast answwers. > > > > We are working (Avi,Meny and me) on the israeli GRID and we need to > > > > create a single huge file system for this GRID. > > > > cheers > > > > Yan > > > > > > > > ________________________________ > > > > > > > > From: Aaron Knister [mailto:aaron at iges.org] > > > > Sent: Sun 12/23/2007 8:27 PM > > > > To: Avi Gershon > > > > Cc: lustre-discuss at clusterfs.com; Yan Benhammou; Meny Ben moshe > > > > Subject: Re: [Lustre-discuss] help needed. > > > > > > > > > > > > Can you check the firewall on each of those machines ( iptables -L ) > > > > > > > and paste that here. Also, is this network dedicated to Lustre? > > > > Lustre can easily saturate a network interface under load to the > > > > point it becomes difficult to login to a node if it only has one > > > > interface. I''d recommend using a different interface if you can. > > > > > > > > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: > > > > > > > > > > > > node 1 132.66.176.212 < http://132.66.176.212/> > > > > node 2 132.66.176.215 < http://132.66.176.215/> > > > > > > > > [root at x-math20 ~]# ssh 132.66.176.215 < http://132.66.176.215/> > > > > root at 132.66.176.215''s password: > > > > ssh(21957) Permission denied, please try again. > > > > root at 132.66.176.215 ''s password: > > > > Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il <http://x-math20.tau.ac.il/ > > > > > > > > > > > > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 > > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > > [root at x-mathr11 ~]# lctl list_nids > > > > 132.66.176.215 at tcp > > > > [root at x-mathr11 ~]# ssh 132.66.176.212 <http://132.66.176.212/ > > > > > > > > The authenticity of host '' 132.66.176.212 < > > > http://132.66.176.212/> > > > > ( 132.66.176.212 <http://132.66.176.212/> )'' can''t be established. > > > > RSA1 key fingerprint is > > > 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce: > > > > 7e:74. > > > > Are you sure you want to continue connecting (yes/no)? yes > > > > ssh(11526) Warning: Permanently added '' 132.66.176.212 <http://132.66.176.212/ > > > > > '' (RSA1) to the list of kno > > > > wn hosts. > > > > root at 132.66.176.212''s password: > > > > Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il < > > > http://x-math20.tau.ac.il/ > > > > > > > > > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 > > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > > [root at localhost ~]# lctl list_nids > > > > 132.66.176.212 at tcp > > > > [root at localhost ~]# > > > > > > > > > > > > thanks for helping!! > > > > Avi > > > > > > > > > > > > On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org> > > > wrote: > > > > > > > > > > > > On the oss can you ping the mds/mgs using this > > > command-- > > > > > > > > lctl ping 132.66.176.211 at tcp0 > > > > > > > > If it doesn''t ping, list the nids on each node by > > > running > > > > > > > > lctl list_nids > > > > > > > > and tell me what comes back. > > > > > > > > -Aaron > > > > > > > > > > > > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: > > > > > > > > > > > > HI I could use some help. > > > > I installed lustre on 3 computers > > > > mdt/mgs : > > > > > > > > > > > > > > > ************************************************************************************8 > > > > > > > [root at x-math20 ~]#mkfs.lustre --reformat > > > --fsname spfs --mdt -- > > > > mgs /dev/hdb > > > > > > > > Permanent disk data: > > > > Target: spfs-MDTffff > > > > Index: unassigned > > > > Lustre FS: spfs > > > > Mount type: ldiskfs > > > > Flags: 0x75 > > > > (MDT MGS needs_index first_time > > > update ) > > > > Persistent mount opts: > > > errors=remount-ro,iopen_nopriv,user_xattr > > > > Parameters: > > > > > > > > device size = 19092MB > > > > formatting backing filesystem ldiskfs on > > > /dev/hdb > > > > target name spfs-MDTffff > > > > 4k blocks 0 > > > > options -J size=400 -i 4096 -I > > > 512 -q -O dir_index > > > > -F > > > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L > > > spfs-MDTffff -J size=400 -i > > > > 4096 -I 512 -q -O dir_index -F /dev/hdb > > > > Writing CONFIGS/mountdata > > > > [ root at x-math20 ~]# df > > > > Filesystem 1K-blocks Used > > > Available Use% Mounted on > > > > /dev/hda1 19228276 4855244 > > > 13396284 27% / > > > > none 127432 0 > > > 127432 0% /dev/shm > > > > /dev/hdb 17105436 455152 > > > 15672728 3% /mnt/test/ > > > > mdt > > > > [root at x-math20 ~]# cat /proc/fs/lustre/devices > > > > 0 UP mgs MGS MGS 5 > > > > 1 UP mgc MGC132.66.176.211 at tcp > > > > 5f5ba729-6412-3843-2229-1310a0b48f71 5 > > > > 2 UP mdt MDS MDS_uuid 3 > > > > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 > > > > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 > > > > [ root at x-math20 ~]# > > > > > > > *************************************************************end > > > > mdt******************************8 > > > > so you can see that the MGS is up > > > > ond on the ost''s I get an error!! plz help... > > > > > > > > ost: > > > > > > > > > > > ********************************************************************** > > > > [ root at x-mathr11 ~]# mkfs.lustre --reformat > > > --fsname spfs --ost -- > > > > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 > > > > > > > > Permanent disk data: > > > > Target: spfs-OSTffff > > > > Index: unassigned > > > > Lustre FS: spfs > > > > Mount type: ldiskfs > > > > Flags: 0x72 > > > > (OST needs_index first_time > > > update ) > > > > Persistent mount opts: > > > errors=remount-ro,extents,mballoc > > > > Parameters: mgsnode=132.66.176.211 at tcp > > > > > > > > device size = 19594MB > > > > formatting backing filesystem ldiskfs on > > > /dev/hdb1 > > > > target name spfs-OSTffff > > > > 4k blocks 0 > > > > options -J size=400 -i 16384 -I > > > 256 -q -O > > > > dir_index -F > > > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L > > > spfs-OSTffff -J size=400 -i > > > > 16384 -I 256 -q -O dir_index -F /dev/hdb1 > > > > Writing CONFIGS/mountdata > > > > [ root at x-mathr11 ~]# /CONFIGS/mountdata > > > > -bash: /CONFIGS/mountdata: No such file or > > > directory > > > > [root at x-mathr11 ~]# mount -t lustre /dev/hdb1 > > > /mnt/test/ost1 > > > > mount.lustre: mount /dev/hdb1 at > > > /mnt/test/ost1 failed: Input/ > > > > output error > > > > Is the MGS running? > > > > > > > ***********************************************end > > > > ost******************************** > > > > > > > > can any one point out the problem? > > > > thanks Avi. > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Lustre-discuss mailing list > > > > Lustre-discuss at clusterfs.com > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > Associate Systems Administrator/Web Designer > > > > Center for Research on Environment and Water > > > > > > > > (301) 595-7001 > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > Associate Systems Administrator/Web Designer > > > > Center for Research on Environment and Water > > > > > > > > (301) 595-7001 > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > Aaron Knister > > > Associate Systems Administrator/Web Designer > > > Center for Research on Environment and Water > > > > > > (301) 595-7001 > > > aaron at iges.org > > > > > > > > > > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080102/48094a1d/attachment-0002.html
That all looks ok. From x-math20 could you run "lctl ping 132.66.176.212 at tcp0"? On Jan 2, 2008, at 8:36 AM, Avi Gershon wrote:> Hi, I get this: > *************************************************************************** > [root at x-math20 ~]# lctl list_nids > 132.66.176.211 at tcp > [root at x-math20 ~]# ifconfig -a > eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF > inet addr:132.66.176.211 Bcast:132.66.255.255 Mask:255.255.0.0 > inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0 > TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB) > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:8180 errors:0 dropped:0 overruns:0 frame:0 > TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:3335243 (3.1 MiB) TX bytes:3335243 (3.1 MiB) > > sit0 Link encap:IPv6-in-IPv4 > NOARP MTU:1480 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) > > [root at x-math20 ~]# cat /etc/modprobe.conf > alias eth0 e100 > alias usb-controller uhci-hcd > alias scsi_hostadapter ata_piix > alias lustre llite > options lnet networks=tcp0 > [root at x-math20 ~]# > > ***********************************************************************************************************8 > > On 1/2/08, Aaron Knister <aaron at iges.org> wrote: > On the host x-math20 could you run an "lctl list_nids" and also an > "ifconfig -a". I want to see if lnet is listening on the correct > interface. Oh could you also post the contents of your /etc/ > modprobe.conf. > > Thanks! > > -Aaron > > On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote: > >> Hello to every one and happy new year.. >> I think I have reduce my problem to this: lctl ping >> 132.66.176.211 at tcp0 don''t work for me for some strange reason >> as you can see: >> *********************************************************************************** >> [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0 >> failed to ping 132.66.176.211 at tcp: Input/output error >> [root at x-math20 ~]# ping 132.66.176.211 >> PING 132.66.176.211 ( 132.66.176.211) 56(84) bytes of data. >> 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms >> 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms >> 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m >> --- 132.66.176.211 ping statistics --- >> 3 packets transmitted, 3 received, 0% packet loss, time 2018ms >> rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2 >> [root at x-math20 ~]# >> ***************************************************************************************** >> >> >> On 12/24/07, Avi Gershon <gershonavi at gmail.com> wrote: >> Hi, >> here is the "iptables -L " results: >> >> NODE 1 132.66.176.212 >> Scientific Linux CERN SLC release 4.6 (Beryllium) >> root at 132.66.176.212''s password: >> Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il >> [root at localhost ~]# >> [root at localhost ~]# >> [root at localhost ~]# iptables -L >> Chain INPUT (policy ACCEPT) >> target prot opt source destination >> Chain FORWARD (policy ACCEPT) >> target prot opt source destination >> >> Chain OUTPUT (policy ACCEPT) >> target prot opt source destination >> ************************************************************************************************ >> MDT 132.66.176.211 >> >> Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il >> [root at x-math20 ~]# iptables -L >> Chain INPUT (policy ACCEPT) >> target prot opt source destination >> Chain FORWARD (policy ACCEPT) >> target prot opt source destination >> >> Chain OUTPUT (policy ACCEPT) >> target prot opt source destination >> ************************************************************************* >> >> NODE 2 132.66.176.215 >> Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il >> [root at x-mathr11 ~]# iptables -L >> >> Chain INPUT (policy ACCEPT) >> target prot opt source destination >> RH-Firewall-1-INPUT all -- anywhere anywhere >> Chain FORWARD (policy ACCEPT) >> target prot opt source destination >> RH-Firewall-1-INPUT all -- anywhere anywhere >> >> Chain OUTPUT (policy ACCEPT) >> target prot opt source destination >> >> Chain RH-Firewall-1-INPUT (2 references) >> target prot opt source destination >> ACCEPT all -- anywhere anywhere >> ACCEPT icmp -- anywhere anywhere icmp any >> ACCEPT ipv6-crypt-- anywhere anywhere >> ACCEPT ipv6-auth-- anywhere anywhere >> ACCEPT udp -- anywhere 224.0.0.251 udp >> dpt:5353 >> ACCEPT udp -- anywhere anywhere udp >> dpt:ipp >> ACCEPT all -- anywhere anywhere state >> RELATED,ESTAB >> LISHED >> ACCEPT tcp -- anywhere anywhere state >> NEW tcp dpts: >> 30000:30101 >> ACCEPT tcp -- anywhere anywhere state >> NEW tcp dpt:s >> sh >> ACCEPT udp -- anywhere anywhere state >> NEW udp dpt:a >> fs3-callback >> REJECT all -- anywhere anywhere reject- >> with icmp-ho >> st-prohibited >> [root at x-mathr11 ~]# >> >> ************************************************************ >> one more thing.... >> Do you use TCP protocol? or do you use UDP? >> >> Regards Avi, >> P.S I think a beginning of a beautiful friendship.. :-) >> >> >> >> On Dec 24, 2007 5:29 PM, Aaron Knister < aaron at iges.org> wrote: >> That sounds like quite a task! Could you show me the contents of your >> firewall rules on the systems mentioned below? (iptables -L) on each. >> That would help to diagnose the problem further. >> >> -Aaron >> >> On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote: >> >> > Hi Aaron and thank you for you fast answwers. >> > We are working (Avi,Meny and me) on the israeli GRID and we need to >> > create a single huge file system for this GRID. >> > cheers >> > Yan >> > >> > ________________________________ >> > >> > From: Aaron Knister [mailto: aaron at iges.org] >> > Sent: Sun 12/23/2007 8:27 PM >> > To: Avi Gershon >> > Cc: lustre-discuss at clusterfs.com ; Yan Benhammou; Meny Ben moshe >> > Subject: Re: [Lustre-discuss] help needed. >> > >> > >> > Can you check the firewall on each of those machines ( iptables - >> L ) >> > and paste that here. Also, is this network dedicated to Lustre? >> > Lustre can easily saturate a network interface under load to the >> > point it becomes difficult to login to a node if it only has one >> > interface. I''d recommend using a different interface if you can. >> > >> > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: >> > >> > >> > node 1 132.66.176.212 < http://132.66.176.212/> >> > node 2 132.66.176.215 < http://132.66.176.215/> >> > >> > [root at x-math20 ~]# ssh 132.66.176.215 < http://132.66.176.215/ >> > >> > root at 132.66.176.215''s password: >> > ssh(21957) Permission denied, please try again. >> > root at 132.66.176.215 ''s password: >> > Last login: Sun Dec 23 14:32:51 2007 from x- >> math20.tau.ac.il <http://x-math20.tau.ac.il/ >> > > >> > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 >> > failed to ping 132.66.176.211 at tcp: Input/output error >> > [root at x-mathr11 ~]# lctl list_nids >> > 132.66.176.215 at tcp >> > [root at x-mathr11 ~]# ssh 132.66.176.212 <http://132.66.176.212/ >> > >> > The authenticity of host '' 132.66.176.212 <http://132.66.176.212/ >> > >> > ( 132.66.176.212 <http://132.66.176.212/> )'' can''t be established. >> > RSA1 key fingerprint is >> 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce: >> > 7e:74. >> > Are you sure you want to continue connecting (yes/no)? yes >> > ssh(11526) Warning: Permanently added '' 132.66.176.212 < http://132.66.176.212/ >> > > '' (RSA1) to the list of kno >> > wn hosts. >> > root at 132.66.176.212''s password: >> > Last login: Sun Dec 23 15:24:41 2007 from x- >> math20.tau.ac.il <http://x-math20.tau.ac.il/ >> > > >> > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 >> > failed to ping 132.66.176.211 at tcp: Input/output error >> > [root at localhost ~]# lctl list_nids >> > 132.66.176.212 at tcp >> > [root at localhost ~]# >> > >> > >> > thanks for helping!! >> > Avi >> > >> > >> > On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org> >> wrote: >> > >> > >> > On the oss can you ping the mds/mgs using this >> command-- >> > >> > lctl ping 132.66.176.211 at tcp0 >> > >> > If it doesn''t ping, list the nids on each node by >> running >> > >> > lctl list_nids >> > >> > and tell me what comes back. >> > >> > -Aaron >> > >> > >> > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: >> > >> > >> > HI I could use some help. >> > I installed lustre on 3 computers >> > mdt/mgs : >> > >> > >> > >> ************************************************************************************8 >> > [root at x-math20 ~]#mkfs.lustre --reformat -- >> fsname spfs --mdt -- >> > mgs /dev/hdb >> > >> > Permanent disk data: >> > Target: spfs-MDTffff >> > Index: unassigned >> > Lustre FS: spfs >> > Mount type: ldiskfs >> > Flags: 0x75 >> > (MDT MGS needs_index >> first_time update ) >> > Persistent mount opts: errors=remount- >> ro,iopen_nopriv,user_xattr >> > Parameters: >> > >> > device size = 19092MB >> > formatting backing filesystem ldiskfs on / >> dev/hdb >> > target name spfs-MDTffff >> > 4k blocks 0 >> > options -J size=400 -i 4096 - >> I 512 -q -O dir_index >> > -F >> > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- >> MDTffff -J size=400 -i >> > 4096 -I 512 -q -O dir_index -F /dev/hdb >> > Writing CONFIGS/mountdata >> > [ root at x-math20 ~]# df >> > Filesystem 1K-blocks Used >> Available Use% Mounted on >> > /dev/hda1 19228276 4855244 >> 13396284 27% / >> > none 127432 0 >> 127432 0% /dev/shm >> > /dev/hdb 17105436 455152 >> 15672728 3% /mnt/test/ >> > mdt >> > [root at x-math20 ~]# cat /proc/fs/lustre/ >> devices >> > 0 UP mgs MGS MGS 5 >> > 1 UP mgc MGC132.66.176.211 at tcp >> > 5f5ba729-6412-3843-2229-1310a0b48f71 5 >> > 2 UP mdt MDS MDS_uuid 3 >> > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 >> > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 >> > [ root at x-math20 ~]# >> > >> *************************************************************end >> > mdt******************************8 >> > so you can see that the MGS is up >> > ond on the ost''s I get an error!! plz help... >> > >> > ost: >> > >> > >> ********************************************************************** >> > [ root at x-mathr11 ~]# mkfs.lustre --reformat >> --fsname spfs --ost -- >> > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 >> > >> > Permanent disk data: >> > Target: spfs-OSTffff >> > Index: unassigned >> > Lustre FS: spfs >> > Mount type: ldiskfs >> > Flags: 0x72 >> > (OST needs_index first_time >> update ) >> > Persistent mount opts: errors=remount- >> ro,extents,mballoc >> > Parameters: mgsnode=132.66.176.211 at tcp >> > >> > device size = 19594MB >> > formatting backing filesystem ldiskfs on / >> dev/hdb1 >> > target name spfs-OSTffff >> > 4k blocks 0 >> > options -J size=400 -i 16384 >> -I 256 -q -O >> > dir_index -F >> > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- >> OSTffff -J size=400 -i >> > 16384 -I 256 -q -O dir_index -F /dev/hdb1 >> > Writing CONFIGS/mountdata >> > [ root at x-mathr11 ~]# /CONFIGS/mountdata >> > -bash: /CONFIGS/mountdata: No such file or >> directory >> > [root at x-mathr11 ~]# mount -t lustre /dev/ >> hdb1 /mnt/test/ost1 >> > mount.lustre: mount /dev/hdb1 at /mnt/test/ >> ost1 failed: Input/ >> > output error >> > Is the MGS running? >> > >> ***********************************************end >> > ost******************************** >> > >> > can any one point out the problem? >> > thanks Avi. >> > >> > >> > >> > >> _______________________________________________ >> > Lustre-discuss mailing list >> > Lustre-discuss at clusterfs.com >> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > >> > >> > >> > >> > >> > Aaron Knister >> > Associate Systems Administrator/Web Designer >> > Center for Research on Environment and Water >> > >> > (301) 595-7001 >> > aaron at iges.org >> > >> > >> > >> > >> > >> > >> > Aaron Knister >> > Associate Systems Administrator/Web Designer >> > Center for Research on Environment and Water >> > >> > (301) 595-7001 >> > aaron at iges.org >> > >> > >> > >> >> Aaron Knister >> Associate Systems Administrator/Web Designer >> Center for Research on Environment and Water >> >> (301) 595-7001 >> aaron at iges.org >> >> >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > > >Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080102/82e67c87/attachment-0002.html
no, that don''t work also :-( .. thanks for answering so fast Avi On 1/2/08, Aaron Knister <aaron at iges.org> wrote:> > That all looks ok. From x-math20 could you run "lctl ping > 132.66.176.212 at tcp0"? > > On Jan 2, 2008, at 8:36 AM, Avi Gershon wrote: > > *Hi, I get this:* > > *************************************************************************** > [root at x-math20 ~]# lctl list_nids > 132.66.176.211 at tcp > [root at x-math20 ~]# ifconfig -a > eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF > inet addr:132.66.176.211 Bcast:132.66.255.255 Mask:255.255.0.0 > inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0 > TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB) > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:8180 errors:0 dropped:0 overruns:0 frame:0 > TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:3335243 (3.1 MiB) TX bytes:3335243 (3.1 MiB) > > sit0 Link encap:IPv6-in-IPv4 > NOARP MTU:1480 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) > > [root at x-math20 ~]# cat /etc/modprobe.conf > alias eth0 e100 > alias usb-controller uhci-hcd > alias scsi_hostadapter ata_piix > alias lustre llite > options lnet networks=tcp0 > [root at x-math20 ~]# > > ***********************************************************************************************************8 > > > On 1/2/08, Aaron Knister <aaron at iges.org> wrote: > > > > On the host x-math20 could you run an "lctl list_nids" and also an > > "ifconfig -a". I want to see if lnet is listening on the correct interface. > > Oh could you also post the contents of your /etc/modprobe.conf. > > > > Thanks! > > > > -Aaron > > > > On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote: > > > > Hello to every one and happy new year.. > > I think I have reduce my problem to this: lctl ping 132.66.176.211 at tcp0don''t work for me for some strange reason > > as you can see: > > *********************************************************************************** > > > > [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0 > > failed to ping 132.66.176.211 at tcp: Input/output error > > [root at x-math20 ~]# ping 132.66.176.211 > > PING 132.66.176.211 ( 132.66.176.211) 56(84) bytes of data. > > 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms > > 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms > > 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m > > --- 132.66.176.211 ping statistics --- > > 3 packets transmitted, 3 received, 0% packet loss, time 2018ms > > rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2 > > [root at x-math20 ~]# > > ***************************************************************************************** > > > > > > > > On 12/24/07, Avi Gershon <gershonavi at gmail.com> wrote: > > > > > > Hi, > > > here is the "iptables -L " results: > > > > > > NODE 1 132.66.176.212 <root at 132.66.176.212> > > > Scientific Linux CERN SLC release 4.6 (Beryllium) > > > root at 132.66.176.212''s password: > > > Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il > > > [root at localhost ~]# > > > [root at localhost ~]# > > > [root at localhost ~]# iptables -L > > > Chain INPUT (policy ACCEPT) > > > target prot opt source destination > > > > > > Chain FORWARD (policy ACCEPT) > > > target prot opt source destination > > > Chain OUTPUT (policy ACCEPT) > > > target prot opt source destination > > > ************************************************************************************************ > > > > > > MDT 132.66.176.211 > > > > > > Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il > > > [root at x-math20 ~]# iptables -L > > > Chain INPUT (policy ACCEPT) > > > target prot opt source destination > > > > > > Chain FORWARD (policy ACCEPT) > > > target prot opt source destination > > > Chain OUTPUT (policy ACCEPT) > > > target prot opt source destination > > > > > > ************************************************************************* > > > > > > NODE 2 132.66.176.215 <root at 132.66.176.215> > > > Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il > > > [root at x-mathr11 ~]# iptables -L > > > Chain INPUT (policy ACCEPT) > > > target prot opt source destination > > > RH-Firewall-1-INPUT all -- anywhere anywhere > > > > > > Chain FORWARD (policy ACCEPT) > > > target prot opt source destination > > > RH-Firewall-1-INPUT all -- anywhere anywhere > > > > > > Chain OUTPUT (policy ACCEPT) > > > target prot opt source destination > > > Chain RH-Firewall-1-INPUT (2 references) > > > target prot opt source destination > > > ACCEPT all -- anywhere anywhere > > > ACCEPT icmp -- anywhere anywhere icmp any > > > ACCEPT ipv6-crypt-- anywhere anywhere > > > ACCEPT ipv6-auth-- anywhere anywhere > > > ACCEPT udp -- anywhere 224.0.0.251 udp > > > dpt:5353 > > > ACCEPT udp -- anywhere anywhere udp > > > dpt:ipp > > > ACCEPT all -- anywhere anywhere state > > > RELATED,ESTAB > > > LISHED > > > ACCEPT tcp -- anywhere anywhere state NEW > > > tcp dpts: > > > 30000:30101 > > > ACCEPT tcp -- anywhere anywhere state NEW > > > tcp dpt:s > > > sh > > > ACCEPT udp -- anywhere anywhere state NEW > > > udp dpt:a > > > fs3-callback > > > REJECT all -- anywhere anywhere > > > reject-with icmp-ho > > > st-prohibited > > > [root at x-mathr11 ~]# > > > > > > ************************************************************ > > > one more thing.... > > > Do you use TCP protocol? or do you use UDP? > > > > > > Regards Avi, > > > P.S I think a beginning of a beautiful friendship.. :-) > > > > > > > > > > > > On Dec 24, 2007 5:29 PM, Aaron Knister < aaron at iges.org> wrote: > > > > > > > That sounds like quite a task! Could you show me the contents of > > > > your > > > > firewall rules on the systems mentioned below? (iptables -L) on > > > > each. > > > > That would help to diagnose the problem further. > > > > > > > > -Aaron > > > > > > > > On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote: > > > > > > > > > Hi Aaron and thank you for you fast answwers. > > > > > We are working (Avi,Meny and me) on the israeli GRID and we need > > > > to > > > > > create a single huge file system for this GRID. > > > > > cheers > > > > > Yan > > > > > > > > > > ________________________________ > > > > > > > > > > From: Aaron Knister [mailto: aaron at iges.org] > > > > > Sent: Sun 12/23/2007 8:27 PM > > > > > To: Avi Gershon > > > > > Cc: lustre-discuss at clusterfs.com ; Yan Benhammou; Meny Ben moshe > > > > > Subject: Re: [Lustre-discuss] help needed. > > > > > > > > > > > > > > > Can you check the firewall on each of those machines ( iptables -L > > > > ) > > > > > and paste that here. Also, is this network dedicated to Lustre? > > > > > Lustre can easily saturate a network interface under load to the > > > > > point it becomes difficult to login to a node if it only has one > > > > > interface. I''d recommend using a different interface if you can. > > > > > > > > > > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: > > > > > > > > > > > > > > > node 1 132.66.176.212 < http://132.66.176.212/> > > > > > node 2 132.66.176.215 < http://132.66.176.215/> > > > > > > > > > > [root at x-math20 ~]# ssh 132.66.176.215 <http://132.66.176.215/> > > > > > root at 132.66.176.215''s password: > > > > > ssh(21957) Permission denied, please try again. > > > > > root at 132.66.176.215 ''s password: > > > > > Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il > > > > <http://x-math20.tau.ac.il/ > > > > > > > > > > > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 > > > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > > > [root at x-mathr11 ~]# lctl list_nids > > > > > 132.66.176.215 at tcp > > > > > [root at x-mathr11 ~]# ssh 132.66.176.212 < > > > > http://132.66.176.212/> > > > > > The authenticity of host '' 132.66.176.212 < > > > > http://132.66.176.212/> > > > > > ( 132.66.176.212 <http://132.66.176.212/> )'' can''t be established. > > > > > RSA1 key fingerprint is > > > > 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce: > > > > > 7e:74. > > > > > Are you sure you want to continue connecting (yes/no)? yes > > > > > ssh(11526) Warning: Permanently added '' 132.66.176.212 <http://132.66.176.212/ > > > > > > '' (RSA1) to the list of kno > > > > > wn hosts. > > > > > root at 132.66.176.212''s password: > > > > > Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il< > > > > http://x-math20.tau.ac.il/ > > > > > > > > > > > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 > > > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > > > [root at localhost ~]# lctl list_nids > > > > > 132.66.176.212 at tcp > > > > > [root at localhost ~]# > > > > > > > > > > > > > > > thanks for helping!! > > > > > Avi > > > > > > > > > > > > > > > On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org> > > > > wrote: > > > > > > > > > > > > > > > On the oss can you ping the mds/mgs using this > > > > command-- > > > > > > > > > > lctl ping 132.66.176.211 at tcp0 > > > > > > > > > > If it doesn''t ping, list the nids on each node by > > > > running > > > > > > > > > > lctl list_nids > > > > > > > > > > and tell me what comes back. > > > > > > > > > > -Aaron > > > > > > > > > > > > > > > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: > > > > > > > > > > > > > > > HI I could use some help. > > > > > I installed lustre on 3 computers > > > > > mdt/mgs : > > > > > > > > > > > > > > > > > > > ************************************************************************************8 > > > > > > > > > [root at x-math20 ~]#mkfs.lustre --reformat > > > > --fsname spfs --mdt -- > > > > > mgs /dev/hdb > > > > > > > > > > Permanent disk data: > > > > > Target: spfs-MDTffff > > > > > Index: unassigned > > > > > Lustre FS: spfs > > > > > Mount type: ldiskfs > > > > > Flags: 0x75 > > > > > (MDT MGS needs_index > > > > first_time update ) > > > > > Persistent mount opts: > > > > errors=remount-ro,iopen_nopriv,user_xattr > > > > > Parameters: > > > > > > > > > > device size = 19092MB > > > > > formatting backing filesystem ldiskfs on > > > > /dev/hdb > > > > > target name spfs-MDTffff > > > > > 4k blocks 0 > > > > > options -J size=400 -i 4096 > > > > -I 512 -q -O dir_index > > > > > -F > > > > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L > > > > spfs-MDTffff -J size=400 -i > > > > > 4096 -I 512 -q -O dir_index -F /dev/hdb > > > > > Writing CONFIGS/mountdata > > > > > [ root at x-math20 ~]# df > > > > > Filesystem 1K-blocks Used > > > > Available Use% Mounted on > > > > > /dev/hda1 19228276 4855244 > > > > 13396284 27% / > > > > > none 127432 0 > > > > 127432 0% /dev/shm > > > > > /dev/hdb 17105436 455152 > > > > 15672728 3% /mnt/test/ > > > > > mdt > > > > > [root at x-math20 ~]# cat > > > > /proc/fs/lustre/devices > > > > > 0 UP mgs MGS MGS 5 > > > > > 1 UP mgc MGC132.66.176.211 at tcp > > > > > 5f5ba729-6412-3843-2229-1310a0b48f71 5 > > > > > 2 UP mdt MDS MDS_uuid 3 > > > > > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 > > > > > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 > > > > > [ root at x-math20 ~]# > > > > > > > > > *************************************************************end > > > > > mdt******************************8 > > > > > so you can see that the MGS is up > > > > > ond on the ost''s I get an error!! plz > > > > help... > > > > > > > > > > ost: > > > > > > > > > > > > > > ********************************************************************** > > > > > [ root at x-mathr11 ~]# mkfs.lustre --reformat > > > > --fsname spfs --ost -- > > > > > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 > > > > > > > > > > Permanent disk data: > > > > > Target: spfs-OSTffff > > > > > Index: unassigned > > > > > Lustre FS: spfs > > > > > Mount type: ldiskfs > > > > > Flags: 0x72 > > > > > (OST needs_index first_time > > > > update ) > > > > > Persistent mount opts: > > > > errors=remount-ro,extents,mballoc > > > > > Parameters: mgsnode=132.66.176.211 at tcp > > > > > > > > > > device size = 19594MB > > > > > formatting backing filesystem ldiskfs on > > > > /dev/hdb1 > > > > > target name spfs-OSTffff > > > > > 4k blocks 0 > > > > > options -J size=400 -i 16384 > > > > -I 256 -q -O > > > > > dir_index -F > > > > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L > > > > spfs-OSTffff -J size=400 -i > > > > > 16384 -I 256 -q -O dir_index -F /dev/hdb1 > > > > > Writing CONFIGS/mountdata > > > > > [ root at x-mathr11 ~]# /CONFIGS/mountdata > > > > > -bash: /CONFIGS/mountdata: No such file or > > > > directory > > > > > [root at x-mathr11 ~]# mount -t lustre > > > > /dev/hdb1 /mnt/test/ost1 > > > > > mount.lustre: mount /dev/hdb1 at > > > > /mnt/test/ost1 failed: Input/ > > > > > output error > > > > > Is the MGS running? > > > > > > > > > ***********************************************end > > > > > ost******************************** > > > > > > > > > > can any one point out the problem? > > > > > thanks Avi. > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Lustre-discuss mailing list > > > > > Lustre-discuss at clusterfs.com > > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > > Associate Systems Administrator/Web Designer > > > > > Center for Research on Environment and Water > > > > > > > > > > (301) 595-7001 > > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > > Associate Systems Administrator/Web Designer > > > > > Center for Research on Environment and Water > > > > > > > > > > (301) 595-7001 > > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > Associate Systems Administrator/Web Designer > > > > Center for Research on Environment and Water > > > > > > > > (301) 595-7001 > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > Aaron Knister > > Associate Systems Analyst > > Center for Ocean-Land-Atmosphere Studies > > > > (301) 595-7000 > > aaron at iges.org > > > > > > > > > > > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080102/1e1929d5/attachment-0002.html
Can you run dmesg and send me any lustre related errors? Also what''s the output of "getenforce"? -Aaron On Jan 2, 2008, at 1:47 PM, Avi Gershon wrote:> no, that don''t work also :-( .. > thanks for answering so fast > Avi > > On 1/2/08, Aaron Knister <aaron at iges.org > wrote: > That all looks ok. From x-math20 could you run "lctl ping > 132.66.176.212 at tcp0"? > > On Jan 2, 2008, at 8:36 AM, Avi Gershon wrote: > >> Hi, I get this: >> *************************************************************************** >> [root at x-math20 ~]# lctl list_nids >> 132.66.176.211 at tcp >> [root at x-math20 ~]# ifconfig -a >> eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF >> inet addr:132.66.176.211 Bcast:132.66.255.255 Mask: 255.255.0.0 >> inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB) >> >> lo Link encap:Local Loopback >> inet addr: 127.0.0.1 Mask:255.0.0.0 >> inet6 addr: ::1/128 Scope:Host >> UP LOOPBACK RUNNING MTU:16436 Metric:1 >> RX packets:8180 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:0 >> RX bytes:3335243 (3.1 MiB) TX bytes:3335243 ( 3.1 MiB) >> >> sit0 Link encap:IPv6-in-IPv4 >> NOARP MTU:1480 Metric:1 >> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:0 >> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) >> >> [root at x-math20 ~]# cat /etc/modprobe.conf >> alias eth0 e100 >> alias usb-controller uhci-hcd >> alias scsi_hostadapter ata_piix >> alias lustre llite >> options lnet networks=tcp0 >> [root at x-math20 ~]# >> >> ***********************************************************************************************************8 >> >> On 1/2/08, Aaron Knister <aaron at iges.org> wrote: >> On the host x-math20 could you run an "lctl list_nids" and also an >> "ifconfig -a". I want to see if lnet is listening on the correct >> interface. Oh could you also post the contents of your /etc/ >> modprobe.conf. >> >> Thanks! >> >> -Aaron >> >> On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote: >> >>> Hello to every one and happy new year.. >>> I think I have reduce my problem to this: lctl ping >>> 132.66.176.211 at tcp0 don''t work for me for some strange reason >>> as you can see: >>> *********************************************************************************** >>> [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0 >>> failed to ping 132.66.176.211 at tcp: Input/output error >>> [root at x-math20 ~]# ping 132.66.176.211 >>> PING 132.66.176.211 ( 132.66.176.211) 56(84) bytes of data. >>> 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms >>> 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms >>> 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m >>> --- 132.66.176.211 ping statistics --- >>> 3 packets transmitted, 3 received, 0% packet loss, time 2018ms >>> rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2 >>> [root at x-math20 ~]# >>> ***************************************************************************************** >>> >>> >>> On 12/24/07, Avi Gershon <gershonavi at gmail.com > wrote: >>> Hi, >>> here is the "iptables -L " results: >>> >>> NODE 1 132.66.176.212 >>> Scientific Linux CERN SLC release 4.6 (Beryllium) >>> root at 132.66.176.212''s password: >>> Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il >>> [root at localhost ~]# >>> [root at localhost ~]# >>> [root at localhost ~]# iptables -L >>> Chain INPUT (policy ACCEPT) >>> target prot opt source destination >>> Chain FORWARD (policy ACCEPT) >>> target prot opt source destination >>> >>> Chain OUTPUT (policy ACCEPT) >>> target prot opt source destination >>> ************************************************************************************************ >>> MDT 132.66.176.211 >>> >>> Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il >>> [root at x-math20 ~]# iptables -L >>> Chain INPUT (policy ACCEPT) >>> target prot opt source destination >>> Chain FORWARD (policy ACCEPT) >>> target prot opt source destination >>> >>> Chain OUTPUT (policy ACCEPT) >>> target prot opt source destination >>> ************************************************************************* >>> >>> NODE 2 132.66.176.215 >>> Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il >>> [root at x-mathr11 ~]# iptables -L >>> >>> Chain INPUT (policy ACCEPT) >>> target prot opt source destination >>> RH-Firewall-1-INPUT all -- anywhere anywhere >>> Chain FORWARD (policy ACCEPT) >>> target prot opt source destination >>> RH-Firewall-1-INPUT all -- anywhere anywhere >>> >>> Chain OUTPUT (policy ACCEPT) >>> target prot opt source destination >>> >>> Chain RH-Firewall-1-INPUT (2 references) >>> target prot opt source destination >>> ACCEPT all -- anywhere anywhere >>> ACCEPT icmp -- anywhere anywhere icmp >>> any >>> ACCEPT ipv6-crypt-- anywhere anywhere >>> ACCEPT ipv6-auth-- anywhere anywhere >>> ACCEPT udp -- anywhere 224.0.0.251 udp >>> dpt:5353 >>> ACCEPT udp -- anywhere anywhere udp >>> dpt:ipp >>> ACCEPT all -- anywhere anywhere state >>> RELATED,ESTAB >>> LISHED >>> ACCEPT tcp -- anywhere anywhere state >>> NEW tcp dpts: >>> 30000:30101 >>> ACCEPT tcp -- anywhere anywhere state >>> NEW tcp dpt:s >>> sh >>> ACCEPT udp -- anywhere anywhere state >>> NEW udp dpt:a >>> fs3-callback >>> REJECT all -- anywhere anywhere >>> reject-with icmp-ho >>> st-prohibited >>> [root at x-mathr11 ~]# >>> >>> ************************************************************ >>> one more thing.... >>> Do you use TCP protocol? or do you use UDP? >>> >>> Regards Avi, >>> P.S I think a beginning of a beautiful friendship.. :-) >>> >>> >>> >>> On Dec 24, 2007 5:29 PM, Aaron Knister < aaron at iges.org> wrote: >>> That sounds like quite a task! Could you show me the contents of >>> your >>> firewall rules on the systems mentioned below? (iptables -L) on >>> each. >>> That would help to diagnose the problem further. >>> >>> -Aaron >>> >>> On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote: >>> >>> > Hi Aaron and thank you for you fast answwers. >>> > We are working (Avi,Meny and me) on the israeli GRID and we need >>> to >>> > create a single huge file system for this GRID. >>> > cheers >>> > Yan >>> > >>> > ________________________________ >>> > >>> > From: Aaron Knister [mailto: aaron at iges.org] >>> > Sent: Sun 12/23/2007 8:27 PM >>> > To: Avi Gershon >>> > Cc: lustre-discuss at clusterfs.com ; Yan Benhammou; Meny Ben moshe >>> > Subject: Re: [Lustre-discuss] help needed. >>> > >>> > >>> > Can you check the firewall on each of those machines ( iptables - >>> L ) >>> > and paste that here. Also, is this network dedicated to Lustre? >>> > Lustre can easily saturate a network interface under load to the >>> > point it becomes difficult to login to a node if it only has one >>> > interface. I''d recommend using a different interface if you can. >>> > >>> > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: >>> > >>> > >>> > node 1 132.66.176.212 < http://132.66.176.212/> >>> > node 2 132.66.176.215 < http://132.66.176.215/> >>> > >>> > [root at x-math20 ~]# ssh 132.66.176.215 < http://132.66.176.215/ >>> > >>> > root at 132.66.176.215''s password: >>> > ssh(21957) Permission denied, please try again. >>> > root at 132.66.176.215 ''s password: >>> > Last login: Sun Dec 23 14:32:51 2007 from x- >>> math20.tau.ac.il <http://x-math20.tau.ac.il/ >>> > > >>> > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 >>> > failed to ping 132.66.176.211 at tcp: Input/output error >>> > [root at x-mathr11 ~]# lctl list_nids >>> > 132.66.176.215 at tcp >>> > [root at x-mathr11 ~]# ssh 132.66.176.212 <http://132.66.176.212/ >>> > >>> > The authenticity of host '' 132.66.176.212 <http://132.66.176.212/ >>> > >>> > ( 132.66.176.212 <http://132.66.176.212/> )'' can''t be established. >>> > RSA1 key fingerprint is >>> 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce: >>> > 7e:74. >>> > Are you sure you want to continue connecting (yes/no)? yes >>> > ssh(11526) Warning: Permanently added '' 132.66.176.212 < http://132.66.176.212/ >>> > > '' (RSA1) to the list of kno >>> > wn hosts. >>> > root at 132.66.176.212''s password: >>> > Last login: Sun Dec 23 15:24:41 2007 from x- >>> math20.tau.ac.il <http://x-math20.tau.ac.il/ >>> > > >>> > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 >>> > failed to ping 132.66.176.211 at tcp: Input/output error >>> > [root at localhost ~]# lctl list_nids >>> > 132.66.176.212 at tcp >>> > [root at localhost ~]# >>> > >>> > >>> > thanks for helping!! >>> > Avi >>> > >>> > >>> > On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org> >>> wrote: >>> > >>> > >>> > On the oss can you ping the mds/mgs using this >>> command-- >>> > >>> > lctl ping 132.66.176.211 at tcp0 >>> > >>> > If it doesn''t ping, list the nids on each node by >>> running >>> > >>> > lctl list_nids >>> > >>> > and tell me what comes back. >>> > >>> > -Aaron >>> > >>> > >>> > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: >>> > >>> > >>> > HI I could use some help. >>> > I installed lustre on 3 computers >>> > mdt/mgs : >>> > >>> > >>> > >>> ************************************************************************************8 >>> > [root at x-math20 ~]#mkfs.lustre --reformat -- >>> fsname spfs --mdt -- >>> > mgs /dev/hdb >>> > >>> > Permanent disk data: >>> > Target: spfs-MDTffff >>> > Index: unassigned >>> > Lustre FS: spfs >>> > Mount type: ldiskfs >>> > Flags: 0x75 >>> > (MDT MGS needs_index >>> first_time update ) >>> > Persistent mount opts: errors=remount- >>> ro,iopen_nopriv,user_xattr >>> > Parameters: >>> > >>> > device size = 19092MB >>> > formatting backing filesystem ldiskfs on / >>> dev/hdb >>> > target name spfs-MDTffff >>> > 4k blocks 0 >>> > options -J size=400 -i 4096 >>> -I 512 -q -O dir_index >>> > -F >>> > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- >>> MDTffff -J size=400 -i >>> > 4096 -I 512 -q -O dir_index -F /dev/hdb >>> > Writing CONFIGS/mountdata >>> > [ root at x-math20 ~]# df >>> > Filesystem 1K-blocks Used >>> Available Use% Mounted on >>> > /dev/hda1 19228276 4855244 >>> 13396284 27% / >>> > none 127432 >>> 0 127432 0% /dev/shm >>> > /dev/hdb 17105436 455152 >>> 15672728 3% /mnt/test/ >>> > mdt >>> > [root at x-math20 ~]# cat /proc/fs/lustre/ >>> devices >>> > 0 UP mgs MGS MGS 5 >>> > 1 UP mgc MGC132.66.176.211 at tcp >>> > 5f5ba729-6412-3843-2229-1310a0b48f71 5 >>> > 2 UP mdt MDS MDS_uuid 3 >>> > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 >>> > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 >>> > [ root at x-math20 ~]# >>> > >>> *************************************************************end >>> > mdt******************************8 >>> > so you can see that the MGS is up >>> > ond on the ost''s I get an error!! plz >>> help... >>> > >>> > ost: >>> > >>> > >>> ********************************************************************** >>> > [ root at x-mathr11 ~]# mkfs.lustre -- >>> reformat --fsname spfs --ost -- >>> > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 >>> > >>> > Permanent disk data: >>> > Target: spfs-OSTffff >>> > Index: unassigned >>> > Lustre FS: spfs >>> > Mount type: ldiskfs >>> > Flags: 0x72 >>> > (OST needs_index first_time >>> update ) >>> > Persistent mount opts: errors=remount- >>> ro,extents,mballoc >>> > Parameters: mgsnode=132.66.176.211 at tcp >>> > >>> > device size = 19594MB >>> > formatting backing filesystem ldiskfs on / >>> dev/hdb1 >>> > target name spfs-OSTffff >>> > 4k blocks 0 >>> > options -J size=400 -i >>> 16384 -I 256 -q -O >>> > dir_index -F >>> > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- >>> OSTffff -J size=400 -i >>> > 16384 -I 256 -q -O dir_index -F /dev/hdb1 >>> > Writing CONFIGS/mountdata >>> > [ root at x-mathr11 ~]# /CONFIGS/mountdata >>> > -bash: /CONFIGS/mountdata: No such file or >>> directory >>> > [root at x-mathr11 ~]# mount -t lustre /dev/ >>> hdb1 /mnt/test/ost1 >>> > mount.lustre: mount /dev/hdb1 at /mnt/test/ >>> ost1 failed: Input/ >>> > output error >>> > Is the MGS running? >>> > >>> ***********************************************end >>> > ost******************************** >>> > >>> > can any one point out the problem? >>> > thanks Avi. >>> > >>> > >>> > >>> > >>> _______________________________________________ >>> > Lustre-discuss mailing list >>> > Lustre-discuss at clusterfs.com >>> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> > >>> > >>> > >>> > >>> > >>> > Aaron Knister >>> > Associate Systems Administrator/Web Designer >>> > Center for Research on Environment and Water >>> > >>> > (301) 595-7001 >>> > aaron at iges.org >>> > >>> > >>> > >>> > >>> > >>> > >>> > Aaron Knister >>> > Associate Systems Administrator/Web Designer >>> > Center for Research on Environment and Water >>> > >>> > (301) 595-7001 >>> > aaron at iges.org >>> > >>> > >>> > >>> >>> Aaron Knister >>> Associate Systems Administrator/Web Designer >>> Center for Research on Environment and Water >>> >>> (301) 595-7001 >>> aaron at iges.org >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> >> Aaron Knister >> Associate Systems Analyst >> Center for Ocean-Land-Atmosphere Studies >> >> (301) 595-7000 >> aaron at iges.org >> >> >> >> >> > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > > >Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080102/a2882ea2/attachment-0002.html
Hi, dmesg: *********************************************************************************************** Lustre: 2092:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 Lustre: OBD class driver, info at clusterfs.com Lustre Version: 1.6.3 Build Version: 1.6.3-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.9.lustre.linux$ LustreError: 2092:0:(socklnd.c:2466:ksocknal_enumerate_interfaces()) Can''t find any usable interfaces LustreError: 105-4: Error -100 starting up LNI tcp LustreError: 2092:0:(events.c:654:ptlrpc_init_portals()) network initialisation failed LustreError: 2711:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading connection request from 132.66.17$ LustreError: 2711:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading connection request from 132.66.17$ audit(1197995576.670:57): avc: denied { rawip_send } for pid=2711 comm="acceptor_988" saddr=132.66.17$ audit(1197995672.933:58): avc: denied { rawip_recv } for saddr132.66.176.215 src=1023 daddr=132.66.1$ audit(1197995673.143:59): avc: denied { rawip_recv } for saddr132.66.176.215 src=1023 daddr=132.66.1$ audit(1197995673.563:60): avc: denied { rawip_recv } for saddr132.66.176.215 src=1023 daddr=132.66.1$ audit(1197995674.403:61): avc: denied { rawip_recv } for saddr132.66.176.215 src=1023 daddr=132.66.1$ ******************************************************************************************************88 getenforce: root at x-math20 ~]# getenforce Enforcing thanks Avi On 1/2/08, Aaron Knister <aaron at iges.org> wrote:> > Can you run dmesg and send me any lustre related errors? Also what''s the > output of "getenforce"? > > -Aaron > > On Jan 2, 2008, at 1:47 PM, Avi Gershon wrote: > > no, that don''t work also :-( .. > thanks for answering so fast > Avi > > On 1/2/08, Aaron Knister <aaron at iges.org > wrote: > > > > That all looks ok. From x-math20 could you run "lctl ping > > 132.66.176.212 at tcp0"? > > > > On Jan 2, 2008, at 8:36 AM, Avi Gershon wrote: > > > > *Hi, I get this:* > > > > *************************************************************************** > > [root at x-math20 ~]# lctl list_nids > > 132.66.176.211 at tcp > > [root at x-math20 ~]# ifconfig -a > > eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF > > inet addr:132.66.176.211 Bcast:132.66.255.255 Mask: 255.255.0.0 > > inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB) > > > > lo Link encap:Local Loopback > > inet addr: 127.0.0.1 Mask:255.0.0.0 > > inet6 addr: ::1/128 Scope:Host > > UP LOOPBACK RUNNING MTU:16436 Metric:1 > > RX packets:8180 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:0 > > RX bytes:3335243 (3.1 MiB) TX bytes:3335243 ( 3.1 MiB) > > > > sit0 Link encap:IPv6-in-IPv4 > > NOARP MTU:1480 Metric:1 > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:0 > > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) > > > > [root at x-math20 ~]# cat /etc/modprobe.conf > > alias eth0 e100 > > alias usb-controller uhci-hcd > > alias scsi_hostadapter ata_piix > > alias lustre llite > > options lnet networks=tcp0 > > [root at x-math20 ~]# > > > > ***********************************************************************************************************8 > > > > > > On 1/2/08, Aaron Knister <aaron at iges.org> wrote: > > > > > > On the host x-math20 could you run an "lctl list_nids" and also an > > > "ifconfig -a". I want to see if lnet is listening on the correct interface. > > > Oh could you also post the contents of your /etc/modprobe.conf. > > > > > > Thanks! > > > > > > -Aaron > > > > > > On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote: > > > > > > Hello to every one and happy new year.. > > > I think I have reduce my problem to this: lctl ping > > > 132.66.176.211 at tcp0 don''t work for me for some strange reason > > > as you can see: > > > *********************************************************************************** > > > > > > [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0 > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > [root at x-math20 ~]# ping 132.66.176.211 > > > PING 132.66.176.211 ( 132.66.176.211) 56(84) bytes of data. > > > 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms > > > 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms > > > 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m > > > --- 132.66.176.211 ping statistics --- > > > 3 packets transmitted, 3 received, 0% packet loss, time 2018ms > > > rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2 > > > [root at x-math20 ~]# > > > ***************************************************************************************** > > > > > > > > > > > > On 12/24/07, Avi Gershon <gershonavi at gmail.com > wrote: > > > > > > > > Hi, > > > > here is the "iptables -L " results: > > > > > > > > NODE 1 132.66.176.212 <root at 132.66.176.212> > > > > Scientific Linux CERN SLC release 4.6 (Beryllium) > > > > root at 132.66.176.212''s password: > > > > Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il > > > > [root at localhost ~]# > > > > [root at localhost ~]# > > > > [root at localhost ~]# iptables -L > > > > Chain INPUT (policy ACCEPT) > > > > target prot opt source destination > > > > > > > > Chain FORWARD (policy ACCEPT) > > > > target prot opt source destination > > > > Chain OUTPUT (policy ACCEPT) > > > > target prot opt source destination > > > > ************************************************************************************************ > > > > > > > > MDT 132.66.176.211 > > > > > > > > Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il > > > > [root at x-math20 ~]# iptables -L > > > > Chain INPUT (policy ACCEPT) > > > > target prot opt source destination > > > > > > > > Chain FORWARD (policy ACCEPT) > > > > target prot opt source destination > > > > Chain OUTPUT (policy ACCEPT) > > > > target prot opt source destination > > > > > > > > ************************************************************************* > > > > > > > > NODE 2 132.66.176.215 <root at 132.66.176.215> > > > > Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il > > > > [root at x-mathr11 ~]# iptables -L > > > > Chain INPUT (policy ACCEPT) > > > > target prot opt source destination > > > > RH-Firewall-1-INPUT all -- anywhere anywhere > > > > > > > > Chain FORWARD (policy ACCEPT) > > > > target prot opt source destination > > > > RH-Firewall-1-INPUT all -- anywhere anywhere > > > > > > > > Chain OUTPUT (policy ACCEPT) > > > > target prot opt source destination > > > > Chain RH-Firewall-1-INPUT (2 references) > > > > target prot opt source destination > > > > ACCEPT all -- anywhere anywhere > > > > ACCEPT icmp -- anywhere anywhere icmp > > > > any > > > > ACCEPT ipv6-crypt-- anywhere anywhere > > > > ACCEPT ipv6-auth-- anywhere anywhere > > > > ACCEPT udp -- anywhere 224.0.0.251 udp > > > > dpt:5353 > > > > ACCEPT udp -- anywhere anywhere udp > > > > dpt:ipp > > > > ACCEPT all -- anywhere anywhere state > > > > RELATED,ESTAB > > > > LISHED > > > > ACCEPT tcp -- anywhere anywhere state > > > > NEW tcp dpts: > > > > 30000:30101 > > > > ACCEPT tcp -- anywhere anywhere state > > > > NEW tcp dpt:s > > > > sh > > > > ACCEPT udp -- anywhere anywhere state > > > > NEW udp dpt:a > > > > fs3-callback > > > > REJECT all -- anywhere anywhere > > > > reject-with icmp-ho > > > > st-prohibited > > > > [root at x-mathr11 ~]# > > > > > > > > ************************************************************ > > > > one more thing.... > > > > Do you use TCP protocol? or do you use UDP? > > > > > > > > Regards Avi, > > > > P.S I think a beginning of a beautiful friendship.. :-) > > > > > > > > > > > > > > > > On Dec 24, 2007 5:29 PM, Aaron Knister < aaron at iges.org> wrote: > > > > > > > > > That sounds like quite a task! Could you show me the contents of > > > > > your > > > > > firewall rules on the systems mentioned below? (iptables -L) on > > > > > each. > > > > > That would help to diagnose the problem further. > > > > > > > > > > -Aaron > > > > > > > > > > On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote: > > > > > > > > > > > Hi Aaron and thank you for you fast answwers. > > > > > > We are working (Avi,Meny and me) on the israeli GRID and we need > > > > > to > > > > > > create a single huge file system for this GRID. > > > > > > cheers > > > > > > Yan > > > > > > > > > > > > ________________________________ > > > > > > > > > > > > From: Aaron Knister [mailto: aaron at iges.org] > > > > > > Sent: Sun 12/23/2007 8:27 PM > > > > > > To: Avi Gershon > > > > > > Cc: lustre-discuss at clusterfs.com ; Yan Benhammou; Meny Ben moshe > > > > > > Subject: Re: [Lustre-discuss] help needed. > > > > > > > > > > > > > > > > > > Can you check the firewall on each of those machines ( iptables > > > > > -L ) > > > > > > and paste that here. Also, is this network dedicated to Lustre? > > > > > > Lustre can easily saturate a network interface under load to the > > > > > > > > > > > point it becomes difficult to login to a node if it only has one > > > > > > interface. I''d recommend using a different interface if you can. > > > > > > > > > > > > > > > > > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: > > > > > > > > > > > > > > > > > > node 1 132.66.176.212 < http://132.66.176.212/> > > > > > > node 2 132.66.176.215 < http://132.66.176.215/> > > > > > > > > > > > > [root at x-math20 ~]# ssh 132.66.176.215 <http://132.66.176.215/> > > > > > > root at 132.66.176.215''s password: > > > > > > ssh(21957) Permission denied, please try again. > > > > > > root at 132.66.176.215 ''s password: > > > > > > Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il > > > > > <http://x-math20.tau.ac.il/ > > > > > > > > > > > > > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 > > > > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > > > > [root at x-mathr11 ~]# lctl list_nids > > > > > > 132.66.176.215 at tcp > > > > > > [root at x-mathr11 ~]# ssh 132.66.176.212 < > > > > > http://132.66.176.212/> > > > > > > The authenticity of host '' 132.66.176.212 < > > > > > http://132.66.176.212/> > > > > > > ( 132.66.176.212 <http://132.66.176.212/> )'' can''t be > > > > > established. > > > > > > RSA1 key fingerprint is > > > > > 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce: > > > > > > 7e:74. > > > > > > Are you sure you want to continue connecting (yes/no)? yes > > > > > > ssh(11526) Warning: Permanently added '' 132.66.176.212 <http://132.66.176.212/ > > > > > > > '' (RSA1) to the list of kno > > > > > > wn hosts. > > > > > > root at 132.66.176.212''s password: > > > > > > Last login: Sun Dec 23 15:24:41 2007 from > > > > > x-math20.tau.ac.il <http://x-math20.tau.ac.il/ > > > > > > > > > > > > > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 > > > > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > > > > [root at localhost ~]# lctl list_nids > > > > > > 132.66.176.212 at tcp > > > > > > [root at localhost ~]# > > > > > > > > > > > > > > > > > > thanks for helping!! > > > > > > Avi > > > > > > > > > > > > > > > > > > On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org> > > > > > wrote: > > > > > > > > > > > > > > > > > > On the oss can you ping the mds/mgs using this > > > > > command-- > > > > > > > > > > > > lctl ping 132.66.176.211 at tcp0 > > > > > > > > > > > > If it doesn''t ping, list the nids on each node by > > > > > running > > > > > > > > > > > > lctl list_nids > > > > > > > > > > > > and tell me what comes back. > > > > > > > > > > > > -Aaron > > > > > > > > > > > > > > > > > > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: > > > > > > > > > > > > > > > > > > HI I could use some help. > > > > > > I installed lustre on 3 computers > > > > > > mdt/mgs : > > > > > > > > > > > > > > > > > > > > > > > ************************************************************************************8 > > > > > > > > > > > [root at x-math20 ~]#mkfs.lustre --reformat > > > > > --fsname spfs --mdt -- > > > > > > mgs /dev/hdb > > > > > > > > > > > > Permanent disk data: > > > > > > Target: spfs-MDTffff > > > > > > Index: unassigned > > > > > > Lustre FS: spfs > > > > > > Mount type: ldiskfs > > > > > > Flags: 0x75 > > > > > > (MDT MGS needs_index > > > > > first_time update ) > > > > > > Persistent mount opts: > > > > > errors=remount-ro,iopen_nopriv,user_xattr > > > > > > Parameters: > > > > > > > > > > > > device size = 19092MB > > > > > > formatting backing filesystem ldiskfs on > > > > > /dev/hdb > > > > > > target name spfs-MDTffff > > > > > > 4k blocks 0 > > > > > > options -J size=400 -i 4096 > > > > > -I 512 -q -O dir_index > > > > > > -F > > > > > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L > > > > > spfs-MDTffff -J size=400 -i > > > > > > 4096 -I 512 -q -O dir_index -F /dev/hdb > > > > > > Writing CONFIGS/mountdata > > > > > > [ root at x-math20 ~]# df > > > > > > Filesystem 1K-blocks Used > > > > > Available Use% Mounted on > > > > > > /dev/hda1 19228276 4855244 > > > > > 13396284 27% / > > > > > > none 127432 0 > > > > > 127432 0% /dev/shm > > > > > > /dev/hdb 17105436 455152 > > > > > 15672728 3% /mnt/test/ > > > > > > mdt > > > > > > [root at x-math20 ~]# cat > > > > > /proc/fs/lustre/devices > > > > > > 0 UP mgs MGS MGS 5 > > > > > > 1 UP mgc MGC132.66.176.211 at tcp > > > > > > 5f5ba729-6412-3843-2229-1310a0b48f71 5 > > > > > > 2 UP mdt MDS MDS_uuid 3 > > > > > > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 > > > > > > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID > > > > > 3 > > > > > > [ root at x-math20 ~]# > > > > > > > > > > > *************************************************************end > > > > > > mdt******************************8 > > > > > > so you can see that the MGS is up > > > > > > ond on the ost''s I get an error!! plz > > > > > help... > > > > > > > > > > > > ost: > > > > > > > > > > > > > > > > > ********************************************************************** > > > > > > [ root at x-mathr11 ~]# mkfs.lustre--reformat --fsname spfs --ost -- > > > > > > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 > > > > > > > > > > > > Permanent disk data: > > > > > > Target: spfs-OSTffff > > > > > > Index: unassigned > > > > > > Lustre FS: spfs > > > > > > Mount type: ldiskfs > > > > > > Flags: 0x72 > > > > > > (OST needs_index first_time > > > > > update ) > > > > > > Persistent mount opts: > > > > > errors=remount-ro,extents,mballoc > > > > > > Parameters: mgsnode=132.66.176.211 at tcp > > > > > > > > > > > > device size = 19594MB > > > > > > formatting backing filesystem ldiskfs on > > > > > /dev/hdb1 > > > > > > target name spfs-OSTffff > > > > > > 4k blocks 0 > > > > > > options -J size=400 -i > > > > > 16384 -I 256 -q -O > > > > > > dir_index -F > > > > > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L > > > > > spfs-OSTffff -J size=400 -i > > > > > > 16384 -I 256 -q -O dir_index -F /dev/hdb1 > > > > > > Writing CONFIGS/mountdata > > > > > > [ root at x-mathr11 ~]# /CONFIGS/mountdata > > > > > > -bash: /CONFIGS/mountdata: No such file or > > > > > directory > > > > > > [root at x-mathr11 ~]# mount -t lustre > > > > > /dev/hdb1 /mnt/test/ost1 > > > > > > mount.lustre: mount /dev/hdb1 at > > > > > /mnt/test/ost1 failed: Input/ > > > > > > output error > > > > > > Is the MGS running? > > > > > > > > > > > ***********************************************end > > > > > > ost******************************** > > > > > > > > > > > > can any one point out the problem? > > > > > > thanks Avi. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Lustre-discuss mailing list > > > > > > Lustre-discuss at clusterfs.com > > > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > > > Associate Systems Administrator/Web Designer > > > > > > Center for Research on Environment and Water > > > > > > > > > > > > (301) 595-7001 > > > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > > > Associate Systems Administrator/Web Designer > > > > > > Center for Research on Environment and Water > > > > > > > > > > > > (301) 595-7001 > > > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > > Associate Systems Administrator/Web Designer > > > > > Center for Research on Environment and Water > > > > > > > > > > (301) 595-7001 > > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss at clusterfs.com > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > Aaron Knister > > > Associate Systems Analyst > > > Center for Ocean-Land-Atmosphere Studies > > > > > > (301) 595-7000 > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > Aaron Knister > > Associate Systems Analyst > > Center for Ocean-Land-Atmosphere Studies > > > > (301) 595-7000 > > aaron at iges.org > > > > > > > > > > > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080103/24408cdf/attachment-0002.html
SELinux is killing your lustre setup. See this article on how to disable it http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/selinux-guide/rhlcommon-section-0068.html#RHLCOMMON-SECTION-0094 a reboot will be required. That should the trick. -Aaron On Jan 3, 2008, at 4:22 AM, Avi Gershon wrote:> Hi, > dmesg: > *********************************************************************************************** > Lustre: 2092:0:(module.c:382:init_libcfs_module()) maximum lustre > stack 8192 > Lustre: OBD class driver, info at clusterfs.com > Lustre Version: 1.6.3 > Build Version: 1.6.3-19691231190000- > PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.9.lustre.linux$ > LustreError: 2092:0:( socklnd.c: > 2466:ksocknal_enumerate_interfaces()) Can''t find any usable interfaces > LustreError: 105-4: Error -100 starting up LNI tcp > LustreError: 2092:0:(events.c:654:ptlrpc_init_portals()) network > initialisation failed > LustreError: 2711:0:(acceptor.c:442:lnet_acceptor()) Error -11 > reading connection request from 132.66.17$ > LustreError: 2711:0:(acceptor.c:442:lnet_acceptor()) Error -11 > reading connection request from 132.66.17$ > audit(1197995576.670:57): avc: denied { rawip_send } for pid=2711 > comm="acceptor_988" saddr=132.66.17$ > audit(1197995672.933:58): avc: denied { rawip_recv } for > saddr=132.66.176.215 src=1023 daddr=132.66.1$ > audit(1197995673.143:59): avc: denied { rawip_recv } for > saddr=132.66.176.215 src=1023 daddr=132.66.1$ > audit(1197995673.563:60): avc: denied { rawip_recv } for saddr= > 132.66.176.215 src=1023 daddr=132.66.1$ > audit(1197995674.403:61): avc: denied { rawip_recv } for > saddr=132.66.176.215 src=1023 daddr=132.66.1$ > ******************************************************************************************************88 > getenforce: > > root at x-math20 ~]# getenforce > Enforcing > > thanks Avi > > > > On 1/2/08, Aaron Knister <aaron at iges.org> wrote: > Can you run dmesg and send me any lustre related errors? Also what''s > the output of "getenforce"? > > -Aaron > > On Jan 2, 2008, at 1:47 PM, Avi Gershon wrote: > >> no, that don''t work also :-( .. >> thanks for answering so fast >> Avi >> >> On 1/2/08, Aaron Knister <aaron at iges.org > wrote: >> That all looks ok. From x-math20 could you run "lctl ping >> 132.66.176.212 at tcp0"? >> >> On Jan 2, 2008, at 8:36 AM, Avi Gershon wrote: >> >>> Hi, I get this: >>> *************************************************************************** >>> [root at x-math20 ~]# lctl list_nids >>> 132.66.176.211 at tcp >>> [root at x-math20 ~]# ifconfig -a >>> eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF >>> inet addr:132.66.176.211 Bcast: 132.66.255.255 Mask: 255.255.0.0 >>> inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link >>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>> RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0 >>> TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:1000 >>> RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB) >>> >>> lo Link encap:Local Loopback >>> inet addr: 127.0.0.1 Mask: 255.0.0.0 >>> inet6 addr: ::1/128 Scope:Host >>> UP LOOPBACK RUNNING MTU:16436 Metric:1 >>> RX packets:8180 errors:0 dropped:0 overruns:0 frame:0 >>> TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:0 >>> RX bytes:3335243 (3.1 MiB) TX bytes:3335243 ( 3.1 MiB) >>> >>> sit0 Link encap:IPv6-in-IPv4 >>> NOARP MTU:1480 Metric:1 >>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:0 >>> RX bytes:0 ( 0.0 b) TX bytes:0 (0.0 b) >>> >>> [root at x-math20 ~]# cat /etc/modprobe.conf >>> alias eth0 e100 >>> alias usb-controller uhci-hcd >>> alias scsi_hostadapter ata_piix >>> alias lustre llite >>> options lnet networks=tcp0 >>> [ root at x-math20 ~]# >>> >>> ***********************************************************************************************************8 >>> >>> On 1/2/08, Aaron Knister <aaron at iges.org> wrote: >>> On the host x-math20 could you run an "lctl list_nids" and also an >>> "ifconfig -a". I want to see if lnet is listening on the correct >>> interface. Oh could you also post the contents of your /etc/ >>> modprobe.conf. >>> >>> Thanks! >>> >>> -Aaron >>> >>> On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote: >>> >>>> Hello to every one and happy new year.. >>>> I think I have reduce my problem to this: lctl ping >>>> 132.66.176.211 at tcp0 don''t work for me for some strange reason >>>> as you can see: >>>> *********************************************************************************** >>>> [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0 >>>> failed to ping 132.66.176.211 at tcp: Input/output error >>>> [root at x-math20 ~]# ping 132.66.176.211 >>>> PING 132.66.176.211 ( 132.66.176.211) 56(84) bytes of data. >>>> 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms >>>> 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms >>>> 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m >>>> --- 132.66.176.211 ping statistics --- >>>> 3 packets transmitted, 3 received, 0% packet loss, time 2018ms >>>> rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2 >>>> [root at x-math20 ~]# >>>> ***************************************************************************************** >>>> >>>> >>>> On 12/24/07, Avi Gershon <gershonavi at gmail.com > wrote: >>>> Hi, >>>> here is the "iptables -L " results: >>>> >>>> NODE 1 132.66.176.212 >>>> Scientific Linux CERN SLC release 4.6 (Beryllium) >>>> root at 132.66.176.212''s password: >>>> Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il >>>> [root at localhost ~]# >>>> [root at localhost ~]# >>>> [root at localhost ~]# iptables -L >>>> Chain INPUT (policy ACCEPT) >>>> target prot opt source destination >>>> Chain FORWARD (policy ACCEPT) >>>> target prot opt source destination >>>> >>>> Chain OUTPUT (policy ACCEPT) >>>> target prot opt source destination >>>> ************************************************************************************************ >>>> MDT 132.66.176.211 >>>> >>>> Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il >>>> [root at x-math20 ~]# iptables -L >>>> Chain INPUT (policy ACCEPT) >>>> target prot opt source destination >>>> Chain FORWARD (policy ACCEPT) >>>> target prot opt source destination >>>> >>>> Chain OUTPUT (policy ACCEPT) >>>> target prot opt source destination >>>> ************************************************************************* >>>> >>>> NODE 2 132.66.176.215 >>>> Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il >>>> [root at x-mathr11 ~]# iptables -L >>>> >>>> Chain INPUT (policy ACCEPT) >>>> target prot opt source destination >>>> RH-Firewall-1-INPUT all -- anywhere anywhere >>>> Chain FORWARD (policy ACCEPT) >>>> target prot opt source destination >>>> RH-Firewall-1-INPUT all -- anywhere anywhere >>>> >>>> Chain OUTPUT (policy ACCEPT) >>>> target prot opt source destination >>>> >>>> Chain RH-Firewall-1-INPUT (2 references) >>>> target prot opt source destination >>>> ACCEPT all -- anywhere anywhere >>>> ACCEPT icmp -- anywhere anywhere icmp >>>> any >>>> ACCEPT ipv6-crypt-- anywhere anywhere >>>> ACCEPT ipv6-auth-- anywhere anywhere >>>> ACCEPT udp -- anywhere 224.0.0.251 udp >>>> dpt:5353 >>>> ACCEPT udp -- anywhere anywhere udp >>>> dpt:ipp >>>> ACCEPT all -- anywhere anywhere >>>> state RELATED,ESTAB >>>> LISHED >>>> ACCEPT tcp -- anywhere anywhere >>>> state NEW tcp dpts: >>>> 30000:30101 >>>> ACCEPT tcp -- anywhere anywhere >>>> state NEW tcp dpt:s >>>> sh >>>> ACCEPT udp -- anywhere anywhere >>>> state NEW udp dpt:a >>>> fs3-callback >>>> REJECT all -- anywhere anywhere >>>> reject-with icmp-ho >>>> st-prohibited >>>> [root at x-mathr11 ~]# >>>> >>>> ************************************************************ >>>> one more thing.... >>>> Do you use TCP protocol? or do you use UDP? >>>> >>>> Regards Avi, >>>> P.S I think a beginning of a beautiful friendship.. :-) >>>> >>>> >>>> >>>> On Dec 24, 2007 5:29 PM, Aaron Knister < aaron at iges.org> wrote: >>>> That sounds like quite a task! Could you show me the contents of >>>> your >>>> firewall rules on the systems mentioned below? (iptables -L) on >>>> each. >>>> That would help to diagnose the problem further. >>>> >>>> -Aaron >>>> >>>> On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote: >>>> >>>> > Hi Aaron and thank you for you fast answwers. >>>> > We are working (Avi,Meny and me) on the israeli GRID and we >>>> need to >>>> > create a single huge file system for this GRID. >>>> > cheers >>>> > Yan >>>> > >>>> > ________________________________ >>>> > >>>> > From: Aaron Knister [mailto: aaron at iges.org] >>>> > Sent: Sun 12/23/2007 8:27 PM >>>> > To: Avi Gershon >>>> > Cc: lustre-discuss at clusterfs.com ; Yan Benhammou; Meny Ben moshe >>>> > Subject: Re: [Lustre-discuss] help needed. >>>> > >>>> > >>>> > Can you check the firewall on each of those machines ( iptables >>>> -L ) >>>> > and paste that here. Also, is this network dedicated to Lustre? >>>> > Lustre can easily saturate a network interface under load to the >>>> > point it becomes difficult to login to a node if it only has one >>>> > interface. I''d recommend using a different interface if you can. >>>> > >>>> > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: >>>> > >>>> > >>>> > node 1 132.66.176.212 < http://132.66.176.212/> >>>> > node 2 132.66.176.215 < http://132.66.176.215/> >>>> > >>>> > [root at x-math20 ~]# ssh 132.66.176.215 < http://132.66.176.215/ >>>> > >>>> > root at 132.66.176.215''s password: >>>> > ssh(21957) Permission denied, please try again. >>>> > root at 132.66.176.215 ''s password: >>>> > Last login: Sun Dec 23 14:32:51 2007 from x- >>>> math20.tau.ac.il <http://x-math20.tau.ac.il/ >>>> > > >>>> > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 >>>> > failed to ping 132.66.176.211 at tcp: Input/output error >>>> > [root at x-mathr11 ~]# lctl list_nids >>>> > 132.66.176.215 at tcp >>>> > [root at x-mathr11 ~]# ssh 132.66.176.212 <http://132.66.176.212/ >>>> > >>>> > The authenticity of host '' 132.66.176.212 <http://132.66.176.212/ >>>> > >>>> > ( 132.66.176.212 <http://132.66.176.212/> )'' can''t be >>>> established. >>>> > RSA1 key fingerprint is >>>> 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce: >>>> > 7e:74. >>>> > Are you sure you want to continue connecting (yes/no)? yes >>>> > ssh(11526) Warning: Permanently added '' 132.66.176.212 < http://132.66.176.212/ >>>> > > '' (RSA1) to the list of kno >>>> > wn hosts. >>>> > root at 132.66.176.212''s password: >>>> > Last login: Sun Dec 23 15:24:41 2007 from x- >>>> math20.tau.ac.il <http://x-math20.tau.ac.il/ >>>> > > >>>> > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 >>>> > failed to ping 132.66.176.211 at tcp: Input/output error >>>> > [root at localhost ~]# lctl list_nids >>>> > 132.66.176.212 at tcp >>>> > [root at localhost ~]# >>>> > >>>> > >>>> > thanks for helping!! >>>> > Avi >>>> > >>>> > >>>> > On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org> >>>> wrote: >>>> > >>>> > >>>> > On the oss can you ping the mds/mgs using this >>>> command-- >>>> > >>>> > lctl ping 132.66.176.211 at tcp0 >>>> > >>>> > If it doesn''t ping, list the nids on each node by >>>> running >>>> > >>>> > lctl list_nids >>>> > >>>> > and tell me what comes back. >>>> > >>>> > -Aaron >>>> > >>>> > >>>> > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: >>>> > >>>> > >>>> > HI I could use some help. >>>> > I installed lustre on 3 computers >>>> > mdt/mgs : >>>> > >>>> > >>>> > >>>> ************************************************************************************8 >>>> > [root at x-math20 ~]#mkfs.lustre --reformat >>>> --fsname spfs --mdt -- >>>> > mgs /dev/hdb >>>> > >>>> > Permanent disk data: >>>> > Target: spfs-MDTffff >>>> > Index: unassigned >>>> > Lustre FS: spfs >>>> > Mount type: ldiskfs >>>> > Flags: 0x75 >>>> > (MDT MGS needs_index >>>> first_time update ) >>>> > Persistent mount opts: errors=remount- >>>> ro,iopen_nopriv,user_xattr >>>> > Parameters: >>>> > >>>> > device size = 19092MB >>>> > formatting backing filesystem ldiskfs on / >>>> dev/hdb >>>> > target name spfs-MDTffff >>>> > 4k blocks 0 >>>> > options -J size=400 -i >>>> 4096 -I 512 -q -O dir_index >>>> > -F >>>> > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- >>>> MDTffff -J size=400 -i >>>> > 4096 -I 512 -q -O dir_index -F /dev/hdb >>>> > Writing CONFIGS/mountdata >>>> > [ root at x-math20 ~]# df >>>> > Filesystem 1K-blocks Used >>>> Available Use% Mounted on >>>> > /dev/hda1 19228276 4855244 >>>> 13396284 27% / >>>> > none 127432 >>>> 0 127432 0% /dev/shm >>>> > /dev/hdb 17105436 455152 >>>> 15672728 3% /mnt/test/ >>>> > mdt >>>> > [root at x-math20 ~]# cat /proc/fs/lustre/ >>>> devices >>>> > 0 UP mgs MGS MGS 5 >>>> > 1 UP mgc MGC132.66.176.211 at tcp >>>> > 5f5ba729-6412-3843-2229-1310a0b48f71 5 >>>> > 2 UP mdt MDS MDS_uuid 3 >>>> > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 >>>> > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 >>>> > [ root at x-math20 ~]# >>>> > >>>> *************************************************************end >>>> > mdt******************************8 >>>> > so you can see that the MGS is up >>>> > ond on the ost''s I get an error!! plz >>>> help... >>>> > >>>> > ost: >>>> > >>>> > >>>> ********************************************************************** >>>> > [ root at x-mathr11 ~]# mkfs.lustre -- >>>> reformat --fsname spfs --ost -- >>>> > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 >>>> > >>>> > Permanent disk data: >>>> > Target: spfs-OSTffff >>>> > Index: unassigned >>>> > Lustre FS: spfs >>>> > Mount type: ldiskfs >>>> > Flags: 0x72 >>>> > (OST needs_index first_time >>>> update ) >>>> > Persistent mount opts: errors=remount- >>>> ro,extents,mballoc >>>> > Parameters: mgsnode=132.66.176.211 at tcp >>>> > >>>> > device size = 19594MB >>>> > formatting backing filesystem ldiskfs on / >>>> dev/hdb1 >>>> > target name spfs-OSTffff >>>> > 4k blocks 0 >>>> > options -J size=400 -i >>>> 16384 -I 256 -q -O >>>> > dir_index -F >>>> > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- >>>> OSTffff -J size=400 -i >>>> > 16384 -I 256 -q -O dir_index -F /dev/hdb1 >>>> > Writing CONFIGS/mountdata >>>> > [ root at x-mathr11 ~]# /CONFIGS/mountdata >>>> > -bash: /CONFIGS/mountdata: No such file >>>> or directory >>>> > [root at x-mathr11 ~]# mount -t lustre /dev/ >>>> hdb1 /mnt/test/ost1 >>>> > mount.lustre: mount /dev/hdb1 at /mnt/ >>>> test/ost1 failed: Input/ >>>> > output error >>>> > Is the MGS running? >>>> > >>>> ***********************************************end >>>> > ost******************************** >>>> > >>>> > can any one point out the problem? >>>> > thanks Avi. >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ >>>> > Lustre-discuss mailing list >>>> > Lustre-discuss at clusterfs.com >>>> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > Aaron Knister >>>> > Associate Systems Administrator/Web Designer >>>> > Center for Research on Environment and Water >>>> > >>>> > (301) 595-7001 >>>> > aaron at iges.org >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > Aaron Knister >>>> > Associate Systems Administrator/Web Designer >>>> > Center for Research on Environment and Water >>>> > >>>> > (301) 595-7001 >>>> > aaron at iges.org >>>> > >>>> > >>>> > >>>> >>>> Aaron Knister >>>> Associate Systems Administrator/Web Designer >>>> Center for Research on Environment and Water >>>> >>>> (301) 595-7001 >>>> aaron at iges.org >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at clusterfs.com >>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> >>> Aaron Knister >>> Associate Systems Analyst >>> Center for Ocean-Land-Atmosphere Studies >>> >>> (301) 595-7000 >>> aaron at iges.org >>> >>> >>> >>> >>> >> >> Aaron Knister >> Associate Systems Analyst >> Center for Ocean-Land-Atmosphere Studies >> >> (301) 595-7000 >> aaron at iges.org >> >> >> >> >> > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > > >Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080103/ae27151c/attachment-0002.html
Hi, well I gust wonted to say thanks !! the command I used was : setenforce 0 and I can do ltcl ping! I gonna try my lustre now! (cross fingers) Regards Avi On 1/3/08, Aaron Knister <aaron at iges.org> wrote:> > SELinux is killing your lustre setup. See this article on how to disable > it > http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/selinux-guide/rhlcommon-section-0068.html#RHLCOMMON-SECTION-0094 > > a reboot will be required. That should the trick. > > -Aaron > > On Jan 3, 2008, at 4:22 AM, Avi Gershon wrote: > > Hi, > dmesg: > > *********************************************************************************************** > Lustre: 2092:0:(module.c:382:init_libcfs_module()) maximum lustre stack > 8192 > Lustre: OBD class driver, info at clusterfs.com > Lustre Version: 1.6.3 > Build Version: > 1.6.3-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.9.lustre.linux$ > LustreError: 2092:0:( socklnd.c:2466:ksocknal_enumerate_interfaces()) > Can''t find any usable interfaces > LustreError: 105-4: Error -100 starting up LNI tcp > LustreError: 2092:0:(events.c:654:ptlrpc_init_portals()) network > initialisation failed > LustreError: 2711:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading > connection request from 132.66.17$ > LustreError: 2711:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading > connection request from 132.66.17$ > audit(1197995576.670:57): avc: denied { rawip_send } for pid=2711 > comm="acceptor_988" saddr=132.66.17$ > audit(1197995672.933:58): avc: denied { rawip_recv } for saddr=132.66.176.215 > src=1023 daddr=132.66.1$ > audit(1197995673.143:59): avc: denied { rawip_recv } for saddr> 132.66.176.215 src=1023 daddr=132.66.1$ > audit(1197995673.563:60): avc: denied { rawip_recv } for saddr> 132.66.176.215 src=1023 daddr=132.66.1$ > audit(1197995674.403:61): avc: denied { rawip_recv } for saddr> 132.66.176.215 src=1023 daddr=132.66.1$ > > ******************************************************************************************************88 > getenforce: > > root at x-math20 ~]# getenforce > Enforcing > > thanks Avi > > > > On 1/2/08, Aaron Knister <aaron at iges.org> wrote: > > > > Can you run dmesg and send me any lustre related errors? Also what''s the > > output of "getenforce"? > > > > -Aaron > > > > On Jan 2, 2008, at 1:47 PM, Avi Gershon wrote: > > > > no, that don''t work also :-( .. > > thanks for answering so fast > > Avi > > > > On 1/2/08, Aaron Knister <aaron at iges.org > wrote: > > > > > > That all looks ok. From x-math20 could you run "lctl ping > > > 132.66.176.212 at tcp0"? > > > > > > On Jan 2, 2008, at 8:36 AM, Avi Gershon wrote: > > > > > > *Hi, I get this:* > > > > > > *************************************************************************** > > > [root at x-math20 ~]# lctl list_nids > > > 132.66.176.211 at tcp > > > [root at x-math20 ~]# ifconfig -a > > > eth0 Link encap:Ethernet HWaddr 00:02:B3:2D:A6:BF > > > inet addr:132.66.176.211 Bcast: 132.66.255.255 Mask: 255.255.0.0 > > > inet6 addr: fe80::202:b3ff:fe2d:a6bf/64 Scope:Link > > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > RX packets:9448397 errors:0 dropped:0 overruns:0 frame:0 > > > TX packets:194259 errors:0 dropped:0 overruns:0 carrier:0 > > > collisions:0 txqueuelen:1000 > > > RX bytes:1171910501 (1.0 GiB) TX bytes:40500450 (38.6 MiB) > > > > > > lo Link encap:Local Loopback > > > inet addr: 127.0.0.1 Mask: 255.0.0.0 > > > inet6 addr: ::1/128 Scope:Host > > > UP LOOPBACK RUNNING MTU:16436 Metric:1 > > > RX packets:8180 errors:0 dropped:0 overruns:0 frame:0 > > > TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0 > > > collisions:0 txqueuelen:0 > > > RX bytes:3335243 (3.1 MiB) TX bytes:3335243 ( 3.1 MiB) > > > > > > sit0 Link encap:IPv6-in-IPv4 > > > NOARP MTU:1480 Metric:1 > > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > > > collisions:0 txqueuelen:0 > > > RX bytes:0 ( 0.0 b) TX bytes:0 (0.0 b) > > > > > > [root at x-math20 ~]# cat /etc/modprobe.conf > > > alias eth0 e100 > > > alias usb-controller uhci-hcd > > > alias scsi_hostadapter ata_piix > > > alias lustre llite > > > options lnet networks=tcp0 > > > [ root at x-math20 ~]# > > > > > > ***********************************************************************************************************8 > > > > > > > > > On 1/2/08, Aaron Knister <aaron at iges.org> wrote: > > > > > > > > On the host x-math20 could you run an "lctl list_nids" and also an > > > > "ifconfig -a". I want to see if lnet is listening on the correct interface. > > > > Oh could you also post the contents of your /etc/modprobe.conf. > > > > > > > > Thanks! > > > > > > > > -Aaron > > > > > > > > On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote: > > > > > > > > Hello to every one and happy new year.. > > > > I think I have reduce my problem to this: lctl ping > > > > 132.66.176.211 at tcp0 don''t work for me for some strange reason > > > > as you can see: > > > > *********************************************************************************** > > > > > > > > [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0 > > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > > [root at x-math20 ~]# ping 132.66.176.211 > > > > PING 132.66.176.211 ( 132.66.176.211) 56(84) bytes of data. > > > > 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms > > > > 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms > > > > 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m > > > > --- 132.66.176.211 ping statistics --- > > > > 3 packets transmitted, 3 received, 0% packet loss, time 2018ms > > > > rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2 > > > > [root at x-math20 ~]# > > > > ***************************************************************************************** > > > > > > > > > > > > > > > > On 12/24/07, Avi Gershon <gershonavi at gmail.com > wrote: > > > > > > > > > > Hi, > > > > > here is the "iptables -L " results: > > > > > > > > > > NODE 1 132.66.176.212 <root at 132.66.176.212> > > > > > Scientific Linux CERN SLC release 4.6 (Beryllium) > > > > > root at 132.66.176.212''s password: > > > > > Last login: Sun Dec 23 22:01:18 2007 from x-fishelov.tau.ac.il > > > > > [root at localhost ~]# > > > > > [root at localhost ~]# > > > > > [root at localhost ~]# iptables -L > > > > > Chain INPUT (policy ACCEPT) > > > > > target prot opt source destination > > > > > > > > > > Chain FORWARD (policy ACCEPT) > > > > > target prot opt source destination > > > > > Chain OUTPUT (policy ACCEPT) > > > > > target prot opt source destination > > > > > ************************************************************************************************ > > > > > > > > > > MDT 132.66.176.211 > > > > > > > > > > Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il > > > > > [root at x-math20 ~]# iptables -L > > > > > Chain INPUT (policy ACCEPT) > > > > > target prot opt source destination > > > > > > > > > > Chain FORWARD (policy ACCEPT) > > > > > target prot opt source destination > > > > > Chain OUTPUT (policy ACCEPT) > > > > > target prot opt source destination > > > > > > > > > > ************************************************************************* > > > > > > > > > > NODE 2 132.66.176.215 <root at 132.66.176.215> > > > > > Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il > > > > > [root at x-mathr11 ~]# iptables -L > > > > > Chain INPUT (policy ACCEPT) > > > > > target prot opt source destination > > > > > RH-Firewall-1-INPUT all -- anywhere anywhere > > > > > > > > > > Chain FORWARD (policy ACCEPT) > > > > > target prot opt source destination > > > > > RH-Firewall-1-INPUT all -- anywhere anywhere > > > > > > > > > > Chain OUTPUT (policy ACCEPT) > > > > > target prot opt source destination > > > > > Chain RH-Firewall-1-INPUT (2 references) > > > > > target prot opt source destination > > > > > ACCEPT all -- anywhere anywhere > > > > > ACCEPT icmp -- anywhere anywhere icmp > > > > > any > > > > > ACCEPT ipv6-crypt-- anywhere anywhere > > > > > ACCEPT ipv6-auth-- anywhere anywhere > > > > > ACCEPT udp -- anywhere 224.0.0.251 udp > > > > > dpt:5353 > > > > > ACCEPT udp -- anywhere anywhere udp > > > > > dpt:ipp > > > > > ACCEPT all -- anywhere anywhere state > > > > > RELATED,ESTAB > > > > > LISHED > > > > > ACCEPT tcp -- anywhere anywhere state > > > > > NEW tcp dpts: > > > > > 30000:30101 > > > > > ACCEPT tcp -- anywhere anywhere state > > > > > NEW tcp dpt:s > > > > > sh > > > > > ACCEPT udp -- anywhere anywhere state > > > > > NEW udp dpt:a > > > > > fs3-callback > > > > > REJECT all -- anywhere anywhere > > > > > reject-with icmp-ho > > > > > st-prohibited > > > > > [root at x-mathr11 ~]# > > > > > > > > > > ************************************************************ > > > > > one more thing.... > > > > > Do you use TCP protocol? or do you use UDP? > > > > > > > > > > Regards Avi, > > > > > P.S I think a beginning of a beautiful friendship.. :-) > > > > > > > > > > > > > > > > > > > > On Dec 24, 2007 5:29 PM, Aaron Knister < aaron at iges.org> wrote: > > > > > > > > > > > That sounds like quite a task! Could you show me the contents of > > > > > > your > > > > > > firewall rules on the systems mentioned below? (iptables -L) on > > > > > > each. > > > > > > That would help to diagnose the problem further. > > > > > > > > > > > > -Aaron > > > > > > > > > > > > On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote: > > > > > > > > > > > > > Hi Aaron and thank you for you fast answwers. > > > > > > > We are working (Avi,Meny and me) on the israeli GRID and we > > > > > > need to > > > > > > > create a single huge file system for this GRID. > > > > > > > cheers > > > > > > > Yan > > > > > > > > > > > > > > ________________________________ > > > > > > > > > > > > > > From: Aaron Knister [mailto: aaron at iges.org] > > > > > > > Sent: Sun 12/23/2007 8:27 PM > > > > > > > To: Avi Gershon > > > > > > > Cc: lustre-discuss at clusterfs.com ; Yan Benhammou; Meny Ben > > > > > > moshe > > > > > > > Subject: Re: [Lustre-discuss] help needed. > > > > > > > > > > > > > > > > > > > > > Can you check the firewall on each of those machines ( > > > > > > iptables -L ) > > > > > > > and paste that here. Also, is this network dedicated to > > > > > > Lustre? > > > > > > > Lustre can easily saturate a network interface under load to > > > > > > the > > > > > > > point it becomes difficult to login to a node if it only has > > > > > > one > > > > > > > interface. I''d recommend using a different interface if you > > > > > > can. > > > > > > > > > > > > > > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: > > > > > > > > > > > > > > > > > > > > > node 1 132.66.176.212 < http://132.66.176.212/> > > > > > > > node 2 132.66.176.215 < http://132.66.176.215/> > > > > > > > > > > > > > > [root at x-math20 ~]# ssh 132.66.176.215 <http://132.66.176.215/> > > > > > > > root at 132.66.176.215''s password: > > > > > > > ssh(21957) Permission denied, please try again. > > > > > > > root at 132.66.176.215 ''s password: > > > > > > > Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il > > > > > > <http://x-math20.tau.ac.il/ > > > > > > > > > > > > > > > [root at x-mathr11 ~]# lctl ping 132.66.176.211 at tcp0 > > > > > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > > > > > [root at x-mathr11 ~]# lctl list_nids > > > > > > > 132.66.176.215 at tcp > > > > > > > [root at x-mathr11 ~]# ssh 132.66.176.212 < > > > > > > http://132.66.176.212/> > > > > > > > The authenticity of host '' 132.66.176.212 < > > > > > > http://132.66.176.212/> > > > > > > > ( 132.66.176.212 <http://132.66.176.212/> )'' can''t be > > > > > > established. > > > > > > > RSA1 key fingerprint is > > > > > > 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce: > > > > > > > 7e:74. > > > > > > > Are you sure you want to continue connecting (yes/no)? > > > > > > yes > > > > > > > ssh(11526) Warning: Permanently added '' 132.66.176.212 <http://132.66.176.212/ > > > > > > > > '' (RSA1) to the list of kno > > > > > > > wn hosts. > > > > > > > root at 132.66.176.212''s password: > > > > > > > Last login: Sun Dec 23 15:24:41 2007 from > > > > > > x-math20.tau.ac.il <http://x-math20.tau.ac.il/ > > > > > > > > > > > > > > > [root at localhost ~]# lctl ping 132.66.176.211 at tcp0 > > > > > > > failed to ping 132.66.176.211 at tcp: Input/output error > > > > > > > [root at localhost ~]# lctl list_nids > > > > > > > 132.66.176.212 at tcp > > > > > > > [root at localhost ~]# > > > > > > > > > > > > > > > > > > > > > thanks for helping!! > > > > > > > Avi > > > > > > > > > > > > > > > > > > > > > On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > On the oss can you ping the mds/mgs using this > > > > > > command-- > > > > > > > > > > > > > > lctl ping 132.66.176.211 at tcp0 > > > > > > > > > > > > > > If it doesn''t ping, list the nids on each node > > > > > > by running > > > > > > > > > > > > > > lctl list_nids > > > > > > > > > > > > > > and tell me what comes back. > > > > > > > > > > > > > > -Aaron > > > > > > > > > > > > > > > > > > > > > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: > > > > > > > > > > > > > > > > > > > > > HI I could use some help. > > > > > > > I installed lustre on 3 computers > > > > > > > mdt/mgs : > > > > > > > > > > > > > > > > > > > > > > > > > > > ************************************************************************************8 > > > > > > > > > > > > > [root at x-math20 ~]#mkfs.lustre --reformat > > > > > > --fsname spfs --mdt -- > > > > > > > mgs /dev/hdb > > > > > > > > > > > > > > Permanent disk data: > > > > > > > Target: spfs-MDTffff > > > > > > > Index: unassigned > > > > > > > Lustre FS: spfs > > > > > > > Mount type: ldiskfs > > > > > > > Flags: 0x75 > > > > > > > (MDT MGS needs_index > > > > > > first_time update ) > > > > > > > Persistent mount opts: > > > > > > errors=remount-ro,iopen_nopriv,user_xattr > > > > > > > Parameters: > > > > > > > > > > > > > > device size = 19092MB > > > > > > > formatting backing filesystem ldiskfs on > > > > > > /dev/hdb > > > > > > > target name spfs-MDTffff > > > > > > > 4k blocks 0 > > > > > > > options -J size=400 -i > > > > > > 4096 -I 512 -q -O dir_index > > > > > > > -F > > > > > > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L > > > > > > spfs-MDTffff -J size=400 -i > > > > > > > 4096 -I 512 -q -O dir_index -F /dev/hdb > > > > > > > Writing CONFIGS/mountdata > > > > > > > [ root at x-math20 ~]# df > > > > > > > Filesystem 1K-blocks Used > > > > > > Available Use% Mounted on > > > > > > > /dev/hda1 19228276 4855244 > > > > > > 13396284 27% / > > > > > > > none 127432 0 > > > > > > 127432 0% /dev/shm > > > > > > > /dev/hdb 17105436 455152 > > > > > > 15672728 3% /mnt/test/ > > > > > > > mdt > > > > > > > [root at x-math20 ~]# cat > > > > > > /proc/fs/lustre/devices > > > > > > > 0 UP mgs MGS MGS 5 > > > > > > > 1 UP mgc MGC132.66.176.211 at tcp > > > > > > > 5f5ba729-6412-3843-2229-1310a0b48f71 5 > > > > > > > 2 UP mdt MDS MDS_uuid 3 > > > > > > > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID > > > > > > 4 > > > > > > > 4 UP mds spfs-MDT0000 > > > > > > spfs-MDT0000_UUID 3 > > > > > > > [ root at x-math20 ~]# > > > > > > > > > > > > > *************************************************************end > > > > > > > mdt******************************8 > > > > > > > so you can see that the MGS is up > > > > > > > ond on the ost''s I get an error!! plz > > > > > > help... > > > > > > > > > > > > > > ost: > > > > > > > > > > > > > > > > > > > > ********************************************************************** > > > > > > > [ root at x-mathr11 ~]# mkfs.lustre--reformat --fsname spfs --ost -- > > > > > > > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1 > > > > > > > > > > > > > > Permanent disk data: > > > > > > > Target: spfs-OSTffff > > > > > > > Index: unassigned > > > > > > > Lustre FS: spfs > > > > > > > Mount type: ldiskfs > > > > > > > Flags: 0x72 > > > > > > > (OST needs_index > > > > > > first_time update ) > > > > > > > Persistent mount opts: > > > > > > errors=remount-ro,extents,mballoc > > > > > > > Parameters: mgsnode=132.66.176.211 at tcp > > > > > > > > > > > > > > device size = 19594MB > > > > > > > formatting backing filesystem ldiskfs on > > > > > > /dev/hdb1 > > > > > > > target name spfs-OSTffff > > > > > > > 4k blocks 0 > > > > > > > options -J size=400 -i > > > > > > 16384 -I 256 -q -O > > > > > > > dir_index -F > > > > > > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L > > > > > > spfs-OSTffff -J size=400 -i > > > > > > > 16384 -I 256 -q -O dir_index -F /dev/hdb1 > > > > > > > Writing CONFIGS/mountdata > > > > > > > [ root at x-mathr11 ~]# /CONFIGS/mountdata > > > > > > > -bash: /CONFIGS/mountdata: No such file > > > > > > or directory > > > > > > > [root at x-mathr11 ~]# mount -t lustre > > > > > > /dev/hdb1 /mnt/test/ost1 > > > > > > > mount.lustre: mount /dev/hdb1 at > > > > > > /mnt/test/ost1 failed: Input/ > > > > > > > output error > > > > > > > Is the MGS running? > > > > > > > > > > > > > ***********************************************end > > > > > > > ost******************************** > > > > > > > > > > > > > > can any one point out the problem? > > > > > > > thanks Avi. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Lustre-discuss mailing list > > > > > > > Lustre-discuss at clusterfs.com > > > > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > > > > Associate Systems Administrator/Web Designer > > > > > > > Center for Research on Environment and Water > > > > > > > > > > > > > > (301) 595-7001 > > > > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > > > > Associate Systems Administrator/Web Designer > > > > > > > Center for Research on Environment and Water > > > > > > > > > > > > > > (301) 595-7001 > > > > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > > > > Associate Systems Administrator/Web Designer > > > > > > Center for Research on Environment and Water > > > > > > > > > > > > (301) 595-7001 > > > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Lustre-discuss mailing list > > > > Lustre-discuss at clusterfs.com > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > > > > Aaron Knister > > > > Associate Systems Analyst > > > > Center for Ocean-Land-Atmosphere Studies > > > > > > > > (301) 595-7000 > > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > > > > > > > > Aaron Knister > > > Associate Systems Analyst > > > Center for Ocean-Land-Atmosphere Studies > > > > > > (301) 595-7000 > > > aaron at iges.org > > > > > > > > > > > > > > > > > > > Aaron Knister > > Associate Systems Analyst > > Center for Ocean-Land-Atmosphere Studies > > > > (301) 595-7000 > > aaron at iges.org > > > > > > > > > > > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080106/aea9854b/attachment-0002.html