Yujun Wu
2008-Jun-19 00:04 UTC
[Lustre-discuss] Help with problem mounting Lustre from another network
Hello, Could somebody please give me some hint on this? This is my first trying with Lustre. I installed everything on a single node following the Lustre quick start: http://wiki.lustre.org/index.php?title=Lustre_Quick_Start with the new version 1.6.5. When I mounted the client on the same node, everything works fine. Later, I tried to mount the client from a seperate network. I got the following error:>mount -t lustre 128.227.89.181 at tcp:/testfs /mnt/testfsmount.lustre: mount 128.227.89.181 at tcp:/testfs at /mnt/testfs failed: Cannot send after transport endpoint shutdown The error message from the server side is: Jun 18 19:52:04 olivine kernel: LustreError: 5682:0:(socklnd_cb.c:2166:ksocknal_recv_hello()) Error -11 reading HELLO from 128.227.221.35 Jun 18 19:52:04 olivine kernel: audit(1213833124.489:54): avc: denied { rawip_send } for pid=5682 comm="socknal_cd02" saddr=128.227.89.181 src=988 daddr=128.227.221.35 dest=1023 netif=eth0 scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_eth0_t tclass=netif This is the result using lctl:>lctl ping 128.227.89.181 at tcpfailed to ping 128.227.89.181 at tcp: Input/output error This is the related configuration of /etc/modprobe.conf from both client node and server node (MDS+OSTs): # Networking options, see /sys/module/lnet/parameters options lnet networks=tcp # (the llite module has been renamed to lustre) # end Lustre modules The server''s ip address is: 128.227.89.181 The client''s ip address is: 128.227.221.35 They are on two different network. Thanks in advance for any help you give. Regards, Yujun
Brian J. Murrell
2008-Jun-19 03:12 UTC
[Lustre-discuss] Help with problem mounting Lustre from another network
On Wed, 2008-06-18 at 20:04 -0400, Yujun Wu wrote:> Hello,Hi,> The error message from the server side is: > > Jun 18 19:52:04 olivine > kernel: LustreError: 5682:0:(socklnd_cb.c:2166:ksocknal_recv_hello()) Error > -11 reading HELLO from 128.227.221.35 > Jun 18 19:52:04 olivine kernel: audit(1213833124.489:54): avc: denied { > rawip_send } for pid=5682 comm="socknal_cd02" saddr=128.227.89.181 > src=988 daddr=128.227.221.35 dest=1023 netif=eth0 > scontext=system_u:object_r:unlabeled_t > tcontext=system_u:object_r:netif_eth0_t tclass=netifI am sooooo glad you included this kernel "audit" message. You need to disable selinux or apparmor or whatever MAC/RBAC tools you are running on your Lustre machines. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080618/dab98c82/attachment.bin
Yujun Wu
2008-Jun-19 19:36 UTC
[Lustre-discuss] Help with problem mounting Lustre from another network
Hello Brian, Thanks for your info. Yes, after disabling selinux and adding accept option for lnet (a tip from a local colleague), everything works fine. Thanks again for your help. Regards, Yujun>On Thu, 19 Jun 2008lustre-discuss-request at lists.lustre.org wrote:> Send Lustre-discuss mailing list submissions to > lustre-discuss at lists.lustre.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.lustre.org/mailman/listinfo/lustre-discuss > or, via email, send a message with subject or body ''help'' to > lustre-discuss-request at lists.lustre.org > > You can reach the person managing the list at > lustre-discuss-owner at lists.lustre.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Lustre-discuss digest..." > > > Today''s Topics: > > 1. Help with problem mounting Lustre from another network (Yujun Wu) > 2. Re: Help with problem mounting Lustre from another network > (Brian J. Murrell) > 3. Re: How do I recover files from partial lustre disk? > (Andreas Dilger) > 4. lustre 1.4 with ibhost stack issue (Changer Van) > 5. Re: Lustre and memory-mapped I/O (Nikita Danilov) > 6. Re: Lustre and memory-mapped I/O (Andreas Dilger) > 7. Re: Lustre 1.6.5 install problem (Charles Taylor) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 18 Jun 2008 20:04:01 -0400 (EDT) > From: Yujun Wu <yujun at phys.ufl.edu> > Subject: [Lustre-discuss] Help with problem mounting Lustre from > another network > To: lustre-discuss at lists.lustre.org > Message-ID: > <Pine.GSO.4.21.0806181946550.18213-100000 at neptune.phys.ufl.edu> > Content-Type: TEXT/PLAIN; charset=US-ASCII > > Hello, > > Could somebody please give me some hint on this? > > This is my first trying with Lustre. I installed everything on > a single node following the Lustre quick start: > > http://wiki.lustre.org/index.php?title=Lustre_Quick_Start > > with the new version 1.6.5. When I mounted the client > on the same node, everything works fine. > > Later, I tried to mount the client from a seperate network. I > got the following error: > > >mount -t lustre 128.227.89.181 at tcp:/testfs /mnt/testfs > > mount.lustre: mount 128.227.89.181 at tcp:/testfs at /mnt/testfs > failed: Cannot send after transport endpoint shutdown > > The error message from the server side is: > > Jun 18 19:52:04 olivine > kernel: LustreError: 5682:0:(socklnd_cb.c:2166:ksocknal_recv_hello()) Error > -11 reading HELLO from 128.227.221.35 > Jun 18 19:52:04 olivine kernel: audit(1213833124.489:54): avc: denied { > rawip_send } for pid=5682 comm="socknal_cd02" saddr=128.227.89.181 > src=988 daddr=128.227.221.35 dest=1023 netif=eth0 > scontext=system_u:object_r:unlabeled_t > tcontext=system_u:object_r:netif_eth0_t tclass=netif > > This is the result using lctl: > > >lctl ping 128.227.89.181 at tcp > failed to ping 128.227.89.181 at tcp: Input/output error > > This is the related configuration of /etc/modprobe.conf from both client > node and server node (MDS+OSTs): > > # Networking options, see /sys/module/lnet/parameters > options lnet networks=tcp > # (the llite module has been renamed to lustre) > # end Lustre modules > > The server''s ip address is: 128.227.89.181 > The client''s ip address is: 128.227.221.35 > > They are on two different network. > > Thanks in advance for any help you give. > > > Regards, > Yujun > > > > > > > > ------------------------------ > > Message: 2 > Date: Wed, 18 Jun 2008 23:12:43 -0400 > From: "Brian J. Murrell" <Brian.Murrell at Sun.COM> > Subject: Re: [Lustre-discuss] Help with problem mounting Lustre from > another network > To: lustre-discuss at lists.lustre.org > Message-ID: <1213845163.18266.115.camel at pc.ilinx> > Content-Type: text/plain; charset="us-ascii" > > On Wed, 2008-06-18 at 20:04 -0400, Yujun Wu wrote: > > Hello, > > Hi, > > > The error message from the server side is: > > > > Jun 18 19:52:04 olivine > > kernel: LustreError: 5682:0:(socklnd_cb.c:2166:ksocknal_recv_hello()) Error > > -11 reading HELLO from 128.227.221.35 > > Jun 18 19:52:04 olivine kernel: audit(1213833124.489:54): avc: denied { > > rawip_send } for pid=5682 comm="socknal_cd02" saddr=128.227.89.181 > > src=988 daddr=128.227.221.35 dest=1023 netif=eth0 > > scontext=system_u:object_r:unlabeled_t > > tcontext=system_u:object_r:netif_eth0_t tclass=netif > > I am sooooo glad you included this kernel "audit" message. You need to > disable selinux or apparmor or whatever MAC/RBAC tools you are running > on your Lustre machines. > > b. > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/pgp-signature > Size: 189 bytes > Desc: This is a digitally signed message part > Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080618/dab98c82/attachment-0001.bin > > ------------------------------ > > Message: 3 > Date: Wed, 18 Jun 2008 23:31:53 -0600 > From: Andreas Dilger <adilger at sun.com> > Subject: Re: [Lustre-discuss] How do I recover files from partial > lustre disk? > To: megan <dobsonunit at gmail.com> > Cc: Lustre User Discussion Mailing List > <lustre-discuss at lists.lustre.org> > Message-ID: <20080619053153.GM3726 at webber.adilger.int> > Content-Type: text/plain; charset=iso-8859-1 > > On Jun 18, 2008 14:33 -0700, megan wrote: > > shell-prompt> mount -t lustre /dev/md1 /srv/lustre/mds/crew4-MDT0000 > > > > No errors so far. > > > > shell-prompt> lctl > > dl (Found my nids of failed JBODs) > > device 14 > > deactivate > > > > device 16 > > deactivate > > > > quit > > > > On one of our servers, I mounted the lustre disk /crew4. > > The disk will hang a UNIX df or ls command. > > You actually need to do the "deactivate" step on the client. Then > "ls" will get EIO on the file, and "df" will return data only from > the available OSTs. > > > However.... > > lfs find --ost crew4-OST0001_UUID --ost crew4-OST0003_UUID --ost crew4- > > OST0004_UUID -print /crew4 > > > > Did indeed provide a list of files. I saved the list to a text > > file. I will next see if I am able to copy a single file to a new > > location. > > > > Thank you again Andreas for this incredibly useful information. Do > > you/Sun do paid Lustre consulting by any chance? > > Yes, in fact we do... > > > On Jun 18, 12:48?am, Andreas Dilger <adil... at sun.com> wrote: > > > On Jun 16, 2008 ?15:37 -0700, megan wrote: > > > > > > > I am using Lustre 2.6.18-53.1.13.el5_lustre.1.6.4.3smp kernel on a > > > > CentOS 5 linux x86_64 linux box. > > > > We had a hardware problem that caused the underlying ext3 partition > > > > table to completely blow up. ?This is resulting in only three of five > > > > OSTs being mountable. ? The main lustre disk of this unit cannot be > > > > mounted because the MDS knows that two of its parts are missing. > > > > > > It should be possible to mount a Lustre filesystem with OSTs that > > > are not available. ?However, access to files on the unavailable > > > OSTs will cause the process to wait on OST recovery. > > > > > > > > > > > > > The underlying set-up is JBOD hw that is passed to the linux OS, via > > > > an LSI 8888ELP card in this case, as a simple device, ie. sde, > > > > sdf,... ? ?The simple devices were partitioned using parted and > > > > formatted ext3 then lustre was built on top of the five ext3 units. > > > > There was no striping done across units/JBODS. ? Three of the five > > > > units passed an e2fsck and an lfsck. ?Those remaining units are > > > > mounted as such: > > > > /dev/sdc ? ? ? ? ? ? ? 13T ?6.3T ?5.7T ?53% /srv/lustre/OST/crew4- > > > > OST0003 > > > > /dev/sdd ? ? ? ? ? ? ? 13T ?6.3T ?5.7T ?53% /srv/lustre/OST/crew4- > > > > OST0004 > > > > /dev/sdf ? ? ? ? ? ? ? 13T ?6.2T ?5.8T ?52% /srv/lustre/OST/crew4- > > > > OST0001 > > > > > > > Being that it is unlikely that we shall be able to recover the > > > > underlying ext3 on the other two units, is there some method by which > > > > I might try to rescue the data from these last three units mounted > > > > currently on the OSS? > > > > > > > Any and all suggestion genuinely appreciated. > > > > > > The recoverability of your data depends heavily on the striping of > > > the individual files (i.e. the default striping). ?If your files have > > > a default stripe_count = 1, then you can probably recover 3/5 of the > > > files in the filesystem. ?If your default stripe_count = 2, then you > > > can probably only recover 1/5 of the files, and if you have a higher > > > stripe_count you probably can''t recover any files. > > > > > > What you need to do is to mount one of the clients and mark the > > > corresponding OSTs inactive with: > > > > > > ? ? ? ? lctl dl ? ?# get device numbers for OSC 0000 and OSC 0002 > > > ? ? ? ? lctl --device N deactivate > > > > > > Then, instead of the clients waiting for the OSTs to recover the > > > client will get an IO error when it accesses files on the failed OSTs. > > > > > > To get a list of the files that are on the good OSTs run: > > > > > > ? ? ? ? lfs find --ost crew4-OST0001_UUID --ost crew4-OST0003_UUID > > > ? ? ? ? ? ? ? ? ?--ost crew4-OST0004_UUID {mountpoint} > > > > > > Cheers, Andreas > > > -- > > > Andreas Dilger > > > Sr. Staff Engineer, Lustre Group > > > Sun Microsystems of Canada, Inc. > > > > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-disc... at lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > > > ------------------------------ > > Message: 4 > Date: Thu, 19 Jun 2008 15:26:49 +0800 > From: "Changer Van" <changerv at gmail.com> > Subject: [Lustre-discuss] lustre 1.4 with ibhost stack issue > To: lustre-discuss at clusterfs.com > Message-ID: > <9fa3c2e50806190026p6351959eu31e6b9011cb50f5b at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hi all, > > I installed lustre 1.4 with voltaire ibhost stack on RHEL4. > The kernel version is 2.6.9-55.0.9.el_lustre.1.4.11.1custom. > > The system was stopped by stopping voltaireibhost process > when it is going down. > > Stopping voltaireibhost... [ press enter twice ] > > I had to press enter twice to bring it down > and got the following messages on screen: > > IPOIB_UD: The del command > IPOIB_UD: Thread going out ... > IPOIB_UD: leave del command > rmmod ... > ... > IPOIB_UD: unregister units > IPOIB_UD: destroys pool > > I also had to press enter twice to bring the system up. > Then I turned off the init.d service of the voltaireibhost > and started it manually after system reboot. It was fine. > > What is wrong with this lustre machine? > > Any suggestion would be greatly appreciated. > > -- > Regards, > Changer > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080619/528a8874/attachment-0001.html > > ------------------------------ > > Message: 5 > Date: Thu, 19 Jun 2008 12:44:56 +0400 > From: Nikita Danilov <Nikita.Danilov at Sun.COM> > Subject: Re: [Lustre-discuss] Lustre and memory-mapped I/O > To: "Huang, Eric" <eric.huang at intel.com> > Cc: lustre-discuss at clusterfs.com > Message-ID: <18522.7304.410714.44837 at gargle.gargle.HOWL> > Content-Type: text/plain; charset=us-ascii > > Huang, Eric writes: > > Hello, > > > > > Does Lustre support memory mapped I/O and direct I/O? > > yes, it supports both. Can you run your application under strace to see > how exactly it fails to create a directory? > > > > > I am trying to run Nastran using Lustre but it always reported failure > > to create a directory. Since Nastran does a lot of memory mapped I/O, I > > was wondering if it was the cause. > > > > I guess a good question to ask is that does Lustre support all POSIX > > file system operations? > > > > Thanks a lot. > > > > Eric > > Nikita. > > > ------------------------------ > > Message: 6 > Date: Thu, 19 Jun 2008 02:56:44 -0600 > From: Andreas Dilger <adilger at sun.com> > Subject: Re: [Lustre-discuss] Lustre and memory-mapped I/O > To: "Huang, Eric" <eric.huang at intel.com> > Cc: lustre-discuss at clusterfs.com > Message-ID: <20080619085644.GR3726 at webber.adilger.int> > Content-Type: text/plain; charset=us-ascii > > On Jun 17, 2008 22:39 -0700, Huang, Eric wrote: > > Does Lustre support memory mapped I/O and direct I/O? > > Yes, it does support both of these. > > > I am trying to run Nastran using Lustre but it always reported failure > > to create a directory. Since Nastran does a lot of memory mapped I/O, I > > was wondering if it was the cause. > > ??? I''m not sure how mmap and direct I/O relate to creating a directory? > > > I guess a good question to ask is that does Lustre support all POSIX > > file system operations? > > Yes. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > > > ------------------------------ > > Message: 7 > Date: Thu, 19 Jun 2008 06:19:46 -0400 > From: Charles Taylor <taylor at hpc.ufl.edu> > Subject: Re: [Lustre-discuss] Lustre 1.6.5 install problem > To: Johnlya <johnlya at gmail.com> > Cc: lustre-discuss at clusterfs.com > Message-ID: <CE9459CB-6FEA-4AD5-A4B4-319B4F09D9B9 at hpc.ufl.edu> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > Lustre doesn''t know where your ib modules symbols are. When you > configured lustre (in the build sense) you pointed it to a patched > kernel tree. In that directory is a Module.symvers file devoid of ib > module symbols. You should also have a Module.symvers in your /usr/ > src/ofa_kernel directory (assuming you built OFED as well). So... > > cat /usr/src/ofa_kernel/Module.symvers >> <patched_kernel_dir>/ > Module.symvers > > and run "make install" again and it should be happy. For a 2.6.9 > kernel, you probably need OFED 1.2. > > Charlie Taylor > UF HPC Center > > On Jun 18, 2008, at 5:55 AM, Johnlya wrote: > > > Install step is: > > rpm -Uvh --nodeps e2fsprogs-devel-1.40.7.sun3-0redhat.x86_64.rpm > > rpm -Uvh e2fsprogs-1.40.7.sun3-0redhat.x86_64.rpm > > cd ../PyXML/ > > tar -zxvf PyXML-0.8.4.tar.gz > > cd PyXML-0.8.4 > > python setup.py build > > python setup.py install > > cd ../../Expect > > rpm -ivh expect-5.42.1-1.src.rpm > > cd ../1.6.5/ > > rpm -ivh kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5.x86_64.rpm > > rpm -ivh lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre. > > 1.6.5smp.x86_64.rpm > > rpm -ivh lustre-modules-1.6.5-2.6.9_67.0.7.EL_lustre. > > 1.6.5smp.x86_64.rpm > > > > when install lustre-modules, it displays warning: > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol ib_create_cq > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_resolve_addr > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol ib_dereg_mr > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_reject > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_disconnect > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_resolve_route > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_bind_addr > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_create_qp > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol ib_destroy_cq > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_create_id > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_listen > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_destroy_qp > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol ib_get_dma_mr > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol ib_alloc_pd > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_connect > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol ib_modify_qp > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_destroy_id > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol rdma_accept > > WARNING: /lib/modules/2.6.9-67.0.7.EL_lustre.1.6.5smp/kernel/net/ > > lustre/ko2iblnd.ko needs unknown symbol ib_dealloc_pd > > > > Please tell me why? > > Thank you > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > ------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > End of Lustre-discuss Digest, Vol 29, Issue 34 > ********************************************** >