Brian Andrus
2010-Mar-15 17:53 UTC
[Lustre-discuss] adding 1.8.2 lustre client to 1.6 install with infiniband
Hello, Scene: We have lustre 1.6 set up and running over tcp and ib. Runing CentOS 5.1, seperate networks. I have a new node I want to install with the newer kernel (2.6.18-164.11.1.el5). I have installed the stock kernel, the appropriate ib modules and am running openib on it. I have installed the client modules and client tools (lustre-client-modules-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2 and lustre-client-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2) downloaded as RPMs from the lustre website. My difficulty: I CAN mount over TCP without a problem. I CANNOT mount over infiniband. I get: -------------------------------- # mount -t lustre nas-ib-1-2 at o2ib:/scratch /scratch mount.lustre: mount nas-ib-1-2 at o2ib:/scratch at /scratch failed: Cannot send after transport endpoint shutdown --------------------------------- #cat /etc/modprobe.conf alias scsi_hostadapter aacraid alias scsi_hostadapter1 ata_piix alias eth0 e1000e alias ib0 ib_ipoib options lnet ip2nets="o2ib0(ib0) 192.168.*.*; tcp(eth0) 10.1.*.*" ----------------------------------------- #mount -t lustre nas-1-2 at tcp:/scratch /scratch #df -h /scratch Filesystem Size Used Avail Use% Mounted on nas-1-2 at tcp:/scratch 22T 8.0T 13T 39% /scratch -------------------------------------------- #tail /var/log/messages Mar 15 10:49:01 compute-1-1 kernel: LustreError: 6539:0:(lib-move.c:2436:LNetPut()) Error sending PUT to 12345-192.168.1.95 at tcp: -113 Mar 15 10:49:01 compute-1-1 kernel: LustreError: 6539:0:(events.c:66:request_out_callback()) @@@ type 4, status -113 req at ffff81025c861400 x1330300123611200/t0 o250->MGS at MGC192.168.1.95@tcp_0:26/25 lens 368/584 e 0 to 1 dl 1268675346 ref 2 fl Rpc:N/0/0 rc 0/0 Mar 15 10:49:01 compute-1-1 kernel: LustreError: 7075:0:(client.c:848:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff81025c861000 x1330300123611201/t0 o101->MGS at MGC192.168.1.95@tcp_0:26/25 lens 296/544 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 Mar 15 10:49:01 compute-1-1 kernel: LustreError: 15c-8: MGC192.168.1.95 at tcp: The configuration from log ''scratch-client'' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Mar 15 10:49:01 compute-1-1 kernel: LustreError: 7075:0:(llite_lib.c:1176:ll_fill_super()) Unable to process log: -108 Mar 15 10:49:01 compute-1-1 kernel: LustreError: 7075:0:(obd_mount.c:2042:lustre_fill_super()) Unable to mount (-108) ------------------------------------- Any ideas on troubleshooting this would be greatly appreciated. Brian Andrus -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100315/b26177ef/attachment.html