Michael Robbert
2010-Apr-12 23:33 UTC
[Lustre-discuss] Lustre module not loading on client mount
I am trying to configure a Lustre 1.8.2 client on a CentOS 5.4 machine. I have compiled from source into RPMS and all 4 RPMS are installed (lustre, -modules, -tests, and -source). The lustre module will load find manually with "modprobe lustre", but I can not get the filesystem to automatically mount on boot up. I have added the following to /etc/modprobe.conf options lnet networks=o2ib0(ib0) and these are the entries in my /etc/fstab 172.16.34.1 at o2ib:/home /lustre/home lustre auto,_netdev 1 2 172.16.34.1 at o2ib:/scratch /lustre/scratch lustre auto,_netdev 1 2 I have a similar setup with Lustre 1.6.7.2 client running on RHEL 4.5 and it loads fine there. What am I missing? Thanks, Mike Robbert
Kit Westneat
2010-Apr-13 04:07 UTC
[Lustre-discuss] Lustre module not loading on client mount
Hey Mike, Are there any messages in dmesg on boot? I''ve seen it on occasion where the IB takes a second to actually start. If that''s the case, you might need to add mounts to rc.local, or try to get openibd to start earlier. - Kit On 4/12/2010 7:33 PM, Michael Robbert wrote:> I am trying to configure a Lustre 1.8.2 client on a CentOS 5.4 machine. I have compiled from source into RPMS and all 4 RPMS are installed (lustre, -modules, -tests, and -source). The lustre module will load find manually with "modprobe lustre", but I can not get the filesystem to automatically mount on boot up. I have added the following to /etc/modprobe.conf > > options lnet networks=o2ib0(ib0) > > and these are the entries in my /etc/fstab > > 172.16.34.1 at o2ib:/home /lustre/home lustre auto,_netdev 1 2 > 172.16.34.1 at o2ib:/scratch /lustre/scratch lustre auto,_netdev 1 2 > > I have a similar setup with Lustre 1.6.7.2 client running on RHEL 4.5 and it loads fine there. > > What am I missing? > > Thanks, > Mike Robbert > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- --- Kit Westneat kwestneat at datadirectnet.com 812-484-8485
Michael Robbert
2010-Apr-14 17:42 UTC
[Lustre-discuss] Lustre module not loading on client mount
Kit, I thought that it may be a timing issue, but I added mount commands to rc.local and it didn''t help. The odd thing is that it does seem to work on subsequent reboots. I haven''t done extensive testing to see if that works all the time or not. The other odd thing is that if the FSs don''t mount on boot a manual mount command does not work without first doing "modprobe lustre" first. This is what I see in that case: [root at compute-2-1 ~]# mount -a mount.lustre: mount 172.16.34.1 at o2ib:/home at /lustre/home failed: No such device Are the lustre modules loaded? Check /etc/modprobe.conf and /proc/filesystems Note ''alias lustre llite'' should be removed from modprobe.conf mount.lustre: mount 172.16.34.1 at o2ib:/scratch at /lustre/scratch failed: No such device Are the lustre modules loaded? Check /etc/modprobe.conf and /proc/filesystems Note ''alias lustre llite'' should be removed from modprobe.conf Here are some dmesg entries from a boot that does not mount the FSs: ADDRCONF(NETDEV_UP): eth0: link is not ready bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready ADDRCONF(NETDEV_UP): ib0: link is not ready ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready Lustre: OBD class driver, http://www.lustre.org/ Lustre: Lustre Version: 1.8.2 Lustre: Build Version: 1.8.2-20100122190848-PRISTINE-2.6.18-164.15.1.el5 ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap ko2iblnd: Unknown symbol ib_fmr_pool_unmap ... Lots more ko2iblnd errors here (Is this part of the problem or a red herring? ... ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys ko2iblnd: Unknown symbol ib_fmr_pool_map_phys LustreError: 3288:0:(api-ni.c:1043:lnet_startup_lndnis()) Can''t load LND o2ib, module ko2iblnd, rc=256 LustreError: 3288:0:(events.c:729:ptlrpc_init_portals()) network initialisation failed LustreError: 165-2: Nothing registered for client mount! Is the ''lustre'' module loaded? LustreError: 3381:0:(obd_mount.c:2042:lustre_fill_super()) Unable to mount (-19) Thanks, Mike On Apr 12, 2010, at 10:07 PM, Kit Westneat wrote:> Hey Mike, > > Are there any messages in dmesg on boot? I''ve seen it on occasion where > the IB takes a second to actually start. If that''s the case, you might > need to add mounts to rc.local, or try to get openibd to start earlier. > > - Kit > > On 4/12/2010 7:33 PM, Michael Robbert wrote: >> I am trying to configure a Lustre 1.8.2 client on a CentOS 5.4 machine. I have compiled from source into RPMS and all 4 RPMS are installed (lustre, -modules, -tests, and -source). The lustre module will load find manually with "modprobe lustre", but I can not get the filesystem to automatically mount on boot up. I have added the following to /etc/modprobe.conf >> >> options lnet networks=o2ib0(ib0) >> >> and these are the entries in my /etc/fstab >> >> 172.16.34.1 at o2ib:/home /lustre/home lustre auto,_netdev 1 2 >> 172.16.34.1 at o2ib:/scratch /lustre/scratch lustre auto,_netdev 1 2 >> >> I have a similar setup with Lustre 1.6.7.2 client running on RHEL 4.5 and it loads fine there. >> >> What am I missing? >> >> Thanks, >> Mike Robbert >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > > -- > --- > Kit Westneat > kwestneat at datadirectnet.com > 812-484-8485 >
Nathan Dauchy
2010-Apr-14 18:00 UTC
[Lustre-discuss] Lustre module not loading on client mount
Michael Robbert wrote:> Kit, > I thought that it may be a timing issue, but I added mount commands to rc.local and it didn''t help.Robert, I''m not sure of the root cause of your mount problems, but we were also hitting a timing problem when mounting file systems over Infiniband at boot time. To avoid it, since the IB may still not be initialized when rc.local runs, the solution I used was to add the following to the "start)" section of /etc/rc.d/init.d/netfs. You could put something similar in rc.local if you prefer. # Spin until we find an "Active" IB device if [ -d /sys/class/infiniband ]; then tries=1 maxtries=10 delay=5 while [ $tries -le $maxtries ]; do grep -q ACTIVE /sys/class/infiniband/*/ports/*/state 2>&1 && break logger -s -t netfs "WARNING: No "ACTIVE" Infiniband ports found: try $tries/$maxtries, sleep $delay" sleep $delay (( tries++ )) [ $tries -gt $maxtries ] && logger -s -t "ERROR: No "ACTIVE" Infiniband ports found." done fi Hope this helps! -Nathan
Kit Westneat
2010-Apr-15 04:21 UTC
[Lustre-discuss] Lustre module not loading on client mount
Hey Mike, That''s pretty odd, it looks like the o2ib module has a symbol mismatch with the ofed driver. I''m surprised it works at all...can you send the dmesg output after modprobe lustre + mounting, as well as the lctl list_nids output? Thanks, Kit On 4/14/2010 1:42 PM, Michael Robbert wrote:> Kit, > I thought that it may be a timing issue, but I added mount commands to rc.local and it didn''t help. The odd thing is that it does seem to work on subsequent reboots. I haven''t done extensive testing to see if that works all the time or not. The other odd thing is that if the FSs don''t mount on boot a manual mount command does not work without first doing "modprobe lustre" first. This is what I see in that case: > > [root at compute-2-1 ~]# mount -a > mount.lustre: mount 172.16.34.1 at o2ib:/home at /lustre/home failed: No such device > Are the lustre modules loaded? > Check /etc/modprobe.conf and /proc/filesystems > Note ''alias lustre llite'' should be removed from modprobe.conf > mount.lustre: mount 172.16.34.1 at o2ib:/scratch at /lustre/scratch failed: No such device > Are the lustre modules loaded? > Check /etc/modprobe.conf and /proc/filesystems > Note ''alias lustre llite'' should be removed from modprobe.conf > > Here are some dmesg entries from a boot that does not mount the FSs: > > ADDRCONF(NETDEV_UP): eth0: link is not ready > bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex > ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > ADDRCONF(NETDEV_UP): ib0: link is not ready > ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready > Lustre: OBD class driver, http://www.lustre.org/ > Lustre: Lustre Version: 1.8.2 > Lustre: Build Version: 1.8.2-20100122190848-PRISTINE-2.6.18-164.15.1.el5 > ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap > ko2iblnd: Unknown symbol ib_fmr_pool_unmap > ... Lots more ko2iblnd errors here (Is this part of the problem or a red herring? ... > ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys > ko2iblnd: Unknown symbol ib_fmr_pool_map_phys > LustreError: 3288:0:(api-ni.c:1043:lnet_startup_lndnis()) Can''t load LND o2ib, module ko2iblnd, rc=256 > LustreError: 3288:0:(events.c:729:ptlrpc_init_portals()) network initialisation failed > LustreError: 165-2: Nothing registered for client mount! Is the ''lustre'' module loaded? > LustreError: 3381:0:(obd_mount.c:2042:lustre_fill_super()) Unable to mount (-19) > > > Thanks, > Mike > > On Apr 12, 2010, at 10:07 PM, Kit Westneat wrote: > > >> Hey Mike, >> >> Are there any messages in dmesg on boot? I''ve seen it on occasion where >> the IB takes a second to actually start. If that''s the case, you might >> need to add mounts to rc.local, or try to get openibd to start earlier. >> >> - Kit >> >> On 4/12/2010 7:33 PM, Michael Robbert wrote: >> >>> I am trying to configure a Lustre 1.8.2 client on a CentOS 5.4 machine. I have compiled from source into RPMS and all 4 RPMS are installed (lustre, -modules, -tests, and -source). The lustre module will load find manually with "modprobe lustre", but I can not get the filesystem to automatically mount on boot up. I have added the following to /etc/modprobe.conf >>> >>> options lnet networks=o2ib0(ib0) >>> >>> and these are the entries in my /etc/fstab >>> >>> 172.16.34.1 at o2ib:/home /lustre/home lustre auto,_netdev 1 2 >>> 172.16.34.1 at o2ib:/scratch /lustre/scratch lustre auto,_netdev 1 2 >>> >>> I have a similar setup with Lustre 1.6.7.2 client running on RHEL 4.5 and it loads fine there. >>> >>> What am I missing? >>> >>> Thanks, >>> Mike Robbert >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> >> -- >> --- >> Kit Westneat >> kwestneat at datadirectnet.com >> 812-484-8485 >> >> >-- --- Kit Westneat kwestneat at datadirectnet.com 812-484-8485
Michael Robbert
2010-Apr-15 19:56 UTC
[Lustre-discuss] Lustre module not loading on client mount
I think that I''ve discovered the problem is the OFED Roll that I''m using. When a node is first built it recompiles the OFED modules for the current kernel and I''m still deciphering the actual sequence of events, but I think that I need to add a reboot at the end of the process. Mike On Apr 14, 2010, at 10:21 PM, Kit Westneat wrote:> Hey Mike, > > That''s pretty odd, it looks like the o2ib module has a symbol mismatch > with the ofed driver. I''m surprised it works at all...can you send the > dmesg output after modprobe lustre + mounting, as well as the lctl > list_nids output? > > Thanks, > Kit > > On 4/14/2010 1:42 PM, Michael Robbert wrote: >> Kit, >> I thought that it may be a timing issue, but I added mount commands to rc.local and it didn''t help. The odd thing is that it does seem to work on subsequent reboots. I haven''t done extensive testing to see if that works all the time or not. The other odd thing is that if the FSs don''t mount on boot a manual mount command does not work without first doing "modprobe lustre" first. This is what I see in that case: >> >> [root at compute-2-1 ~]# mount -a >> mount.lustre: mount 172.16.34.1 at o2ib:/home at /lustre/home failed: No such device >> Are the lustre modules loaded? >> Check /etc/modprobe.conf and /proc/filesystems >> Note ''alias lustre llite'' should be removed from modprobe.conf >> mount.lustre: mount 172.16.34.1 at o2ib:/scratch at /lustre/scratch failed: No such device >> Are the lustre modules loaded? >> Check /etc/modprobe.conf and /proc/filesystems >> Note ''alias lustre llite'' should be removed from modprobe.conf >> >> Here are some dmesg entries from a boot that does not mount the FSs: >> >> ADDRCONF(NETDEV_UP): eth0: link is not ready >> bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex >> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >> ADDRCONF(NETDEV_UP): ib0: link is not ready >> ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready >> Lustre: OBD class driver, http://www.lustre.org/ >> Lustre: Lustre Version: 1.8.2 >> Lustre: Build Version: 1.8.2-20100122190848-PRISTINE-2.6.18-164.15.1.el5 >> ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap >> ko2iblnd: Unknown symbol ib_fmr_pool_unmap >> ... Lots more ko2iblnd errors here (Is this part of the problem or a red herring? ... >> ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys >> ko2iblnd: Unknown symbol ib_fmr_pool_map_phys >> LustreError: 3288:0:(api-ni.c:1043:lnet_startup_lndnis()) Can''t load LND o2ib, module ko2iblnd, rc=256 >> LustreError: 3288:0:(events.c:729:ptlrpc_init_portals()) network initialisation failed >> LustreError: 165-2: Nothing registered for client mount! Is the ''lustre'' module loaded? >> LustreError: 3381:0:(obd_mount.c:2042:lustre_fill_super()) Unable to mount (-19) >> >> >> Thanks, >> Mike >> >> On Apr 12, 2010, at 10:07 PM, Kit Westneat wrote: >> >> >>> Hey Mike, >>> >>> Are there any messages in dmesg on boot? I''ve seen it on occasion where >>> the IB takes a second to actually start. If that''s the case, you might >>> need to add mounts to rc.local, or try to get openibd to start earlier. >>> >>> - Kit >>> >>> On 4/12/2010 7:33 PM, Michael Robbert wrote: >>> >>>> I am trying to configure a Lustre 1.8.2 client on a CentOS 5.4 machine. I have compiled from source into RPMS and all 4 RPMS are installed (lustre, -modules, -tests, and -source). The lustre module will load find manually with "modprobe lustre", but I can not get the filesystem to automatically mount on boot up. I have added the following to /etc/modprobe.conf >>>> >>>> options lnet networks=o2ib0(ib0) >>>> >>>> and these are the entries in my /etc/fstab >>>> >>>> 172.16.34.1 at o2ib:/home /lustre/home lustre auto,_netdev 1 2 >>>> 172.16.34.1 at o2ib:/scratch /lustre/scratch lustre auto,_netdev 1 2 >>>> >>>> I have a similar setup with Lustre 1.6.7.2 client running on RHEL 4.5 and it loads fine there. >>>> >>>> What am I missing? >>>> >>>> Thanks, >>>> Mike Robbert >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> >>> >>> -- >>> --- >>> Kit Westneat >>> kwestneat at datadirectnet.com >>> 812-484-8485 >>> >>> >> > > > -- > --- > Kit Westneat > kwestneat at datadirectnet.com > 812-484-8485 >