Hello to Everyone! I have a question to which I think I know the answer, but I am seeking confirmation (re-assurance?). I have build a RHEL 6.2 system with lustre-2.1.2. I am using the rpms from the Whamcloud site for linux kernel 2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching lustre, lustre-modules, lustre-ldiskfs, and kernel-devel, I also have from the Whamcloud site kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related kernel-ib-devel for same. The lustre file system works properly for TCP. I would like to use InfiniBand. The system has a new Mellanox card for which mlxn1 firmware and drivers were installed. After this was done (I cannot speak to before) the IB network will come up on boot and copy and ping in a traditional network fashion. Hard Part: I would like to run the lustre file system on the IB (ib0). I re-created the lustre network to use /etc/modprobe.d/lustre.conf pointing to o2ib in place of tcp0. I rebuilt the mgs/mdt and all osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and the osts point to mgs on IB net). When I "modprobe lustre" to start the system I receive error messages stating that there are Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko lov.ko. The lustre.ko cannot be started. A look in /var/log/messages reveals many "Unknown symbol" and "Disagrees about version of symbol" from the ko2iblnd module. A "modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko" shows it pointing to the Modules.symvers of the lustre kernel. Am I correct in thinking that because of the specific Mellanox IB hardware I have (with its own /usr/src/ofa_kernel/Module.symvers file), that I have to build Lustre-2.1.2 from tarball to use the "configure --with-o2ib=/usr/src/ofa_kernel...." mandating that this system use the ofa_kernel-1.8.5 modules and not the OFED 1.8.5 from the kernel-ib rpms to which Lustre defaults in the Linux kernel? Is a rebuild of lustre from source mandartory or is there a way in which I may point to the appropriate symbols needed by the ko2iblnd.ko? Enjoy the Thanksgiving holiday for those U.S. readers. To everyone else in the world, have a great weekend! Megan Larko Hewlett-Packard
Hi Megan, One thing to check is if the existing IB drivers are installed on your system. They will conflict with the MLX ones. Not sure how Intel is building against IB these days but if they''re using stock, and you''re trying to use MLX, you''re going to run into these symbol errors. If that''s the case then recompile against the correct driver set is the fix here. -cf On 11/20/2012 02:20 PM, Ms. Megan Larko wrote:> Hello to Everyone! > > I have a question to which I think I know the answer, but I am seeking > confirmation (re-assurance?). > > I have build a RHEL 6.2 system with lustre-2.1.2. I am using the > rpms from the Whamcloud site for linux kernel > 2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching > lustre, lustre-modules, lustre-ldiskfs, and kernel-devel, I also > have from the Whamcloud site > kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related > kernel-ib-devel for same. > > The lustre file system works properly for TCP. > > I would like to use InfiniBand. The system has a new Mellanox card > for which mlxn1 firmware and drivers were installed. After this was > done (I cannot speak to before) the IB network will come up on boot > and copy and ping in a traditional network fashion. > > Hard Part: I would like to run the lustre file system on the IB (ib0). > I re-created the lustre network to use /etc/modprobe.d/lustre.conf > pointing to o2ib in place of tcp0. I rebuilt the mgs/mdt and all > osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and > the osts point to mgs on IB net). When I "modprobe lustre" to start > the system I receive error messages stating that there are > Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko > lov.ko. The lustre.ko cannot be started. A look in > /var/log/messages reveals many "Unknown symbol" and "Disagrees about > version of symbol" from the ko2iblnd module. > > A "modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko" shows it > pointing to the Modules.symvers of the lustre kernel. > > Am I correct in thinking that because of the specific Mellanox IB > hardware I have (with its own /usr/src/ofa_kernel/Module.symvers > file), that I have to build Lustre-2.1.2 from tarball to use the > "configure --with-o2ib=/usr/src/ofa_kernel...." mandating that this > system use the ofa_kernel-1.8.5 modules and not the OFED 1.8.5 from > the kernel-ib rpms to which Lustre defaults in the Linux kernel? > > Is a rebuild of lustre from source mandartory or is there a way in > which I may point to the appropriate symbols needed by the > ko2iblnd.ko? > > Enjoy the Thanksgiving holiday for those U.S. readers. To everyone > else in the world, have a great weekend! > > Megan Larko > Hewlett-Packard > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
I''ve had to rebuild against the Mellanox OFED every time I change Lustre or OFED versions. It''s a bit of a catch 22 situation because you have to build the Mellanox OFED against the Lustre kernel, install the Mellanox OFED, then rebuild the Lustre modules against the Mellanox OFED. The procedure I use is as follows... * install upgraded Lustre kernel and kernel-devel rpms * rebuild Mellanox OFED against Lustre kernel - mount -o loop MLNX_OFED.iso /root/mnt - /root/mnt/docs/mlnx_add_kernel_support.sh -i /root/MLNX_OFED.iso * install Mellanox OFED from rebuilt MLNX_OFED.iso * install kernel-ib-devel from rebuilt MLNX_OFED.iso Now rebuld lustre-modules RPM to get ko2iblnd.ko which is compatible with Mellanox kernel-ib drivers... * cd /usr/src/lustre-x.x.x * configure --with-o2ib=/usr/src/openib * make rpms Ron. -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Ms. Megan Larko Sent: November 20, 2012 4:21 PM To: Lustre User Discussion Mailing List Subject: [Lustre-discuss] lo2iblnd and Mellanox IB question Hello to Everyone! I have a question to which I think I know the answer, but I am seeking confirmation (re-assurance?). I have build a RHEL 6.2 system with lustre-2.1.2. I am using the rpms from the Whamcloud site for linux kernel 2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching lustre, lustre-modules, lustre-ldiskfs, and kernel-devel, I also have from the Whamcloud site kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related kernel-ib-devel for same. The lustre file system works properly for TCP. I would like to use InfiniBand. The system has a new Mellanox card for which mlxn1 firmware and drivers were installed. After this was done (I cannot speak to before) the IB network will come up on boot and copy and ping in a traditional network fashion. Hard Part: I would like to run the lustre file system on the IB (ib0). I re-created the lustre network to use /etc/modprobe.d/lustre.conf pointing to o2ib in place of tcp0. I rebuilt the mgs/mdt and all osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and the osts point to mgs on IB net). When I "modprobe lustre" to start the system I receive error messages stating that there are Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko lov.ko. The lustre.ko cannot be started. A look in /var/log/messages reveals many "Unknown symbol" and "Disagrees about version of symbol" from the ko2iblnd module. A "modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko" shows it pointing to the Modules.symvers of the lustre kernel. Am I correct in thinking that because of the specific Mellanox IB hardware I have (with its own /usr/src/ofa_kernel/Module.symvers file), that I have to build Lustre-2.1.2 from tarball to use the "configure --with-o2ib=/usr/src/ofa_kernel...." mandating that this system use the ofa_kernel-1.8.5 modules and not the OFED 1.8.5 from the kernel-ib rpms to which Lustre defaults in the Linux kernel? Is a rebuild of lustre from source mandartory or is there a way in which I may point to the appropriate symbols needed by the ko2iblnd.ko? Enjoy the Thanksgiving holiday for those U.S. readers. To everyone else in the world, have a great weekend! Megan Larko Hewlett-Packard _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Megan, You will have to rebuild Lustre from source. Furthermore you will have to have the Mellanox ib driver source installed so the Lustre build process can grab the necessary bits from the Mellanox source. The issue you are seeing is exactly what you think it is. The WC builds use the RHEL in-kernel IB driver. I have even had issues with MDS/OSS boxes running RHEL in-kernel IB and clients running Mellanox of OFED IB drivers. Even though IB is a "standard" you really need to have everything, from core to edge, talking the same driver. I recently did nearly the same config you have; RHEL6.2 x86_64, MLX OFED, Lustre 2.1.3. You could opt to run your Mellanox IB HCA using the RHEL in-kernel IB drivers and not have to recompile anything. --Jeff On 11/20/12 1:20 PM, Ms. Megan Larko wrote:> Hello to Everyone! > > I have a question to which I think I know the answer, but I am seeking > confirmation (re-assurance?). > > I have build a RHEL 6.2 system with lustre-2.1.2. I am using the > rpms from the Whamcloud site for linux kernel > 2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching > lustre, lustre-modules, lustre-ldiskfs, and kernel-devel, I also > have from the Whamcloud site > kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related > kernel-ib-devel for same. > > The lustre file system works properly for TCP. > > I would like to use InfiniBand. The system has a new Mellanox card > for which mlxn1 firmware and drivers were installed. After this was > done (I cannot speak to before) the IB network will come up on boot > and copy and ping in a traditional network fashion. > > Hard Part: I would like to run the lustre file system on the IB (ib0). > I re-created the lustre network to use /etc/modprobe.d/lustre.conf > pointing to o2ib in place of tcp0. I rebuilt the mgs/mdt and all > osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and > the osts point to mgs on IB net). When I "modprobe lustre" to start > the system I receive error messages stating that there are > Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko > lov.ko. The lustre.ko cannot be started. A look in > /var/log/messages reveals many "Unknown symbol" and "Disagrees about > version of symbol" from the ko2iblnd module. > > A "modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko" shows it > pointing to the Modules.symvers of the lustre kernel. > > Am I correct in thinking that because of the specific Mellanox IB > hardware I have (with its own /usr/src/ofa_kernel/Module.symvers > file), that I have to build Lustre-2.1.2 from tarball to use the > "configure --with-o2ib=/usr/src/ofa_kernel...." mandating that this > system use the ofa_kernel-1.8.5 modules and not the OFED 1.8.5 from > the kernel-ib rpms to which Lustre defaults in the Linux kernel? > > Is a rebuild of lustre from source mandartory or is there a way in > which I may point to the appropriate symbols needed by the > ko2iblnd.ko? > > Enjoy the Thanksgiving holiday for those U.S. readers. To everyone > else in the world, have a great weekend! > > Megan Larko > Hewlett-Packard > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- ------------------------------ Jeff Johnson Co-Founder Aeon Computing jeff.johnson at aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117
Thanks, especially to Colin and to Jeff. Yup. I suspected that I would have to rebuild the Lustre 2.1.2 I have to make use of the Mellanox IB. Colin, I appreciate the check; I did not have conflicting IB drivers. Jeff, I will heed your advice and I will start my rebuild after the (U.S.) holiday weekend. An enjoyable weekend to one and all! megan