Jason Brooks
2012-Dec-28 23:54 UTC
[Lustre-discuss] problem with installing lustre and ofed
Hello, I am having trouble installing the server modules for lustre 2.1.4 and use mellanox''s OFED distribution so we may use infiniband. Would you folks look at my procedure and results below and let me know what you think? Thanks very much! The mellanox ofed installation builds and installs some kernel modules too, so I used this method to ensure OFED compiled against the correct kernel. This is on centos 6.3. 1. download all lustre rpms from whamcloud 2. install kernel, kernel-firmware, kernel-headers, and kernel-devel * in this case, it''s the rpm files with "2.6.32-279.14.1.el6_lustre.x86_64" in their name 3. reboot into this lustre kernel 4. install the remaining rpms 5. download ofed from mellanox "MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso" * build mellanox ofed bits using the lustre kernel and kernel-devel info * install mellanox ofed 6. reboot 7. upon reboot, if I do NOT have o2ib3 in my lnet networks parameters, I can modprobe lnet and lustre. 8. if I DO have o2ib3 present in the lnet parameters, running modprobe lustre gets me: ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko): Input/output error WARNING: Error inserting fid (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko): Input/output error WARNING: Error inserting mdc (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko): Input/output error WARNING: Error inserting osc (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko): Input/output error WARNING: Error inserting lov (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko): Input/output error FATAL: Error inserting lustre (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko): Input/output error dmesg shows: ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap ko2iblnd: Unknown symbol ib_fmr_pool_unmap ko2iblnd: disagrees about version of symbol ib_create_cq ko2iblnd: Unknown symbol ib_create_cq ? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121228/e51e56c6/attachment.html
Jeff Johnson
2012-Dec-29 00:45 UTC
[Lustre-discuss] problem with installing lustre and ofed
Jason, The prebuilt server-side Lustre packages from Whamcloud are built against RHEL/CentOS kernel sources with kernel-ib active in them. This means that any of the Lustre prebuilt server packages are already tied to RHEL''s kernel-ib. To accomplish your stated goal you''ll have to start with a non Whamcloud, stock kernel (plus headers, devel, etc). Then compile/install the OFED version of your choice. Once you have that you can build Lustre from source where it will compile against OFED and the installed kernel. --Jeff --------------------------- Jeff Johnson Co-Founder Aeon Computing jeff.johnson at aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 4170 Morena Boulevard, Suite D - San Diego, CA 92117 /* Follow us on Twitter - @AeonComputing */ On 12/28/12 3:54 PM, Jason Brooks wrote:> Hello, > > I am having trouble installing the server modules for lustre 2.1.4 and > use mellanox''s OFED distribution so we may use infiniband. Would you > folks look at my procedure and results below and let me know what you > think? Thanks very much! > > The mellanox ofed installation builds and installs some kernel modules > too, so I used this method to ensure OFED compiled against the correct > kernel. This is on centos 6.3. > > 1. download all lustre rpms from whamcloud > 2. install kernel, kernel-firmware, kernel-headers, and kernel-devel > 1. in this case, it''s the rpm files with > "2.6.32-279.14.1.el6_lustre.x86_64" in their name > 3. reboot into this lustre kernel > 4. install the remaining rpms > 5. download ofed from mellanox > "MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso" > 1. build mellanox ofed bits using the lustre kernel and > kernel-devel info > 2. install mellanox ofed > 6. reboot > 7. upon reboot, if I do NOT have o2ib3 in my lnet networks > parameters, I can modprobe lnet and lustre. > 8. if I DO have o2ib3 present in the lnet parameters, running > modprobe lustre gets me: > > ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko): > Input/output error > WARNING: Error inserting fid > (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko): > Input/output error > WARNING: Error inserting mdc > (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko): > Input/output error > WARNING: Error inserting osc > (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko): > Input/output error > WARNING: Error inserting lov > (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko): > Input/output error > FATAL: Error inserting lustre > (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko): > Input/output error > > > dmesg shows: > ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap > ko2iblnd: Unknown symbol ib_fmr_pool_unmap > ko2iblnd: disagrees about version of symbol ib_create_cq > ko2iblnd: Unknown symbol ib_create_cq > ? > > > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Jason Brooks
2012-Dec-29 01:06 UTC
[Lustre-discuss] problem with installing lustre and ofed
Hello, That''s good to know kernel-ib comes with the lustre stock install. What about the rest of the OFED tools? I mean things like ibdiagnet, ibstatus, etc? (I will look at the contents of the other rpms and see what I can learn) On 12/28/12 4:45 PM, "Jeff Johnson" <jeff.johnson at aeoncomputing.com> wrote:>Jason, > >The prebuilt server-side Lustre packages from Whamcloud are built >against RHEL/CentOS kernel sources with kernel-ib active in them. This >means that any of the Lustre prebuilt server packages are already tied >to RHEL''s kernel-ib. > >To accomplish your stated goal you''ll have to start with a non >Whamcloud, stock kernel (plus headers, devel, etc). Then compile/install >the OFED version of your choice. Once you have that you can build Lustre >from source where it will compile against OFED and the installed kernel. > >--Jeff > >--------------------------- >Jeff Johnson >Co-Founder >Aeon Computing > >jeff.johnson at aeoncomputing.com >www.aeoncomputing.com >t: 858-412-3810 x101 f: 858-412-3845 > >4170 Morena Boulevard, Suite D - San Diego, CA 92117 > >/* Follow us on Twitter - @AeonComputing */ > > > > >On 12/28/12 3:54 PM, Jason Brooks wrote: >> Hello, >> >> I am having trouble installing the server modules for lustre 2.1.4 and >> use mellanox''s OFED distribution so we may use infiniband. Would you >> folks look at my procedure and results below and let me know what you >> think? Thanks very much! >> >> The mellanox ofed installation builds and installs some kernel modules >> too, so I used this method to ensure OFED compiled against the correct >> kernel. This is on centos 6.3. >> >> 1. download all lustre rpms from whamcloud >> 2. install kernel, kernel-firmware, kernel-headers, and kernel-devel >> 1. in this case, it''s the rpm files with >> "2.6.32-279.14.1.el6_lustre.x86_64" in their name >> 3. reboot into this lustre kernel >> 4. install the remaining rpms >> 5. download ofed from mellanox >> "MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso" >> 1. build mellanox ofed bits using the lustre kernel and >> kernel-devel info >> 2. install mellanox ofed >> 6. reboot >> 7. upon reboot, if I do NOT have o2ib3 in my lnet networks >> parameters, I can modprobe lnet and lustre. >> 8. if I DO have o2ib3 present in the lnet parameters, running >> modprobe lustre gets me: >> >> >>ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld >>.ko): >> Input/output error >> WARNING: Error inserting fid >> >>(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/ >>fid.ko): >> Input/output error >> WARNING: Error inserting mdc >> >>(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/ >>mdc.ko): >> Input/output error >> WARNING: Error inserting osc >> >>(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/ >>osc.ko): >> Input/output error >> WARNING: Error inserting lov >> >>(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/ >>lov.ko): >> Input/output error >> FATAL: Error inserting lustre >> >>(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/ >>lustre.ko): >> Input/output error >> >> >> dmesg shows: >> ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap >> ko2iblnd: Unknown symbol ib_fmr_pool_unmap >> ko2iblnd: disagrees about version of symbol ib_create_cq >> ko2iblnd: Unknown symbol ib_create_cq >> ? >> >> >> >> >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss at lists.lustre.org >http://lists.lustre.org/mailman/listinfo/lustre-discuss
Ken Hornstein
2012-Dec-29 04:56 UTC
[Lustre-discuss] problem with installing lustre and ofed
>That''s good to know kernel-ib comes with the lustre stock install. > >What about the rest of the OFED tools? I mean things like ibdiagnet, >ibstatus, etc? (I will look at the contents of the other rpms and see >what I can learn)I think Jeff missed a few steps. If you want the _server-side_ packages, what you need to do is: - Install a Lustre-patched kernel, including devel packages (you can use the ones from Whamcloud if they''re suitable). - Build your OFED against that kernel & install it. - Compile Lustre against the Lustre-patched kernel and the OFED. This is the tricky part; you need to make sure to tell Lustre to link against the right OFED package. There are Lustre build scripts that actually automate all of this; last time I checked, they were only available in the git tree, NOT in the source tarball. Those build scripts are a bit of a pain to use, and I find that I always have to tweak them a bit. But once you figure them all out it makes things easier. Now as for the userspace utilities ... well, you need to make sure they''re not too far off from the kernel. How far is "too far"? Good question. I don''t think they''re guaranteed to work when they don''t match, but in my limited experience minor version differences are ok. --Ken
Brian J. Murrell
2012-Dec-31 16:32 UTC
[Lustre-discuss] problem with installing lustre and ofed
On Fri, 2012-12-28 at 15:54 -0800, Jason Brooks wrote:> Hello,Hi,> I am having trouble installing the server modules for lustre 2.1.4 > and use mellanox''s OFED distributionIs there a particular need for the Mellanox OFED distribution? The Redhat EL 6 kernel comes stock with the inifiniband drivers and stack already baked in and we leverage that and build our Lustre modules RPM against it. So unless there is something particular that you need that is only in the Mellanox OFED distribution and is not already in EL6''s kernels, you should be able to just use the binary kernel and lustre-modules RPMs that we supply and have working inifiniband support. Cheers, b.
Michael Shuey
2013-Jan-01 04:40 UTC
[Lustre-discuss] problem with installing lustre and ofed
RedHat''s OFED tends to lag Mellanox''s. They''re pretty current on bugfixes, but support for the latest hardware is usually 3-6 months behind - it took about 4 months to bring in drivers for our most recent FDR system. Also, support for Mellanox''s advanced features (e.g., MXM, FCA) is often missing. -- Mike Shuey On Mon, Dec 31, 2012 at 11:32 AM, Brian J. Murrell <brian.murrell at linux.intel.com> wrote:> On Fri, 2012-12-28 at 15:54 -0800, Jason Brooks wrote: >> Hello, > > Hi, > >> I am having trouble installing the server modules for lustre 2.1.4 >> and use mellanox''s OFED distribution > > Is there a particular need for the Mellanox OFED distribution? The > Redhat EL 6 kernel comes stock with the inifiniband drivers and stack > already baked in and we leverage that and build our Lustre modules RPM > against it. > > So unless there is something particular that you need that is only in > the Mellanox OFED distribution and is not already in EL6''s kernels, you > should be able to just use the binary kernel and lustre-modules RPMs > that we supply and have working inifiniband support. > > Cheers, > b. > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Ms. Megan Larko
2013-Jan-02 17:11 UTC
[Lustre-discuss] problem with installing lustre and OFED
Greetings Jason, As you have most likely discovered, Mellanox (MLNX) needs to be built into the lustre linux kernel to use InfiniBand. I worked on such an issue recently. The Whamcloud linux kernel 2.1.2-2.6.32_220.17.1.el6_lustre would not work with our Mellanox InfiniBand (IB) drivers optimally. We got the MLXN version 1.8.5 to match our Mellanox hardware and had to do the dance already described to you in this list of... 1. downloading all of the appropriate (Whamcloud) lustre linux kernels, header and devel rpms 2. boot into the lustre kernel 3. in our /usr/src/lustre-2.1.2 directory built lustre against the Mellanox "Module.symvers" information (which is why you see the "Input/Output" errors on fid.ko, mdc.ko, osc.ko, lov.ko and because of the aforementioned items, the lustre.ko. The MLNX version 1.8.5 that we needed was in the /usr/src/ofa_kernel directory (with the Module.symvers etc....) We used the defaults other than the o2ib so our command in the /usr/src/lustre-2.1.2 directory looked like "./configure --with-o2ib=/usr/src/ofa_kernel" 4. next we issued "make" 5. next we chose to run a "make rpms" command so that we could have rpms for our system for cluster re-building We had to do this for *both* our lustre servers and lustre clients (using the lustre-client Whamcloud kernel, headers, ... So we had the servers and the clients communicating properly over the MLNX ib fabric. In /etc/modprobe.d we used a lustre.conf file to explicitly direct the system to use the o2ib network when starting lustre at boot. Without the above actions the ko2iblnd would not load. Just confirming that you need to build Mellanox on servers and clients to use MLNX IB with Lustre cluster file system. megan