Hi all, I was experimenting with 1.8.2 patchless client and bump into a problem with using Infiniband. * Server kickstarted with RHEL 5.3 * Installed lustre-client-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2.x86_64.rpm * Installed lustre-client-modules-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2.x86_64.rpm * Performed "chkconfig -level 2345 openibd on" so ib0 is working * Performed "rsync -alP /lib/modules/2.6.18-164.11.1.el5 /lib/modules/2.6.18-128.el5 * Performed "depmod -a" * /etc/modprobe.conf has "options lnet networks=o2ib0(ib0),tcp1(eth2),tcp2(eth3)" * /etc/fstab has 10.103.34.42 at o2ib0:/spfs /lustre1_fifo lustre rw,noauto,_netdev 0 0 * Performed "modprobe lnet" * Performed "lctl net up" with error - LNET configure error 100: Network is down * bundle of errors on /var/log/messages, portion of the tail message: Feb 16 16:20:43 bg8mo33sn kernel: ko2iblnd: disagrees about version of symbol ib_destroy_fmr_pool Feb 16 16:20:43 bg8mo33sn kernel: ko2iblnd: Unknown symbol ib_destroy_fmr_pool Feb 16 16:20:43 bg8mo33sn modprobe: FATAL: Error inserting ko2iblnd (/lib/modules/2.6.18-128.el5/2.6.18-164.11.1.el5/kernel/net/lustre/ko2ib lnd.ko): Unknown symbol in module, or unknown parameter (see dmesg) Feb 16 16:20:43 bg8mo33sn kernel: ko2iblnd: disagrees about version of symbol rdma_destroy_id Feb 16 16:20:44 bg8mo33sn kernel: ko2iblnd: Unknown symbol rdma_destroy_id Feb 16 16:20:44 bg8mo33sn kernel: ko2iblnd: disagrees about version of symbol rdma_accept Feb 16 16:20:44 bg8mo33sn kernel: ko2iblnd: Unknown symbol rdma_accept Feb 16 16:20:44 bg8mo33sn kernel: ko2iblnd: disagrees about version of symbol ib_dealloc_pd Feb 16 16:20:44 bg8mo33sn kernel: ko2iblnd: Unknown symbol ib_dealloc_pd Feb 16 16:20:44 bg8mo33sn kernel: ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys Feb 16 16:20:44 bg8mo33sn kernel: ko2iblnd: Unknown symbol ib_fmr_pool_map_phys Feb 16 16:20:44 bg8mo33sn kernel: LustreError: 4768:0:(api-ni.c:1043:lnet_startup_lndnis()) Can''t load LND o2ib, module ko2iblnd, rc=256 * In 1.8.1.1 there was kernel-ib-1.4.2-2.6.18_128.7.1.el5.x86_64.rpm. No problem with 1.8.1.1. * In 1.8.2 there is no specific RPM to handle IB? Using stocked openib from RHEL worked for patched kernel installation. * Are there any other tricks to make this work with patchless client? Thanks in advance... Steve Stephen Chu AT&T Labs CSO C5-3C03 200 Laurel Ave Middletown, NJ stephenchu at att.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100216/fc3a5801/attachment-0001.html
On Tue, 2010-02-16 at 11:28 -0500, CHU, STEPHEN H (ATTSI) wrote:> Hi all,Hi,> ? Server kickstarted with RHEL 5.3So what is the actual version of the kernel you are running?> ? Installed > lustre-client-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2.x86_64.rpm > > ? Installed > lustre-client-modules-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2.x86_64.rpm...> ? Performed ?rsync ? > alP /lib/modules/2.6.18-164.11.1.el5 /lib/modules/2.6.18-128.el5What was the purpose of this? Generally speaking modules from one kernel are not usable in other kernels. I assume since you installed lustre-client-modules-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2.x86_64.rpm that you are using the RH kernel versioned 2.6.18_164.11.1.el5, yes?> Feb 16 16:20:43 bg8mo33sn kernel: ko2iblnd: disagrees about version of > symbol ib_destroy_fmr_poolYou have a mismatch between the lustre modules you are using and the kernel.> ? In 1.8.1.1 there was > kernel-ib-1.4.2-2.6.18_128.7.1.el5.x86_64.rpm. No problem with > 1.8.1.1.Right. This was OFED 1.4.2 from the OFA.> ? In 1.8.2 there is no specific RPM to handle IB? Using stocked > openib from RHEL worked for patched kernel installation.Yes, in 1.8.2 we are using the OFED stack as shipped by RedHat which is a 1.4[.1rc3 IIRC] stack. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100216/426b323e/attachment.bin
Hi Brian, Thanks for the quick reply. We are using RHEL 5.3 kernel-2.6.18-128.el5. Since we are not using the latest kernel, we have, in the past, used rsync to reconcile the differences and it worked fine. This is how we can get the patchless clients to work between out of sync kernel and the patchless client modules. This time it didn''t seems to work out. With the rsync method we were able to retain the use of a specific version of RHEL (specific version being used in production; rolling to newer version will be based on specific schedules) and continue to move forward with test and try out the newest lustre releases. And of course this can only go so far. We will have to go with the patched clients if this is not working out. Any insights will be appreciated. Thanks. Steve -----Original Message----- From: Brian J. Murrell [mailto:Brian.Murrell at Sun.COM] Sent: Tuesday, February 16, 2010 11:42 AM To: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] 1.8.2 Patchless Client On Tue, 2010-02-16 at 11:28 -0500, CHU, STEPHEN H (ATTSI) wrote:> Hi all,Hi,> ? Server kickstarted with RHEL 5.3So what is the actual version of the kernel you are running?> ? Installed > lustre-client-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2.x86_64.rpm > > ? Installed > lustre-client-modules-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2.x86_64.rpm...> ? Performed ?rsync ? > alP /lib/modules/2.6.18-164.11.1.el5 /lib/modules/2.6.18-128.el5What was the purpose of this? Generally speaking modules from one kernel are not usable in other kernels. I assume since you installed lustre-client-modules-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2.x86_64.rpm that you are using the RH kernel versioned 2.6.18_164.11.1.el5, yes?> Feb 16 16:20:43 bg8mo33sn kernel: ko2iblnd: disagrees about version of > symbol ib_destroy_fmr_poolYou have a mismatch between the lustre modules you are using and the kernel.> ? In 1.8.1.1 there was > kernel-ib-1.4.2-2.6.18_128.7.1.el5.x86_64.rpm. No problem with > 1.8.1.1.Right. This was OFED 1.4.2 from the OFA.> ? In 1.8.2 there is no specific RPM to handle IB? Using stocked > openib from RHEL worked for patched kernel installation.Yes, in 1.8.2 we are using the OFED stack as shipped by RedHat which is a 1.4[.1rc3 IIRC] stack. b.
El Martes 16 Febrero 2010, CHU, STEPHEN H (ATTSI) escribi?:> Hi Brian, > > Thanks for the quick reply. > > We are using RHEL 5.3 kernel-2.6.18-128.el5. Since we are not using the > latest kernel, we have, in the past, used rsync to reconcile the > differences and it worked fine. This is how we can get the patchless > clients to work between out of sync kernel and the patchless client > modules. This time it didn''t seems to work out. > > With the rsync method we were able to retain the use of a specific version > of RHEL (specific version being used in production; rolling to newer > version will be based on specific schedules) and continue to move forward > with test and try out the newest lustre releases. And of course this can > only go so far. We will have to go with the patched clients if this is not > working out. > > Any insights will be appreciated. Thanks. > > SteveThe problem might be that you''re mixing 2.6.18-128.* with 2.6.18-164.*, if you stick with versions 2.6.18-128.* you should be fine. At least this is working for our patchless clients: [root at client001 ~] # rpm -qa | grep -e lustre -e kernel | sort kernel-headers-2.6.18-164.11.1.el5 kernel-xen-2.6.18-164.11.1.el5 kernel-xen-devel-2.6.18-164.11.1.el5 lustre-client-1.8.1.1-2.6.18_164.el5xen_01 lustre-client-modules-1.8.1.1-2.6.18_164.el5xen_01 We''re using virtual machines so lustre modules are compiles for kernel-xen-2.6.18-164.el5 but those exact same modules worked fine with kernel-xen-2.6.18-164.6.1.el5 and kernel-xen-2.6.18-164.9.1.el5 and now with kernel-xen-2.6.18-164.11.1.el5 Regards, -- Ricardo J. Barberis Senior SysAdmin - I+D Dattatec.com :: Soluciones de Web Hosting Su Hosting hecho Simple..!
On Tue, 2010-02-16 at 14:18 -0500, CHU, STEPHEN H (ATTSI) wrote:> Hi Brian,Hi,> We are using RHEL 5.3 kernel-2.6.18-128.el5. Since we are not using the latest kernel, we have, in the past, used rsync to reconcile the differences and it worked fine.So just to be clear, you are attempting to use the lustre modules built for RedHat kernel 2.6.18_164.11.1.el5 with kernel 2.6.18-128.el5, right?> This is how we can get the patchless clients to work between out of sync kernel and the patchless client modules.Sometimes, perhaps. I don''t think there are any guarantees though, and you appear to have run into a case where it just isn''t going to work.> We will have to go with the patched clients if this is not working out.TBH, I''m not sure how patched clients are going to help here either given that you seem to have a need to remain on kernel 2.6.18-128.el5 (or else you would have just upgrade it to match the 1.8.2 lustre modules) and the kernel that was released with 1.8.2 is 2.6.18_164.11.1.el5 for both patched and patchless releases. If you are willing to upgrade the kernel to our patched 2.6.18_164.11.1.el5 (which you MUST do to use the patched client) why not just upgrade to the stock RH 2.6.18_164.11.1.el5 and use the 1.8.2. patchless client modules with the kernel they were meant for? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100217/585d8478/attachment.bin
Ricardo, Thanks. I knew I had to sync up the kernel versions. I brought up RHEL 5.4 which contains kernel-2.6.18-164.el5. I then do "rsync -alP 2.6.18-164.11.1.el5 2.6.18-164.el5" and "depmod -a". Everything worked again. Seems like we won''t be able to use 1.8.2 patchless client setup unless we move to RHEL 5.4. Steve -----Original Message----- From: Ricardo J. Barberis [mailto:ricardo.barberis at dattatec.com] Sent: Tuesday, February 16, 2010 2:53 PM To: lustre-discuss at lists.lustre.org Cc: CHU, STEPHEN H (ATTSI) Subject: Re: [Lustre-discuss] 1.8.2 Patchless Client El Martes 16 Febrero 2010, CHU, STEPHEN H (ATTSI) escribi?:> Hi Brian, > > Thanks for the quick reply. > > We are using RHEL 5.3 kernel-2.6.18-128.el5. Since we are not using the > latest kernel, we have, in the past, used rsync to reconcile the > differences and it worked fine. This is how we can get the patchless > clients to work between out of sync kernel and the patchless client > modules. This time it didn''t seems to work out. > > With the rsync method we were able to retain the use of a specific version > of RHEL (specific version being used in production; rolling to newer > version will be based on specific schedules) and continue to move forward > with test and try out the newest lustre releases. And of course this can > only go so far. We will have to go with the patched clients if this is not > working out. > > Any insights will be appreciated. Thanks. > > SteveThe problem might be that you''re mixing 2.6.18-128.* with 2.6.18-164.*, if you stick with versions 2.6.18-128.* you should be fine. At least this is working for our patchless clients: [root at client001 ~] # rpm -qa | grep -e lustre -e kernel | sort kernel-headers-2.6.18-164.11.1.el5 kernel-xen-2.6.18-164.11.1.el5 kernel-xen-devel-2.6.18-164.11.1.el5 lustre-client-1.8.1.1-2.6.18_164.el5xen_01 lustre-client-modules-1.8.1.1-2.6.18_164.el5xen_01 We''re using virtual machines so lustre modules are compiles for kernel-xen-2.6.18-164.el5 but those exact same modules worked fine with kernel-xen-2.6.18-164.6.1.el5 and kernel-xen-2.6.18-164.9.1.el5 and now with kernel-xen-2.6.18-164.11.1.el5 Regards, -- Ricardo J. Barberis Senior SysAdmin - I+D Dattatec.com :: Soluciones de Web Hosting Su Hosting hecho Simple..!
Hi Brian, For RHEL 5.3, yes I was attempting to use the 1.8.2 lustre modules with kernel 2.6.18-128.el5. I have not thought of upgrading the -128 kernel to 2.6.18_164.11.1.el5 as you have suggested. I shall experiment on that and see how it works. Thanks for the reminder. Steve -----Original Message----- From: Brian J. Murrell [mailto:Brian.Murrell at Sun.COM] Sent: Wednesday, February 17, 2010 10:55 AM To: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] 1.8.2 Patchless Client On Tue, 2010-02-16 at 14:18 -0500, CHU, STEPHEN H (ATTSI) wrote:> Hi Brian,Hi,> We are using RHEL 5.3 kernel-2.6.18-128.el5. Since we are not using the latest kernel, we have, in the past, used rsync to reconcile the differences and it worked fine.So just to be clear, you are attempting to use the lustre modules built for RedHat kernel 2.6.18_164.11.1.el5 with kernel 2.6.18-128.el5, right?> This is how we can get the patchless clients to work between out of sync kernel and the patchless client modules.Sometimes, perhaps. I don''t think there are any guarantees though, and you appear to have run into a case where it just isn''t going to work.> We will have to go with the patched clients if this is not working out.TBH, I''m not sure how patched clients are going to help here either given that you seem to have a need to remain on kernel 2.6.18-128.el5 (or else you would have just upgrade it to match the 1.8.2 lustre modules) and the kernel that was released with 1.8.2 is 2.6.18_164.11.1.el5 for both patched and patchless releases. If you are willing to upgrade the kernel to our patched 2.6.18_164.11.1.el5 (which you MUST do to use the patched client) why not just upgrade to the stock RH 2.6.18_164.11.1.el5 and use the 1.8.2. patchless client modules with the kernel they were meant for? b.
El Mi?rcoles 17 Febrero 2010, CHU, STEPHEN H (ATTSI) escribi?:> Ricardo, > > Thanks. I knew I had to sync up the kernel versions. I brought up RHEL 5.4 > which contains kernel-2.6.18-164.el5. I then do "rsync -alP > 2.6.18-164.11.1.el5 2.6.18-164.el5" and "depmod -a". Everything worked > again. Seems like we won''t be able to use 1.8.2 patchless client setup > unless we move to RHEL 5.4. > > SteveWell, you could recompile lustre 1.8.2 source rpm (*) against a 2.6.18.128 kernel instead of upgraiding all of your clients to RHEL 5.4. (*) lustre-client-source-1.8.2-2.6.18_164.11.1.el5_lustre.1.8.2.i686.rpm if I''m not mistaken. I haven''t tried but it should be painless. Best regards, -- Ricardo J. Barberis Senior SysAdmin - I+D Dattatec.com :: Soluciones de Web Hosting Su Hosting hecho Simple..! ------------------------------------------ Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo son confidenciales, de uso exclusivo para el destinatario del mismo. La divulgaci?n y/o uso del mismo sin autorizaci?n por parte de Dattatec.com queda prohibida. Dattatec.com no se hace responsable del mensaje por la falsificaci?n y/o alteraci?n del mismo. De no ser Ud. el destinatario del mismo y lo ha recibido por error, por favor notifique al remitente y elim?nelo de su sistema. Confidentiality Note: This message and any attachments (the message) are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be liable for the message if altered or falsified. If you are not the intended addressee of this message, please cancel it immediately and inform the sender. Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem conter dados confidenciais ou privilegiados. Se voc? os recebeu por engano ou n?o ? um dos destinat?rios aos quais ela foi endere?ada, por favor destrua-a e a todos os seus eventuais anexos ou copias realizadas, imediatamente. ? proibida a reten??o, distribui??o, divulga??o ou utiliza??o de quaisquer informa??es aqui contidas. Por favor, informe-nos sobre o recebimento indevido desta mensagem, retornando-a para o autor.