Hi, I don''t know if this is a bug or it''s it''s a misconfig or something else. What I have is: server = 1.6.4.1+vanilla 2.6.18.8 (mgs+2*ost+mdt all on a single server) clients = cvs.20080116+2.6.23.12 I mounted the server from several clients and several hours later noticed the top display below. dmesg show some lustre errors (also below).Can someone comment on what could be going on? Thanks, Ron top - 18:28:09 up 5 days, 3:36, 1 user, load average: 12.00, 12.00, 11.94 Tasks: 168 total, 13 running, 136 sleeping, 0 stopped, 19 zombie Cpu(s): 0.0% us, 37.5% sy, 0.0% ni, 62.5% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 16468196k total, 526828k used, 15941368k free, 42996k buffers Swap: 4192924k total, 0k used, 4192924k free, 294916k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1533 root 20 0 0 0 0 R 100 0.0 308:54.05 ll_cfg_requeue 32071 root 20 0 0 0 0 R 100 0.0 308:15.95 socknal_reaper 32073 root 20 0 0 0 0 R 100 0.0 308:48.90 ptlrpcd 1 root 20 0 4832 588 492 R 0 0.0 0:02.48 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd Lustre: OBD class driver, info at clusterfs.com Lustre Version: 1.6.4.50 Build Version: b1_6-20080210103536- CHANGED-.usr.src.linux-2.6.23.12-2.6.23.12 Lustre: Added LNI 192.168.241.42 at tcp [8/256] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; info at clusterfs.com Lustre: Binding irq 17 to CPU 0 with cmd: echo 1 > /proc/irq/17/ smp_affinity Lustre: MGC192.168.241.247 at tcp: Reactivating import Lustre: setting import datafs-OST0002_UUID INACTIVE by administrator request Lustre: datafs-OST0002-osc-ffff810241ad7800.osc: set parameter active=0 LustreError: 32181:0:(lov_obd.c:230:lov_connect_obd()) not connecting OSC datafs-OST0002_UUID; administratively disabled Lustre: Client datafs-client has started Lustre: Request x7684 sent from MGC192.168.241.247 at tcp to NID 192.168.241.247 at tcp 15s ago has timed out (limit 15s). LustreError: 166-1: MGC192.168.241.247 at tcp: Connection to service MGS via nid 192.168.241.247 at tcp was lost; in progress operations using this service will fail. LustreError: 32073:0:(import.c:212:ptlrpc_invalidate_import()) MGS: rc = -110 waiting for callback (1 != 0) LustreError: 32073:0:(import.c:216:ptlrpc_invalidate_import()) @@@ still on sending list req at ffff81040fa14600 x7684/t0 o400->MGS at 192.168.241.247@tcp:26/25 lens 128/256 e 0 to 11 dl 1202843837ref 1 fl Rpc:EXN/0/0 rc -4/0 Lustre: Request x7685 sent from datafs-MDT0000-mdc-ffff810241ad7800 to NID 192.168.241.247 at tcp 115s ago has timed out (limit 15s). Lustre: datafs-MDT0000-mdc-ffff810241ad7800: Connection to service datafs-MDT0000 via nid 192.168.241.247 at tcp was lost; in progress operations using this service will wait for recovery to complete. Lustre: MGC192.168.241.247 at tcp: Reactivating import Lustre: MGC192.168.241.247 at tcp: Connection restored to service MGS using nid 192.168.241.247 at tcp. LustreError: 32059:0:(events.c:116:reply_in_callback()) ASSERTION(ev->mlength == lustre_msg_early_size()) failedLustreError: 32059:0:(tracefile.c:432:libcfs_assertion_failed()) LBUG Call Trace: [<ffffffff88000b53>] :libcfs:lbug_with_loc+0x73/0xc0 [<ffffffff88007bd4>] :libcfs:libcfs_assertion_failed+0x54/0x60 [<ffffffff8815c746>] :ptlrpc:reply_in_callback+0x426/0x430 [<ffffffff88027f35>] :lnet:lnet_enq_event_locked+0xc5/0xf0 [<ffffffff88028475>] :lnet:lnet_finalize+0x1e5/0x270 [<ffffffff880625d9>] :ksocklnd:ksocknal_process_receive+0x469/0xab0 [<ffffffff88060350>] :ksocklnd:ksocknal_tx_done+0x80/0x1e0 [<ffffffff8806301c>] :ksocklnd:ksocknal_scheduler+0x12c/0x7e0 [<ffffffff8024e850>] autoremove_wake_function+0x0/0x30 [<ffffffff8024e850>] autoremove_wake_function+0x0/0x30 [<ffffffff8020c918>] child_rip+0xa/0x12 [<ffffffff88062ef0>] :ksocklnd:ksocknal_scheduler+0x0/0x7e0 [<ffffffff8020c90e>] child_rip+0x0/0x12 LustreError: dumping log to /tmp/lustre-log.1202843942.32059 Lustre: Request x7707 sent from MGC192.168.241.247 at tcp to NID 192.168.241.247 at tcp 15s ago has timed out (limit 15s). Lustre: Skipped 2 previous similar messages
Eric Barton
2008-Feb-13 12:37 UTC
[Lustre-discuss] [Lustre-devel] lustre client goes wacky?
Ron, I''m sending this to lustre-discuss, which is a more suitable forum.> -----Original Message----- > From: lustre-devel-bounces at lists.lustre.org > [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of Ron > Sent: 13 February 2008 12:51 AM > To: lustre-devel at clusterfs.com > Cc: ron at fnal.gov > Subject: [Lustre-devel] lustre client goes wacky? > > Hi, > I don''t know if this is a bug or it''s it''s a misconfig or something > else. > > What I have is: > server = 1.6.4.1+vanilla 2.6.18.8 (mgs+2*ost+mdt all on a single > server) > clients = cvs.20080116+2.6.23.12 > > I mounted the server from several clients and several hours later > noticed the top display below. dmesg show some lustre errors (also > below).Can someone comment on what could be going on? > > Thanks, > Ron > > top - 18:28:09 up 5 days, 3:36, 1 user, load average: 12.00, 12.00, > 11.94 > Tasks: 168 total, 13 running, 136 sleeping, 0 stopped, 19 zombie > Cpu(s): 0.0% us, 37.5% sy, 0.0% ni, 62.5% id, 0.0% wa, 0.0% hi, > 0.0% si > Mem: 16468196k total, 526828k used, 15941368k free, 42996k > buffers > Swap: 4192924k total, 0k used, 4192924k free, 294916k > cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 1533 root 20 0 0 0 0 R 100 0.0 308:54.05 > ll_cfg_requeue > 32071 root 20 0 0 0 0 R 100 0.0 308:15.95 > socknal_reaper > 32073 root 20 0 0 0 0 R 100 0.0 308:48.90 > ptlrpcd > 1 root 20 0 4832 588 492 R 0 0.0 0:02.48 > init > 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 > kthreadd > > > Lustre: OBD class driver, info at clusterfs.com > Lustre Version: 1.6.4.50 > Build Version: b1_6-20080210103536- > CHANGED-.usr.src.linux-2.6.23.12-2.6.23.12 > Lustre: Added LNI 192.168.241.42 at tcp [8/256] > Lustre: Accept secure, port 988 > Lustre: Lustre Client File System; info at clusterfs.com > Lustre: Binding irq 17 to CPU 0 with cmd: echo 1 > /proc/irq/17/ > smp_affinity > Lustre: MGC192.168.241.247 at tcp: Reactivating import > Lustre: setting import datafs-OST0002_UUID INACTIVE by administrator > request > Lustre: datafs-OST0002-osc-ffff810241ad7800.osc: set parameter > active=0 > LustreError: 32181:0:(lov_obd.c:230:lov_connect_obd()) not connecting > OSC datafs-OST0002_UUID; administratively disabled > Lustre: Client datafs-client has started > Lustre: Request x7684 sent from MGC192.168.241.247 at tcp to NID > 192.168.241.247 at tcp 15s ago has timed out (limit 15s). > LustreError: 166-1: MGC192.168.241.247 at tcp: Connection to service MGS > via nid 192.168.241.247 at tcp was lost; in progress operations using > this service will fail. > LustreError: 32073:0:(import.c:212:ptlrpc_invalidate_import()) MGS: rc > = -110 waiting for callback (1 != 0) > LustreError: 32073:0:(import.c:216:ptlrpc_invalidate_import()) @@@ > still on sending list req at ffff81040fa14600 x7684/t0 o400- > >MGS at 192.168.241.247@tcp:26/25 lens 128/256 e 0 to 11 dl 1202843837 > ref 1 fl Rpc:EXN/0/0 rc -4/0 > Lustre: Request x7685 sent from datafs-MDT0000-mdc-ffff810241ad7800 to > NID 192.168.241.247 at tcp 115s ago has timed out (limit 15s). > Lustre: datafs-MDT0000-mdc-ffff810241ad7800: Connection to service > datafs-MDT0000 via nid 192.168.241.247 at tcp was lost; in progress > operations using this service will wait for recovery to complete. > Lustre: MGC192.168.241.247 at tcp: Reactivating import > Lustre: MGC192.168.241.247 at tcp: Connection restored to service MGS > using nid 192.168.241.247 at tcp. > LustreError: 32059:0:(events.c:116:reply_in_callback()) ASSERTION(ev- > >mlength == lustre_msg_early_size()) failed > LustreError: 32059:0:(tracefile.c:432:libcfs_assertion_failed()) LBUG > > Call Trace: > [<ffffffff88000b53>] :libcfs:lbug_with_loc+0x73/0xc0 > [<ffffffff88007bd4>] :libcfs:libcfs_assertion_failed+0x54/0x60 > [<ffffffff8815c746>] :ptlrpc:reply_in_callback+0x426/0x430 > [<ffffffff88027f35>] :lnet:lnet_enq_event_locked+0xc5/0xf0 > [<ffffffff88028475>] :lnet:lnet_finalize+0x1e5/0x270 > [<ffffffff880625d9>] :ksocklnd:ksocknal_process_receive+0x469/0xab0 > [<ffffffff88060350>] :ksocklnd:ksocknal_tx_done+0x80/0x1e0 > [<ffffffff8806301c>] :ksocklnd:ksocknal_scheduler+0x12c/0x7e0 > [<ffffffff8024e850>] autoremove_wake_function+0x0/0x30 > [<ffffffff8024e850>] autoremove_wake_function+0x0/0x30 > [<ffffffff8020c918>] child_rip+0xa/0x12 > [<ffffffff88062ef0>] :ksocklnd:ksocknal_scheduler+0x0/0x7e0 > [<ffffffff8020c90e>] child_rip+0x0/0x12 > > LustreError: dumping log to /tmp/lustre-log.1202843942.32059 > Lustre: Request x7707 sent from MGC192.168.241.247 at tcp to NID > 192.168.241.247 at tcp 15s ago has timed out (limit 15s). > Lustre: Skipped 2 previous similar messages > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel >
Nathaniel Rutman
2008-Feb-13 16:41 UTC
[Lustre-discuss] [Lustre-devel] lustre client goes wacky?
The clients you pulled from CVS have a feature called adaptive timeouts which apparently are having an issue with your 1.6.4.1 servers. Eric, can you make sure our interoperability is working? Moving this thread to lustre-discuss; devel is more for architecture/coding stuff. Ron wrote:> Hi, > I don''t know if this is a bug or it''s it''s a misconfig or something > else. > > What I have is: > server = 1.6.4.1+vanilla 2.6.18.8 (mgs+2*ost+mdt all on a single > server) > clients = cvs.20080116+2.6.23.12 > > I mounted the server from several clients and several hours later > noticed the top display below. dmesg show some lustre errors (also > below).Can someone comment on what could be going on? > > Thanks, > Ron > > top - 18:28:09 up 5 days, 3:36, 1 user, load average: 12.00, 12.00, > 11.94 > Tasks: 168 total, 13 running, 136 sleeping, 0 stopped, 19 zombie > Cpu(s): 0.0% us, 37.5% sy, 0.0% ni, 62.5% id, 0.0% wa, 0.0% hi, > 0.0% si > Mem: 16468196k total, 526828k used, 15941368k free, 42996k > buffers > Swap: 4192924k total, 0k used, 4192924k free, 294916k > cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 1533 root 20 0 0 0 0 R 100 0.0 308:54.05 > ll_cfg_requeue > 32071 root 20 0 0 0 0 R 100 0.0 308:15.95 > socknal_reaper > 32073 root 20 0 0 0 0 R 100 0.0 308:48.90 > ptlrpcd > 1 root 20 0 4832 588 492 R 0 0.0 0:02.48 > init > 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 > kthreadd > > > Lustre: OBD class driver, info at clusterfs.com > Lustre Version: 1.6.4.50 > Build Version: b1_6-20080210103536- > CHANGED-.usr.src.linux-2.6.23.12-2.6.23.12 > Lustre: Added LNI 192.168.241.42 at tcp [8/256] > Lustre: Accept secure, port 988 > Lustre: Lustre Client File System; info at clusterfs.com > Lustre: Binding irq 17 to CPU 0 with cmd: echo 1 > /proc/irq/17/ > smp_affinity > Lustre: MGC192.168.241.247 at tcp: Reactivating import > Lustre: setting import datafs-OST0002_UUID INACTIVE by administrator > request > Lustre: datafs-OST0002-osc-ffff810241ad7800.osc: set parameter > active=0 > LustreError: 32181:0:(lov_obd.c:230:lov_connect_obd()) not connecting > OSC datafs-OST0002_UUID; administratively disabled > Lustre: Client datafs-client has started > Lustre: Request x7684 sent from MGC192.168.241.247 at tcp to NID > 192.168.241.247 at tcp 15s ago has timed out (limit 15s). > LustreError: 166-1: MGC192.168.241.247 at tcp: Connection to service MGS > via nid 192.168.241.247 at tcp was lost; in progress operations using > this service will fail. > LustreError: 32073:0:(import.c:212:ptlrpc_invalidate_import()) MGS: rc > = -110 waiting for callback (1 != 0) > LustreError: 32073:0:(import.c:216:ptlrpc_invalidate_import()) @@@ > still on sending list req at ffff81040fa14600 x7684/t0 o400- > >> MGS at 192.168.241.247@tcp:26/25 lens 128/256 e 0 to 11 dl 1202843837 >> > ref 1 fl Rpc:EXN/0/0 rc -4/0 > Lustre: Request x7685 sent from datafs-MDT0000-mdc-ffff810241ad7800 to > NID 192.168.241.247 at tcp 115s ago has timed out (limit 15s). > Lustre: datafs-MDT0000-mdc-ffff810241ad7800: Connection to service > datafs-MDT0000 via nid 192.168.241.247 at tcp was lost; in progress > operations using this service will wait for recovery to complete. > Lustre: MGC192.168.241.247 at tcp: Reactivating import > Lustre: MGC192.168.241.247 at tcp: Connection restored to service MGS > using nid 192.168.241.247 at tcp. > LustreError: 32059:0:(events.c:116:reply_in_callback()) ASSERTION(ev- > >> mlength == lustre_msg_early_size()) failed >> > LustreError: 32059:0:(tracefile.c:432:libcfs_assertion_failed()) LBUG > > Call Trace: > [<ffffffff88000b53>] :libcfs:lbug_with_loc+0x73/0xc0 > [<ffffffff88007bd4>] :libcfs:libcfs_assertion_failed+0x54/0x60 > [<ffffffff8815c746>] :ptlrpc:reply_in_callback+0x426/0x430 > [<ffffffff88027f35>] :lnet:lnet_enq_event_locked+0xc5/0xf0 > [<ffffffff88028475>] :lnet:lnet_finalize+0x1e5/0x270 > [<ffffffff880625d9>] :ksocklnd:ksocknal_process_receive+0x469/0xab0 > [<ffffffff88060350>] :ksocklnd:ksocknal_tx_done+0x80/0x1e0 > [<ffffffff8806301c>] :ksocklnd:ksocknal_scheduler+0x12c/0x7e0 > [<ffffffff8024e850>] autoremove_wake_function+0x0/0x30 > [<ffffffff8024e850>] autoremove_wake_function+0x0/0x30 > [<ffffffff8020c918>] child_rip+0xa/0x12 > [<ffffffff88062ef0>] :ksocklnd:ksocknal_scheduler+0x0/0x7e0 > [<ffffffff8020c90e>] child_rip+0x0/0x12 > > LustreError: dumping log to /tmp/lustre-log.1202843942.32059 > Lustre: Request x7707 sent from MGC192.168.241.247 at tcp to NID > 192.168.241.247 at tcp 15s ago has timed out (limit 15s). > Lustre: Skipped 2 previous similar messages > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel >
Yes there seems some problems. I filed a bug 14881 to track this. Ron, thanks for reporting this. In the mean time please don''t use CVS version with your 1.6.4 server until 14881 get fixed. -- Eric Nathaniel Rutman wrote:> The clients you pulled from CVS have a feature called adaptive timeouts > which apparently > are having an issue with your 1.6.4.1 servers. Eric, can you make sure > our interoperability is working? > > Moving this thread to lustre-discuss; devel is more for > architecture/coding stuff. > > Ron wrote: >> Hi, >> I don''t know if this is a bug or it''s it''s a misconfig or something >> else. >> >> What I have is: >> server = 1.6.4.1+vanilla 2.6.18.8 (mgs+2*ost+mdt all on a single >> server) >> clients = cvs.20080116+2.6.23.12 >> >> I mounted the server from several clients and several hours later >> noticed the top display below. dmesg show some lustre errors (also >> below).Can someone comment on what could be going on? >> >> Thanks, >> Ron >> >> top - 18:28:09 up 5 days, 3:36, 1 user, load average: 12.00, 12.00, >> 11.94 >> Tasks: 168 total, 13 running, 136 sleeping, 0 stopped, 19 zombie >> Cpu(s): 0.0% us, 37.5% sy, 0.0% ni, 62.5% id, 0.0% wa, 0.0% hi, >> 0.0% si >> Mem: 16468196k total, 526828k used, 15941368k free, 42996k >> buffers >> Swap: 4192924k total, 0k used, 4192924k free, 294916k >> cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> COMMAND >> 1533 root 20 0 0 0 0 R 100 0.0 308:54.05 >> ll_cfg_requeue >> 32071 root 20 0 0 0 0 R 100 0.0 308:15.95 >> socknal_reaper >> 32073 root 20 0 0 0 0 R 100 0.0 308:48.90 >> ptlrpcd >> 1 root 20 0 4832 588 492 R 0 0.0 0:02.48 >> init >> 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 >> kthreadd >> >> >> Lustre: OBD class driver, info at clusterfs.com >> Lustre Version: 1.6.4.50 >> Build Version: b1_6-20080210103536- >> CHANGED-.usr.src.linux-2.6.23.12-2.6.23.12 >> Lustre: Added LNI 192.168.241.42 at tcp [8/256] >> Lustre: Accept secure, port 988 >> Lustre: Lustre Client File System; info at clusterfs.com >> Lustre: Binding irq 17 to CPU 0 with cmd: echo 1 > /proc/irq/17/ >> smp_affinity >> Lustre: MGC192.168.241.247 at tcp: Reactivating import >> Lustre: setting import datafs-OST0002_UUID INACTIVE by administrator >> request >> Lustre: datafs-OST0002-osc-ffff810241ad7800.osc: set parameter >> active=0 >> LustreError: 32181:0:(lov_obd.c:230:lov_connect_obd()) not connecting >> OSC datafs-OST0002_UUID; administratively disabled >> Lustre: Client datafs-client has started >> Lustre: Request x7684 sent from MGC192.168.241.247 at tcp to NID >> 192.168.241.247 at tcp 15s ago has timed out (limit 15s). >> LustreError: 166-1: MGC192.168.241.247 at tcp: Connection to service MGS >> via nid 192.168.241.247 at tcp was lost; in progress operations using >> this service will fail. >> LustreError: 32073:0:(import.c:212:ptlrpc_invalidate_import()) MGS: rc >> = -110 waiting for callback (1 != 0) >> LustreError: 32073:0:(import.c:216:ptlrpc_invalidate_import()) @@@ >> still on sending list req at ffff81040fa14600 x7684/t0 o400- >> >>> MGS at 192.168.241.247@tcp:26/25 lens 128/256 e 0 to 11 dl 1202843837 >>> >> ref 1 fl Rpc:EXN/0/0 rc -4/0 >> Lustre: Request x7685 sent from datafs-MDT0000-mdc-ffff810241ad7800 to >> NID 192.168.241.247 at tcp 115s ago has timed out (limit 15s). >> Lustre: datafs-MDT0000-mdc-ffff810241ad7800: Connection to service >> datafs-MDT0000 via nid 192.168.241.247 at tcp was lost; in progress >> operations using this service will wait for recovery to complete. >> Lustre: MGC192.168.241.247 at tcp: Reactivating import >> Lustre: MGC192.168.241.247 at tcp: Connection restored to service MGS >> using nid 192.168.241.247 at tcp. >> LustreError: 32059:0:(events.c:116:reply_in_callback()) ASSERTION(ev- >> >>> mlength == lustre_msg_early_size()) failed >>> >> LustreError: 32059:0:(tracefile.c:432:libcfs_assertion_failed()) LBUG >> >> Call Trace: >> [<ffffffff88000b53>] :libcfs:lbug_with_loc+0x73/0xc0 >> [<ffffffff88007bd4>] :libcfs:libcfs_assertion_failed+0x54/0x60 >> [<ffffffff8815c746>] :ptlrpc:reply_in_callback+0x426/0x430 >> [<ffffffff88027f35>] :lnet:lnet_enq_event_locked+0xc5/0xf0 >> [<ffffffff88028475>] :lnet:lnet_finalize+0x1e5/0x270 >> [<ffffffff880625d9>] :ksocklnd:ksocknal_process_receive+0x469/0xab0 >> [<ffffffff88060350>] :ksocklnd:ksocknal_tx_done+0x80/0x1e0 >> [<ffffffff8806301c>] :ksocklnd:ksocknal_scheduler+0x12c/0x7e0 >> [<ffffffff8024e850>] autoremove_wake_function+0x0/0x30 >> [<ffffffff8024e850>] autoremove_wake_function+0x0/0x30 >> [<ffffffff8020c918>] child_rip+0xa/0x12 >> [<ffffffff88062ef0>] :ksocklnd:ksocknal_scheduler+0x0/0x7e0 >> [<ffffffff8020c90e>] child_rip+0x0/0x12 >> >> LustreError: dumping log to /tmp/lustre-log.1202843942.32059 >> Lustre: Request x7707 sent from MGC192.168.241.247 at tcp to NID >> 192.168.241.247 at tcp 15s ago has timed out (limit 15s). >> Lustre: Skipped 2 previous similar messages >> >> >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Craig Tierney
2008-Feb-13 21:49 UTC
[Lustre-discuss] Trying to build lustre with unsupported kernel
I am trying to build lustre against and unsupported kernel (2.6.20.20). I am having some issues with it. First, I patched the kernel. Quilt had some difficulties with some of the 2.6.18-vanilla patches, but I added them by hand. I was able to build and boot the kernel. Then, I tried to build the lustre package (1.6.4.2). I did the following: # ./configure --prefix=/opt/lustre/1.6.4.2--2.6.20.20-lustre \ --with-linux=/usr/src/kernels/linux-2.6.20.20-lustre I got most of the way through the configure process. I saw the line: checking for /usr/src/kernels/linux-2.6.20.20-lustre/include/linux/lustre_version.h... yes Which is a good sign, that the configure script though I patched the kernel. However, when it got to configuring ldiskfs, I had problems. The configure script didn''t have any option for the 2.6.20.20 kernel. The following message was displayed: checking which ldiskfs series to use... configure: WARNING: Unknown kernel version 2.6.20.20, fix ldiskfs/configure.ac Ok, So I went and I modified configure.ac to include support for 2.6.20.20. I just told it to use the 2.6.18-vanilla case (to see what would happen). However, I haven''t gotten any further. To build properly, I need to regenerate the configure file from configure.ac. I am not sure the proper way to do that. when I run make in the ldiskfs directory, that does re-create the configure file. However, it does not do it successfully. Many of the macros (is that the right term) in the configure.ac file are not expanded properly. Here is the output from make run from the root directory: [root at wupdate lustre-1.6.4.2]# make test -d CVS || exit 0; \ list=""; for mod in $list; do \ perl ./build/kabi -v archive $HOME/nonfree $mod || exit $?; \ done make all-recursive make[1]: Entering directory `/usr/src/lustre/lustre-1.6.4.2'' Making all in ldiskfs make[2]: Entering directory `/usr/src/lustre/lustre-1.6.4.2/ldiskfs'' cd . && /bin/sh /usr/src/lustre/lustre-1.6.4.2/ldiskfs/missing --run aclocal-1.7 cd . && \ /bin/sh /usr/src/lustre/lustre-1.6.4.2/ldiskfs/missing --run automake-1.7 --foreign autoMakefile cd . && /bin/sh /usr/src/lustre/lustre-1.6.4.2/ldiskfs/missing --run autoconf /bin/sh ./config.status --recheck running /bin/sh ./configure --prefix=/opt/lustre/1.6.4.2--2.6.20.20-lustre --with-linux=/usr/src/kernels/linux-2.6.20.20-lustre --with-lustre-hack --with-sockets --cache-file=/dev/null --srcdir=. --no-create --no-recursion checking build system type... x86_64-unknown-linux-gnu checking host system type... x86_64-unknown-linux-gnu checking target system type... x86_64-unknown-linux-gnu checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... gawk checking whether make sets $(MAKE)... yes checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ANSI C... none needed checking for style of include used by make... GNU checking dependency style of gcc... none ./configure: line 2865: LB_CANONICAL_SYSTEM: command not found ./configure: line 2866: LB_INCLUDE_RULES: command not found ./configure: line 2867: LB_PROG_CC: command not found checking whether to build kernel modules... yes ./configure: line 2905: LB_PROG_LINUX: command not found ./configure: line 2906: LB_LINUX_MODPOST: command not found ./configure: line 2909: LB_CONFIG_HEADERS: command not found checking whether to enable quilt for making ldiskfs... yes checking for patch... /usr/bin/patch checking for quilt... no ./configure: line 3071: LB_DEFINE_LDISKFS_OPTIONS: command not found checking which ldiskfs series to use... configure: WARNING: Unknown kernel version , fix ldiskfs/configure.ac ./configure: line 3096: LB_CONFIG_FILES: command not found configure: creating ./config.status cd . && /bin/sh ./config.status autoMakefile config.status: error: invalid argument: autoMakefile What is the proper way to rebuild the configure file? Thanks, Craig -- Craig Tierney (craig.tierney at noaa.gov)
David Simas
2008-Feb-13 22:26 UTC
[Lustre-discuss] Trying to build lustre with unsupported kernel
On Wed, Feb 13, 2008 at 02:49:28PM -0700, Craig Tierney wrote:> I am trying to build lustre against and unsupported > kernel (2.6.20.20). I am having some issues with it. > > ./configure: line 3096: LB_CONFIG_FILES: command not found > configure: creating ./config.status > cd . && /bin/sh ./config.status autoMakefile > config.status: error: invalid argument: autoMakefile > > What is the proper way to rebuild the configure file?You probably don''t need them all, but the usual sequence of commands is aclocal autoconf autoheader automake -a You can find this documented in many places on the web. Search for something like "GNU build system". David S.> > Thanks, > Craig > > > -- > Craig Tierney (craig.tierney at noaa.gov)
Andreas Dilger
2008-Feb-13 22:33 UTC
[Lustre-discuss] Trying to build lustre with unsupported kernel
On Feb 13, 2008 14:49 -0700, Craig Tierney wrote:> I am trying to build lustre against and unsupported > kernel (2.6.20.20). I am having some issues with it.Do you really need this kernel on the servers, or could it be a supported patched kernel on the servers (e.g. RHEL5 2.6.18) and the 2.6.20.20 kernel patchless on the clients? Making new kernel patch series is not for the faint of heart, and is generally not worthwhile for an "older" kernel like this. We have patches for 2.6.22 in our CVS and in bugzilla, so any work to get 2.6.20 on the server would be lost effort. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Patrick Winnertz
2008-Feb-13 22:34 UTC
[Lustre-discuss] Trying to build lustre with unsupported kernel
Hello, Am Mittwoch, 13. Februar 2008 22:49:28 schrieb Craig Tierney:> Ok, So I went and I modified configure.ac to include support for > 2.6.20.20. I just told it to use the 2.6.18-vanilla case (to see what > would happen). However, I haven''t gotten any further. To build > properly, I need to regenerate the configure file from configure.ac. I > am not sure the proper way to do that. when I run make in the ldiskfs > directory, that does re-create the configure file. However, it does not > do it successfully. Many of the macros (is that the right term) in the > configure.ac file are not expanded properly.> What is the proper way to rebuild the configure file?I would suggest to have a look on the debian pkg-lustre team: http://svn.debian.org/wsvn/pkg-lustre/trunk/ Maybe there is something usefull for you. (we build also lustre for kernel 2.6.22 atm) Greetings Winnie> > Thanks, > Craig-- Patrick Winnertz Tel.: +49 (0) 2161 / 4643 - 0 credativ GmbH, HRB M?nchengladbach 12080 Hohenzollernstr. 133, 41061 M?nchengladbach Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz
Andreas Dilger
2008-Feb-13 22:41 UTC
[Lustre-discuss] Trying to build lustre with unsupported kernel
On Feb 13, 2008 14:26 -0800, David Simas wrote:> On Wed, Feb 13, 2008 at 02:49:28PM -0700, Craig Tierney wrote: > > I am trying to build lustre against and unsupported > > kernel (2.6.20.20). I am having some issues with it. > > > > ./configure: line 3096: LB_CONFIG_FILES: command not found > > configure: creating ./config.status > > cd . && /bin/sh ./config.status autoMakefile > > config.status: error: invalid argument: autoMakefile > > > > What is the proper way to rebuild the configure file? > > You probably don''t need them all, but the usual sequence of commands is > > aclocal > autoconf > autoheader > automake -aThis is all wrapped in the lustre "autogen.sh" script. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Craig Tierney
2008-Feb-13 23:14 UTC
[Lustre-discuss] Trying to build lustre with unsupported kernel
Andreas Dilger wrote:> On Feb 13, 2008 14:49 -0700, Craig Tierney wrote: >> I am trying to build lustre against and unsupported >> kernel (2.6.20.20). I am having some issues with it. > > Do you really need this kernel on the servers, or could it be a > supported patched kernel on the servers (e.g. RHEL5 2.6.18) and > the 2.6.20.20 kernel patchless on the clients? > > Making new kernel patch series is not for the faint of heart, > and is generally not worthwhile for an "older" kernel like this. > We have patches for 2.6.22 in our CVS and in bugzilla, so any > work to get 2.6.20 on the server would be lost effort. >Thanks for all of the quick replies. I am happy to try 2.6.22. Redhat kernels don''t work for me (or at least, debugging their problems isn''t worth the effort). However, if I patch a 2.6.22 kernel from CVS, wont I have to be using the CVS tree for the latest Lustre dev version? IE I cannot just use the patches against 1.6.4.2 because I would be dealing with the same problem as before (autogen.sh). Thanks, Craig> Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Craig Tierney (craig.tierney at noaa.gov)
Craig Tierney
2008-Feb-13 23:44 UTC
[Lustre-discuss] Trying to build lustre with unsupported kernel
Craig Tierney wrote:> Andreas Dilger wrote: >> On Feb 13, 2008 14:49 -0700, Craig Tierney wrote: >>> I am trying to build lustre against and unsupported >>> kernel (2.6.20.20). I am having some issues with it. >> Do you really need this kernel on the servers, or could it be a >> supported patched kernel on the servers (e.g. RHEL5 2.6.18) and >> the 2.6.20.20 kernel patchless on the clients? >> >> Making new kernel patch series is not for the faint of heart, >> and is generally not worthwhile for an "older" kernel like this. >> We have patches for 2.6.22 in our CVS and in bugzilla, so any >> work to get 2.6.20 on the server would be lost effort. >> > > Thanks for all of the quick replies. I am happy to try 2.6.22. > Redhat kernels don''t work for me (or at least, debugging their > problems isn''t worth the effort). > > However, if I patch a 2.6.22 kernel from CVS, wont I have to > be using the CVS tree for the latest Lustre dev version? IE > I cannot just use the patches against 1.6.4.2 because I would > be dealing with the same problem as before (autogen.sh). > > Thanks, > Craig > > >Please ignore this last comment. I see what needs to be done. Craig -- Craig Tierney (craig.tierney at noaa.gov)