Andreas Dilger
2006-May-19 07:36 UTC
[Lustre-discuss] question about __alloc_pages: 0-order allocation failed (gfp=0x20/0)
--ZkK3DH7HIvxYLwhR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Oct 08, 2004 17:57 +0200, Martin Vogt wrote:> No. These are seperate machines. > The error is new btw. Before that (1.4b) I could run 350 iteration, > without this bug.(Then I stopped the benmark, 350 iterations > are close to infinity :-)> >>when I run IOZone in an endless loop I have one client which > >>after 10 Iterations stops working. > >> > >>I get this error in dmesg: > >> > >> > >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) > >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) > >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) > >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) > >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) > >> > >> > >>All the clients are the same (RAM/CPU, etc...) > >>After the next iteration the iozone binary is started, lustre re-imports > >>its OSTs on the client with this error message and the binary runs.Martin, can you please set "sysctl -w vm.vm_gfp_debug=1" on your system (depends what kernel you are running) and send me the stack traces it generates when getting the allocation failures. That would give me an idea of what is failing. The lustre allocations should normally print an error message to the syslog if they ever fail, so it would be good to know what is going on here. Thanks. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ --ZkK3DH7HIvxYLwhR Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFBZvQ2pIg59Q01vtYRAm5HAJ9KNyl2N8J3lBbIbUMpXB60wNQLVQCg55+6 pEeZFEKWlnXwyQNhOV+lGqs=q/X0 -----END PGP SIGNATURE----- --ZkK3DH7HIvxYLwhR--
Martin Vogt
2006-May-19 07:36 UTC
[Lustre-discuss] Re: question about __alloc_pages: 0-order allocation failed (gfp=0x20/0)
On Oct 08, 2004 17:57 +0200, Martin Vogt wrote:>> No. These are seperate machines. >> The error is new btw. Before that (1.4b) I could run 350 iteration, >> without this bug.(Then I stopped the benmark, 350 iterations >> are close to infinity :-) > >>>>> >>when I run IOZone in an endless loop I have one client which >>>> >>after 10 Iterations stops working. >>>> >> >>>> >>I get this error in dmesg: >>>> >> >>>> >> >>>> >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) >>>> >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) >>>> >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) >>>> >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) >>>> >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) >>>> >> >>>> >> >>>> >>All the clients are the same (RAM/CPU, etc...) >>>> >>After the next iteration the iozone binary is started, lustre re-imports >>>> >>its OSTs on the client with this error message and the binary runs. >>> >>>>Martin, can you please set "sysctl -w vm.vm_gfp_debug=1" on your system >(depends what kernel you are running) and send me the stack traces it >generates when getting the allocation failures. That would give me an >idea of what is failing. The lustre allocations should normally print >an error message to the syslog if they ever fail, so it would be good >to know what is going on here. Thanks.Hello, I piped the trace through ksymoops, the trace is at the end of the mail. But the oops was not lustres fault.(I think) The trace shows that there is a vmware modules loaded. I had tested on this maschine vmware then reinstalled 1.3.2 without rebuilding the vmware kernel modules and thus using the old ones. I think this leads to the oops. After removing vmnet/vmmon iozone runs again, and there are no oopses.(15 iterations) But I have still these errors:>LustreError: 1661:0:(client.c:816:ptlrpc_expire_one_request()) @@@ timeout (sent at 1097573837) req@e7692800 >x910748/t657940 o4->media-ost3_UUID@NID_192.168.9.13_UUID:6 lens 288/248 ref 3 fl ?phase?:R/4/0 rc 0/0After some time the upcall script is executed and the client continues. Is it ok that I have these error messages on a regular basis? I attached the oops, maybe that the vmware modules triggers something. regards, Martin media4:/usr/local/lustre # ksymoops </tmp/a.txt ksymoops 2.4.9 on i686 2.4.24-lustrevogt. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.24-lustrevogt/ (default) -m /boot/System.map-2.4.24-lustrevogt (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I''ll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. f48479bc f48479f8 c013aa22 000004ed c1010030 00000001 0000000c c02d4bfc c02d4db4 00000000 00000020 00000001 00000246 c2a1bdb8 00000000 f4847a00 c013aa57 f4847a20 c0137896 c2a1bdc0 00000003 00000020 c2a1bdb8 00000246 Call Trace: [<c013aa22>] [<c013aa57>] [<c0137896>] [<c0137ddc>] [<c0227e4a>] [<f8a14112>] [<f8a13d89>] [<f8a14112>] [<f8a13d89>] [<f8a139c8>] [<c010aad3>] [<c010acf5>] [<c010d398>] [<c02280c4>] [<c0228230>] [<f8a9163a>] [<f8a90e65>] [<c0228230>] [<f8a93c73>] [<c022c20e>] [<c0237d62>] [<c022c45d>] [<c0246821>] [<c0246ac4>] [<c01379cf>] [<c025e8c2>] [<c0258ca7>] [<c025b4d0>] [<c0257630>] [<c010ad42>] [<c025f9cf>] [<c0260038>] [<c0243612>] [<c0243a32>] [<c022cc45>] [<c022cd1d>] [<c022cec1>] [<c01232f7>] [<c010ad42>] [<c010d398>] [<c026ca40>] [<c022499e>] [<f8b71a1a>] [<f8b71c2a>] [<f8b74bbe>] [<f8b75609>] [<c01075c3>] [<f8b75470>] Warning (Oops_read): Code line not seen, dumping what data is available Trace; c013aa22 <__alloc_pages+272/280> Trace; c013aa57 <__get_free_pages+27/30> Trace; c0137896 <kmem_cache_grow+c6/280> Trace; c0137ddc <kmalloc+8c/160> Trace; c0227e4a <alloc_skb+ba/1f0> Trace; f8a14112 <[e1000]e1000_alloc_rx_buffers+a2/120> Trace; f8a13d89 <[e1000]e1000_clean_rx_irq+169/450> Trace; f8a14112 <[e1000]e1000_alloc_rx_buffers+a2/120> Trace; f8a13d89 <[e1000]e1000_clean_rx_irq+169/450> Trace; f8a139c8 <[e1000]e1000_intr+28/80> Trace; c010aad3 <handle_IRQ_event+63/a0> Trace; c010acf5 <do_IRQ+a5/100> Trace; c010d398 <call_do_IRQ+5/d> Trace; c02280c4 <kfree_skbmem+4/70> Trace; c0228230 <__kfree_skb+100/150> Trace; f8a9163a <[vmnet]VNetHubReceive+8c/98> Trace; f8a90e65 <[vmnet]VNetSend+33/62> Trace; c0228230 <__kfree_skb+100/150> Trace; f8a93c73 <[vmnet]VNetBridgeReceiveFromDev+1b7/1c4> Trace; c022c20e <dev_queue_xmit_nit+9e/f0> Trace; c0237d62 <qdisc_restart+122/1a0> Trace; c022c45d <dev_queue_xmit+16d/320> Trace; c0246821 <ip_output+121/1c0> Trace; c0246ac4 <ip_queue_xmit+204/570> Trace; c01379cf <kmem_cache_grow+1ff/280> Trace; c025e8c2 <tcp_v4_send_check+82/d0> Trace; c0258ca7 <tcp_transmit_skb+3a7/5f0> Trace; c025b4d0 <tcp_send_ack+80/c0> Trace; c0257630 <tcp_rcv_established+720/850> Trace; c010ad42 <do_IRQ+f2/100> Trace; c025f9cf <tcp_v4_do_rcv+11f/130> Trace; c0260038 <tcp_v4_rcv+658/720> Trace; c0243612 <ip_local_deliver+1a2/1d0> Trace; c0243a32 <ip_rcv+3f2/440> Trace; c022cc45 <netif_receive_skb+1c5/1f0> Trace; c022cd1d <process_backlog+ad/160> Trace; c022cec1 <net_rx_action+f1/180> Trace; c01232f7 <do_softirq+87/f0> Trace; c010ad42 <do_IRQ+f2/100> Trace; c010d398 <call_do_IRQ+5/d> Trace; c026ca40 <inet_recvmsg+0/50> Trace; c022499e <sock_recvmsg+2e/c0> Trace; f8b71a1a <[ksocknal]ksocknal_recv_kiov+ea/250> Trace; f8b71c2a <[ksocknal]ksocknal_receive+aa/280> Trace; f8b74bbe <[ksocknal]ksocknal_process_receive+ce/780> Trace; f8b75609 <[ksocknal]ksocknal_scheduler+199/640> Trace; c01075c3 <arch_kernel_thread+23/30> Trace; f8b75470 <[ksocknal]ksocknal_scheduler+0/640>
Andreas Dilger
2006-May-19 07:36 UTC
[Lustre-discuss] Re: question about __alloc_pages: 0-order allocation failed (gfp=0x20/0)
--7ff5AkW+g6+eEohN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Oct 12, 2004 12:17 +0200, Martin Vogt wrote:> Andreas Dilger wrote: > >Martin, can you please set "sysctl -w vm.vm_gfp_debug=3D1" on your system > >(depends what kernel you are running) and send me the stack traces it > >generates when getting the allocation failures. That would give me an > >idea of what is failing. The lustre allocations should normally print > >an error message to the syslog if they ever fail, so it would be good > >to know what is going on here. Thanks. >=20 >=20 > I piped the trace through ksymoops, the trace is at the end of the mail. >=20 > But the oops was not lustres fault.(I think) >=20 > The trace shows that there is a vmware modules loaded. > I had tested on this maschine vmware then reinstalled 1.3.2 > without rebuilding the vmware kernel modules and thus > using the old ones. I think this leads to the oops. >=20 > After removing vmnet/vmmon iozone runs again, and there are no oopses.(15=20 > iterations)Just FYI, it isn''t really an "oops", just a stack dump for the currently running process when there is an allocation failure (this is what the vm.vm_gfp_debug=3D1 setting does for us). It tells us that the allocation failure is coming from alloc_skb() in the e1000 receive path, though I''m not sure why that is happening. Do all of the stack traces look similar? This is sort of "statistical" debugging, since any allocation could be the one that fails, but if you examine enough of them it is likely that the one that happens the most would become clear. It is interesting to hear that removing the vmware modules avoids this problem. Does vmware do anything with incoming network traffic (e.g. filter it for forwarding to the virtual machine)?> But I have still these errors: >=20 > >LustreError: 1661:0:(client.c:816:ptlrpc_expire_one_request()) @@@=20 > >timeout (sent at 1097573837) req@e7692800 >x910748/t657940=20 > >o4->media-ost3_UUID@NID_192.168.9.13_UUID:6 lens 288/248 ref 3 fl=20 > >?phase?:R/4/0 rc 0/0 >=20 > After some time the upcall script is executed and the client continues. > Is it ok that I have these error messages on a regular basis?It''s not really OK. It means that your client is timing out in its RPCs to the OST, possibly because these allocation failures are causing the OST to drop incoming messages. If the OSTs are running in "recoverable" mode (configured with --failover in the lmc script) then the clients will resend these RPCs.> Trace; c013aa22 <__alloc_pages+272/280> > Trace; c013aa57 <__get_free_pages+27/30> > Trace; c0137896 <kmem_cache_grow+c6/280> > Trace; c0137ddc <kmalloc+8c/160> > Trace; c0227e4a <alloc_skb+ba/1f0> > Trace; f8a14112 <[e1000]e1000_alloc_rx_buffers+a2/120> > Trace; f8a13d89 <[e1000]e1000_clean_rx_irq+169/450> > Trace; f8a14112 <[e1000]e1000_alloc_rx_buffers+a2/120> > Trace; f8a13d89 <[e1000]e1000_clean_rx_irq+169/450> > Trace; f8a139c8 <[e1000]e1000_intr+28/80> > Trace; c010aad3 <handle_IRQ_event+63/a0> > Trace; c010acf5 <do_IRQ+a5/100> > Trace; c010d398 <call_do_IRQ+5/d> > Trace; c02280c4 <kfree_skbmem+4/70> > Trace; c0228230 <__kfree_skb+100/150> > Trace; f8a9163a <[vmnet]VNetHubReceive+8c/98> > Trace; f8a90e65 <[vmnet]VNetSend+33/62> > Trace; c0228230 <__kfree_skb+100/150> > Trace; f8a93c73 <[vmnet]VNetBridgeReceiveFromDev+1b7/1c4> > Trace; c022c20e <dev_queue_xmit_nit+9e/f0> > Trace; c0237d62 <qdisc_restart+122/1a0> > Trace; c022c45d <dev_queue_xmit+16d/320> > Trace; c0246821 <ip_output+121/1c0> > Trace; c0246ac4 <ip_queue_xmit+204/570> > Trace; c01379cf <kmem_cache_grow+1ff/280> > Trace; c025e8c2 <tcp_v4_send_check+82/d0> > Trace; c0258ca7 <tcp_transmit_skb+3a7/5f0> > Trace; c025b4d0 <tcp_send_ack+80/c0> > Trace; c0257630 <tcp_rcv_established+720/850> > Trace; c010ad42 <do_IRQ+f2/100> > Trace; c025f9cf <tcp_v4_do_rcv+11f/130> > Trace; c0260038 <tcp_v4_rcv+658/720> > Trace; c0243612 <ip_local_deliver+1a2/1d0> > Trace; c0243a32 <ip_rcv+3f2/440> > Trace; c022cc45 <netif_receive_skb+1c5/1f0> > Trace; c022cd1d <process_backlog+ad/160> > Trace; c022cec1 <net_rx_action+f1/180> > Trace; c01232f7 <do_softirq+87/f0> > Trace; c010ad42 <do_IRQ+f2/100> > Trace; c010d398 <call_do_IRQ+5/d> > Trace; c026ca40 <inet_recvmsg+0/50> > Trace; c022499e <sock_recvmsg+2e/c0> > Trace; f8b71a1a <[ksocknal]ksocknal_recv_kiov+ea/250> > Trace; f8b71c2a <[ksocknal]ksocknal_receive+aa/280> > Trace; f8b74bbe <[ksocknal]ksocknal_process_receive+ce/780> > Trace; f8b75609 <[ksocknal]ksocknal_scheduler+199/640> > Trace; c01075c3 <arch_kernel_thread+23/30> > Trace; f8b75470 <[ksocknal]ksocknal_scheduler+0/640>Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ --7ff5AkW+g6+eEohN Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFBbAqjpIg59Q01vtYRAgPgAJ9Yd0DSJWcW6tfOaOmrRMP9mJVlKQCg1Yzp cWWp478vnLV08Jznhn0BTNw=cpHC -----END PGP SIGNATURE----- --7ff5AkW+g6+eEohN--
Martin Vogt
2006-May-19 07:36 UTC
[Lustre-discuss] question about __alloc_pages: 0-order allocation failed (gfp=0x20/0)
Capps@iozone.org wrote:>Martin, > > Are you running Iozone on an OST ? > >Enjoy, >Don Capps > > >Hello Don, No. These are seperate machines. The error is new btw. Before that (1.4b) I could run 350 iteration, without this bug.(Then I stopped the benmark, 350 iterations are close to infinity :-) Clients on an OST do not work, I think I read this somewhere. regards, Martin>----- Original Message ----- >From: "Martin Vogt" <vogt@itwm.fraunhofer.de> >To: <lustre-discuss@lists.clusterfs.com> >Sent: Friday, October 08, 2004 4:57 AM >Subject: [Lustre-discuss] question about __alloc_pages: 0-order allocation >failed (gfp=0x20/0) > > > > >>Hello, >> >> >>when I run IOZone in an endless loop I have one client which >>after 10 Iterations stops working. >> >>I get this error in dmesg: >> >> >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) >>__alloc_pages: 0-order allocation failed (gfp=0x20/0) >> >> >>All the clients are the same (RAM/CPU, etc...) >>After the next iteration the iozone binary is started, lustre re-imports >>its OSTs on the client with this error message and the binary runs. >> >>regards, >> >>Martin >> >>_______________________________________________ >>Lustre-discuss mailing list >>Lustre-discuss@lists.clusterfs.com >>https://lists.clusterfs.com/mailman/listinfo/lustre-discuss >> >> >> > > >
Martin Vogt
2006-May-19 07:36 UTC
[Lustre-discuss] Re: question about __alloc_pages: 0-order allocation failed (gfp=0x20/0)
Andreas Dilger wrote:>On Oct 12, 2004 12:17 +0200, Martin Vogt wrote > >>I piped the trace through ksymoops, the trace is at the end of the mail. >> >>But the oops was not lustres fault.(I think) >> >>The trace shows that there is a vmware modules loaded. >>I had tested on this maschine vmware then reinstalled 1.3.2 >>without rebuilding the vmware kernel modules and thus >>using the old ones. I think this leads to the oops. >> >>After removing vmnet/vmmon iozone runs again, and there are no oopses.(15 >>iterations) >> >> > >Just FYI, it isn''t really an "oops", just a stack dump for the currently >running process when there is an allocation failure (this is what the >vm.vm_gfp_debug=1 setting does for us). It tells us that the allocation >failure is coming from alloc_skb() in the e1000 receive path, though >I''m not sure why that is happening. Do all of the stack traces look >similar? This is sort of "statistical" debugging, since any allocation >could be the one that fails, but if you examine enough of them it is >likely that the one that happens the most would become clear. > >It is interesting to hear that removing the vmware modules avoids >this problem. Does vmware do anything with incoming network traffic >(e.g. filter it for forwarding to the virtual machine)? > > >I dont know, but I saw that the vmware modules were not necessary, and after removing them I had no stack dumps/traces. Before that (1.4b) the vmware modules were properly compiled against the lustre 1.4b kernel and I had no stack dumps. Now I''m running lustre-1.3.3 and I have no stack dumps. (but still the error below)>>But I have still these errors: >> >> >> >>>LustreError: 1661:0:(client.c:816:ptlrpc_expire_one_request()) @@@ >>>timeout (sent at 1097573837) req@e7692800 >x910748/t657940 >>>o4->media-ost3_UUID@NID_192.168.9.13_UUID:6 lens 288/248 ref 3 fl >>>?phase?:R/4/0 rc 0/0 >>> >>> >>After some time the upcall script is executed and the client continues. >>Is it ok that I have these error messages on a regular basis? >> >> > >It''s not really OK. It means that your client is timing out in its >RPCs to the OST, possibly because these allocation failures are >causing the OST to drop incoming messages. >No, this error comes from a run where I have no alloction failures / stack dumps. This error comes regulary "under network load". (eg: run iozone in an endless loop) These are the modules paramaters for the nic. (Otherwise I made no changes) >options e1000 RxIntDelay=0,0,0 RxDescriptors=1024,1024,1024 TxDescriptors=1024,1 >024,1024 TxIntDelay=0,0,0 I have the "ptlrpc_expire_one_request" from the beginning of lustre testing: - on 2.6.x SLES9 kernels (lustre 2.4.x) - on redhat EL kernels, pre-build and self-builed. - on SuSE 2.4.21 kernels (from lnxi) - on my patched SuSE kernels (2.4.24) Although the combination gcc 3.3.3 + linux-2.4.24 works best. The error is not "fatal" usually the client hangs for some time and then it "recovers". On the redhat kernel (from clusterfs) version 1.2.4 I had these errors: After 59 iterations (6 Hours of testing) of a parallel iozone I get this LustreError: >Aug 27 05:24:07 media4 kernel: LustreError: 1765:0:(socknal_cb.c:1058:ksocknal_s >endmsg()) PORTALS: out of memory at socknal_cb.c:1058 (tried to alloc ''ltx'' = 13 >6) >Aug 27 05:24:07 media4 kernel: LustreError: 1765:0:(socknal_cb.c:1058:ksocknal_s >endmsg()) PORTALS: out of memory at socknal_cb.c:1058 (tried to alloc ''ltx'' = 13 >6) These error was with the newest intel drivers. (The drivers from RH 3.0 dont know the PCI ID of the nics, I cannot use them) Maye it is an bug in the e1000 drivers? When I had the allocation failure I made 4 dumps with ksymoops. Here is another one: (in this one there is no vmware module included) All off the dumps looked similar, except the part in the "middle". sometime it was vmware here it is sun.rpc regards, Martin Trace; c013aa22 <__alloc_pages+272/280> Trace; c013aa57 <__get_free_pages+27/30> Trace; c0137896 <kmem_cache_grow+c6/280> Trace; c0137ddc <kmalloc+8c/160> Trace; c0227e4a <alloc_skb+ba/1f0> Trace; f8a14112 <[e1000]e1000_alloc_rx_buffers+a2/120> Trace; f8a13d89 <[e1000]e1000_clean_rx_irq+169/450> Trace; f8a14112 <[e1000]e1000_alloc_rx_buffers+a2/120> Trace; f8a13d89 <[e1000]e1000_clean_rx_irq+169/450> Trace; f8a139c8 <[e1000]e1000_intr+28/80> Trace; c010aad3 <handle_IRQ_event+63/a0> Trace; c010acf5 <do_IRQ+a5/100> Trace; c010d398 <call_do_IRQ+5/d> Trace; c02280c4 <kfree_skbmem+4/70> Trace; c0228230 <__kfree_skb+100/150> Trace; f8a9163a <[sunrpc]xdr_partial_copy_from_skb+ea/200> Trace; f8a90e65 <[sunrpc].text.lock.pmap_clnt+12/3d> Trace; c0228230 <__kfree_skb+100/150> Trace; f8a93c73 <[sunrpc]__kstrtab_nlm_debug+fc3/2c50> Trace; c022c20e <dev_queue_xmit_nit+9e/f0> Trace; c0237d62 <qdisc_restart+122/1a0> Trace; c022c45d <dev_queue_xmit+16d/320> Trace; c0246821 <ip_output+121/1c0> Trace; c0246ac4 <ip_queue_xmit+204/570> Trace; c01379cf <kmem_cache_grow+1ff/280> Trace; c025e8c2 <tcp_v4_send_check+82/d0> Trace; c0258ca7 <tcp_transmit_skb+3a7/5f0> Trace; c025b4d0 <tcp_send_ack+80/c0> Trace; c0257630 <tcp_rcv_established+720/850> Trace; c010ad42 <do_IRQ+f2/100> Trace; c025f9cf <tcp_v4_do_rcv+11f/130> Trace; c0260038 <tcp_v4_rcv+658/720> Trace; c0243612 <ip_local_deliver+1a2/1d0> Trace; c0243a32 <ip_rcv+3f2/440> Trace; c022cc45 <netif_receive_skb+1c5/1f0> Trace; c022cd1d <process_backlog+ad/160> Trace; c022cec1 <net_rx_action+f1/180> Trace; c01232f7 <do_softirq+87/f0> Trace; c010ad42 <do_IRQ+f2/100> Trace; c010d398 <call_do_IRQ+5/d> Trace; c026ca40 <inet_recvmsg+0/50> Trace; c022499e <sock_recvmsg+2e/c0> Trace; f8b71a1a <[obdclass]llog_print_cb+41a/6e0> Trace; f8b71c2a <[obdclass]llog_print_cb+62a/6e0> Here is another one. But this is not done against a proper System-map: Trace; c013aa22 <__alloc_pages+272/280> Trace; c013aa57 <__get_free_pages+27/30> Trace; c0137896 <kmem_cache_grow+c6/280> Trace; c0137ddc <kmalloc+8c/160> Trace; c0227e4a <alloc_skb+ba/1f0> Trace; f8a14112 <[e1000]e1000_alloc_rx_buffers+a2/120> Trace; f8a13d89 <[e1000]e1000_clean_rx_irq+169/450> Trace; c010ad42 <do_IRQ+f2/100> Trace; f8a139c8 <[e1000]e1000_intr+28/80> Trace; c010aad3 <handle_IRQ_event+63/a0> Trace; c010acf5 <do_IRQ+a5/100> Trace; c010d398 <call_do_IRQ+5/d> Trace; c0220018 <md_ioctl+478/7d0> Trace; c027cb52 <packet_rcv_spkt+82/220> Trace; c022c20e <dev_queue_xmit_nit+9e/f0> Trace; c0237d62 <qdisc_restart+122/1a0> Trace; c022c45d <dev_queue_xmit+16d/320> Trace; c0246821 <ip_output+121/1c0> Trace; c0246ac4 <ip_queue_xmit+204/570> Trace; c01379cf <kmem_cache_grow+1ff/280> Trace; c025e8c2 <tcp_v4_send_check+82/d0> Trace; c0258ca7 <tcp_transmit_skb+3a7/5f0> Trace; c025b4d0 <tcp_send_ack+80/c0> Trace; c0257630 <tcp_rcv_established+720/850> Trace; c010ad42 <do_IRQ+f2/100> Trace; c025f9cf <tcp_v4_do_rcv+11f/130> Trace; c0260038 <tcp_v4_rcv+658/720>
Kian_Chang_Low@veritasdgc.com
2006-May-19 07:36 UTC
[Lustre-discuss] Lustre with Infiniband
This is a multipart message in MIME format. --=_alternative 002FA8CA48256F41_Content-Type: text/plain; charset="US-ASCII" Hi, Is there a version of Lustre that is IB (infiniband) ready? And is that native IB NAL (network abstraction layer), or sockets NAL over IPoIB or SDP (socket direct protocol)? Thanks, Kian Chang. --=_alternative 002FA8CA48256F41_Content-Type: text/html; charset="US-ASCII" <br><font size=2 face="sans-serif">Hi,</font> <br> <br><font size=2 face="sans-serif">Is there a version of Lustre that is IB (infiniband) ready? And is that native IB NAL (network abstraction layer), or sockets NAL over IPoIB or SDP (socket direct protocol)?</font> <br> <br><font size=2 face="sans-serif">Thanks,</font> <br><font size=2 face="sans-serif">Kian Chang.</font> --=_alternative 002FA8CA48256F41_=--
Hi Kian-- On 11/3/2004 3:35, Kian_Chang_Low@veritasdgc.com wrote:> > Is there a version of Lustre that is IB (infiniband) ready? And is that > native IB NAL (network abstraction layer), or sockets NAL over IPoIB or > SDP (socket direct protocol)?Lustre 1.4.0 includes two NALs: one for the Mellanox/TopSpin/OpenIB-gen-1 stack, and one for the Infinicon stack. Both are "beta" quality, not ready for production use, and not updated to the latest version of the IB stack. Both NALs use verbs, not SDP or IPoIB. You can use IP over IB with the normal socknal. Lustre 1.4.1 will ship by default with a setting that improves IPoIB performance. It causes the socknal to issue I/O to the TCP/IP stack differently. Thanks-- -Phil
Martin Vogt
2006-May-19 07:36 UTC
[Lustre-discuss] question about __alloc_pages: 0-order allocation failed (gfp=0x20/0)
Hello, when I run IOZone in an endless loop I have one client which after 10 Iterations stops working. I get this error in dmesg: __alloc_pages: 0-order allocation failed (gfp=0x20/0) __alloc_pages: 0-order allocation failed (gfp=0x20/0) __alloc_pages: 0-order allocation failed (gfp=0x20/0) __alloc_pages: 0-order allocation failed (gfp=0x20/0) __alloc_pages: 0-order allocation failed (gfp=0x20/0) All the clients are the same (RAM/CPU, etc...) After the next iteration the iozone binary is started, lustre re-imports its OSTs on the client with this error message and the binary runs. regards, Martin