thr3ads.net - Lustre discuss - [Lustre-discuss] o2iblnd no resources [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Kilian CAVALOTTI

2008-Feb-01 23:39 UTC

[Lustre-discuss] o2iblnd no resources

Hi all,

What can cause a client to receive a "o2iblnd no resources" message 
from an OSS?
---------------------------------------------------------------------------
Feb  1 15:24:24 node-5-8 kernel: LustreError:
1893:0:(o2iblnd_cb.c:2448:kiblnd_rejected()) 10.10.60.3 at o2ib rejected:
o2iblnd no resources
---------------------------------------------------------------------------

I suspect an out-of-memory problem, and indeed the OSS logs are filled
up with the following:
---------------------------------------------------------------------------
ib_cm/3: page allocation failure. order:4, mode:0xd0

Call Trace:<ffffffff8015c847>{__alloc_pages+777}
<ffffffff801727e9>{alloc_page_interleave+61}
       <ffffffff8015c8e0>{__get_free_pages+11}
<ffffffff8015facd>{kmem_getpages+36}
       <ffffffff80160262>{cache_alloc_refill+609}
<ffffffff8015ff30>{__kmalloc+123}
       <ffffffffa014ee75>{:ib_mthca:mthca_alloc_qp_common+668}
       <ffffffffa014f42d>{:ib_mthca:mthca_alloc_qp+178}
<ffffffffa0153e3a>{:ib_mthca:mthca_create_qp+311}
       <ffffffffa00d5b1b>{:ib_core:ib_create_qp+20}
<ffffffffa021a5f9>{:rdma_cm:rdma_create_qp+43}
       <ffffffff8024b7b5>{dma_pool_free+245}
<ffffffffa014b257>{:ib_mthca:mthca_init_cq+1073}
       <ffffffffa01540cf>{:ib_mthca:mthca_create_cq+282}
<ffffffff801727e9>{alloc_page_interleave+61}
       <ffffffffa0400c10>{:ko2iblnd:kiblnd_cq_completion+0}
       <ffffffffa0400d50>{:ko2iblnd:kiblnd_cq_event+0}
<ffffffffa00d5cc1>{:ib_core:ib_create_cq+33}
       <ffffffffa03f56bd>{:ko2iblnd:kiblnd_create_conn+3565}
       <ffffffffa0276f38>{:libcfs:cfs_alloc+40}
<ffffffffa03fe457>{:ko2iblnd:kiblnd_passive_connect+2215}
       <ffffffffa00d8595>{:ib_core:ib_find_cached_gid+244}
       <ffffffffa021a278>{:rdma_cm:cma_acquire_dev+293}
<ffffffffa03ff540>{:ko2iblnd:kiblnd_cm_callback+64}
       <ffffffffa03ff500>{:ko2iblnd:kiblnd_cm_callback+0}
       <ffffffffa021b19a>{:rdma_cm:cma_req_handler+863}
<ffffffff801e8427>{alloc_layer+67}
       <ffffffff801e8645>{idr_get_new_above_int+423}
<ffffffffa00fa0ab>{:ib_cm:cm_process_work+101}
       <ffffffffa00faa57>{:ib_cm:cm_req_handler+2398}
<ffffffffa00fae3c>{:ib_cm:cm_work_handler+0}
       <ffffffffa00fae6a>{:ib_cm:cm_work_handler+46}
<ffffffff80146fca>{worker_thread+419}
       <ffffffff80133566>{default_wake_function+0}
<ffffffff801335b7>{__wake_up_common+67}
       <ffffffff80133566>{default_wake_function+0}
<ffffffff8014ad18>{keventd_create_kthread+0}
       <ffffffff80146e27>{worker_thread+0}
<ffffffff8014ad18>{keventd_create_kthread+0}
       <ffffffff8014acef>{kthread+200}
<ffffffff80110de3>{child_rip+8}
       <ffffffff8014ad18>{keventd_create_kthread+0}
<ffffffff8014ac27>{kthread+0}
       <ffffffff80110ddb>{child_rip+0}
Mem-info:
Node 0 DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
cpu 2 hot: low 2, high 6, batch 1
cpu 2 cold: low 0, high 2, batch 1
cpu 3 hot: low 2, high 6, batch 1
cpu 3 cold: low 0, high 2, batch 1
Node 0 Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
Node 0 HighMem per-cpu: empty

Free pages:       35336kB (0kB HighMem)
Active:534156 inactive:127091 dirty:1072 writeback:0 unstable:0 free:8834
slab:146612 mapped:26222 pagetables:1035
Node 0 DMA free:9832kB min:52kB low:64kB high:76kB active:0kB inactive:0kB
present:16384kB pages_scanned:37 all_unreclaimable? yes
protections[]: 0 510200 510200
Node 0 Normal free:25504kB min:16328kB low:20408kB high:24492kB active:2136624kB
inactive:508364kB present:4964352kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
Node 0 DMA: 2*4kB 2*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB
0*2048kB 2*4096kB = 9832kB
Node 0 Normal: 1284*4kB 2290*8kB 126*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 25504kB
Node 0 HighMem: empty
Swap cache: add 111, delete 111, find 23/36, race 0+0
Free swap:       4096360kB
1245184 pages of RAM
235840 reserved pages
659867 pages shared
0 pages swap cached
---------------------------------------------------------------------------

IB links are up and working on both the client and the OSS:
---------------------------------------------------------------------------
client# ibstatus
Infiniband device ''mthca0'' port 1 status:
        default gid:     fe80:0000:0000:0000:0005:ad00:0008:af71
        base lid:        0x83
        sm lid:          0x130
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            20 Gb/sec (4X DDR)
oss# ibstatus
Infiniband device ''mthca0'' port 1 status:
        default gid:     fe80:0000:0000:0000:0005:ad00:0008:cb11
        base lid:        0x126
        sm lid:          0x130
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            20 Gb/sec (4X DDR)
---------------------------------------------------------------------------
And the Subnet Manager doesn''t expose any unusual error or skyrocketed 
counter (I use OFED 1.2, kernel 2.6.9-55.0.9.EL_lustre.1.6.4.1smp).

What I don''t really get is that most clients can access files on this
OSS with no issue, and besides, my limited understanding of the kernel
memory mechanisms tend to let me believe that this OSS is not out of 
memory:
---------------------------------------------------------------------------
# cat /proc/meminfo
MemTotal:      4037380 kB
MemFree:         31688 kB
Buffers:       1333536 kB
Cached:        1231900 kB
SwapCached:          0 kB
Active:        2138948 kB
Inactive:       507720 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      4037380 kB
LowFree:         31688 kB
SwapTotal:     4096564 kB
SwapFree:      4096360 kB
Dirty:            6868 kB
Writeback:           0 kB
Mapped:         106984 kB
Slab:           588200 kB
CommitLimit:   6115252 kB
Committed_AS:   860508 kB
PageTables:       4304 kB
VmallocTotal: 536870911 kB
VmallocUsed:    274788 kB
VmallocChunk: 536596091 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB
---------------------------------------------------------------------------

This only appeared lately, after several week of continuous use of the 
filesystem, without any problem. Is there anything like a memory leak 
somewhere? Any help to diagnose the problem would be greatly appreciated.

Thanks!
-- 
Kilian

Liang Zhen

2008-Feb-02 07:39 UTC

head link

[Lustre-discuss] o2iblnd no resources

Hi Kilian,
I think it''s because o2iblnd uses fragmented RDMA by default(Max to 
256), so we have to set max_send_wr as (concurrent_send * (256 + 1)) 
while creating QP by rdma_create_qp(), it takes a lot of resource and 
can make a busy server out of memory sometime.
To resolve this problem, we have to use FMR to map fragmented buffer to 
virtual contiguous I/O address, there will always be one fragment for 
RDMA by this way.
Here is patch for this problem (using FMR in o2iblnd)
https://bugzilla.lustre.org/attachment.cgi?id=15144

Regards
Liang

Kilian CAVALOTTI wrote:> Hi all,
>
> What can cause a client to receive a "o2iblnd no resources"
message
> from an OSS?
> ---------------------------------------------------------------------------
> Feb  1 15:24:24 node-5-8 kernel: LustreError:
1893:0:(o2iblnd_cb.c:2448:kiblnd_rejected()) 10.10.60.3 at o2ib rejected:
o2iblnd no resources
> ---------------------------------------------------------------------------
>
> I suspect an out-of-memory problem, and indeed the OSS logs are filled
> up with the following:
> ---------------------------------------------------------------------------
> ib_cm/3: page allocation failure. order:4, mode:0xd0
>
> Call Trace:<ffffffff8015c847>{__alloc_pages+777}
<ffffffff801727e9>{alloc_page_interleave+61}
>        <ffffffff8015c8e0>{__get_free_pages+11}
<ffffffff8015facd>{kmem_getpages+36}
>        <ffffffff80160262>{cache_alloc_refill+609}
<ffffffff8015ff30>{__kmalloc+123}
>        <ffffffffa014ee75>{:ib_mthca:mthca_alloc_qp_common+668}
>        <ffffffffa014f42d>{:ib_mthca:mthca_alloc_qp+178}
<ffffffffa0153e3a>{:ib_mthca:mthca_create_qp+311}
>        <ffffffffa00d5b1b>{:ib_core:ib_create_qp+20}
<ffffffffa021a5f9>{:rdma_cm:rdma_create_qp+43}
>        <ffffffff8024b7b5>{dma_pool_free+245}
<ffffffffa014b257>{:ib_mthca:mthca_init_cq+1073}
>        <ffffffffa01540cf>{:ib_mthca:mthca_create_cq+282}
<ffffffff801727e9>{alloc_page_interleave+61}
>        <ffffffffa0400c10>{:ko2iblnd:kiblnd_cq_completion+0}
>        <ffffffffa0400d50>{:ko2iblnd:kiblnd_cq_event+0}
<ffffffffa00d5cc1>{:ib_core:ib_create_cq+33}
>        <ffffffffa03f56bd>{:ko2iblnd:kiblnd_create_conn+3565}
>        <ffffffffa0276f38>{:libcfs:cfs_alloc+40}
<ffffffffa03fe457>{:ko2iblnd:kiblnd_passive_connect+2215}
>        <ffffffffa00d8595>{:ib_core:ib_find_cached_gid+244}
>        <ffffffffa021a278>{:rdma_cm:cma_acquire_dev+293}
<ffffffffa03ff540>{:ko2iblnd:kiblnd_cm_callback+64}
>        <ffffffffa03ff500>{:ko2iblnd:kiblnd_cm_callback+0}
>        <ffffffffa021b19a>{:rdma_cm:cma_req_handler+863}
<ffffffff801e8427>{alloc_layer+67}
>        <ffffffff801e8645>{idr_get_new_above_int+423}
<ffffffffa00fa0ab>{:ib_cm:cm_process_work+101}
>        <ffffffffa00faa57>{:ib_cm:cm_req_handler+2398}
<ffffffffa00fae3c>{:ib_cm:cm_work_handler+0}
>        <ffffffffa00fae6a>{:ib_cm:cm_work_handler+46}
<ffffffff80146fca>{worker_thread+419}
>        <ffffffff80133566>{default_wake_function+0}
<ffffffff801335b7>{__wake_up_common+67}
>        <ffffffff80133566>{default_wake_function+0}
<ffffffff8014ad18>{keventd_create_kthread+0}
>        <ffffffff80146e27>{worker_thread+0}
<ffffffff8014ad18>{keventd_create_kthread+0}
>        <ffffffff8014acef>{kthread+200}
<ffffffff80110de3>{child_rip+8}
>        <ffffffff8014ad18>{keventd_create_kthread+0}
<ffffffff8014ac27>{kthread+0}
>        <ffffffff80110ddb>{child_rip+0}
> Mem-info:
> Node 0 DMA per-cpu:
> cpu 0 hot: low 2, high 6, batch 1
> cpu 0 cold: low 0, high 2, batch 1
> cpu 1 hot: low 2, high 6, batch 1
> cpu 1 cold: low 0, high 2, batch 1
> cpu 2 hot: low 2, high 6, batch 1
> cpu 2 cold: low 0, high 2, batch 1
> cpu 3 hot: low 2, high 6, batch 1
> cpu 3 cold: low 0, high 2, batch 1
> Node 0 Normal per-cpu:
> cpu 0 hot: low 32, high 96, batch 16
> cpu 0 cold: low 0, high 32, batch 16
> cpu 1 hot: low 32, high 96, batch 16
> cpu 1 cold: low 0, high 32, batch 16
> cpu 2 hot: low 32, high 96, batch 16
> cpu 2 cold: low 0, high 32, batch 16
> cpu 3 hot: low 32, high 96, batch 16
> cpu 3 cold: low 0, high 32, batch 16
> Node 0 HighMem per-cpu: empty
>
> Free pages:       35336kB (0kB HighMem)
> Active:534156 inactive:127091 dirty:1072 writeback:0 unstable:0 free:8834
slab:146612 mapped:26222 pagetables:1035
> Node 0 DMA free:9832kB min:52kB low:64kB high:76kB active:0kB inactive:0kB
present:16384kB pages_scanned:37 all_unreclaimable? yes
> protections[]: 0 510200 510200
> Node 0 Normal free:25504kB min:16328kB low:20408kB high:24492kB
active:2136624kB inactive:508364kB present:4964352kB pages_scanned:0
all_unreclaimable? no
> protections[]: 0 0 0
> Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> protections[]: 0 0 0
> Node 0 DMA: 2*4kB 2*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB
1*1024kB 0*2048kB 2*4096kB = 9832kB
> Node 0 Normal: 1284*4kB 2290*8kB 126*16kB 1*32kB 0*64kB 0*128kB 0*256kB
0*512kB 0*1024kB 0*2048kB 0*4096kB = 25504kB
> Node 0 HighMem: empty
> Swap cache: add 111, delete 111, find 23/36, race 0+0
> Free swap:       4096360kB
> 1245184 pages of RAM
> 235840 reserved pages
> 659867 pages shared
> 0 pages swap cached
> ---------------------------------------------------------------------------
>
> IB links are up and working on both the client and the OSS:
> ---------------------------------------------------------------------------
> client# ibstatus
> Infiniband device ''mthca0'' port 1 status:
>         default gid:     fe80:0000:0000:0000:0005:ad00:0008:af71
>         base lid:        0x83
>         sm lid:          0x130
>         state:           4: ACTIVE
>         phys state:      5: LinkUp
>         rate:            20 Gb/sec (4X DDR)
> oss# ibstatus
> Infiniband device ''mthca0'' port 1 status:
>         default gid:     fe80:0000:0000:0000:0005:ad00:0008:cb11
>         base lid:        0x126
>         sm lid:          0x130
>         state:           4: ACTIVE
>         phys state:      5: LinkUp
>         rate:            20 Gb/sec (4X DDR)
> ---------------------------------------------------------------------------
> And the Subnet Manager doesn''t expose any unusual error or
skyrocketed
> counter (I use OFED 1.2, kernel 2.6.9-55.0.9.EL_lustre.1.6.4.1smp).
>
> What I don''t really get is that most clients can access files on
this
> OSS with no issue, and besides, my limited understanding of the kernel
> memory mechanisms tend to let me believe that this OSS is not out of 
> memory:
> ---------------------------------------------------------------------------
> # cat /proc/meminfo
> MemTotal:      4037380 kB
> MemFree:         31688 kB
> Buffers:       1333536 kB
> Cached:        1231900 kB
> SwapCached:          0 kB
> Active:        2138948 kB
> Inactive:       507720 kB
> HighTotal:           0 kB
> HighFree:            0 kB
> LowTotal:      4037380 kB
> LowFree:         31688 kB
> SwapTotal:     4096564 kB
> SwapFree:      4096360 kB
> Dirty:            6868 kB
> Writeback:           0 kB
> Mapped:         106984 kB
> Slab:           588200 kB
> CommitLimit:   6115252 kB
> Committed_AS:   860508 kB
> PageTables:       4304 kB
> VmallocTotal: 536870911 kB
> VmallocUsed:    274788 kB
> VmallocChunk: 536596091 kB
> HugePages_Total:     0
> HugePages_Free:      0
> Hugepagesize:     2048 kB
> ---------------------------------------------------------------------------
>
> This only appeared lately, after several week of continuous use of the 
> filesystem, without any problem. Is there anything like a memory leak 
> somewhere? Any help to diagnose the problem would be greatly appreciated.
>
> Thanks!
>

Isaac Huang

2008-Feb-02 08:42 UTC

head link

[Lustre-discuss] o2iblnd no resources

On Sat, Feb 02, 2008 at 03:39:09PM +0800, Liang Zhen
wrote:> Hi Kilian,
> I think it''s because o2iblnd uses fragmented RDMA by default(Max
to
> 256), so we have to set max_send_wr as (concurrent_send * (256 + 1)) 
> while creating QP by rdma_create_qp(), it takes a lot of resource and 
> can make a busy server out of memory sometime.
> To resolve this problem, we have to use FMR to map fragmented buffer to 
> virtual contiguous I/O address, there will always be one fragment for 
> RDMA by this way.
> Here is patch for this problem (using FMR in o2iblnd)
> https://bugzilla.lustre.org/attachment.cgi?id=15144
This is an experimental patch - nodes with the patch applied are not
interoperable with those without it. Please don''t propagate the patch
to production systems.

Isaac

Kilian CAVALOTTI

2008-Feb-02 20:29 UTC

head link

[Lustre-discuss] o2iblnd no resources

On Saturday 02 February 2008 00:42:47 Isaac Huang wrote:> > Here is patch for this problem (using FMR in o2iblnd)
> > https://bugzilla.lustre.org/attachment.cgi?id=15144
>
> This is an experimental patch - nodes with the patch applied are not
> interoperable with those without it. Please don''t propagate the
patch
> to production systems.
Thanks for the explanation. Since the problem indeed occurs on a production 
system, I''d rather keep experimental patches out of the way.

I assume that adding more RAM on the OSSes is likely to solve this problem, 
right? If that''s the case, I''d probably go this way, before
the FMR patch
is landed.

Thanks,
-- 
Kilian

Kilian CAVALOTTI

2008-Feb-02 22:11 UTC

head link

[Lustre-discuss] o2iblnd no resources

Hi Liang, 

On Friday 01 February 2008 23:39:09 you wrote:> I think it''s because o2iblnd uses fragmented RDMA by default(Max
to
> 256), so we have to set max_send_wr as (concurrent_send * (256 + 1))
> while creating QP by rdma_create_qp(), it takes a lot of resource and
> can make a busy server out of memory sometime.
By the way, is there a way to free some of this memory to resolve the 
problem temporarily, without having to restart the OSS?

Thanks,
-- 
Kilian

Isaac Huang

2008-Feb-03 14:30 UTC

head link

[Lustre-discuss] o2iblnd no resources

On Sat, Feb 02, 2008 at 12:29:07PM -0800, Kilian CAVALOTTI
wrote:> On Saturday 02 February 2008 00:42:47 Isaac Huang wrote:
> > > Here is patch for this problem (using FMR in o2iblnd)
> > > https://bugzilla.lustre.org/attachment.cgi?id=15144
> >
> > This is an experimental patch - nodes with the patch applied are not
> > interoperable with those without it. Please don''t propagate
the patch
> > to production systems.
> 
> Thanks for the explanation. Since the problem indeed occurs on a production
> system, I''d rather keep experimental patches out of the way.
> 
> I assume that adding more RAM on the OSSes is likely to solve this problem,
> right? If that''s the case, I''d probably go this way,
before the FMR patch
> is landed.
It depends on the architectures of the OSSes - o2iblnd, and I believe
OFED too, can''t use memory in ZONE_HIGHMEM. For example, on x86_64
where ZONE_HIGHMEM is empty, adding more RAM will certainly help.

Isaac

Kilian CAVALOTTI

2008-Feb-04 18:53 UTC

head link

[Lustre-discuss] o2iblnd no resources

On Sunday 03 February 2008 06:30:16 am Isaac Huang
wrote:> It depends on the architectures of the OSSes - o2iblnd, and I believe
> OFED too, can''t use memory in ZONE_HIGHMEM. For example, on x86_64
> where ZONE_HIGHMEM is empty, adding more RAM will certainly help.
Good to know, thanks.

On the strange side, this "no resources" message only appears on one 
client. It get it from pretty much all our 8 OSSes, while all the other 
276 clients can still access the filesystem (hence all the 8 OSSes) 
with not a single problem. Rebooting the problematic client doesn''t 
help either.

Does that sound that something that logic can explain? I would assume 
that if the OSS were out of memory, this would indifferently affect all 
the clients, right?

Thanks,
-- 
Kilian

Lustre discuss - Feb 2008 - o2iblnd no resources

[Lustre-discuss] o2iblnd no resources

[Lustre-discuss] o2iblnd no resources

[Lustre-discuss] o2iblnd no resources

[Lustre-discuss] o2iblnd no resources

[Lustre-discuss] o2iblnd no resources

[Lustre-discuss] o2iblnd no resources

[Lustre-discuss] o2iblnd no resources