lejeczek
2017-Oct-23 15:59 UTC
[Gluster-users] problems running a vol over IPoIB, and qemu off it?
hi people I wonder if anybody experience any problems with vols in replica mode that run across IPoIB links and libvirt stores qcow image on such a volume? I wonder if maybe devel could confirm it should just work, and then hardware/Infiniband I should blame. I have a direct IPoIB link between two hosts, gluster replica volume, libvirt store disk images there. I start a guest on hostA and I get below on hostB(which is IB subnet manager): [Mon Oct 23 16:43:32 2017] Workqueue: ipoib_wq ipoib_cm_tx_start [ib_ipoib] [Mon Oct 23 16:43:32 2017]? 0000000000008010 00000000553c90b1 ffff880c1c6eb818 ffffffff816a3db1 [Mon Oct 23 16:43:32 2017]? ffff880c1c6eb8a8 ffffffff81188810 0000000000000000 ffff88042ffdb000 [Mon Oct 23 16:43:32 2017]? 0000000000000004 0000000000008010 ffff880c1c6eb8a8 00000000553c90b1 [Mon Oct 23 16:43:32 2017] Call Trace: [Mon Oct 23 16:43:32 2017]? [<ffffffff816a3db1>] dump_stack+0x19/0x1b [Mon Oct 23 16:43:32 2017]? [<ffffffff81188810>] warn_alloc_failed+0x110/0x180 [Mon Oct 23 16:43:32 2017]? [<ffffffff8169fd8a>] __alloc_pages_slowpath+0x6b6/0x724 [Mon Oct 23 16:43:32 2017]? [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420 [Mon Oct 23 16:43:32 2017]? [<ffffffff81030f8f>] dma_generic_alloc_coherent+0x8f/0x140 [Mon Oct 23 16:43:32 2017]? [<ffffffff81065c0d>] gart_alloc_coherent+0x2d/0x40 [Mon Oct 23 16:43:32 2017]? [<ffffffffc012e4d3>] mlx4_buf_direct_alloc.isra.6+0xd3/0x1a0 [mlx4_core] [Mon Oct 23 16:43:32 2017]? [<ffffffffc012e76b>] mlx4_buf_alloc+0x1cb/0x240 [mlx4_core] [Mon Oct 23 16:43:32 2017]? [<ffffffffc04dd85e>] create_qp_common.isra.31+0x62e/0x10d0 [mlx4_ib] [Mon Oct 23 16:43:32 2017]? [<ffffffffc04de44e>] mlx4_ib_create_qp+0x14e/0x480 [mlx4_ib] [Mon Oct 23 16:43:32 2017]? [<ffffffffc06df20c>] ? ipoib_cm_tx_init+0x5c/0x400 [ib_ipoib] [Mon Oct 23 16:43:32 2017]? [<ffffffffc0639c3a>] ib_create_qp+0x7a/0x2f0 [ib_core] [Mon Oct 23 16:43:32 2017]? [<ffffffffc06df2b3>] ipoib_cm_tx_init+0x103/0x400 [ib_ipoib] [Mon Oct 23 16:43:32 2017]? [<ffffffffc06e1608>] ipoib_cm_tx_start+0x268/0x3f0 [ib_ipoib] [Mon Oct 23 16:43:32 2017]? [<ffffffff810a881a>] process_one_work+0x17a/0x440 [Mon Oct 23 16:43:32 2017]? [<ffffffff810a94e6>] worker_thread+0x126/0x3c0 [Mon Oct 23 16:43:32 2017]? [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0 [Mon Oct 23 16:43:32 2017]? [<ffffffff810b098f>] kthread+0xcf/0xe0 [Mon Oct 23 16:43:32 2017]? [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [Mon Oct 23 16:43:32 2017]? [<ffffffff816b4f58>] ret_from_fork+0x58/0x90 [Mon Oct 23 16:43:32 2017]? [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [Mon Oct 23 16:43:32 2017] Mem-Info: [Mon Oct 23 16:43:32 2017] active_anon:2389656 inactive_anon:17792 isolated_anon:0 ?active_file:14294829 inactive_file:14609973 isolated_file:0 ?unevictable:24185 dirty:11846 writeback:9907 unstable:0 ?slab_reclaimable:1024309 slab_unreclaimable:127961 ?mapped:74895 shmem:28096 pagetables:30088 bounce:0 ?free:142329 free_pcp:249 free_cma:0 [Mon Oct 23 16:43:32 2017] Node 0 DMA free:15320kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:64kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes To clarify - other volumes which use that IPoIB link do not seem to case that, or any other problem.
Mohammed Rafi K C
2017-Oct-24 04:45 UTC
[Gluster-users] problems running a vol over IPoIB, and qemu off it?
The backtrace you have provided here suggests that the issue could be with mellanox driver, though the question still valid to users of the IPoIB. Regards Rafi KC On 10/23/2017 09:29 PM, lejeczek wrote:> hi people > > I wonder if anybody experience any problems with vols in replica mode > that run across IPoIB links and libvirt stores qcow image on such a > volume? > > I wonder if maybe devel could confirm it should just work, and then > hardware/Infiniband I should blame. > > I have a direct IPoIB link between two hosts, gluster replica volume, > libvirt store disk images there. > > I start a guest on hostA and I get below on hostB(which is IB subnet > manager): > > [Mon Oct 23 16:43:32 2017] Workqueue: ipoib_wq ipoib_cm_tx_start > [ib_ipoib] > [Mon Oct 23 16:43:32 2017] 0000000000008010 00000000553c90b1 > ffff880c1c6eb818 ffffffff816a3db1 > [Mon Oct 23 16:43:32 2017] ffff880c1c6eb8a8 ffffffff81188810 > 0000000000000000 ffff88042ffdb000 > [Mon Oct 23 16:43:32 2017] 0000000000000004 0000000000008010 > ffff880c1c6eb8a8 00000000553c90b1 > [Mon Oct 23 16:43:32 2017] Call Trace: > [Mon Oct 23 16:43:32 2017] [<ffffffff816a3db1>] dump_stack+0x19/0x1b > [Mon Oct 23 16:43:32 2017] [<ffffffff81188810>] > warn_alloc_failed+0x110/0x180 > [Mon Oct 23 16:43:32 2017] [<ffffffff8169fd8a>] > __alloc_pages_slowpath+0x6b6/0x724 > [Mon Oct 23 16:43:32 2017] [<ffffffff8118cd85>] > __alloc_pages_nodemask+0x405/0x420 > [Mon Oct 23 16:43:32 2017] [<ffffffff81030f8f>] > dma_generic_alloc_coherent+0x8f/0x140 > [Mon Oct 23 16:43:32 2017] [<ffffffff81065c0d>] > gart_alloc_coherent+0x2d/0x40 > [Mon Oct 23 16:43:32 2017] [<ffffffffc012e4d3>] > mlx4_buf_direct_alloc.isra.6+0xd3/0x1a0 [mlx4_core] > [Mon Oct 23 16:43:32 2017] [<ffffffffc012e76b>] > mlx4_buf_alloc+0x1cb/0x240 [mlx4_core] > [Mon Oct 23 16:43:32 2017] [<ffffffffc04dd85e>] > create_qp_common.isra.31+0x62e/0x10d0 [mlx4_ib] > [Mon Oct 23 16:43:32 2017] [<ffffffffc04de44e>] > mlx4_ib_create_qp+0x14e/0x480 [mlx4_ib] > [Mon Oct 23 16:43:32 2017] [<ffffffffc06df20c>] ? > ipoib_cm_tx_init+0x5c/0x400 [ib_ipoib] > [Mon Oct 23 16:43:32 2017] [<ffffffffc0639c3a>] > ib_create_qp+0x7a/0x2f0 [ib_core] > [Mon Oct 23 16:43:32 2017] [<ffffffffc06df2b3>] > ipoib_cm_tx_init+0x103/0x400 [ib_ipoib] > [Mon Oct 23 16:43:32 2017] [<ffffffffc06e1608>] > ipoib_cm_tx_start+0x268/0x3f0 [ib_ipoib] > [Mon Oct 23 16:43:32 2017] [<ffffffff810a881a>] > process_one_work+0x17a/0x440 > [Mon Oct 23 16:43:32 2017] [<ffffffff810a94e6>] > worker_thread+0x126/0x3c0 > [Mon Oct 23 16:43:32 2017] [<ffffffff810a93c0>] ? > manage_workers.isra.24+0x2a0/0x2a0 > [Mon Oct 23 16:43:32 2017] [<ffffffff810b098f>] kthread+0xcf/0xe0 > [Mon Oct 23 16:43:32 2017] [<ffffffff810b08c0>] ? > insert_kthread_work+0x40/0x40 > [Mon Oct 23 16:43:32 2017] [<ffffffff816b4f58>] ret_from_fork+0x58/0x90 > [Mon Oct 23 16:43:32 2017] [<ffffffff810b08c0>] ? > insert_kthread_work+0x40/0x40 > [Mon Oct 23 16:43:32 2017] Mem-Info: > [Mon Oct 23 16:43:32 2017] active_anon:2389656 inactive_anon:17792 > isolated_anon:0 > active_file:14294829 inactive_file:14609973 isolated_file:0 > unevictable:24185 dirty:11846 writeback:9907 unstable:0 > slab_reclaimable:1024309 slab_unreclaimable:127961 > mapped:74895 shmem:28096 pagetables:30088 bounce:0 > free:142329 free_pcp:249 free_cma:0 > [Mon Oct 23 16:43:32 2017] Node 0 DMA free:15320kB min:24kB low:28kB > high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB > inactive_file:0kB unevictable:0kB isolated(anon):0kB > isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB > dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB > slab_unreclaimable:64kB kernel_stack:0kB pagetables:0kB unstable:0kB > bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB > pages_scanned:0 all_unreclaimable? yes > > > To clarify - other volumes which use that IPoIB link do not seem to > case that, or any other problem. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users
Apparently Analagous Threads
- temp fix: Simultaneous reads and writes from specific apps to IPoIB volume seem to conflict and kill performance.
- Areca RAID controller on latest CentOS 7 (1708 i.e. RHEL 7.4) kernel 3.10.0-693.2.2.el7.x86_64
- Xen CentOS 7.3 server + CentOS 7.3 VM fails to boot after CR updates (applied to VM)!
- mounting an nfs4 file system as v4.0 in CentOS 7.4?
- Re: [openib-general] problems with lustre o2ib module & ofed