thr3ads.net - Gluster users - [Gluster-users] OOM Kills glustershd process in 3.10.1 [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Amudhan P

2017-Apr-25 17:31 UTC

[Gluster-users] OOM Kills glustershd process in 3.10.1

Yes, I have enabled bitrot process and it's currently running signer
process in some nodes.

Disabling and enabling bitrot doesn't makes difference it will start crawl
process again right.

On Tuesday, April 25, 2017, Atin Mukherjee <amukherj at redhat.com>
wrote:>
>
> On Tue, Apr 25, 2017 at 9:22 PM, Amudhan P <amudhan83 at gmail.com>
wrote:
>>
>> Hi Pranith,
>> if I restart glusterd service in the node alone will it work. bcoz Ifeel that doing volume force start will trigger bitrot process to crawl
disks in all nodes.>
> Have you enabled bitrot? If not then the process will not be inexistence. As a workaround you can always disable this option before
executing volume start force. Please note volume start force doesn't affect
any running processes.>
>>
>> yes, rebalance fix layout is on process.
>> regards
>> Amudhan
>>
>> On Tue, Apr 25, 2017 at 9:15 PM, Pranith Kumar Karampuri <
pkarampu at redhat.com> wrote:>>>
>>> You can restart the process using:
>>> gluster volume start <volname> force
>>>
>>> Did shd on this node heal a lot of data? Based on the kind of
memory
usage it showed, seems like there is a leak.>>>
>>>
>>> Sunil,
>>>        Could you find if there any leaks in this particular version
that we might have missed in our testing?>>>
>>> On Tue, Apr 25, 2017 at 8:37 PM, Amudhan P <amudhan83 at
gmail.com> wrote:
>>>>
>>>> Hi,
>>>> In one of my node glustershd process is killed due to OOM and
this
happened only in one node out of 40 node cluster.>>>> Node running on Ubuntu 16.04.2.
>>>> dmesg output:
>>>> [Mon Apr 24 17:21:38 2017] nrpe invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0>>>> [Mon Apr 24 17:21:38 2017] nrpe cpuset=/ mems_allowed=0
>>>> [Mon Apr 24 17:21:38 2017] CPU: 0 PID: 12626 Comm: nrpe Not
tainted
4.4.0-62-generic #83-Ubuntu>>>> [Mon Apr 24 17:21:38 2017]  0000000000000286 00000000fc26b170
ffff88048bf27af0 ffffffff813f7c63>>>> [Mon Apr 24 17:21:38 2017]  ffff88048bf27cc8 ffff88082a663c00
ffff88048bf27b60 ffffffff8120ad4e>>>> [Mon Apr 24 17:21:38 2017]  ffff88087781a870 ffff88087781a860
ffffea0011285a80 0000000100000001>>>> [Mon Apr 24 17:21:38 2017] Call Trace:
>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff813f7c63>]
dump_stack+0x63/0x90
>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8120ad4e>]
dump_header+0x5a/0x1c5
>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff811926c2>]
oom_kill_process+0x202/0x3c0>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81192ae9>]
out_of_memory+0x219/0x460>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81198a5d>]
__alloc_pages_slowpath.constprop.88+0x8fd/0xa70>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81198e56>]
__alloc_pages_nodemask+0x286/0x2a0>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81198f0b>]
alloc_kmem_pages_node+0x4b/0xc0>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8107ea5e>]
copy_process+0x1be/0x1b70>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8122d013>] ?
__fd_install+0x33/0xe0>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81713d01>] ?
release_sock+0x111/0x160>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff810805a0>]
_do_fork+0x80/0x360
>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8122429c>] ?
SyS_select+0xcc/0x110>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81080929>]
SyS_clone+0x19/0x20
>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff818385f2>]
entry_SYSCALL_64_fastpath+0x16/0x71>>>> [Mon Apr 24 17:21:38 2017] Mem-Info:
>>>> [Mon Apr 24 17:21:38 2017] active_anon:553952
inactive_anon:206987
isolated_anon:0>>>>                             active_file:3410764
inactive_file:3460179
isolated_file:0>>>>                             unevictable:4914 dirty:212868
writeback:0
unstable:0>>>>                             slab_reclaimable:386621
slab_unreclaimable:31829>>>>                             mapped:6112 shmem:211
pagetables:6178
bounce:0>>>>                             free:82623 free_pcp:213 free_cma:0
>>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA free:15880kB min:32kB
low:40kB
high:48kB active_anon:0kB inactive_anon:0k>>>> B active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB
isolated(file):0kB present:15964kB manag>>>> ed:15880kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB
shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB>>>>  kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
free_pcp:0kB
local_pcp:0kB free_cma:0kB writeback_tmp:>>>> 0kB pages_scanned:0 all_unreclaimable? yes
>>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 1868 31944 31944
31944
>>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA32 free:133096kB
min:3948kB
low:4932kB high:5920kB active_anon:170764kB in>>>> active_anon:206296kB active_file:394236kB
inactive_file:525288kB
unevictable:980kB isolated(anon):0kB isolated(>>>> file):0kB present:2033596kB managed:1952976kB mlocked:980kB
dirty:1552kB writeback:0kB mapped:3904kB shmem:724k>>>> B slab_reclaimable:502176kB slab_unreclaimable:8916kBkernel_stack:1952kB pagetables:1408kB unstable:0kB
bounce>>>> :0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no>>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 0 30076 30076
30076
>>>> [Mon Apr 24 17:21:38 2017] Node 0 Normal free:181516kB
min:63600kB
low:79500kB high:95400kB active_anon:2045044>>>> kB inactive_anon:621652kB active_file:13248820kBinactive_file:13315428kB unevictable:18676kB isolated(anon):0kB
isolated(file):0kB present:31322112kB managed:30798036kB mlocked:18676kB
dirty:849920kB writeback:0kB mapped:20544kB shmem:120kB
slab_reclaimable:1044308kB slab_unreclaimable:118400kB kernel_stack:33792kB
pagetables:23304kB unstable:0kB bounce:0kB free_pcp:852kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no>>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 0 0 0 0
>>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB
0*32kB
2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB>>>>  1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
>>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA32: 18416*4kB (UME)
7480*8kB
(UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*>>>> 512kB 0*1024kB 0*2048kB 0*4096kB = 133504kB
>>>> [Mon Apr 24 17:21:38 2017] Node 0 Normal: 44972*4kB (UMEH)
13*8kB (EH)
13*16kB (H) 13*32kB (H) 8*64kB (H) 2*128>>>> kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 181384kB
>>>> [Mon Apr 24 17:21:38 2017] Node 0 hugepages_total=0
hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB>>>> [Mon Apr 24 17:21:38 2017] Node 0 hugepages_total=0
hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB>>>> [Mon Apr 24 17:21:38 2017] 6878703 total pagecache pages
>>>> [Mon Apr 24 17:21:38 2017] 2484 pages in swap cache
>>>> [Mon Apr 24 17:21:38 2017] Swap cache stats: add 3533870,
delete
3531386, find 3743168/4627884>>>> [Mon Apr 24 17:21:38 2017] Free swap  = 14976740kB
>>>> [Mon Apr 24 17:21:38 2017] Total swap = 15623164kB
>>>> [Mon Apr 24 17:21:38 2017] 8342918 pages RAM
>>>> [Mon Apr 24 17:21:38 2017] 0 pages HighMem/MovableOnly
>>>> [Mon Apr 24 17:21:38 2017] 151195 pages reserved
>>>> [Mon Apr 24 17:21:38 2017] 0 pages cma reserved
>>>> [Mon Apr 24 17:21:38 2017] 0 pages hwpoisoned
>>>> [Mon Apr 24 17:21:38 2017] [ pid ]   uid  tgid total_vm     
rss
nr_ptes nr_pmds swapents oom_score_adj name>>>> [Mon Apr 24 17:21:38 2017] [  566]     0   566    15064     
460
 33       3     1108             0 systemd>>>> -journal
>>>> [Mon Apr 24 17:21:38 2017] [  602]     0   602    23693     
182
 16       3        0             0 lvmetad>>>> [Mon Apr 24 17:21:38 2017] [  613]     0   613    11241     
589
 21       3      264         -1000 systemd>>>> -udevd
>>>> [Mon Apr 24 17:21:38 2017] [ 1381]   100  1381    25081     
440
 19       3       25             0 systemd>>>> -timesyn
>>>> [Mon Apr 24 17:21:38 2017] [ 1447]     0  1447     1100     
307
7       3        0             0 acpid>>>> [Mon Apr 24 17:21:38 2017] [ 1449]     0  1449     7252     
374
 21       3       47             0 cron>>>> [Mon Apr 24 17:21:38 2017] [ 1451]     0  1451    77253     
994
 19       3       10             0 lxcfs>>>> [Mon Apr 24 17:21:38 2017] [ 1483]     0  1483     6511     
413
 18       3       42             0 atd>>>> [Mon Apr 24 17:21:38 2017] [ 1505]     0  1505     7157     
286
 18       3       36             0 systemd>>>> -logind
>>>> [Mon Apr 24 17:21:38 2017] [ 1508]   104  1508    64099     
376
 27       4      712             0 rsyslog>>>> d
>>>> [Mon Apr 24 17:21:38 2017] [ 1510]   107  1510    10723     
497
 25       3       45          -900 dbus-da>>>> emon
>>>> [Mon Apr 24 17:21:38 2017] [ 1521]     0  1521    68970     
178
 38       3      170             0 account>>>> s-daemon
>>>> [Mon Apr 24 17:21:38 2017] [ 1526]     0  1526     6548     
785
 16       3       63             0 smartd>>>> [Mon Apr 24 17:21:38 2017] [ 1528]     0  1528    54412     
146
 31       5     1806             0 snapd>>>> [Mon Apr 24 17:21:38 2017] [ 1578]     0  1578     3416     
335
 11       3       24             0 mdadm>>>> [Mon Apr 24 17:21:38 2017] [ 1595]     0  1595    16380     
470
 35       3      157         -1000 sshd>>>> [Mon Apr 24 17:21:38 2017] [ 1610]     0  1610    69295     
303
 40       4       57             0 polkitd>>>> [Mon Apr 24 17:21:38 2017] [ 1618]     0  1618     1306      
31
8       3        0             0 iscsid>>>> [Mon Apr 24 17:21:38 2017] [ 1619]     0  1619     1431     
877
8       3        0           -17 iscsid>>>> [Mon Apr 24 17:21:38 2017] [ 1624]     0  1624   126363    
8027
122       4    22441             0 gluster>>>> d
>>>> [Mon Apr 24 17:21:38 2017] [ 1688]     0  1688     4884     
430
 15       3       46             0 irqbala>>>> nce
>>>> [Mon Apr 24 17:21:38 2017] [ 1699]     0  1699     3985     
348
 13       3        0             0 agetty>>>> [Mon Apr 24 17:21:38 2017] [ 7001]     0  7001   500631   
27874
145       5     3356             0 gluster>>>> fsd
>>>> [Mon Apr 24 17:21:38 2017] [ 8136]     0  8136   500631   
28760
141       5     2390             0 gluster>>>> fsd
>>>> [Mon Apr 24 17:21:38 2017] [ 9280]     0  9280   533529   
27752
135       5     3200             0 gluster>>>> fsd
>>>> [Mon Apr 24 17:21:38 2017] [12626]   111 12626     5991     
420
 16       3      113             0 nrpe>>>> [Mon Apr 24 17:21:38 2017] [14342]     0 14342   533529   
28377
135       5     2176             0 gluster>>>> fsd
>>>> [Mon Apr 24 17:21:38 2017] [14361]     0 14361   534063   
29190
136       5     1972             0 gluster>>>> fsd
>>>> [Mon Apr 24 17:21:38 2017] [14380]     0 14380   533529   
28104
136       6     2437             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [14399]     0 14399   533529   
27552
131       5     2808             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [14418]     0 14418   533529   
29588
138       5     2697             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [14437]     0 14437   517080   
28671
146       5     2170             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [14456]     0 14456   533529   
28083
139       5     3359             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [14475]     0 14475   533529   
28054
134       5     2954             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [14494]     0 14494   533529   
28594
135       5     2311             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [14513]     0 14513   533529   
28911
138       5     2833             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [14532]     0 14532   533529   
28259
134       6     3145             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [14551]     0 14551   533529   
27875
138       5     2267             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [14570]     0 14570   484716   
28247
142       5     2875             0 glusterfsd>>>> [Mon Apr 24 17:21:38 2017] [27646]     0 27646  3697561  
202086
 2830      17    16528             0 glusterfs>>>> [Mon Apr 24 17:21:38 2017] [27655]     0 27655   787371   
29588
197       6    25472             0 glusterfs>>>> [Mon Apr 24 17:21:38 2017] [27665]     0 27665   689585     
605
108       6     7008             0 glusterfs>>>> [Mon Apr 24 17:21:38 2017] [29878]     0 29878   193833   
36054
241       4    41182             0 glusterfs>>>> [Mon Apr 24 17:21:38 2017] Out of memory: Kill process 27646
(glusterfs) score 17 or sacrifice child>>>> [Mon Apr 24 17:21:38 2017] Killed process 27646 (glusterfs)
total-vm:14790244kB, anon-rss:795040kB, file-rss:13304kB>>>> /var/log/glusterfs/glusterd.log
>>>> [2017-04-24 11:53:51.359603] I [MSGID: 106006][glusterd-svc-mgmt.c:327:glusterd_svc_common_rpc_notify] 0-management:
glustershd has disconnected from glusterd.>>>> what would have gone wrong?
>>>> regards
>>>> Amudhan
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> --
>>> Pranith
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170425/d3ddf607/attachment.html>

Amudhan P

2017-Apr-26 07:09 UTC

head link

[Gluster-users] OOM Kills glustershd process in 3.10.1

I did volume start force and now self-heal daemon is up on the node which
was down.

But bitrot has triggered crawling process on all node now,  why was it
crawling disk again?  if the process is running already.

[output from bitd.log]
[2017-04-13 06:01:23.930089] I [glusterfsd-mgmt.c:1778:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2017-04-26 06:51:46.998935] I [MSGID: 100030] [glusterfsd.c:2460:main]
0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs
version 3.10.1 (args: /usr/local/sbin/glusterfs -s localhost --volfile-id
gluster/bitd -p /var/lib/glusterd/bitd/run/bitd.pid -l
/var/log/glusterfs/bitd.log -S
/var/run/gluster/02f1dd346d47b9006f9bf64e347338fd.socket
--global-timer-wheel)
[2017-04-26 06:51:47.002732] I [MSGID: 101190]
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1


On Tue, Apr 25, 2017 at 11:01 PM, Amudhan P <amudhan83 at gmail.com>
wrote:
> Yes, I have enabled bitrot process and it's currently running signer
> process in some nodes.
>
> Disabling and enabling bitrot doesn't makes difference it will start
crawl
> process again right.
>
>
> On Tuesday, April 25, 2017, Atin Mukherjee <amukherj at redhat.com>
wrote:
> >
> >
> > On Tue, Apr 25, 2017 at 9:22 PM, Amudhan P <amudhan83 at
gmail.com> wrote:
> >>
> >> Hi Pranith,
> >> if I restart glusterd service in the node alone will it work. bcoz
I
> feel that doing volume force start will trigger bitrot process to crawl
> disks in all nodes.
> >
> > Have you enabled bitrot? If not then the process will not be in
> existence. As a workaround you can always disable this option before
> executing volume start force. Please note volume start force doesn't
affect
> any running processes.
> >
> >>
> >> yes, rebalance fix layout is on process.
> >> regards
> >> Amudhan
> >>
> >> On Tue, Apr 25, 2017 at 9:15 PM, Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
> >>>
> >>> You can restart the process using:
> >>> gluster volume start <volname> force
> >>>
> >>> Did shd on this node heal a lot of data? Based on the kind of
memory
> usage it showed, seems like there is a leak.
> >>>
> >>>
> >>> Sunil,
> >>>        Could you find if there any leaks in this particular
version
> that we might have missed in our testing?
> >>>
> >>> On Tue, Apr 25, 2017 at 8:37 PM, Amudhan P <amudhan83 at
gmail.com>
> wrote:
> >>>>
> >>>> Hi,
> >>>> In one of my node glustershd process is killed due to OOM
and this
> happened only in one node out of 40 node cluster.
> >>>> Node running on Ubuntu 16.04.2.
> >>>> dmesg output:
> >>>> [Mon Apr 24 17:21:38 2017] nrpe invoked oom-killer:
> gfp_mask=0x26000c0, order=2, oom_score_adj=0
> >>>> [Mon Apr 24 17:21:38 2017] nrpe cpuset=/ mems_allowed=0
> >>>> [Mon Apr 24 17:21:38 2017] CPU: 0 PID: 12626 Comm: nrpe
Not tainted
> 4.4.0-62-generic #83-Ubuntu
> >>>> [Mon Apr 24 17:21:38 2017]  0000000000000286
00000000fc26b170
> ffff88048bf27af0 ffffffff813f7c63
> >>>> [Mon Apr 24 17:21:38 2017]  ffff88048bf27cc8
ffff88082a663c00
> ffff88048bf27b60 ffffffff8120ad4e
> >>>> [Mon Apr 24 17:21:38 2017]  ffff88087781a870
ffff88087781a860
> ffffea0011285a80 0000000100000001
> >>>> [Mon Apr 24 17:21:38 2017] Call Trace:
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff813f7c63>]
dump_stack+0x63/0x90
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8120ad4e>]
> dump_header+0x5a/0x1c5
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff811926c2>]
> oom_kill_process+0x202/0x3c0
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81192ae9>]
> out_of_memory+0x219/0x460
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81198a5d>]
> __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81198e56>]
> __alloc_pages_nodemask+0x286/0x2a0
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81198f0b>]
> alloc_kmem_pages_node+0x4b/0xc0
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8107ea5e>]
> copy_process+0x1be/0x1b70
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8122d013>] ?
> __fd_install+0x33/0xe0
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81713d01>] ?
> release_sock+0x111/0x160
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff810805a0>]
_do_fork+0x80/0x360
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8122429c>] ?
> SyS_select+0xcc/0x110
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81080929>]
SyS_clone+0x19/0x20
> >>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff818385f2>]
> entry_SYSCALL_64_fastpath+0x16/0x71
> >>>> [Mon Apr 24 17:21:38 2017] Mem-Info:
> >>>> [Mon Apr 24 17:21:38 2017] active_anon:553952
inactive_anon:206987
> isolated_anon:0
> >>>>                             active_file:3410764
inactive_file:3460179
> isolated_file:0
> >>>>                             unevictable:4914 dirty:212868
writeback:0
> unstable:0
> >>>>                             slab_reclaimable:386621
> slab_unreclaimable:31829
> >>>>                             mapped:6112 shmem:211
pagetables:6178
> bounce:0
> >>>>                             free:82623 free_pcp:213
free_cma:0
> >>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA free:15880kB
min:32kB low:40kB
> high:48kB active_anon:0kB inactive_anon:0k
> >>>> B active_file:0kB inactive_file:0kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB present:15964kB manag
> >>>> ed:15880kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB
shmem:0kB
> slab_reclaimable:0kB slab_unreclaimable:0kB
> >>>>  kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
free_pcp:0kB
> local_pcp:0kB free_cma:0kB writeback_tmp:
> >>>> 0kB pages_scanned:0 all_unreclaimable? yes
> >>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 1868 31944
31944 31944
> >>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA32 free:133096kB
min:3948kB
> low:4932kB high:5920kB active_anon:170764kB in
> >>>> active_anon:206296kB active_file:394236kB
inactive_file:525288kB
> unevictable:980kB isolated(anon):0kB isolated(
> >>>> file):0kB present:2033596kB managed:1952976kB
mlocked:980kB
> dirty:1552kB writeback:0kB mapped:3904kB shmem:724k
> >>>> B slab_reclaimable:502176kB slab_unreclaimable:8916kB
> kernel_stack:1952kB pagetables:1408kB unstable:0kB bounce
> >>>> :0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? no
> >>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 0 30076
30076 30076
> >>>> [Mon Apr 24 17:21:38 2017] Node 0 Normal free:181516kB
min:63600kB
> low:79500kB high:95400kB active_anon:2045044
> >>>> kB inactive_anon:621652kB active_file:13248820kB
> inactive_file:13315428kB unevictable:18676kB isolated(anon):0kB
> isolated(file):0kB present:31322112kB managed:30798036kB mlocked:18676kB
> dirty:849920kB writeback:0kB mapped:20544kB shmem:120kB
> slab_reclaimable:1044308kB slab_unreclaimable:118400kB kernel_stack:33792kB
> pagetables:23304kB unstable:0kB bounce:0kB free_pcp:852kB local_pcp:0kB
> free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 0 0 0 0
> >>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA: 0*4kB 1*8kB (U)
0*16kB 0*32kB
> 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB
> >>>>  1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> >>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA32: 18416*4kB (UME)
7480*8kB
> (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*
> >>>> 512kB 0*1024kB 0*2048kB 0*4096kB = 133504kB
> >>>> [Mon Apr 24 17:21:38 2017] Node 0 Normal: 44972*4kB (UMEH)
13*8kB
> (EH) 13*16kB (H) 13*32kB (H) 8*64kB (H) 2*128
> >>>> kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
181384kB
> >>>> [Mon Apr 24 17:21:38 2017] Node 0 hugepages_total=0
hugepages_free=0
> hugepages_surp=0 hugepages_size=1048576kB
> >>>> [Mon Apr 24 17:21:38 2017] Node 0 hugepages_total=0
hugepages_free=0
> hugepages_surp=0 hugepages_size=2048kB
> >>>> [Mon Apr 24 17:21:38 2017] 6878703 total pagecache pages
> >>>> [Mon Apr 24 17:21:38 2017] 2484 pages in swap cache
> >>>> [Mon Apr 24 17:21:38 2017] Swap cache stats: add 3533870,
delete
> 3531386, find 3743168/4627884
> >>>> [Mon Apr 24 17:21:38 2017] Free swap  = 14976740kB
> >>>> [Mon Apr 24 17:21:38 2017] Total swap = 15623164kB
> >>>> [Mon Apr 24 17:21:38 2017] 8342918 pages RAM
> >>>> [Mon Apr 24 17:21:38 2017] 0 pages HighMem/MovableOnly
> >>>> [Mon Apr 24 17:21:38 2017] 151195 pages reserved
> >>>> [Mon Apr 24 17:21:38 2017] 0 pages cma reserved
> >>>> [Mon Apr 24 17:21:38 2017] 0 pages hwpoisoned
> >>>> [Mon Apr 24 17:21:38 2017] [ pid ]   uid  tgid total_vm   
rss
> nr_ptes nr_pmds swapents oom_score_adj name
> >>>> [Mon Apr 24 17:21:38 2017] [  566]     0   566    15064   
460
>  33       3     1108             0 systemd
> >>>> -journal
> >>>> [Mon Apr 24 17:21:38 2017] [  602]     0   602    23693   
182
>  16       3        0             0 lvmetad
> >>>> [Mon Apr 24 17:21:38 2017] [  613]     0   613    11241   
589
>  21       3      264         -1000 systemd
> >>>> -udevd
> >>>> [Mon Apr 24 17:21:38 2017] [ 1381]   100  1381    25081   
440
>  19       3       25             0 systemd
> >>>> -timesyn
> >>>> [Mon Apr 24 17:21:38 2017] [ 1447]     0  1447     1100   
307
>   7       3        0             0 acpid
> >>>> [Mon Apr 24 17:21:38 2017] [ 1449]     0  1449     7252   
374
>  21       3       47             0 cron
> >>>> [Mon Apr 24 17:21:38 2017] [ 1451]     0  1451    77253   
994
>  19       3       10             0 lxcfs
> >>>> [Mon Apr 24 17:21:38 2017] [ 1483]     0  1483     6511   
413
>  18       3       42             0 atd
> >>>> [Mon Apr 24 17:21:38 2017] [ 1505]     0  1505     7157   
286
>  18       3       36             0 systemd
> >>>> -logind
> >>>> [Mon Apr 24 17:21:38 2017] [ 1508]   104  1508    64099   
376
>  27       4      712             0 rsyslog
> >>>> d
> >>>> [Mon Apr 24 17:21:38 2017] [ 1510]   107  1510    10723   
497
>  25       3       45          -900 dbus-da
> >>>> emon
> >>>> [Mon Apr 24 17:21:38 2017] [ 1521]     0  1521    68970   
178
>  38       3      170             0 account
> >>>> s-daemon
> >>>> [Mon Apr 24 17:21:38 2017] [ 1526]     0  1526     6548   
785
>  16       3       63             0 smartd
> >>>> [Mon Apr 24 17:21:38 2017] [ 1528]     0  1528    54412   
146
>  31       5     1806             0 snapd
> >>>> [Mon Apr 24 17:21:38 2017] [ 1578]     0  1578     3416   
335
>  11       3       24             0 mdadm
> >>>> [Mon Apr 24 17:21:38 2017] [ 1595]     0  1595    16380   
470
>  35       3      157         -1000 sshd
> >>>> [Mon Apr 24 17:21:38 2017] [ 1610]     0  1610    69295   
303
>  40       4       57             0 polkitd
> >>>> [Mon Apr 24 17:21:38 2017] [ 1618]     0  1618     1306   
31
>   8       3        0             0 iscsid
> >>>> [Mon Apr 24 17:21:38 2017] [ 1619]     0  1619     1431   
877
>   8       3        0           -17 iscsid
> >>>> [Mon Apr 24 17:21:38 2017] [ 1624]     0  1624   126363   
8027
> 122       4    22441             0 gluster
> >>>> d
> >>>> [Mon Apr 24 17:21:38 2017] [ 1688]     0  1688     4884   
430
>  15       3       46             0 irqbala
> >>>> nce
> >>>> [Mon Apr 24 17:21:38 2017] [ 1699]     0  1699     3985   
348
>  13       3        0             0 agetty
> >>>> [Mon Apr 24 17:21:38 2017] [ 7001]     0  7001   500631   
27874
> 145       5     3356             0 gluster
> >>>> fsd
> >>>> [Mon Apr 24 17:21:38 2017] [ 8136]     0  8136   500631   
28760
> 141       5     2390             0 gluster
> >>>> fsd
> >>>> [Mon Apr 24 17:21:38 2017] [ 9280]     0  9280   533529   
27752
> 135       5     3200             0 gluster
> >>>> fsd
> >>>> [Mon Apr 24 17:21:38 2017] [12626]   111 12626     5991   
420
>  16       3      113             0 nrpe
> >>>> [Mon Apr 24 17:21:38 2017] [14342]     0 14342   533529   
28377
> 135       5     2176             0 gluster
> >>>> fsd
> >>>> [Mon Apr 24 17:21:38 2017] [14361]     0 14361   534063   
29190
> 136       5     1972             0 gluster
> >>>> fsd
> >>>> [Mon Apr 24 17:21:38 2017] [14380]     0 14380   533529   
28104
> 136       6     2437             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [14399]     0 14399   533529   
27552
> 131       5     2808             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [14418]     0 14418   533529   
29588
> 138       5     2697             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [14437]     0 14437   517080   
28671
> 146       5     2170             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [14456]     0 14456   533529   
28083
> 139       5     3359             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [14475]     0 14475   533529   
28054
> 134       5     2954             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [14494]     0 14494   533529   
28594
> 135       5     2311             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [14513]     0 14513   533529   
28911
> 138       5     2833             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [14532]     0 14532   533529   
28259
> 134       6     3145             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [14551]     0 14551   533529   
27875
> 138       5     2267             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [14570]     0 14570   484716   
28247
> 142       5     2875             0 glusterfsd
> >>>> [Mon Apr 24 17:21:38 2017] [27646]     0 27646  3697561  
202086
>  2830      17    16528             0 glusterfs
> >>>> [Mon Apr 24 17:21:38 2017] [27655]     0 27655   787371   
29588
> 197       6    25472             0 glusterfs
> >>>> [Mon Apr 24 17:21:38 2017] [27665]     0 27665   689585   
605
> 108       6     7008             0 glusterfs
> >>>> [Mon Apr 24 17:21:38 2017] [29878]     0 29878   193833   
36054
> 241       4    41182             0 glusterfs
> >>>> [Mon Apr 24 17:21:38 2017] Out of memory: Kill process
27646
> (glusterfs) score 17 or sacrifice child
> >>>> [Mon Apr 24 17:21:38 2017] Killed process 27646
(glusterfs)
> total-vm:14790244kB, anon-rss:795040kB, file-rss:13304kB
> >>>> /var/log/glusterfs/glusterd.log
> >>>> [2017-04-24 11:53:51.359603] I [MSGID: 106006]
> [glusterd-svc-mgmt.c:327:glusterd_svc_common_rpc_notify] 0-management:
> glustershd has disconnected from glusterd.
> >>>> what would have gone wrong?
> >>>> regards
> >>>> Amudhan
> >>>>
> >>>> _______________________________________________
> >>>> Gluster-users mailing list
> >>>> Gluster-users at gluster.org
> >>>> http://lists.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>>
> >>>
> >>> --
> >>> Pranith
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170426/19cea279/attachment.html>

Gluster users - Apr 2017 - OOM Kills glustershd process in 3.10.1

[Gluster-users] OOM Kills glustershd process in 3.10.1

[Gluster-users] OOM Kills glustershd process in 3.10.1