Our domain controllers have run samba 4.4.3 since it was released. We
didn't upgrade because it was so stable :)
I recently decided that release was too old and upgraded to 4.8.2
(then current). Our domain controllers were crashing on 4.8.2 so I
upgraded to 4.8.3 as soon as it was released but this has not resolved
the issue.
When the issue occurs the DC becomes unresponsive and needs to be
power cycled. The crashes have ocurred approximately once a week since
the update (after 2 years of stability). The following was in
/var/log/messages at the time of the last crash:
Jul 16 14:02:26 soda samba[1472]: [2018/07/16 14:02:26.119996, 0]
../source4/dsdb/kcc/kcc_periodic.c:693(samba_kcc_done)
Jul 16 14:02:26 soda samba[1472]:
../source4/dsdb/kcc/kcc_periodic.c:693: Failed samba_kcc -
NT_STATUS_IO_TIMEOUT
Jul 16 14:02:30 soda samba[27799]: DsCrackNames: Unsupported
operation requested: FFFFFFF8DsCrackNames: Unsupported operation
requested: FFFFFFF8DsCrackNames: Unsupported operation requested:
FFFFFFF8DsCrackNames: Unsupported operation requested:
FFFFFFF8../librpc/rpc/dcerpc_util.c:264: ERROR: pad length mismatch.
Calculated 44 got 0
Jul 16 14:02:41 soda winbindd[1484]: [2018/07/16 14:02:41.104516, 0]
../source3/rpc_server/rpc_ncacn_np.c:1022(rpc_pipe_open_external)
Jul 16 14:02:41 soda winbindd[1484]: Failed to bind external pipe.
Jul 16 14:02:41 soda winbindd[1484]: [2018/07/16 14:02:41.801048, 0]
../source3/winbindd/winbindd_cm.c:1847(wb_open_internal_pipe)
Jul 16 14:02:41 soda winbindd[1484]: open_internal_pipe: Could not
connect to samr pipe: NT_STATUS_IO_TIMEOUT
Jul 16 14:05:18 soda samba[1474]: [2018/07/16 14:05:18.228534, 0]
../source4/dsdb/dns/dns_update.c:330(dnsupdate_nameupdate_done)
Jul 16 14:05:18 soda samba[1474]:
../source4/dsdb/dns/dns_update.c:330: Failed DNS update - with error
code 110
Jul 16 14:05:18 soda samba[1474]: [2018/07/16 14:05:18.371712, 0]
../source4/dsdb/dns/dns_update.c:353(dnsupdate_spnupdate_done)
Jul 16 14:05:18 soda samba[1474]:
../source4/dsdb/dns/dns_update.c:353: Failed SPN update - with error
code 110
Jul 16 14:07:01 soda kernel: possible SYN flooding on port 88. Sending cookies.
Jul 16 14:07:26 soda samba[1472]: [2018/07/16 14:07:26.452894, 0]
../source4/dsdb/kcc/kcc_periodic.c:693(samba_kcc_done)
Jul 16 14:07:26 soda samba[1472]:
../source4/dsdb/kcc/kcc_periodic.c:693: Failed samba_kcc -
NT_STATUS_IO_TIMEOUT
Jul 16 14:09:47 soda kernel: possible SYN flooding on port 88. Sending cookies.
Jul 16 14:10:12 soda winbindd[1484]: [2018/07/16 14:10:12.545854, 0]
../source3/rpc_server/rpc_ncacn_np.c:1022(rpc_pipe_open_external)
Jul 16 14:10:13 soda winbindd[1484]: Failed to bind external pipe.
Jul 16 14:10:13 soda winbindd[1484]: [2018/07/16 14:10:13.483540, 0]
../source3/winbindd/winbindd_cm.c:1847(wb_open_internal_pipe)
Jul 16 14:10:13 soda winbindd[1484]: open_internal_pipe: Could not
connect to samr pipe: NT_STATUS_IO_TIMEOUT
Jul 16 14:10:30 soda samba[1460]: DsCrackNames: Unsupported
operation requested: FFFFFFF8DsCrackNames: Unsupported operation
requested: FFFFFFF8DsCrackNames: Unsupported operation requested:
FFFFFFF8DsCrackNames: Unsupported operation requested: FFFFFFF8IRPC
callback failed for DsReplicaSync - NT_STATUS_IO_TIMEOUT
Jul 16 14:12:30 soda samba[1472]: [2018/07/16 14:12:29.374518, 0]
../source4/dsdb/kcc/kcc_periodic.c:693(samba_kcc_done)
Jul 16 14:12:30 soda samba[1472]:
../source4/dsdb/kcc/kcc_periodic.c:693: Failed samba_kcc -
NT_STATUS_IO_TIMEOUT
Jul 16 14:14:35 soda kernel: samba invoked oom-killer:
gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
Jul 16 14:14:36 soda kernel: samba cpuset=/ mems_allowed=0
Jul 16 14:14:36 soda kernel: Pid: 27742, comm: samba Not tainted
2.6.32-696.30.1.el6.x86_64 #1
Jul 16 14:14:36 soda kernel: Call Trace:
Jul 16 14:14:36 soda kernel: [<ffffffff81134630>] ? dump_header+0x90/0x1b0
Jul 16 14:14:36 soda kernel: [<ffffffff81240312>] ?
security_real_capable_noaudit+0x42/0x70
Jul 16 14:14:36 soda kernel: [<ffffffff81134ab2>] ?
oom_kill_process+0x82/0x2a0
Jul 16 14:14:36 soda kernel: [<ffffffff811349f1>] ?
select_bad_process+0xe1/0x120
Jul 16 14:14:36 soda kernel: [<ffffffff81134ef0>] ?
out_of_memory+0x220/0x3c0
Jul 16 14:14:36 soda kernel: [<ffffffff811418e1>] ?
__alloc_pages_nodemask+0x941/0x960
Jul 16 14:14:36 soda kernel: [<ffffffff81060d0c>] ?
__wake_up_common+0x5c/0x90
Jul 16 14:14:36 soda kernel: [<ffffffff8117aefa>] ?
alloc_pages_vma+0x9a/0x150
Jul 16 14:14:36 soda kernel: [<ffffffff8116e2b2>] ?
read_swap_cache_async+0xf2/0x160
Jul 16 14:14:36 soda kernel: [<ffffffff8116ee09>] ?
valid_swaphandles+0x69/0x160
Jul 16 14:14:36 soda kernel: [<ffffffff8116e3a7>] ?
swapin_readahead+0x87/0xc0
Jul 16 14:14:36 soda kernel: [<ffffffff8115d175>] ?
handle_pte_fault+0x6c5/0xac0
Jul 16 14:14:36 soda kernel: [<ffffffff8117167d>] ?
free_swap_and_cache+0x5d/0x120
Jul 16 14:14:36 soda kernel: [<ffffffff8115d81a>] ?
handle_mm_fault+0x2aa/0x3f0
Jul 16 14:14:36 soda kernel: [<ffffffff81053671>] ?
__do_page_fault+0x141/0x500
Jul 16 14:14:36 soda kernel: [<ffffffff81067b50>] ?
__dequeue_entity+0x30/0x50
Jul 16 14:14:36 soda kernel: [<ffffffff8155f19e>] ?
apic_timer_interrupt+0xe/0x20
Jul 16 14:14:36 soda kernel: [<ffffffff8155f19e>] ?
apic_timer_interrupt+0xe/0x20
Jul 16 14:14:36 soda kernel: [<ffffffff8155a2be>] ?
do_page_fault+0x3e/0xa0
Jul 16 14:14:36 soda kernel: [<ffffffff81557265>] ? page_fault+0x25/0x30
After this it keeps killing processes until the machine is
powercycled. There is nothing relevant logged to log.samba or
log.smbd.
This is a CentOS 6 ESXi VM named soda. The issue occured on another
CentOS 6 domain controller, once, with 4.8.2, but the issue has not
recurred on that DC with 4.8.3. There are three additional domain
controllers, two CentOS 6, one CentOS 7 which have not had the issue
at all under 4.8.2 or 4.8.3. The one its still happening on has the
FSMO roles and is the most used of the VMs (the additional load may
expose the issue more than on the other DCs).
In normal use the VM has 1.5GB of swap free.
Any advice for finding the root cause of the issue and resolving would
be appreciated.
William