Hello, I'm running Samba 4.5.0 and bind-9.8.2-0.47.rc1.el6_8.1. One DC of four, the PDC, is magnitudes slower running /usr/local/samba/sbin/samba_dnsupdate --verbose --all-names. When that is running on that DC it seems to block any queries. The load average is usually under 0.5. The DC was unsafely halted, which could have corrupted something. I ran a dbcheck with samba-tool and it came back clean other than the expected cleanup after upgrading to 4.5.0. Is there any caches or similar that I could try clearing for BIND? Usually at least once a day the memory increases from the typical ~1 GB of usage to everything the box has, 8 GB physical and 10 GB swap, requiring a forceful restart, so there appears to be a memory leak as well. When memory usage is high, it is from smbd process, which I wouldn't think would have a correlation to BIND. Rather than a memory leak, the blocking seen with DNS queries is also blocking smb clients resulting in a pile of connections and high memory usage? The load under this condition is very high, but that is due to high IO and CPU usage from swapping. I had similar behavior with 4.4.5, but it was fine for the first couple of weeks after upgrade. Thanks, Arthur This e-mail and any attachments may contain CONFIDENTIAL information, including PROTECTED HEALTH INFORMATION. If you are not the intended recipient, any use or disclosure of this information is STRICTLY PROHIBITED; you are requested to delete this e-mail and any attachments, notify the sender immediately, and notify the Mediture Privacy Officer at privacyofficer at mediture.com.
I'm hoping the issue is just load balancing, but I'm not sure. I
can't
see to get the traffic balanced across two DCs.
I ran this script on all Linux nodes to balance the traffic.
#!/usr/bin/perl
use strict;
use warnings;
my $primary_name_server;
my $random = int(rand(10));
open(my $resolv_conf_fh, '< /etc/resolv.conf') or die("Unable to
open /etc/resolv.conf for reading: $!");
while(<$resolv_conf_fh>) {
chomp;
if ($_ =~ /nameserver (.*)/) {
$primary_name_server = $1;
last;
}
}
close($resolv_conf_fh);
if (! defined($primary_name_server) || $primary_name_server eq
'192.168.168.64' || $primary_name_server eq '192.168.168.65') {
open(my $resolv_conf_fh, '> /etc/resolv.conf') or
die("Unable to open /etc/resolv.conf for writing: $!");
print $resolv_conf_fh "search mediture.dom\n";
print $resolv_conf_fh "options rotate timeout:1\n";
if ($random >= 4) {
print $resolv_conf_fh "nameserver 192.168.168.64\n";
print $resolv_conf_fh "nameserver 192.168.168.65\n";
} else {
print $resolv_conf_fh "nameserver 192.168.168.65\n";
print $resolv_conf_fh "nameserver 192.168.168.64\n";
}
close($resolv_conf_fh);
if (-f '/usr/bin/wbinfo') {
open(my $krb5_conf_fh, '> /etc/krb5.conf') or
die("Unable to open /etc/krb5.conf for writing: $!");
print $krb5_conf_fh q([logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
default_realm = MEDITURE.DOM
[libdefaults]
default_realm = MEDITURE.DOM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
default_keytab_name = FILE:/etc/krb5.keytab
[realms]
MEDITURE.DOM = {);
if ($random >= 4) {
print $krb5_conf_fh " kdc = dc01.mediture.dom\n";
print $krb5_conf_fh " kdc = dc03.mediture.dom\n";
print $krb5_conf_fh " kdc = dc02.mediture.dom\n";
print $krb5_conf_fh " kdc = dc04.mediture.dom\n";
} else {
print $krb5_conf_fh " kdc = dc03.mediture.dom\n";
print $krb5_conf_fh " kdc = dc01.mediture.dom\n";
print $krb5_conf_fh " kdc = dc04.mediture.dom\n";
print $krb5_conf_fh " kdc = dc02.mediture.dom\n";
}
print $krb5_conf_fh q( default_realm = MEDITURE.DOM
}
[domain_realm]
mediture.dom = MEDITURE.DOM
.mediture.dom = MEDITURE.DOM);
close($krb5_conf_fh);
open(my $smb_conf_fh, '> /etc/samba/smb.conf') or
die("Unable to open /etc/samba/smb.conf for writing: $!");
print $smb_conf_fh q([global]
#--authconfig--start-line--
workgroup = MEDITURE
password server = );
if ($random >= 4) {
print $smb_conf_fh 'dc01.mediture.dom ';
print $smb_conf_fh 'dc03.mediture.dom ';
print $smb_conf_fh 'dc02.mediture.dom ';
print $smb_conf_fh 'dc04.mediture.dom';
} else {
print $smb_conf_fh 'dc03.mediture.dom ';
print $smb_conf_fh 'dc01.mediture.dom ';
print $smb_conf_fh 'dc04.mediture.dom ';
print $smb_conf_fh 'dc02.mediture.dom';
}
print $smb_conf_fh q(
realm = MEDITURE.DOM
security = ads
template homedir = /home/%U
template shell = /bin/bash
winbind use default domain = true
#--authconfig--end-line--
server string = Samba Server Version %v
# logs split per machine
log file = /var/log/samba/log.%m
# max 50KB per log file, then rotate
max log size = 50
passdb backend = tdbsam
winbind refresh tickets = yes
winbind offline logon = yes
winbind use default domain = yes
winbind nss info = rfc2307
winbind enum users = yes
winbind enum groups = yes
winbind nested groups = yes
kerberos method = secrets and keytab
idmap config *: backend = tdb
idmap config *: range = 90000001-100000000
idmap config MEDITURE: backend = ad
idmap config MEDITURE: range = 10000-49999
idmap config MEDITURE: schema mode = rfc2307);
close($smb_conf_fh);
close($resolv_conf_fh);
}
}
I also have AD sites setup and have manually configured SRV records to
perform load balancing.
$ dig +short srv _ldap._tcp.vsc._sites.dc._msdcs.mediture.dom
0 50 389 dc02.mediture.dom.
0 25 389 dc04.mediture.dom.
0 100 389 dc01.mediture.dom.
0 100 389 dc03.mediture.dom.
$ dig +short srv _ldap._tcp.aws._sites.dc._msdcs.mediture.dom
0 25 389 dc02.mediture.dom.
0 100 389 dc04.mediture.dom.
0 50 389 dc01.mediture.dom.
0 50 389 dc03.mediture.dom.
$ dig +short srv _ldap._tcp.epo._sites.dc._msdcs.mediture.dom
0 25 389 dc04.mediture.dom.
0 100 389 DC02.mediture.dom.
0 50 389 dc01.mediture.dom.
0 50 389 dc03.mediture.dom.
$ dig +short srv
_ldap._tcp.Default-First-Site-Name._sites.dc._msdcs.mediture.dom
0 100 389 dc01.mediture.dom.
0 100 389 dc03.mediture.dom.
$ dig +short srv _ldap._tcp.vsc._sites.mediture.dom
0 100 389 dc01.mediture.dom.
0 100 389 dc03.mediture.dom.
0 50 389 dc02.mediture.dom.
0 25 389 dc04.mediture.dom.
$ dig +short srv _ldap._tcp.aws._sites.mediture.dom
0 100 389 dc04.mediture.dom.
0 50 389 dc01.mediture.dom.
0 50 389 dc03.mediture.dom.
0 25 3268 dc02.mediture.dom.
$ dig +short srv _ldap._tcp.epo._sites.mediture.dom
0 25 389 dc04.mediture.dom.
0 100 389 dc02.mediture.dom.
0 50 389 dc01.mediture.dom.
0 50 389 dc03.mediture.dom.
$ dig +short srv _ldap._tcp.Default-First-Site-Name._sites.mediture.dom
0 100 389 dc04.mediture.dom.
0 100 389 dc01.mediture.dom.
0 100 389 dc02.mediture.dom.
0 100 389 dc03.mediture.dom.
I'm not seeing balanced traffic though.
[root at dc01 ~]# netstat -an | grep 445 | grep -c ESTABLISHED
164
[root at dc03 ~]# netstat -an | grep 445 | grep -c ESTABLISHED
10
[root at dc01 ~]# netstat -an | grep 88 | grep -c ESTABLISHED
20
[root at dc03 ~]# netstat -an | grep 88 | grep -c ESTABLISHED
2
[root at dc01 ~]# netstat -an | grep 389 | grep -c ESTABLISHED
175
[root at dc03 ~]# netstat -an | grep 389 | grep -c ESTABLISHED
23
[root at dc01 ~]# netstat -an | grep 636 | grep -c ESTABLISHED
3
[root at dc03 ~]# netstat -an | grep 636 | grep -c ESTABLISHED
7
[root at dc01 ~]# netstat -an | grep 53 | grep -c ESTABLISHED
42
[root at dc03 ~]# netstat -an | grep 53 | grep -c ESTABLISHED
6
I only have a handful of Windows instances joined to the domain at that
site, VSC, but over 100 Linux nodes.
Thanks,
Arthur
On 09/29/2016 10:16 AM, Arthur Ramsey wrote:> Hello,
>
> I'm running Samba 4.5.0 and bind-9.8.2-0.47.rc1.el6_8.1. One DC of
> four, the PDC, is magnitudes slower running
> /usr/local/samba/sbin/samba_dnsupdate --verbose --all-names. When
> that is running on that DC it seems to block any queries. The load
> average is usually under 0.5. The DC was unsafely halted, which could
> have corrupted something. I ran a dbcheck with samba-tool and it came
> back clean other than the expected cleanup after upgrading to 4.5.0.
> Is there any caches or similar that I could try clearing for BIND?
> Usually at least once a day the memory increases from the typical ~1
> GB of usage to everything the box has, 8 GB physical and 10 GB swap,
> requiring a forceful restart, so there appears to be a memory leak as
> well. When memory usage is high, it is from smbd process, which I
> wouldn't think would have a correlation to BIND. Rather than a memory
> leak, the blocking seen with DNS queries is also blocking smb clients
> resulting in a pile of connections and high memory usage? The load
> under this condition is very high, but that is due to high IO and CPU
> usage from swapping. I had similar behavior with 4.4.5, but it was
> fine for the first couple of weeks after upgrade.
>
> Thanks,
> Arthur
This e-mail and any attachments may contain CONFIDENTIAL information, including
PROTECTED HEALTH INFORMATION. If you are not the intended recipient, any use or
disclosure of this information is STRICTLY PROHIBITED; you are requested to
delete this e-mail and any attachments, notify the sender immediately, and
notify the Mediture Privacy Officer at privacyofficer at mediture.com.
I got core dumps when the issue was happening. Here are the backtraces: http://pastebin.com/N0e2fsSQ. Seems to be TDB contention? Thanks, Arthur On 10/7/2016 11:12 AM, Arthur Ramsey wrote:> > I'm hoping the issue is just load balancing, but I'm not sure. I can't > see to get the traffic balanced across two DCs. > > I ran this script on all Linux nodes to balance the traffic. > > #!/usr/bin/perl > use strict; > use warnings; > > my $primary_name_server; > my $random = int(rand(10)); > > open(my $resolv_conf_fh, '< /etc/resolv.conf') or die("Unable to open /etc/resolv.conf for reading: $!"); > while(<$resolv_conf_fh>) { > chomp; > if ($_ =~ /nameserver (.*)/) { > $primary_name_server = $1; > last; > } > } > close($resolv_conf_fh); > > if (! defined($primary_name_server) || $primary_name_server eq '192.168.168.64' || $primary_name_server eq '192.168.168.65') { > open(my $resolv_conf_fh, '> /etc/resolv.conf') or die("Unable to open /etc/resolv.conf for writing: $!"); > print $resolv_conf_fh "search mediture.dom\n"; > print $resolv_conf_fh "options rotate timeout:1\n"; > if ($random >= 4) { > print $resolv_conf_fh "nameserver 192.168.168.64\n"; > print $resolv_conf_fh "nameserver 192.168.168.65\n"; > } else { > print $resolv_conf_fh "nameserver 192.168.168.65\n"; > print $resolv_conf_fh "nameserver 192.168.168.64\n"; > } > close($resolv_conf_fh); > > if (-f '/usr/bin/wbinfo') { > open(my $krb5_conf_fh, '> /etc/krb5.conf') or die("Unable to open /etc/krb5.conf for writing: $!"); > print $krb5_conf_fh q([logging] > default =FILE:/var/log/krb5libs.log > kdc =FILE:/var/log/krb5kdc.log > admin_server =FILE:/var/log/kadmind.log > default_realm = MEDITURE.DOM > > [libdefaults] > default_realm = MEDITURE.DOM > dns_lookup_realm = false > dns_lookup_kdc = false > ticket_lifetime = 24h > renew_lifetime = 7d > forwardable = true > default_keytab_name =FILE:/etc/krb5.keytab > > [realms] > MEDITURE.DOM = {); > if ($random >= 4) { > print $krb5_conf_fh " kdc = dc01.mediture.dom\n"; > print $krb5_conf_fh " kdc = dc03.mediture.dom\n"; > print $krb5_conf_fh " kdc = dc02.mediture.dom\n"; > print $krb5_conf_fh " kdc = dc04.mediture.dom\n"; > } else { > print $krb5_conf_fh " kdc = dc03.mediture.dom\n"; > print $krb5_conf_fh " kdc = dc01.mediture.dom\n"; > print $krb5_conf_fh " kdc = dc04.mediture.dom\n"; > print $krb5_conf_fh " kdc = dc02.mediture.dom\n"; > } > print $krb5_conf_fh q( default_realm = MEDITURE.DOM > } > > [domain_realm] > mediture.dom = MEDITURE.DOM > .mediture.dom = MEDITURE.DOM); > close($krb5_conf_fh); > > open(my $smb_conf_fh, '> /etc/samba/smb.conf') or die("Unable to open /etc/samba/smb.conf for writing: $!"); > print $smb_conf_fh q([global] > #--authconfig--start-line-- > workgroup = MEDITURE > password server = ); > if ($random >= 4) { > print $smb_conf_fh 'dc01.mediture.dom '; > print $smb_conf_fh 'dc03.mediture.dom '; > print $smb_conf_fh 'dc02.mediture.dom '; > print $smb_conf_fh 'dc04.mediture.dom'; > } else { > print $smb_conf_fh 'dc03.mediture.dom '; > print $smb_conf_fh 'dc01.mediture.dom '; > print $smb_conf_fh 'dc04.mediture.dom '; > print $smb_conf_fh 'dc02.mediture.dom'; > } > print $smb_conf_fh q( > realm = MEDITURE.DOM > security = ads > > template homedir = /home/%U > template shell = /bin/bash > > winbind use default domain = true > > #--authconfig--end-line-- > server string = Samba Server Version %v > > # logs split per machine > log file = /var/log/samba/log.%m > # max 50KB per log file, then rotate > max log size = 50 > > passdb backend = tdbsam > > winbind refresh tickets = yes > winbind offline logon = yes > winbind use default domain = yes > winbind nss info = rfc2307 > winbind enum users = yes > winbind enum groups = yes > winbind nested groups = yes > > kerberos method = secrets and keytab > > idmap config *: backend = tdb > idmap config *: range = 90000001-100000000 > > idmap config MEDITURE: backend = ad > idmap config MEDITURE: range = 10000-49999 > idmap config MEDITURE: schema mode = rfc2307); > close($smb_conf_fh); > close($resolv_conf_fh); > } > } > I also have AD sites setup and have manually configured SRV records to > perform load balancing. > $ dig +short srv _ldap._tcp.vsc._sites.dc._msdcs.mediture.dom > 0 50 389 dc02.mediture.dom. > 0 25 389 dc04.mediture.dom. > 0 100 389 dc01.mediture.dom. > 0 100 389 dc03.mediture.dom. > > $ dig +short srv _ldap._tcp.aws._sites.dc._msdcs.mediture.dom > 0 25 389 dc02.mediture.dom. > 0 100 389 dc04.mediture.dom. > 0 50 389 dc01.mediture.dom. > 0 50 389 dc03.mediture.dom. > > $ dig +short srv _ldap._tcp.epo._sites.dc._msdcs.mediture.dom > 0 25 389 dc04.mediture.dom. > 0 100 389 DC02.mediture.dom. > 0 50 389 dc01.mediture.dom. > 0 50 389 dc03.mediture.dom. > > $ dig +short srv _ldap._tcp.Default-First-Site-Name._sites.dc._msdcs.mediture.dom > 0 100 389 dc01.mediture.dom. > 0 100 389 dc03.mediture.dom. > > $ dig +short srv _ldap._tcp.vsc._sites.mediture.dom > 0 100 389 dc01.mediture.dom. > 0 100 389 dc03.mediture.dom. > 0 50 389 dc02.mediture.dom. > 0 25 389 dc04.mediture.dom. > > $ dig +short srv _ldap._tcp.aws._sites.mediture.dom > 0 100 389 dc04.mediture.dom. > 0 50 389 dc01.mediture.dom. > 0 50 389 dc03.mediture.dom. > 0 25 3268 dc02.mediture.dom. > > $ dig +short srv _ldap._tcp.epo._sites.mediture.dom > 0 25 389 dc04.mediture.dom. > 0 100 389 dc02.mediture.dom. > 0 50 389 dc01.mediture.dom. > 0 50 389 dc03.mediture.dom. > > $ dig +short srv _ldap._tcp.Default-First-Site-Name._sites.mediture.dom > 0 100 389 dc04.mediture.dom. > 0 100 389 dc01.mediture.dom. > 0 100 389 dc02.mediture.dom. > 0 100 389 dc03.mediture.dom. > I'm not seeing balanced traffic though. > [root at dc01 ~]# netstat -an | grep 445 | grep -c ESTABLISHED > 164 > [root at dc03 ~]# netstat -an | grep 445 | grep -c ESTABLISHED > 10 > > [root at dc01 ~]# netstat -an | grep 88 | grep -c ESTABLISHED > 20 > [root at dc03 ~]# netstat -an | grep 88 | grep -c ESTABLISHED > 2 > > [root at dc01 ~]# netstat -an | grep 389 | grep -c ESTABLISHED > 175 > [root at dc03 ~]# netstat -an | grep 389 | grep -c ESTABLISHED > 23 > > [root at dc01 ~]# netstat -an | grep 636 | grep -c ESTABLISHED > 3 > [root at dc03 ~]# netstat -an | grep 636 | grep -c ESTABLISHED > 7 > > [root at dc01 ~]# netstat -an | grep 53 | grep -c ESTABLISHED > 42 > [root at dc03 ~]# netstat -an | grep 53 | grep -c ESTABLISHED > 6 > I only have a handful of Windows instances joined to the domain at > that site, VSC, but over 100 Linux nodes. > > Thanks, > Arthur > > On 09/29/2016 10:16 AM, Arthur Ramsey wrote: >> Hello, >> >> I'm running Samba 4.5.0 and bind-9.8.2-0.47.rc1.el6_8.1. One DC of >> four, the PDC, is magnitudes slower running >> /usr/local/samba/sbin/samba_dnsupdate --verbose --all-names. When >> that is running on that DC it seems to block any queries. The load >> average is usually under 0.5. The DC was unsafely halted, which >> could have corrupted something. I ran a dbcheck with samba-tool and >> it came back clean other than the expected cleanup after upgrading to >> 4.5.0. Is there any caches or similar that I could try clearing for >> BIND? Usually at least once a day the memory increases from the >> typical ~1 GB of usage to everything the box has, 8 GB physical and >> 10 GB swap, requiring a forceful restart, so there appears to be a >> memory leak as well. When memory usage is high, it is from smbd >> process, which I wouldn't think would have a correlation to BIND. >> Rather than a memory leak, the blocking seen with DNS queries is also >> blocking smb clients resulting in a pile of connections and high >> memory usage? The load under this condition is very high, but that is >> due to high IO and CPU usage from swapping. I had similar behavior >> with 4.4.5, but it was fine for the first couple of weeks after upgrade. >> >> Thanks, >> ArthurThis e-mail and any attachments may contain CONFIDENTIAL information, including PROTECTED HEALTH INFORMATION. If you are not the intended recipient, any use or disclosure of this information is STRICTLY PROHIBITED; you are requested to delete this e-mail and any attachments, notify the sender immediately, and notify the Mediture Privacy Officer at privacyofficer at mediture.com.