So an update on the situation as we've been testing internally all day on
this:
We messed with branching the process and the solution we came up with
was to change the script being called in dhcpd.conf.
Instead of directly calling /usr/local/bin/dhcp-dyndns.sh, dhcpd now
calls another script (dubbed asyncdyndns.sh), which forks the process
and allows dhcpd to continue on it's way. After testing this
implementation on our test network, the problem was resolved and the
dhcp-dyndns script is still being run albeit in parallel now. The
script is below:
```
#!/bin/bash
runWithDelay() {
sleep 1;
"${@}";
}
runWithDelay /usr/local/bin/dhcp-dyndns.sh "$@" &
echo Running DynamicDNS Script
```
We'll be testing on the production network tomorrow and I'll report
how the test goes!
Oh, it's also worth noting the other change I noticed was the
dhcp-dyndns.cc file generated for Kerberos is now owned and accessed
by dhcpd rather than root, not sure if that would affect anything, but
the script was able to renew and generate the ticket so it seemed to
work fine.
Regards,
Ralph
On Tue, Feb 23, 2021 at 3:39 PM ralph strebbing
<blackbirdralph at gmail.com> wrote:>
> Hi All,
>
> So a couple weeks back I was working with Rowland to diagnose and
> repair some issues with the dynamic DHCP script that allows for DHCP
> to update AD DNS entries. This worked awesome until I started getting
> the static entries added, and flipped the switch.
>
> Once we attempted to move the new server into production things looked
> like it was working fine, but after about an hour we started getting
> reports of devices not working, looking deeper it was that these
> random devices around the network weren't being issued an IP address.
> We switched things back quickly and have since been working to
> reproduce the issue.
>
> We've finally reproduced it on a replication of our live network, and
> have figured out that it's being caused by the dyndhcp.sh script that
> updates AD DNS. More accurately after I removed the script hooks from
> /etc/dhcp/dhcpd.conf the devices we were testing with immediately
> grabbed an address. While manually running the script it took
> approximately 16 seconds to execute, which I'm theorizing is causing
> the devices to timeout on the current DHCP request, thus sending
> another one.
>
> My question boils down to this:
> Is there a way to asynchronisly run that script so that the DHCP
> server itself isn't being backed up with requests the devices won't
> acknowledge by the time it can answer? If not, can the script be
> optimized?
> Also, is the time it takes to execute correlated with the sheer amount
> of DNS entries/ReverseDNS entries we have? If this is the case, what
> can we do to ensure a scalable solution because our DNS entries will
> only continue to grow from here. A note on this since I started
> writing this post: I've skimmed the list of DNS entries to test the
> script execution time, and it still takes ~15 seconds to execute after
> skimming the dynamic clients (We have ~80+ static entries for devices
> that have multiple hostnames e.g pbx -> pbx1, but we also have pbx2
> that we move the IP of pbx incase of failover, etc.)
>
> Thanks for any advice and look forward to hearing other's thoughts on
> the matter!
>
> Regards,
> Ralph S.