GomoR
2021-Feb-05 08:11 UTC
Suspected mbuf leak with Nginx + sendfile + TLS in 12.2-STABLE
On 2021-02-04 19:33, John Baldwin wrote:> None of the sendfile or KTLS changes from Netflix are in 12, they are > only > in 13 and later.I thought about that possibility, thank you for the clarification.>> Don't transmit mbufs that aren't yet ready on TOE sockets. >> This includes mbufs waiting for data from sendfile() I/O requests, or >> mbufs awaiting encryption for KTLS. >> https://github.com/freebsd/freebsd-src/commit/14c77f30b201bf76119d59678e72051c093333c2 > > This patch only applies to Chelsio T5/T6 NICs when using TOE (TCP > offload) > and doesn't affect freeing mbufs, it just fixes a race when the NIC > could > potentially send random garbage if it sends the mbuf before the > scheduled > disk I/O to populate it with data from disk has completed.Understood.>> NIC is: >> ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver> >> >> What can we do to help you find the root cause? > > The first step I would do if possible would be to bisect between the > last > known working version and the version that is known to be broken to > determine which commit introduced the problem. One thing that could > help > here is to see if you can reproduce the problem using a 12.2 kernel on > a > 12.1 world + ports. If you can, then you can limit your bisecting to > just > building new kernels which will make that process quicker.Thank you for the tip, I'll try that path and let you know.> You might also see if using a different NIC shows the same problem. If > not, then it might point to a regression in the NIC driver (or perhaps > in > iflib as ix uses iflib I believe).Unfortunately, not a possibility here. I did some other tests and found where the problem arise. In fact, we use proxy_pass directive within Nginx and the network flow goes through one public interface (ix0) and proxy_pass through a second (ix1) towards a remote machine. Changing the Nginx configuration to only go through ix0 does not cause the issue. So that's something about with passing packets between 2 NICs. I'll keep you posted. Regards,
GomoR
2021-Feb-05 10:54 UTC
Suspected mbuf leak with Nginx + sendfile + TLS in 12.2-STABLE
On 2021-02-05 09:11, GomoR wrote:>> The first step I would do if possible would be to bisect between the >> last >> known working version and the version that is known to be broken to >> determine which commit introduced the problem. One thing that could >> help >> here is to see if you can reproduce the problem using a 12.2 kernel on >> a >> 12.1 world + ports. If you can, then you can limit your bisecting to >> just >> building new kernels which will make that process quicker.We have reinstalled from scratch our system with FreeBSD 12.1-RELEASE. We then have installed just enough of our software stack to reproduce the issue. No problem with a stock 12.1-RELEASE kernel, but problem arise after installkernel with the latest 12.2-STABLE. We then turned off all our customizations, including some specific sysctl.conf values. The bug didn't triggered. After dissecting our sysctl values, the faulty one has been identified: kern.ipc.maxsockbuf=157286400 This value is 75 times the default value (2097152). Restoring the default value fixes the issue. After some tests, the bug is triggered starting somewhere to 64 times the default value (134217728). There was no issue with this setting in 12.1-RELEASE, but there is in 12.2-RELEASE. Do you have some insights onto why it causes that mbuf problems? In the meantime, we have our solution, but we are willing to help identify if that's a kernel bug or just a real bad idea to set maxsockbuf to such a high value. Best regards,