thr3ads.net - freebsd stable - Suspected mbuf leak with Nginx + sendfile + TLS in 12.2-STABLE [Feb 2021]

If this information is useful, please help other people find it:
Share via:

GomoR

2021-Feb-04 16:08 UTC

Suspected mbuf leak with Nginx + sendfile + TLS in 12.2-STABLE

Dear FreeBSD community,

we are encountering a DoS condition on our production machines.
Our use case is an Nginx reverse proxy serving large files via HTTPS.
This problem arose when switching kernel and userland from 12.1-RELEASE
to 12.2-RELEASE. Ports were not upgraded (at first).

Each time a user downloads a file, mbuf & mbuf_clusters are raising to
reach the maximum limit in a matter of seconds. Those values are
asserted by 'netstat -m' as follows:

Normal situation:

mbuf:                   256, 26031105,   16767,    5974,428087938,   0,  
  0
mbuf_cluster:          2048, 8135232,   18408,    2704,101644203,   0,   
0

Warning situtation:

mbuf:                   256, 26031105, 2981516,  151205,1109483561,   0, 
   0
mbuf_cluster:          2048, 8135232, 2983155,    4201,319714617,   0,   
0

We have seen a patch related to sendfile + KTLS + mbuf at the below link
and we updated to -STABLE to apply:

Don't transmit mbufs that aren't yet ready on TOE sockets.
This includes mbufs waiting for data from sendfile() I/O requests, or
mbufs awaiting encryption for KTLS.
https://github.com/freebsd/freebsd-src/commit/14c77f30b201bf76119d59678e72051c093333c2

Unfortunately for us, applying it didn't solve the issue.

When we stop the download early, mbufs are freed. But past a threshold,
we must reboot the server. The only remaining thing we can do is to
ping the server, it is no more possible to connect with SSH, for 
instance.

We also tried to set some loader.conf values which fixed nothing:

hw.ix.enable_msix=0
hw.pci.enable_msix=0
hw.pci.enable_msi=0
net.inet.tcp.tso=0
hw.ix.flow_control=0

We also updated Nginx & OpenSSL to latest versions and tried Nginx to
compile against FreeBSD shipped OpenSSL library. It did change nothing.

Versions:

openssl-1.1.1i,1
nginx-1.18.0_45,2

# ldd /usr/local/sbin/nginx
/usr/local/sbin/nginx:
         libcrypt.so.5 => /lib/libcrypt.so.5 (0x800323000)
         libpcre.so.1 => /usr/local/lib/libpcre.so.1 (0x800344000)
         libssl.so.11 => /usr/local/lib/libssl.so.11 (0x8003e7000)
         libcrypto.so.11 => /usr/local/lib/libcrypto.so.11 (0x80047e000)
         libz.so.6 => /lib/libz.so.6 (0x800772000)
         libc.so.7 => /lib/libc.so.7 (0x80078e000)
         libthr.so.3 => /lib/libthr.so.3 (0x800b84000)

NIC is:
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver>

What can we do to help you find the root cause?

Best regards,

P.S.: adding jhb@ in Cc from bapt@ suggestion

John Baldwin

2021-Feb-04 18:33 UTC

head link

Suspected mbuf leak with Nginx + sendfile + TLS in 12.2-STABLE

On 2/4/21 8:08 AM, GomoR wrote:> Dear FreeBSD community,
> 
> we are encountering a DoS condition on our production machines.
> Our use case is an Nginx reverse proxy serving large files via HTTPS.
> This problem arose when switching kernel and userland from 12.1-RELEASE
> to 12.2-RELEASE. Ports were not upgraded (at first).
> 
> Each time a user downloads a file, mbuf & mbuf_clusters are raising to
> reach the maximum limit in a matter of seconds. Those values are
> asserted by 'netstat -m' as follows:
> 
> Normal situation:
> 
> mbuf:                   256, 26031105,   16767,    5974,428087938,   0,
>    0
> mbuf_cluster:          2048, 8135232,   18408,    2704,101644203,   0,
> 0
> 
> Warning situtation:
> 
> mbuf:                   256, 26031105, 2981516,  151205,1109483561,   0,
>     0
> mbuf_cluster:          2048, 8135232, 2983155,    4201,319714617,   0,
> 0
> 
> We have seen a patch related to sendfile + KTLS + mbuf at the below link
> and we updated to -STABLE to apply:
None of the sendfile or KTLS changes from Netflix are in 12, they are only
in 13 and later.
> Don't transmit mbufs that aren't yet ready on TOE sockets.
> This includes mbufs waiting for data from sendfile() I/O requests, or
> mbufs awaiting encryption for KTLS.
>
https://github.com/freebsd/freebsd-src/commit/14c77f30b201bf76119d59678e72051c093333c2
This patch only applies to Chelsio T5/T6 NICs when using TOE (TCP offload)
and doesn't affect freeing mbufs, it just fixes a race when the NIC could
potentially send random garbage if it sends the mbuf before the scheduled
disk I/O to populate it with data from disk has completed.
> NIC is:
> ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver>
> 
> What can we do to help you find the root cause?
The first step I would do if possible would be to bisect between the last
known working version and the version that is known to be broken to
determine which commit introduced the problem.  One thing that could help
here is to see if you can reproduce the problem using a 12.2 kernel on a
12.1 world + ports.  If you can, then you can limit your bisecting to just
building new kernels which will make that process quicker.

You might also see if using a different NIC shows the same problem.  If
not, then it might point to a regression in the NIC driver (or perhaps in
iflib as ix uses iflib I believe).

-- 
John Baldwin

freebsd stable - Feb 2021 - Suspected mbuf leak with Nginx + sendfile + TLS in 12.2-STABLE

Suspected mbuf leak with Nginx + sendfile + TLS in 12.2-STABLE

Suspected mbuf leak with Nginx + sendfile + TLS in 12.2-STABLE