Jonathan Barber
2007-Aug-07 16:22 UTC
[Fedora-directory-users] FDS SSL performance tuning query
Hello all, currently we have a FDS instance running on RHEL4 with a small number of entries (6,000), we also have a linux compute cluster of 100 nodes which uses LDAP for user account data (via libnss_ldap). nss_ldap on the cluster is configured to use SSL, and everything is fine most of the time. However, occasionally, when a large job is started on the cluster, the number of connections increases from 100/minute to 1600/minute (26/sec). This causes the server to become generally unresponsive, and FDS especially so (as judged by the time required to retrieve the DSE via TLS). Which is a right pain as it causes our samba PDC to timeout and everything goes wrong very quickly. I can reproducably, impact on FDS performance by running: $ getent passwd | cut -d: -f 1 | while read i; do id $i; done across the cluster. When SSL is off, the command to run fine and doesn''t impact on other searches. As a short term measure, we''ve disabled LDAPS on the cluster nodes, which is fine as users don''t log into them, but we had planned to expand the use of LDAP to cover more hosts (Macs and Linux) that require a confidential channal for authentication. So this experience is giving us some trepidation about moving forward with that plan. Our system is configured following the guidance of the wiki [0], with a maximum of 16834 available file descriptors and 50M of cache (more than enough to hold the DB) - and the ratio of cache hits/misses look good with little paging out. Running logconv.pl on the access logs doesn''t show any unindexed searches, so that isn''t an issue. Our server CPU is a 3Ghz Xeon with 1G of RAM, and looking at the performance of NSS 3.2 [1], I would expect the machine to be able to setup and tear down many more connections than we are currently seeing. Indeed, running the test described in [1] with the nss-3.11.4 binaries, I get over 1200 connections per second [2], so it certainly doesn''t seem to be a problem with NSS. This suggests to me that the problem lies in FDS somewhere. So, does anyone have any suggestions as to how to improve the SSL/TLS performance of FDS, or point me at tuning docs for the SSL side of FDS? Cheers. [0] http://directory.fedoraproject.org/wiki/Performance_Tuning [1] http://www.mozilla.org/projects/security/pki/nss/nss-3.2-performance-results [2] server$ ./selfserv -n "Server-Cert" -p 6000 client$ time ./strsclnt -p 6000 server -c 1000 strsclnt: -- SSL: Server Certificate Validated. strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable strsclnt: 999 cache hits; 1 cache misses, 0 cache not reusable real 0m0.605s user 0m0.795s sys 0m0.226s -- Jonathan Barber High Performance Computing Analyst Tel. +44 (0) 1382 386389
Richard Megginson
2007-Aug-07 16:26 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
Jonathan Barber wrote:> Hello all, currently we have a FDS instance running on RHEL4 with a > small number of entries (6,000), we also have a linux compute cluster of > 100 nodes which uses LDAP for user account data (via libnss_ldap). > > nss_ldap on the cluster is configured to use SSL, and everything is fine > most of the time. However, occasionally, when a large job is started on > the cluster, the number of connections increases from 100/minute to > 1600/minute (26/sec). > > This causes the server to become generally unresponsive, and FDS > especially so (as judged by the time required to retrieve the DSE via > TLS). Which is a right pain as it causes our samba PDC to timeout and > everything goes wrong very quickly. > > I can reproducably, impact on FDS performance by running: > $ getent passwd | cut -d: -f 1 | while read i; do id $i; done > > across the cluster. When SSL is off, the command to run fine and doesn''t > impact on other searches. > > As a short term measure, we''ve disabled LDAPS on the cluster nodes, > which is fine as users don''t log into them, but we had planned to expand > the use of LDAP to cover more hosts (Macs and Linux) that require a > confidential channal for authentication. So this experience is giving us > some trepidation about moving forward with that plan. > > Our system is configured following the guidance of the wiki [0], with a > maximum of 16834 available file descriptors and 50M of cache (more than > enough to hold the DB) - and the ratio of cache hits/misses look good > with little paging out. Running logconv.pl on the access logs doesn''t > show any unindexed searches, so that isn''t an issue. > > Our server CPU is a 3Ghz Xeon with 1G of RAM, and looking at the > performance of NSS 3.2 [1], I would expect the machine to be able to > setup and tear down many more connections than we are currently seeing. > Indeed, running the test described in [1] with the nss-3.11.4 binaries, > I get over 1200 connections per second [2], so it certainly doesn''t seem > to be a problem with NSS. > > This suggests to me that the problem lies in FDS somewhere. So, does > anyone have any suggestions as to how to improve the SSL/TLS performance > of FDS, or point me at tuning docs for the SSL side of FDS? >I don''t know. But opening and closing SSL connections is pretty expensive, with all of the TLS/SSL protocol operations. Is it possible you could configure the client machines to use LDAP (not LDAPS) and use the LDAP startTLS operation to start up the TLS session on the non-secure port? This might allow the server to process the connection + TLS session creation more efficiently.> Cheers. > > [0] http://directory.fedoraproject.org/wiki/Performance_Tuning > [1] http://www.mozilla.org/projects/security/pki/nss/nss-3.2-performance-results > [2] server$ ./selfserv -n "Server-Cert" -p 6000 > client$ time ./strsclnt -p 6000 server -c 1000 > strsclnt: -- SSL: Server Certificate Validated. > strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable > strsclnt: 999 cache hits; 1 cache misses, 0 cache not reusable > > real 0m0.605s > user 0m0.795s > sys 0m0.226s >
Jonathan Barber
2007-Aug-07 16:32 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
On Tue, Aug 07, 2007 at 05:22:19PM +0100, Jonathan Barber wrote:> Hello all, currently we have a FDS instance running on RHEL4 with a > small number of entries (6,000), we also have a linux compute cluster of > 100 nodes which uses LDAP for user account data (via libnss_ldap). > > nss_ldap on the cluster is configured to use SSL, and everything is fine > most of the time. However, occasionally, when a large job is started on > the cluster, the number of connections increases from 100/minute to > 1600/minute (26/sec). > > This causes the server to become generally unresponsive, and FDS > especially so (as judged by the time required to retrieve the DSE via > TLS). Which is a right pain as it causes our samba PDC to timeout and > everything goes wrong very quickly. > > I can reproducably, impact on FDS performance by running: > $ getent passwd | cut -d: -f 1 | while read i; do id $i; done > > across the cluster. When SSL is off, the command to run fine and doesn''t > impact on other searches.Having done a bit more digging, I discovered that whilst the cluster is banging on port 389, searches from other hosts to to 389 and TLS on port 389 are fine, but SSL on port 686 hangs. So it looks like it could be port specific? Very confused...> As a short term measure, we''ve disabled LDAPS on the cluster nodes, > which is fine as users don''t log into them, but we had planned to expand > the use of LDAP to cover more hosts (Macs and Linux) that require a > confidential channal for authentication. So this experience is giving us > some trepidation about moving forward with that plan. > > Our system is configured following the guidance of the wiki [0], with a > maximum of 16834 available file descriptors and 50M of cache (more than > enough to hold the DB) - and the ratio of cache hits/misses look good > with little paging out. Running logconv.pl on the access logs doesn''t > show any unindexed searches, so that isn''t an issue. > > Our server CPU is a 3Ghz Xeon with 1G of RAM, and looking at the > performance of NSS 3.2 [1], I would expect the machine to be able to > setup and tear down many more connections than we are currently seeing. > Indeed, running the test described in [1] with the nss-3.11.4 binaries, > I get over 1200 connections per second [2], so it certainly doesn''t seem > to be a problem with NSS. > > This suggests to me that the problem lies in FDS somewhere. So, does > anyone have any suggestions as to how to improve the SSL/TLS performance > of FDS, or point me at tuning docs for the SSL side of FDS? > > Cheers. > > [0] http://directory.fedoraproject.org/wiki/Performance_Tuning > [1] http://www.mozilla.org/projects/security/pki/nss/nss-3.2-performance-results > [2] server$ ./selfserv -n "Server-Cert" -p 6000 > client$ time ./strsclnt -p 6000 server -c 1000 > strsclnt: -- SSL: Server Certificate Validated. > strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable > strsclnt: 999 cache hits; 1 cache misses, 0 cache not reusable > > real 0m0.605s > user 0m0.795s > sys 0m0.226s > -- > Jonathan Barber > High Performance Computing Analyst > Tel. +44 (0) 1382 386389 > > -- > Fedora-directory-users mailing list > Fedora-directory-users@redhat.com > https://www.redhat.com/mailman/listinfo/fedora-directory-users-- Jonathan Barber High Performance Computing Analyst Tel. +44 (0) 1382 386389
Rob Crittenden
2007-Aug-07 16:32 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
Jonathan Barber wrote:> Hello all, currently we have a FDS instance running on RHEL4 with a > small number of entries (6,000), we also have a linux compute cluster of > 100 nodes which uses LDAP for user account data (via libnss_ldap).SNIP> [0] http://directory.fedoraproject.org/wiki/Performance_Tuning > [1] http://www.mozilla.org/projects/security/pki/nss/nss-3.2-performance-results > [2] server$ ./selfserv -n "Server-Cert" -p 6000 > client$ time ./strsclnt -p 6000 server -c 1000 > strsclnt: -- SSL: Server Certificate Validated. > strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable > strsclnt: 999 cache hits; 1 cache misses, 0 cache not reusable > > real 0m0.605s > user 0m0.795s > sys 0m0.226sYour SSL test is probably not representative of the real world. It did just one full handshake. You may want to look at the -P and -N options of strsclnt. It may be that each getent is doing a full handshake. rob
Jonathan Barber
2007-Aug-07 16:38 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
On Tue, Aug 07, 2007 at 10:26:46AM -0600, Richard Megginson wrote:> Jonathan Barber wrote: > >Hello all, currently we have a FDS instance running on RHEL4 with a > >small number of entries (6,000), we also have a linux compute cluster of > >100 nodes which uses LDAP for user account data (via libnss_ldap). > > > >nss_ldap on the cluster is configured to use SSL, and everything is fine > >most of the time. However, occasionally, when a large job is started on > >the cluster, the number of connections increases from 100/minute to > >1600/minute (26/sec). > > > >This causes the server to become generally unresponsive, and FDS > >especially so (as judged by the time required to retrieve the DSE via > >TLS). Which is a right pain as it causes our samba PDC to timeout and > >everything goes wrong very quickly. > > > >I can reproducably, impact on FDS performance by running: > >$ getent passwd | cut -d: -f 1 | while read i; do id $i; done > > > >across the cluster. When SSL is off, the command to run fine and doesn''t > >impact on other searches. > > > >As a short term measure, we''ve disabled LDAPS on the cluster nodes, > >which is fine as users don''t log into them, but we had planned to expand > >the use of LDAP to cover more hosts (Macs and Linux) that require a > >confidential channal for authentication. So this experience is giving us > >some trepidation about moving forward with that plan. > > > >Our system is configured following the guidance of the wiki [0], with a > >maximum of 16834 available file descriptors and 50M of cache (more than > >enough to hold the DB) - and the ratio of cache hits/misses look good > >with little paging out. Running logconv.pl on the access logs doesn''t > >show any unindexed searches, so that isn''t an issue. > > > >Our server CPU is a 3Ghz Xeon with 1G of RAM, and looking at the > >performance of NSS 3.2 [1], I would expect the machine to be able to > >setup and tear down many more connections than we are currently seeing. > >Indeed, running the test described in [1] with the nss-3.11.4 binaries, > >I get over 1200 connections per second [2], so it certainly doesn''t seem > >to be a problem with NSS. > > > >This suggests to me that the problem lies in FDS somewhere. So, does > >anyone have any suggestions as to how to improve the SSL/TLS performance > >of FDS, or point me at tuning docs for the SSL side of FDS? > > > > I don''t know. But opening and closing SSL connections is pretty > expensive, with all of the TLS/SSL protocol operations. Is it possible > you could configure the client machines to use LDAP (not LDAPS) and use > the LDAP startTLS operation to start up the TLS session on the > non-secure port? This might allow the server to process the connection > + TLS session creation more efficiently.I''ll give it a go and see how it works. I had assumed SSL would be less expensive than a start TLS. Do you have any benchmarks (even rough numbers) available as to how many connections FDS can copes with TLS/SSL vs. plain LDAP? I''ve read Howard Chu''s presentation (http://highlandsun.com/hyc/SambaXP.pdf) but it doesn''t compare against SSL, and I didn''t do any SSL benchmarks with FDS when I evaluted LDAP servers. I don''t have any real feeling as to how many TLS/SSL connection you get compared to plain TCP/IP. Ta.> >Cheers. > > > >[0] http://directory.fedoraproject.org/wiki/Performance_Tuning > >[1] > >http://www.mozilla.org/projects/security/pki/nss/nss-3.2-performance-results > >[2] server$ ./selfserv -n "Server-Cert" -p 6000 > > client$ time ./strsclnt -p 6000 server -c 1000 > > strsclnt: -- SSL: Server Certificate Validated. > > strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable > > strsclnt: 999 cache hits; 1 cache misses, 0 cache not reusable > > > > real 0m0.605s > > user 0m0.795s > > sys 0m0.226s > > >> -- > Fedora-directory-users mailing list > Fedora-directory-users@redhat.com > https://www.redhat.com/mailman/listinfo/fedora-directory-users-- Jonathan Barber High Performance Computing Analyst Tel. +44 (0) 1382 386389
Richard Megginson
2007-Aug-07 16:43 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
Jonathan Barber wrote:> On Tue, Aug 07, 2007 at 10:26:46AM -0600, Richard Megginson wrote: > >> Jonathan Barber wrote: >> >>> Hello all, currently we have a FDS instance running on RHEL4 with a >>> small number of entries (6,000), we also have a linux compute cluster of >>> 100 nodes which uses LDAP for user account data (via libnss_ldap). >>> >>> nss_ldap on the cluster is configured to use SSL, and everything is fine >>> most of the time. However, occasionally, when a large job is started on >>> the cluster, the number of connections increases from 100/minute to >>> 1600/minute (26/sec). >>> >>> This causes the server to become generally unresponsive, and FDS >>> especially so (as judged by the time required to retrieve the DSE via >>> TLS). Which is a right pain as it causes our samba PDC to timeout and >>> everything goes wrong very quickly. >>> >>> I can reproducably, impact on FDS performance by running: >>> $ getent passwd | cut -d: -f 1 | while read i; do id $i; done >>> >>> across the cluster. When SSL is off, the command to run fine and doesn''t >>> impact on other searches. >>> >>> As a short term measure, we''ve disabled LDAPS on the cluster nodes, >>> which is fine as users don''t log into them, but we had planned to expand >>> the use of LDAP to cover more hosts (Macs and Linux) that require a >>> confidential channal for authentication. So this experience is giving us >>> some trepidation about moving forward with that plan. >>> >>> Our system is configured following the guidance of the wiki [0], with a >>> maximum of 16834 available file descriptors and 50M of cache (more than >>> enough to hold the DB) - and the ratio of cache hits/misses look good >>> with little paging out. Running logconv.pl on the access logs doesn''t >>> show any unindexed searches, so that isn''t an issue. >>> >>> Our server CPU is a 3Ghz Xeon with 1G of RAM, and looking at the >>> performance of NSS 3.2 [1], I would expect the machine to be able to >>> setup and tear down many more connections than we are currently seeing. >>> Indeed, running the test described in [1] with the nss-3.11.4 binaries, >>> I get over 1200 connections per second [2], so it certainly doesn''t seem >>> to be a problem with NSS. >>> >>> This suggests to me that the problem lies in FDS somewhere. So, does >>> anyone have any suggestions as to how to improve the SSL/TLS performance >>> of FDS, or point me at tuning docs for the SSL side of FDS? >>> >>> >> I don''t know. But opening and closing SSL connections is pretty >> expensive, with all of the TLS/SSL protocol operations. Is it possible >> you could configure the client machines to use LDAP (not LDAPS) and use >> the LDAP startTLS operation to start up the TLS session on the >> non-secure port? This might allow the server to process the connection >> + TLS session creation more efficiently. >> > > I''ll give it a go and see how it works. I had assumed SSL would be less > expensive than a start TLS. >It might be in toto, but if you use startTLS you at least spread out the expense, create TCP connection and resources first, then TLS handshake and session resources.> Do you have any benchmarks (even rough numbers) available as to how many > connections FDS can copes with TLS/SSL vs. plain LDAP? I''ve read Howard > Chu''s presentation (http://highlandsun.com/hyc/SambaXP.pdf) but it > doesn''t compare against SSL, and I didn''t do any SSL benchmarks with FDS > when I evaluted LDAP servers. I don''t have any real feeling as to how > many TLS/SSL connection you get compared to plain TCP/IP. >We don''t have anything like that.> Ta. > > >>> Cheers. >>> >>> [0] http://directory.fedoraproject.org/wiki/Performance_Tuning >>> [1] >>> http://www.mozilla.org/projects/security/pki/nss/nss-3.2-performance-results >>> [2] server$ ./selfserv -n "Server-Cert" -p 6000 >>> client$ time ./strsclnt -p 6000 server -c 1000 >>> strsclnt: -- SSL: Server Certificate Validated. >>> strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable >>> strsclnt: 999 cache hits; 1 cache misses, 0 cache not reusable >>> >>> real 0m0.605s >>> user 0m0.795s >>> sys 0m0.226s >>> >>> > > > > >> -- >> Fedora-directory-users mailing list >> Fedora-directory-users@redhat.com >> https://www.redhat.com/mailman/listinfo/fedora-directory-users >> > > >
Jonathan Barber
2007-Aug-07 16:44 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
On Tue, Aug 07, 2007 at 12:32:54PM -0400, Rob Crittenden wrote:> Jonathan Barber wrote: > >Hello all, currently we have a FDS instance running on RHEL4 with a > >small number of entries (6,000), we also have a linux compute cluster of > >100 nodes which uses LDAP for user account data (via libnss_ldap). > > SNIP > > >[0] http://directory.fedoraproject.org/wiki/Performance_Tuning > >[1] > >http://www.mozilla.org/projects/security/pki/nss/nss-3.2-performance-results > >[2] server$ ./selfserv -n "Server-Cert" -p 6000 > > client$ time ./strsclnt -p 6000 server -c 1000 > > strsclnt: -- SSL: Server Certificate Validated. > > strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable > > strsclnt: 999 cache hits; 1 cache misses, 0 cache not reusable > > > > real 0m0.605s > > user 0m0.795s > > sys 0m0.226s > > Your SSL test is probably not representative of the real world. It did > just one full handshake. You may want to look at the -P and -N options > of strsclnt. It may be that each getent is doing a full handshake.I considered that, but I have the situation where the server is being bogged down instead of the clients, and I don''t use client certs to auth. So as I understand it, the server doesn''t do any validation and the burden should be on the client and not the server. Additionally, I have "tls_checkpeer no" set on my client''s nss_ldap config. Running the same test with the -N option on strsclnt took ~30 seconds.> rob> -- > Fedora-directory-users mailing list > Fedora-directory-users@redhat.com > https://www.redhat.com/mailman/listinfo/fedora-directory-users-- Jonathan Barber High Performance Computing Analyst Tel. +44 (0) 1382 386389
Satish Chetty
2007-Aug-07 21:44 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
Jonathan Barber wrote:> Hello all, currently we have a FDS instance running on RHEL4 with a > small number of entries (6,000), we also have a linux compute cluster of > 100 nodes which uses LDAP for user account data (via libnss_ldap). > > nss_ldap on the cluster is configured to use SSL, and everything is fine > most of the time. However, occasionally, when a large job is started on > the cluster, the number of connections increases from 100/minute to > 1600/minute (26/sec). > > This causes the server to become generally unresponsive, and FDS > especially so (as judged by the time required to retrieve the DSE via > TLS). Which is a right pain as it causes our samba PDC to timeout and > everything goes wrong very quickly. > > I can reproducably, impact on FDS performance by running: > $ getent passwd | cut -d: -f 1 | while read i; do id $i; doneWhen you are seeing slow performance on the SSL port, are you also seeing slow performance on 389 (non SSL port) too at the same time? (ldapsearch for objectclass=posixaccount) Also, is it possible that the handshakes and subsequent data transfer is taking time that the OS queues up the requests? How about the performance when you do getent from localhost vs getent from the client machine? -Satish.> > across the cluster. When SSL is off, the command to run fine and doesn''t > impact on other searches. > > As a short term measure, we''ve disabled LDAPS on the cluster nodes, > which is fine as users don''t log into them, but we had planned to expand > the use of LDAP to cover more hosts (Macs and Linux) that require a > confidential channal for authentication. So this experience is giving us > some trepidation about moving forward with that plan. > > Our system is configured following the guidance of the wiki [0], with a > maximum of 16834 available file descriptors and 50M of cache (more than > enough to hold the DB) - and the ratio of cache hits/misses look good > with little paging out. Running logconv.pl on the access logs doesn''t > show any unindexed searches, so that isn''t an issue. > > Our server CPU is a 3Ghz Xeon with 1G of RAM, and looking at the > performance of NSS 3.2 [1], I would expect the machine to be able to > setup and tear down many more connections than we are currently seeing. > Indeed, running the test described in [1] with the nss-3.11.4 binaries, > I get over 1200 connections per second [2], so it certainly doesn''t seem > to be a problem with NSS. > > This suggests to me that the problem lies in FDS somewhere. So, does > anyone have any suggestions as to how to improve the SSL/TLS performance > of FDS, or point me at tuning docs for the SSL side of FDS? > > Cheers. > > [0] http://directory.fedoraproject.org/wiki/Performance_Tuning > [1] http://www.mozilla.org/projects/security/pki/nss/nss-3.2-performance-results > [2] server$ ./selfserv -n "Server-Cert" -p 6000 > client$ time ./strsclnt -p 6000 server -c 1000 > strsclnt: -- SSL: Server Certificate Validated. > strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable > strsclnt: 999 cache hits; 1 cache misses, 0 cache not reusable > > real 0m0.605s > user 0m0.795s > sys 0m0.226s
David Boreham
2007-Aug-08 00:22 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
>> >> >> I can reproducably, impact on FDS performance by running: >> $ getent passwd | cut -d: -f 1 | while read i; do id $i; doneSince you can reproduce the syndrome at will, and in a steady state, just run that command above and then go run ''pstack <fds_pid>'' on the server machine. That''ll dump the thread stacks and tell you where the server is spending its time (assuming, which I am, that we''ve already established that the server is for some reason burning CPU when subjected to this load).
Andrey Ivanov
2007-Aug-08 07:26 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
Hi, JB> Hello all, currently we have a FDS instance running on RHEL4 with a JB> small number of entries (6,000), we also have a linux compute cluster of JB> 100 nodes which uses LDAP for user account data (via libnss_ldap). JB> nss_ldap on the cluster is configured to use SSL, and everything is fine JB> most of the time. However, occasionally, when a large job is started on JB> the cluster, the number of connections increases from 100/minute to JB> 1600/minute (26/sec). JB> This causes the server to become generally unresponsive, and FDS JB> especially so (as judged by the time required to retrieve the DSE via JB> TLS). Which is a right pain as it causes our samba PDC to timeout and JB> everything goes wrong very quickly. JB> I can reproducably, impact on FDS performance by running: JB> $ getent passwd | cut -d: -f 1 | while read i; do id $i; done To reduce substantially the number of LDAP (or NIS) requests we use the nscd (Name Service Caching Daemon). The result is that the number of LDAP requests is decreased easily by one order of magnitude... Give it a try and tune the /etc/nscd.conf :) Andrey Ivanov tel +33-(0)1-69-33-99-24 fax +33-(0)1-69-33-99-55 Direction des Systemes d''Information Ecole Polytechnique 91128 Palaiseau CEDEX France
David Bogen
2007-Aug-08 15:38 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
We use SSL connections (LDAPS) almost exclusively and have easily handled over 7000 SSL connections per minute without extensive tuning of FDS. That particular server is a RHEL4 box running an AMD Opteron with 4GB of RAM. Even a crusty old PIII (1.2Ghz) running RHEL3 has handled over 1000 SSL connections per minute from a high-performance cluster, though I suspect that the upper limit of that system isn''t too far above that number and we are moving beyond it to another 64-bit system. Our experience has shown start_tls to be noticeably slower than straight ssl; slow enough that the difference is noticeable to people and not just to measurements. I would recommend going with straight SSL and not messing around with start_tls. If your connections are limited at 1600/minute I wonder if you aren''t perhaps hitting a limitation elsewhere in your system as our experience seems to indicate that FDS can handle the load you are throwing at it. David -- David Bogen :: (608) 263-0168 Unix SysAdmin :: IceCube Project david.bogen@icecube.wisc.edu
David Boreham
2007-Aug-08 15:59 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
David Bogen wrote:> Our experience has shown start_tls to be noticeably slower than straight > ssl; slow enough that the difference is noticeable to people and not > just to measurements. I would recommend going with straight SSL and not > messing around with start_tlsInteresting observation given that the code path is essentially identical.
Jonathan Barber
2007-Aug-09 07:27 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
On Wed, Aug 08, 2007 at 10:38:58AM -0500, David Bogen wrote:> We use SSL connections (LDAPS) almost exclusively and have easily > handled over 7000 SSL connections per minute without extensive tuning of > FDS. That particular server is a RHEL4 box running an AMD Opteron with > 4GB of RAM. > > Even a crusty old PIII (1.2Ghz) running RHEL3 has handled over 1000 SSL > connections per minute from a high-performance cluster, though I suspect > that the upper limit of that system isn''t too far above that number and > we are moving beyond it to another 64-bit system. > > Our experience has shown start_tls to be noticeably slower than straight > ssl; slow enough that the difference is noticeable to people and not > just to measurements. I would recommend going with straight SSL and not > messing around with start_tls. > > If your connections are limited at 1600/minute I wonder if you aren''t > perhaps hitting a limitation elsewhere in your system as our experience > seems to indicate that FDS can handle the load you are throwing at it.That was with just one of the NSS strsclnt clients running, with two clients (from different hosts to a third server) I get 2000/s aggregate. So I guess I''m reaching an upper bound on the client. To add another data point to the mess, I now have a situation where if the cluster libnss_ldap is talking plain LDAP, the LDAP and TLS works fine, but I can see hangs in the LDAPS. Using the openldap ldapsearch client with debugging turned on with function tracing (-d 1), shows blocking at the point where the client says: ... ... TLS trace: SSL_connect:SSLv2/v3 write client hello A the next message being (immediately after I stop hammering the LDAP): TLS trace: SSL_connect:SSLv3 read server hello A ... ... Which is just weird. LDAP/TLS works fine during this period. However, the cluster has been more heavily loaded during this period of testing, so it''s entirely possible that I''m not being able to generate enough LDAP requests to provoke the behaviour I was seeing the other day. It''s encouraging to hear that it is possible to get more SSL/TLS connections though. David Boreham is correct in the assumption that the LDAP server CPU is pegged at 99%, when this is all going on. Running pstack hasn''t got me anything yet, as it just shows gdb running at 99% CPU and the LDAP goes unresponsive and needs to be restarted. Given that this is a production service - I''m somewhat adverse to leaving it in that state for too long :) Setting up a test environment is something that''s going to me a while to get together, as I have training and planned work over the next couple of weeks. I''ll try and report back any progress when I make any. Thanks for everyone''s suggestions.> David > > -- > David Bogen :: (608) 263-0168 > Unix SysAdmin :: IceCube Project > david.bogen@icecube.wisc.edu > > -- > Fedora-directory-users mailing list > Fedora-directory-users@redhat.com > https://www.redhat.com/mailman/listinfo/fedora-directory-users-- Jonathan Barber High Performance Computing Analyst Tel. +44 (0) 1382 386389
Jonathan Barber
2007-Aug-09 07:34 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
On Wed, Aug 08, 2007 at 09:26:08AM +0200, Andrey Ivanov wrote:> Hi, > > > JB> Hello all, currently we have a FDS instance running on RHEL4 with a > JB> small number of entries (6,000), we also have a linux compute cluster of > JB> 100 nodes which uses LDAP for user account data (via libnss_ldap). > > JB> nss_ldap on the cluster is configured to use SSL, and everything is fine > JB> most of the time. However, occasionally, when a large job is started on > JB> the cluster, the number of connections increases from 100/minute to > JB> 1600/minute (26/sec). > > JB> This causes the server to become generally unresponsive, and FDS > JB> especially so (as judged by the time required to retrieve the DSE via > JB> TLS). Which is a right pain as it causes our samba PDC to timeout and > JB> everything goes wrong very quickly. > > JB> I can reproducably, impact on FDS performance by running: > JB> $ getent passwd | cut -d: -f 1 | while read i; do id $i; done > To reduce substantially the number of LDAP (or NIS) requests we use > the nscd (Name Service Caching Daemon). The result is that the number > of LDAP requests is decreased easily by one order of magnitude... Give > it a try and tune the /etc/nscd.conf :)I have considered nscd, but I''ve had bad experiances with it in the past when we ran NIS - usually due to entries being changed and the new entry then not being seen on the clients at the same time. We could avoid that problem by setting the nscd time out low enough but it''s another piece of client config that I''d rather avoid.> Andrey Ivanov > tel +33-(0)1-69-33-99-24 > fax +33-(0)1-69-33-99-55 > > Direction des Systemes d''Information > Ecole Polytechnique > 91128 Palaiseau CEDEX > France >-- Jonathan Barber High Performance Computing Analyst Tel. +44 (0) 1382 386389
Gordon Messmer
2007-Aug-15 17:44 UTC
Re: [Fedora-directory-users] FDS SSL performance tuning query
Jonathan Barber wrote:> Hello all, currently we have a FDS instance running on RHEL4 with a > small number of entries (6,000), we also have a linux compute cluster of > 100 nodes which uses LDAP for user account data (via libnss_ldap). > > nss_ldap on the cluster is configured to use SSL, and everything is fine > most of the time. However, occasionally, when a large job is started on > the cluster, the number of connections increases from 100/minute to > 1600/minute (26/sec).Use nscd. I know you said that you''d rather avoid it, but the performance penalty of LDAP without NSCD is significant, to say the least. Even plain LDAP is terribly expensive, since each process that needs NSS info will go through the connection process for its lookups. At best that means that you''ll see a high rate of connections on your directory server, which will drive its load up. Worse, you''re likely to see the connection rate kept very high *and* a large number of open connections on all of your hosts (since each process that does a lookup will keep its connection open). Using nscd, most lookups will be done locally, and connections will be pooled. Lookup latency will be reduced on your clients, and both connection rate and the number of open connections on your server will diminish greatly. The cost of nscd is that data may not update on your clients immediately, but how often are you going to change a user''s uidNumber or homeDirectory? I''ve never seen the cost of nscd outweigh the tremendous benefits.> I can reproducably, impact on FDS performance by running: > $ getent passwd | cut -d: -f 1 | while read i; do id $i; doneWell, yes. If you''re doing roughly 6000 connections over SSL on 100 machines concurrently, it''s going to impact performance in a bad way. If you only get 30 SSL connections per second (as seems likely given that 1000 connections takes ~30 seconds, per a later message in this thread), there may be a flaw in FDS. I haven''t tested its SSL connection rate personally, so I have nothing against which to compare. (Since I use Kerberos for authentication, I don''t use SSL.) In any case, I''d expect that job to take more than 30 minutes. If you want to increase SSL connections per second, you can use an SSL accelerator to proxy the SSL connections.> Our system is configured following the guidance of the wiki [0]> [0] http://directory.fedoraproject.org/wiki/Performance_Tuning That document *sucks*. First, it sucks because it is terribly incomplete, leaving out some rather critical tuning parameters. Second, it sucks because it mixes OS tuning and directory server tuning with little discussion of what each change accomplishes, and how to correlate that with expected use. Third, it sucks because the data is outdated and may actually affect performance *adversely* on Linux if followed completely. Fourth, it sucks because many of the settings recommended (specifically, the limits.conf and profile modifications) won''t affect the directory server at all. Fifth, it sucks because better documentation exists, and this document distracts users from it: http://www.redhat.com/docs/manuals/dir-server/ag/7.1/dsmanage.html Personally, I''d delete the wiki document entirely, and link directly to this chapter of the Admin Guide. (But I''d note that Sun''s documentation is much better at discussing exactly how to size the caches)> , with a > maximum of 16834 available file descriptors and 50M of cache (more than > enough to hold the DB) - and the ratio of cache hits/misses look good > with little paging out.That''s good, but since performance problems only show up with SSL connections, your database and entry caches probably aren''t the issue. (Another set of items the wiki doesn''t cover -- the difference between the database cache and the entry cache, and how to size each).