thr3ads.net - Fedora directory users - [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Richard Hesse

2008-Feb-12 03:23 UTC

[Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Started to play with FDS 1.1 for some dogfood testing. After running for 10-15
minutes, the server stopped responding to network requests and went silent. The
process was running, the error log was updating with the ldbm event loop, but no
socket requests were fulfilled. Checking the access log, I saw this:

[12/Feb/2008:01:47:58 +0000] conn=71108 op=-1 fd=79 closed error 107 (Transport
endpoint is not connected) - Network file descriptor is not connected.
[12/Feb/2008:01:47:59 +0000] conn=71007 op=60 fd=69 closed - B4
[12/Feb/2008:01:48:00 +0000] conn=71003 op=48 fd=68 closed - B4
[12/Feb/2008:01:48:01 +0000] conn=71017 op=47 fd=72 closed - B4
[12/Feb/2008:01:48:06 +0000] conn=71102 op=2 fd=66 closed - B4
[12/Feb/2008:01:48:07 +0000] conn=71103 op=2 fd=70 closed - B4
[12/Feb/2008:01:48:07 +0000] conn=71040 op=10 fd=76 closed - B4

Any ideas or suggestions on how to approach troubleshooting this issue would be
greatly appreciated.

Thanks.

-richard

Richard Megginson

2008-Feb-12 03:43 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Richard Hesse wrote:> Started to play with FDS 1.1 for some dogfood testing. After running for
10-15 minutes, the server stopped responding to network requests and went
silent. The process was running, the error log was updating with the ldbm event
loop, but no socket requests were fulfilled. Checking the access log, I saw
this:
>
> [12/Feb/2008:01:47:58 +0000] conn=71108 op=-1 fd=79 closed error 107
(Transport endpoint is not connected) - Network file descriptor is not
connected.
> [12/Feb/2008:01:47:59 +0000] conn=71007 op=60 fd=69 closed - B4
> [12/Feb/2008:01:48:00 +0000] conn=71003 op=48 fd=68 closed - B4
> [12/Feb/2008:01:48:01 +0000] conn=71017 op=47 fd=72 closed - B4
> [12/Feb/2008:01:48:06 +0000] conn=71102 op=2 fd=66 closed - B4
> [12/Feb/2008:01:48:07 +0000] conn=71103 op=2 fd=70 closed - B4
> [12/Feb/2008:01:48:07 +0000] conn=71040 op=10 fd=76 closed - B4
>
> Any ideas or suggestions on how to approach troubleshooting this issue
would be greatly appreciated.
>   B4 means SLAPD_DISCONNECT_BER_FLUSH - this usually means the client has 
reset or closed the connection while the server was attempting to send a 
response.

http://www.redhat.com/docs/manuals/dir-server/cli/8.0/Configuration_Command_File_Reference-Access_Log_and_Connection_Code_Reference-Common_Connection_Codes.html

Do you have a firewall or some other network device?> Thanks.
>
> -richard
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Richard Hesse

2008-Feb-12 18:44 UTC

head link

RE: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

There''s a load balancer acting as the client to the DS (proxying client
requests). I think that''s a red herring though. Any search requests
sent directly to the DS, bypassing the LB, would fail. I think I even tried
requests locally from the server and they still failed. I can''t be sure
about that last statement, it was a long day.

What about the network file descriptor is not connected error?

Thanks.

-richard

-----Original Message-----
From: fedora-directory-users-bounces@redhat.com
[mailto:fedora-directory-users-bounces@redhat.com] On Behalf Of Richard
Megginson
Sent: Monday, February 11, 2008 7:43 PM
To: General discussion list for the Fedora Directory server project.
Subject: Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not
connected

Richard Hesse wrote:> Started to play with FDS 1.1 for some dogfood testing. After running for
10-15 minutes, the server stopped responding to network requests and went
silent. The process was running, the error log was updating with the ldbm event
loop, but no socket requests were fulfilled. Checking the access log, I saw
this:
>
> [12/Feb/2008:01:47:58 +0000] conn=71108 op=-1 fd=79 closed error 107
(Transport endpoint is not connected) - Network file descriptor is not
connected.
> [12/Feb/2008:01:47:59 +0000] conn=71007 op=60 fd=69 closed - B4
> [12/Feb/2008:01:48:00 +0000] conn=71003 op=48 fd=68 closed - B4
> [12/Feb/2008:01:48:01 +0000] conn=71017 op=47 fd=72 closed - B4
> [12/Feb/2008:01:48:06 +0000] conn=71102 op=2 fd=66 closed - B4
> [12/Feb/2008:01:48:07 +0000] conn=71103 op=2 fd=70 closed - B4
> [12/Feb/2008:01:48:07 +0000] conn=71040 op=10 fd=76 closed - B4
>
> Any ideas or suggestions on how to approach troubleshooting this issue
would be greatly appreciated.
>B4 means SLAPD_DISCONNECT_BER_FLUSH - this usually means the client has reset or
closed the connection while the server was attempting to send a response.

http://www.redhat.com/docs/manuals/dir-server/cli/8.0/Configuration_Command_File_Reference-Access_Log_and_Connection_Code_Reference-Common_Connection_Codes.html

Do you have a firewall or some other network device?> Thanks.
>
> -richard
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Richard Megginson

2008-Feb-12 20:32 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Richard Hesse wrote:> There''s a load balancer acting as the client to the DS (proxying
client requests). I think that''s a red herring though. Any search
requests sent directly to the DS, bypassing the LB, would fail. I think I even
tried requests locally from the server and they still failed. I can''t
be sure about that last statement, it was a long day.
>   What are all of these closed connections from? e.g. conn=71007, 
conn=71003, etc.?  Are they from the load balancer?

I''m not really sure how to proceed to diagnose this from the directory 
server because events like these usually indicate something is happening 
at the TCP/IP layer.

I would be really interested to see if you continued to have problems if 
you shut off the load balancer completely and just contacted the 
directory server via the loopback interface.> What about the network file descriptor is not connected error?
>   It''s similar to the B4 - it means there was a problem with the 
connection to the client.> Thanks.
>
> -richard
>
> -----Original Message-----
> From: fedora-directory-users-bounces@redhat.com
[mailto:fedora-directory-users-bounces@redhat.com] On Behalf Of Richard
Megginson
> Sent: Monday, February 11, 2008 7:43 PM
> To: General discussion list for the Fedora Directory server project.
> Subject: Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not
connected
>
> Richard Hesse wrote:
>   
>> Started to play with FDS 1.1 for some dogfood testing. After running
for 10-15 minutes, the server stopped responding to network requests and went
silent. The process was running, the error log was updating with the ldbm event
loop, but no socket requests were fulfilled. Checking the access log, I saw
this:
>>
>> [12/Feb/2008:01:47:58 +0000] conn=71108 op=-1 fd=79 closed error 107
(Transport endpoint is not connected) - Network file descriptor is not
connected.
>> [12/Feb/2008:01:47:59 +0000] conn=71007 op=60 fd=69 closed - B4
>> [12/Feb/2008:01:48:00 +0000] conn=71003 op=48 fd=68 closed - B4
>> [12/Feb/2008:01:48:01 +0000] conn=71017 op=47 fd=72 closed - B4
>> [12/Feb/2008:01:48:06 +0000] conn=71102 op=2 fd=66 closed - B4
>> [12/Feb/2008:01:48:07 +0000] conn=71103 op=2 fd=70 closed - B4
>> [12/Feb/2008:01:48:07 +0000] conn=71040 op=10 fd=76 closed - B4
>>
>> Any ideas or suggestions on how to approach troubleshooting this issue
would be greatly appreciated.
>>
>>     
> B4 means SLAPD_DISCONNECT_BER_FLUSH - this usually means the client has
reset or closed the connection while the server was attempting to send a
response.
>
>
http://www.redhat.com/docs/manuals/dir-server/cli/8.0/Configuration_Command_File_Reference-Access_Log_and_Connection_Code_Reference-Common_Connection_Codes.html
>
> Do you have a firewall or some other network device?
>   
>> Thanks.
>>
>> -richard
>>
>> --
>> Fedora-directory-users mailing list
>> Fedora-directory-users@redhat.com
>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>
>>     
>
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Richard Hesse

2008-Feb-14 19:21 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Actually, it ends up that debug logging was putting too much disk load on the
server and the process fell behind/stopped servicing socket requests. Thanks for
your help Richard.

-richard


On 2/12/08 12:32 PM, "Richard Megginson" <rmeggins@redhat.com>
wrote:

Richard Hesse wrote:> There''s a load balancer acting as the client to the DS (proxying
client requests). I think that''s a red herring though. Any search
requests sent directly to the DS, bypassing the LB, would fail. I think I even
tried requests locally from the server and they still failed. I can''t
be sure about that last statement, it was a long day.
>What are all of these closed connections from? e.g. conn=71007,
conn=71003, etc.?  Are they from the load balancer?

I''m not really sure how to proceed to diagnose this from the directory
server because events like these usually indicate something is happening
at the TCP/IP layer.

I would be really interested to see if you continued to have problems if
you shut off the load balancer completely and just contacted the
directory server via the loopback interface.> What about the network file descriptor is not connected error?
>It''s similar to the B4 - it means there was a problem with the
connection to the client.> Thanks.
>
> -richard
>
> -----Original Message-----
> From: fedora-directory-users-bounces@redhat.com
[mailto:fedora-directory-users-bounces@redhat.com] On Behalf Of Richard
Megginson
> Sent: Monday, February 11, 2008 7:43 PM
> To: General discussion list for the Fedora Directory server project.
> Subject: Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not
connected
>
> Richard Hesse wrote:
>
>> Started to play with FDS 1.1 for some dogfood testing. After running
for 10-15 minutes, the server stopped responding to network requests and went
silent. The process was running, the error log was updating with the ldbm event
loop, but no socket requests were fulfilled. Checking the access log, I saw
this:
>>
>> [12/Feb/2008:01:47:58 +0000] conn=71108 op=-1 fd=79 closed error 107
(Transport endpoint is not connected) - Network file descriptor is not
connected.
>> [12/Feb/2008:01:47:59 +0000] conn=71007 op=60 fd=69 closed - B4
>> [12/Feb/2008:01:48:00 +0000] conn=71003 op=48 fd=68 closed - B4
>> [12/Feb/2008:01:48:01 +0000] conn=71017 op=47 fd=72 closed - B4
>> [12/Feb/2008:01:48:06 +0000] conn=71102 op=2 fd=66 closed - B4
>> [12/Feb/2008:01:48:07 +0000] conn=71103 op=2 fd=70 closed - B4
>> [12/Feb/2008:01:48:07 +0000] conn=71040 op=10 fd=76 closed - B4
>>
>> Any ideas or suggestions on how to approach troubleshooting this issue
would be greatly appreciated.
>>
>>
> B4 means SLAPD_DISCONNECT_BER_FLUSH - this usually means the client has
reset or closed the connection while the server was attempting to send a
response.
>
>
http://www.redhat.com/docs/manuals/dir-server/cli/8.0/Configuration_Command_File_Reference-Access_Log_and_Connection_Code_Reference-Common_Connection_Codes.html
>
> Do you have a firewall or some other network device?
>
>> Thanks.
>>
>> -richard
>>
>> --
>> Fedora-directory-users mailing list
>> Fedora-directory-users@redhat.com
>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>
>>
>
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Richard Hesse

2008-Feb-15 20:04 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Eh sorry about this but it appears that my original hunch was correct. The 1.1
DS instance did indeed hang again recently. I was able to check a localhost
query and that failed, too. So the problem definitely appears to be a hang in
the FDS code somewhere. The question is, how do I go about debugging this?
Strace doesn''t show much at all. Enabling debug trace logging kills the
server. Any ideas? Thanks.

-richard


On 2/14/08 11:21 AM, "Richard Hesse" <richard@powerset.com>
wrote:

Actually, it ends up that debug logging was putting too much disk load on the
server and the process fell behind/stopped servicing socket requests. Thanks for
your help Richard.

-richard


On 2/12/08 12:32 PM, "Richard Megginson" <rmeggins@redhat.com>
wrote:

Richard Hesse wrote:> There''s a load balancer acting as the client to the DS (proxying
client requests). I think that''s a red herring though. Any search
requests sent directly to the DS, bypassing the LB, would fail. I think I even
tried requests locally from the server and they still failed. I can''t
be sure about that last statement, it was a long day.
>What are all of these closed connections from? e.g. conn=71007,
conn=71003, etc.?  Are they from the load balancer?

I''m not really sure how to proceed to diagnose this from the directory
server because events like these usually indicate something is happening
at the TCP/IP layer.

I would be really interested to see if you continued to have problems if
you shut off the load balancer completely and just contacted the
directory server via the loopback interface.> What about the network file descriptor is not connected error?
>It''s similar to the B4 - it means there was a problem with the
connection to the client.> Thanks.
>
> -richard
>
> -----Original Message-----
> From: fedora-directory-users-bounces@redhat.com
[mailto:fedora-directory-users-bounces@redhat.com] On Behalf Of Richard
Megginson
> Sent: Monday, February 11, 2008 7:43 PM
> To: General discussion list for the Fedora Directory server project.
> Subject: Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not
connected
>
> Richard Hesse wrote:
>
>> Started to play with FDS 1.1 for some dogfood testing. After running
for 10-15 minutes, the server stopped responding to network requests and went
silent. The process was running, the error log was updating with the ldbm event
loop, but no socket requests were fulfilled. Checking the access log, I saw
this:
>>
>> [12/Feb/2008:01:47:58 +0000] conn=71108 op=-1 fd=79 closed error 107
(Transport endpoint is not connected) - Network file descriptor is not
connected.
>> [12/Feb/2008:01:47:59 +0000] conn=71007 op=60 fd=69 closed - B4
>> [12/Feb/2008:01:48:00 +0000] conn=71003 op=48 fd=68 closed - B4
>> [12/Feb/2008:01:48:01 +0000] conn=71017 op=47 fd=72 closed - B4
>> [12/Feb/2008:01:48:06 +0000] conn=71102 op=2 fd=66 closed - B4
>> [12/Feb/2008:01:48:07 +0000] conn=71103 op=2 fd=70 closed - B4
>> [12/Feb/2008:01:48:07 +0000] conn=71040 op=10 fd=76 closed - B4
>>
>> Any ideas or suggestions on how to approach troubleshooting this issue
would be greatly appreciated.
>>
>>
> B4 means SLAPD_DISCONNECT_BER_FLUSH - this usually means the client has
reset or closed the connection while the server was attempting to send a
response.
>
>
http://www.redhat.com/docs/manuals/dir-server/cli/8.0/Configuration_Command_File_Reference-Access_Log_and_Connection_Code_Reference-Common_Connection_Codes.html
>
> Do you have a firewall or some other network device?
>
>> Thanks.
>>
>> -richard
>>
>> --
>> Fedora-directory-users mailing list
>> Fedora-directory-users@redhat.com
>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>
>>
>
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Rich Megginson

2008-Feb-15 20:23 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Richard Hesse wrote:> Eh sorry about this but it appears that my original hunch was correct. 
> The 1.1 DS instance did indeed hang again recently. I was able to 
> check a localhost query and that failed, too. So the problem 
> definitely appears to be a hang in the FDS code somewhere. The 
> question is, how do I go about debugging this? Strace doesn’t show 
> much at all. Enabling debug trace logging kills the server. Any ideas? 
> Thanks.What sort of application(s) are you using to generate a load against the 
directory server?

What does logconv.pl /var/log/dirsrv/slapd-instance/access say?

If TRACE level logging is too expensive, you might try 8 Connection 
management
http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting

Richard Hesse

2008-Feb-15 20:38 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Thanks Richard, I¹ll give connection management a whirl.

Here¹s the log parser output (nice util btw):

----------- Access Log Output ------------

Restarts:                     0

Total Connections:            4820
Peak Concurrent Connections:  19
Total Operations:             18017
Total Results:                18129
Overall Performance:          100.0%

Searches:                     9960
Modifications:                6
Adds:                         0
Deletes:                      0
Mod RDNs:                     0

Persistent Searches:          0
Internal Operations:          0
Entry Operations:             0
Extended Operations:          3224
Abandoned Requests:           0
Smart Referrals Received:     0

VLV Operations:               0
VLV Unindexed Searches:       0
SORT Operations:              0
SSL Connections:              1613

Entire Search Base Queries:   820
Unindexed Searches:           0

FDs Taken:                    4828
FDs Returned:                 4817
Highest FD Taken:             109

Broken Pipes:                 0
Connections Reset By Peer:    0
Resource Unavailable:         17
     -  17   (T1) Idle Timeout Exceeded

Binds:                        4827
Unbinds:                      65

 LDAP v2 Binds:               0
 LDAP v3 Binds:               4827
 SSL Client Binds:            0
 Failed SSL Client Binds:     0
 SASL Binds:                  0

 Directory Manager Binds:     0
 Anonymous Binds:             4813
 Other Binds:                 14



On 2/15/08 12:23 PM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
> Richard Hesse wrote:
>> Eh sorry about this but it appears that my original hunch was correct.
>> The 1.1 DS instance did indeed hang again recently. I was able to
>> check a localhost query and that failed, too. So the problem
>> definitely appears to be a hang in the FDS code somewhere. The
>> question is, how do I go about debugging this? Strace doesn¹t show
>> much at all. Enabling debug trace logging kills the server. Any ideas?
>> Thanks.
> What sort of application(s) are you using to generate a load against the
> directory server?
>
> What does logconv.pl /var/log/dirsrv/slapd-instance/access say?
>
> If TRACE level logging is too expensive, you might try 8 Connection
> management
> http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
>
>

Rich Megginson

2008-Feb-15 20:53 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Richard Hesse wrote:> Thanks Richard, I¹ll give connection management a whirl.
>   
What is the application which is generating this load?> Here¹s the log parser output (nice util btw):
>
> ----------- Access Log Output ------------
>
> Restarts:                     0
>
> Total Connections:            4820
> Peak Concurrent Connections:  19
> Total Operations:             18017
> Total Results:                18129
> Overall Performance:          100.0%
>
> Searches:                     9960
> Modifications:                6
> Adds:                         0
> Deletes:                      0
> Mod RDNs:                     0
>
> Persistent Searches:          0
> Internal Operations:          0
> Entry Operations:             0
> Extended Operations:          3224
> Abandoned Requests:           0
> Smart Referrals Received:     0
>
> VLV Operations:               0
> VLV Unindexed Searches:       0
> SORT Operations:              0
> SSL Connections:              1613
>
> Entire Search Base Queries:   820
> Unindexed Searches:           0
>
> FDs Taken:                    4828
> FDs Returned:                 4817
> Highest FD Taken:             109
>
> Broken Pipes:                 0
> Connections Reset By Peer:    0
> Resource Unavailable:         17
>      -  17   (T1) Idle Timeout Exceeded
>
> Binds:                        4827
> Unbinds:                      65
>
>  LDAP v2 Binds:               0
>  LDAP v3 Binds:               4827
>  SSL Client Binds:            0
>  Failed SSL Client Binds:     0
>  SASL Binds:                  0
>
>  Directory Manager Binds:     0
>  Anonymous Binds:             4813
>  Other Binds:                 14
>
>
>
> On 2/15/08 12:23 PM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
>
>   
>> Richard Hesse wrote:
>>     
>>> Eh sorry about this but it appears that my original hunch was
correct.
>>> The 1.1 DS instance did indeed hang again recently. I was able to
>>> check a localhost query and that failed, too. So the problem
>>> definitely appears to be a hang in the FDS code somewhere. The
>>> question is, how do I go about debugging this? Strace doesn¹t show
>>> much at all. Enabling debug trace logging kills the server. Any
ideas?
>>> Thanks.
>>>       
>> What sort of application(s) are you using to generate a load against
the
>> directory server?
>>
>> What does logconv.pl /var/log/dirsrv/slapd-instance/access say?
>>
>> If TRACE level logging is too expensive, you might try 8 Connection
>> management
>> http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
>>
>>
>>     
>
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Richard Hesse

2008-Feb-15 21:59 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

nsswitch posix users/groups, ssh, sudo, puppet (config management), and
internally written applications.

-richard

On 2/15/08 12:53 PM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
> What is the application which is generating this load?

Rich Megginson

2008-Feb-15 22:11 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Richard Hesse wrote:> nsswitch posix users/groups,
Are you using nscd?> ssh, sudo, puppet (config management), and
> internally written applications.
>
> -richard
>
> On 2/15/08 12:53 PM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
>
>   
>> What is the application which is generating this load?
>>     
>
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Richard Hesse

2008-Feb-15 22:50 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Yes, every host (except the ldap hosts) runs nscd. The ldap servers are not
configured to use directory data for anything.

-richard


On 2/15/08 2:11 PM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
> Richard Hesse wrote:
>> nsswitch posix users/groups,
> Are you using nscd?
>> ssh, sudo, puppet (config management), and
>> internally written applications.
>>
>> -richard
>>
>> On 2/15/08 12:53 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>
>>
>>> What is the application which is generating this load?
>>>
>>
>>
>> --
>> Fedora-directory-users mailing list
>> Fedora-directory-users@redhat.com
>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>
>
>

Rich Megginson

2008-Feb-19 18:23 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Richard Hesse wrote:> Yes, every host (except the ldap hosts) runs nscd. The ldap servers are not
> configured to use directory data for anything.
>   I just don''t know.  I''ve not seen this before.  I suppose you
could try
checking your kernel TCP/IP settings, and increasing the number of file 
descriptors used - 
http://directory.fedoraproject.org/wiki/Performance_Tuning> -richard
>
>
> On 2/15/08 2:11 PM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
>
>   
>> Richard Hesse wrote:
>>     
>>> nsswitch posix users/groups,
>>>       
>> Are you using nscd?
>>     
>>> ssh, sudo, puppet (config management), and
>>> internally written applications.
>>>
>>> -richard
>>>
>>> On 2/15/08 12:53 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>
>>>
>>>       
>>>> What is the application which is generating this load?
>>>>
>>>>         
>>> --
>>> Fedora-directory-users mailing list
>>> Fedora-directory-users@redhat.com
>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>
>>>       
>>     
>
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Richard Hesse

2008-Feb-19 23:02 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Not much new to report. The server hung again and the only thing in the
error log with connection tracing is this:

[18/Feb/2008:13:14:03 +0000] - PR_Write(41818752) Netscape Portable Runtime
error -5961 (TCP connection reset by peer.)
[18/Feb/2008:13:14:03 +0000] - ber_flush failed, error 104 (Connection reset
by peer)

Which doesn''t look like much. As for network tuning, it''s
already been done.
Max descriptors is set to 32768.

Are there any gdb commands I can run while the server is in a hung state?
I''m going to try running strace while the process is working, and hope
for a
hang. Maybe that will give us some more info.

-richard

On 2/19/08 10:23 AM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
> Richard Hesse wrote:
>> Yes, every host (except the ldap hosts) runs nscd. The ldap servers are
not
>> configured to use directory data for anything.
>>
> I just don''t know.  I''ve not seen this before.  I suppose
you could try
> checking your kernel TCP/IP settings, and increasing the number of file
> descriptors used -
> http://directory.fedoraproject.org/wiki/Performance_Tuning
>> -richard
>>
>>
>> On 2/15/08 2:11 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>
>>
>>> Richard Hesse wrote:
>>>
>>>> nsswitch posix users/groups,
>>>>
>>> Are you using nscd?
>>>
>>>> ssh, sudo, puppet (config management), and
>>>> internally written applications.
>>>>
>>>> -richard
>>>>
>>>> On 2/15/08 12:53 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>>
>>>>
>>>>
>>>>> What is the application which is generating this load?
>>>>>
>>>>>
>>>> --
>>>> Fedora-directory-users mailing list
>>>> Fedora-directory-users@redhat.com
>>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>
>>>>
>>>
>>
>>
>> --
>> Fedora-directory-users mailing list
>> Fedora-directory-users@redhat.com
>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>
>
>

Rich Megginson

2008-Feb-20 00:04 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Richard Hesse wrote:> Not much new to report. The server hung again and the only thing in the
> error log with connection tracing is this:
>
> [18/Feb/2008:13:14:03 +0000] - PR_Write(41818752) Netscape Portable Runtime
> error -5961 (TCP connection reset by peer.)
> [18/Feb/2008:13:14:03 +0000] - ber_flush failed, error 104 (Connection
reset
> by peer)
>
> Which doesn''t look like much.Well, it tells me that the server was attempting to write to a socket, 
and got an error.  -5961 is PR_CONNECT_RESET_ERROR which can occur if 
the system call returns either EPIPE or ECONNRESET.  And error 104 is 
indeed ECONNRESET.
/usr/include/asm-generic/errno.h:#define        ECONNRESET      104     
/* Connection reset by peer */

AFAICT, this can happen if the client shuts down the socket (for any 
number of reasons) but the server is still attempting to send data.  In 
this case, the client will respond with a TCP RST.  I''m not sure how or
why this could happen.  I''m open to other causes for ECONNRESET.
What would be really, really interesting is if we could narrow this down 
to a particular client application and run ethereal on the connection.

Are you using SSL?> As for network tuning, it''s already been done.
>   
> Max descriptors is set to 32768.
>
> Are there any gdb commands I can run while the server is in a hung state?
>   Sure.  For whatever the cause of the ECONNRESET, it should not cause the 
server to hang, and it would be interesting to find out what it''s 
doing.  You''ll have to install the fedora-ds-base-debuginfo package.
Attach to the process - gdb /usr/sbin/ns-slapd <pid of process>
Then, dump the thread stacks -

(gdb) thread apply all bt

If you want the output to go to a file, redirect gdb logging to a file 
first before doing the thread apply e.g.

(gdb) set logging on
(gdb) set logging file stack.txt

> I''m going to try running strace while the process is working, and
hope for a
> hang. Maybe that will give us some more info.
>
> -richard
>
> On 2/19/08 10:23 AM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
>
>   
>> Richard Hesse wrote:
>>     
>>> Yes, every host (except the ldap hosts) runs nscd. The ldap servers
are not
>>> configured to use directory data for anything.
>>>
>>>       
>> I just don''t know.  I''ve not seen this before.  I
suppose you could try
>> checking your kernel TCP/IP settings, and increasing the number of file
>> descriptors used -
>> http://directory.fedoraproject.org/wiki/Performance_Tuning
>>     
>>> -richard
>>>
>>>
>>> On 2/15/08 2:11 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>
>>>
>>>       
>>>> Richard Hesse wrote:
>>>>
>>>>         
>>>>> nsswitch posix users/groups,
>>>>>
>>>>>           
>>>> Are you using nscd?
>>>>
>>>>         
>>>>> ssh, sudo, puppet (config management), and
>>>>> internally written applications.
>>>>>
>>>>> -richard
>>>>>
>>>>> On 2/15/08 12:53 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> What is the application which is generating this load?
>>>>>>
>>>>>>
>>>>>>             
>>>>> --
>>>>> Fedora-directory-users mailing list
>>>>> Fedora-directory-users@redhat.com
>>>>>
https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>>
>>>>>
>>>>>           
>>> --
>>> Fedora-directory-users mailing list
>>> Fedora-directory-users@redhat.com
>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>
>>>       
>>     
>
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Richard Hesse

2008-Feb-20 23:17 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Yeah, we¹re using SSL and TLS so ethereal/tcpdump isn¹t going to yield much
info. The process hung again and strace didn¹t provide too much information
other than this:

futex(0x20b9260, FUTEX_WAIT, 2, NULL)

Would that give you a place to start looking?

-richard


On 2/19/08 4:04 PM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
> Richard Hesse wrote:
>> Not much new to report. The server hung again and the only thing in the
>> error log with connection tracing is this:
>>
>> [18/Feb/2008:13:14:03 +0000] - PR_Write(41818752) Netscape Portable
Runtime
>> error -5961 (TCP connection reset by peer.)
>> [18/Feb/2008:13:14:03 +0000] - ber_flush failed, error 104 (Connection
reset
>> by peer)
>>
>> Which doesn''t look like much.
> Well, it tells me that the server was attempting to write to a socket,
> and got an error.  -5961 is PR_CONNECT_RESET_ERROR which can occur if
> the system call returns either EPIPE or ECONNRESET.  And error 104 is
> indeed ECONNRESET.
> /usr/include/asm-generic/errno.h:#define        ECONNRESET      104
> /* Connection reset by peer */
>
> AFAICT, this can happen if the client shuts down the socket (for any
> number of reasons) but the server is still attempting to send data.  In
> this case, the client will respond with a TCP RST.  I''m not sure
how or
> why this could happen.  I''m open to other causes for ECONNRESET.
> What would be really, really interesting is if we could narrow this down
> to a particular client application and run ethereal on the connection.
>
> Are you using SSL?
>> As for network tuning, it''s already been done.
>>
>> Max descriptors is set to 32768.
>>
>> Are there any gdb commands I can run while the server is in a hung
state?
>>
> Sure.  For whatever the cause of the ECONNRESET, it should not cause the
> server to hang, and it would be interesting to find out what it''s
> doing.  You''ll have to install the fedora-ds-base-debuginfo
package.
> Attach to the process - gdb /usr/sbin/ns-slapd <pid of process>
> Then, dump the thread stacks -
>
> (gdb) thread apply all bt
>
> If you want the output to go to a file, redirect gdb logging to a file
> first before doing the thread apply e.g.
>
> (gdb) set logging on
> (gdb) set logging file stack.txt
>
>
>> I''m going to try running strace while the process is working,
and hope for a
>> hang. Maybe that will give us some more info.
>>
>> -richard
>>
>> On 2/19/08 10:23 AM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>
>>
>>> Richard Hesse wrote:
>>>
>>>> Yes, every host (except the ldap hosts) runs nscd. The ldap
servers are not
>>>> configured to use directory data for anything.
>>>>
>>>>
>>> I just don''t know.  I''ve not seen this before.  I
suppose you could try
>>> checking your kernel TCP/IP settings, and increasing the number of
file
>>> descriptors used -
>>> http://directory.fedoraproject.org/wiki/Performance_Tuning
>>>
>>>> -richard
>>>>
>>>>
>>>> On 2/15/08 2:11 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>>
>>>>
>>>>
>>>>> Richard Hesse wrote:
>>>>>
>>>>>
>>>>>> nsswitch posix users/groups,
>>>>>>
>>>>>>
>>>>> Are you using nscd?
>>>>>
>>>>>
>>>>>> ssh, sudo, puppet (config management), and
>>>>>> internally written applications.
>>>>>>
>>>>>> -richard
>>>>>>
>>>>>> On 2/15/08 12:53 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> What is the application which is generating this
load?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Fedora-directory-users mailing list
>>>>>> Fedora-directory-users@redhat.com
>>>>>>
https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>>>
>>>>>>
>>>>>>
>>>> --
>>>> Fedora-directory-users mailing list
>>>> Fedora-directory-users@redhat.com
>>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>
>>>>
>>>
>>
>>
>> --
>> Fedora-directory-users mailing list
>> Fedora-directory-users@redhat.com
>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>
>
>

Rich Megginson

2008-Feb-20 23:39 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Richard Hesse wrote:> Yeah, we¹re using SSL and TLS so ethereal/tcpdump isn¹t going to yield much
> info.It would give us the TCP/IP protocol data, so we could see what clients 
and servers are sending the FIN and RST.  It''s not so much the LDAP
data
I care about, although ssltap might be useful for that.> The process hung again and strace didn¹t provide too much information
> other than this:
>
> futex(0x20b9260, FUTEX_WAIT, 2, NULL)
>
> Would that give you a place to start looking?
>   
That does suggest a possible deadlock.> -richard
>
>
> On 2/19/08 4:04 PM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
>
>   
>> Richard Hesse wrote:
>>     
>>> Not much new to report. The server hung again and the only thing in
the
>>> error log with connection tracing is this:
>>>
>>> [18/Feb/2008:13:14:03 +0000] - PR_Write(41818752) Netscape Portable
Runtime
>>> error -5961 (TCP connection reset by peer.)
>>> [18/Feb/2008:13:14:03 +0000] - ber_flush failed, error 104
(Connection reset
>>> by peer)
>>>
>>> Which doesn''t look like much.
>>>       
>> Well, it tells me that the server was attempting to write to a socket,
>> and got an error.  -5961 is PR_CONNECT_RESET_ERROR which can occur if
>> the system call returns either EPIPE or ECONNRESET.  And error 104 is
>> indeed ECONNRESET.
>> /usr/include/asm-generic/errno.h:#define        ECONNRESET      104
>> /* Connection reset by peer */
>>
>> AFAICT, this can happen if the client shuts down the socket (for any
>> number of reasons) but the server is still attempting to send data.  In
>> this case, the client will respond with a TCP RST.  I''m not
sure how or
>> why this could happen.  I''m open to other causes for
ECONNRESET.
>> What would be really, really interesting is if we could narrow this
down
>> to a particular client application and run ethereal on the connection.
>>
>> Are you using SSL?
>>     
>>> As for network tuning, it''s already been done.
>>>
>>> Max descriptors is set to 32768.
>>>
>>> Are there any gdb commands I can run while the server is in a hung
state?
>>>
>>>       
>> Sure.  For whatever the cause of the ECONNRESET, it should not cause
the
>> server to hang, and it would be interesting to find out what
it''s
>> doing.  You''ll have to install the fedora-ds-base-debuginfo
package.
>> Attach to the process - gdb /usr/sbin/ns-slapd <pid of process>
>> Then, dump the thread stacks -
>>
>> (gdb) thread apply all bt
>>
>> If you want the output to go to a file, redirect gdb logging to a file
>> first before doing the thread apply e.g.
>>
>> (gdb) set logging on
>> (gdb) set logging file stack.txt
>>
>>
>>     
>>> I''m going to try running strace while the process is
working, and hope for a
>>> hang. Maybe that will give us some more info.
>>>
>>> -richard
>>>
>>> On 2/19/08 10:23 AM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>
>>>
>>>       
>>>> Richard Hesse wrote:
>>>>
>>>>         
>>>>> Yes, every host (except the ldap hosts) runs nscd. The ldap
servers are not
>>>>> configured to use directory data for anything.
>>>>>
>>>>>
>>>>>           
>>>> I just don''t know.  I''ve not seen this
before.  I suppose you could try
>>>> checking your kernel TCP/IP settings, and increasing the number
of file
>>>> descriptors used -
>>>> http://directory.fedoraproject.org/wiki/Performance_Tuning
>>>>
>>>>         
>>>>> -richard
>>>>>
>>>>>
>>>>> On 2/15/08 2:11 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Richard Hesse wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> nsswitch posix users/groups,
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Are you using nscd?
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> ssh, sudo, puppet (config management), and
>>>>>>> internally written applications.
>>>>>>>
>>>>>>> -richard
>>>>>>>
>>>>>>> On 2/15/08 12:53 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> What is the application which is generating
this load?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> --
>>>>>>> Fedora-directory-users mailing list
>>>>>>> Fedora-directory-users@redhat.com
>>>>>>>
https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>> --
>>>>> Fedora-directory-users mailing list
>>>>> Fedora-directory-users@redhat.com
>>>>>
https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>>
>>>>>
>>>>>           
>>> --
>>> Fedora-directory-users mailing list
>>> Fedora-directory-users@redhat.com
>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>
>>>       
>>     
>
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Rich Megginson

2008-Feb-26 03:31 UTC

head link

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Richard Hesse wrote:> Yeah, we¹re using SSL and TLS so ethereal/tcpdump isn¹t going to yield much
> info. The process hung again and strace didn¹t provide too much information
> other than this:
>
> futex(0x20b9260, FUTEX_WAIT, 2, NULL)
>
> Would that give you a place to start looking?
>   Try logconv.pl -V
/var/log/dirsrv/slapd-instancename/access> -richard
>
>
> On 2/19/08 4:04 PM, "Rich Megginson" <rmeggins@redhat.com>
wrote:
>
>   
>> Richard Hesse wrote:
>>     
>>> Not much new to report. The server hung again and the only thing in
the
>>> error log with connection tracing is this:
>>>
>>> [18/Feb/2008:13:14:03 +0000] - PR_Write(41818752) Netscape Portable
Runtime
>>> error -5961 (TCP connection reset by peer.)
>>> [18/Feb/2008:13:14:03 +0000] - ber_flush failed, error 104
(Connection reset
>>> by peer)
>>>
>>> Which doesn''t look like much.
>>>       
>> Well, it tells me that the server was attempting to write to a socket,
>> and got an error.  -5961 is PR_CONNECT_RESET_ERROR which can occur if
>> the system call returns either EPIPE or ECONNRESET.  And error 104 is
>> indeed ECONNRESET.
>> /usr/include/asm-generic/errno.h:#define        ECONNRESET      104
>> /* Connection reset by peer */
>>
>> AFAICT, this can happen if the client shuts down the socket (for any
>> number of reasons) but the server is still attempting to send data.  In
>> this case, the client will respond with a TCP RST.  I''m not
sure how or
>> why this could happen.  I''m open to other causes for
ECONNRESET.
>> What would be really, really interesting is if we could narrow this
down
>> to a particular client application and run ethereal on the connection.
>>
>> Are you using SSL?
>>     
>>> As for network tuning, it''s already been done.
>>>
>>> Max descriptors is set to 32768.
>>>
>>> Are there any gdb commands I can run while the server is in a hung
state?
>>>
>>>       
>> Sure.  For whatever the cause of the ECONNRESET, it should not cause
the
>> server to hang, and it would be interesting to find out what
it''s
>> doing.  You''ll have to install the fedora-ds-base-debuginfo
package.
>> Attach to the process - gdb /usr/sbin/ns-slapd <pid of process>
>> Then, dump the thread stacks -
>>
>> (gdb) thread apply all bt
>>
>> If you want the output to go to a file, redirect gdb logging to a file
>> first before doing the thread apply e.g.
>>
>> (gdb) set logging on
>> (gdb) set logging file stack.txt
>>
>>
>>     
>>> I''m going to try running strace while the process is
working, and hope for a
>>> hang. Maybe that will give us some more info.
>>>
>>> -richard
>>>
>>> On 2/19/08 10:23 AM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>
>>>
>>>       
>>>> Richard Hesse wrote:
>>>>
>>>>         
>>>>> Yes, every host (except the ldap hosts) runs nscd. The ldap
servers are not
>>>>> configured to use directory data for anything.
>>>>>
>>>>>
>>>>>           
>>>> I just don''t know.  I''ve not seen this
before.  I suppose you could try
>>>> checking your kernel TCP/IP settings, and increasing the number
of file
>>>> descriptors used -
>>>> http://directory.fedoraproject.org/wiki/Performance_Tuning
>>>>
>>>>         
>>>>> -richard
>>>>>
>>>>>
>>>>> On 2/15/08 2:11 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Richard Hesse wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> nsswitch posix users/groups,
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Are you using nscd?
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> ssh, sudo, puppet (config management), and
>>>>>>> internally written applications.
>>>>>>>
>>>>>>> -richard
>>>>>>>
>>>>>>> On 2/15/08 12:53 PM, "Rich Megginson"
<rmeggins@redhat.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> What is the application which is generating
this load?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> --
>>>>>>> Fedora-directory-users mailing list
>>>>>>> Fedora-directory-users@redhat.com
>>>>>>>
https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>> --
>>>>> Fedora-directory-users mailing list
>>>>> Fedora-directory-users@redhat.com
>>>>>
https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>>
>>>>>
>>>>>           
>>> --
>>> Fedora-directory-users mailing list
>>> Fedora-directory-users@redhat.com
>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>
>>>       
>>     
>
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users@redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>

Fedora directory users - Feb 2008 - FDS 1.1 Transport endpoint is not connected

[Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

RE: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Re: [Fedora-directory-users] FDS 1.1 Transport endpoint is not connected