Hi, I have a reproducible hang on my sol10u2 apache 2.0.59 web server. It hangs during a subversion (svn) export operation. It does not hang on the solaris 9 web servers. I think dtrace would be perfect to debug this - truss just shows it in pollsys(): I have the latest DtraceToolkit and ran the dexplorer - I need some help analyzing for the root cause - are there any scripts to give me Kernel Tuning tips ala SEToolKit? I have done the TCPtuning (below the truss - its what I use on Solaris9) I know solaris 10 has updated kernel parameters - but I never expected this show stopper hanging of simple operations. The system logs are clean. 19359: read(14, " 9D3 [ 5A4 /11 yB080 ZB4".., 4096) = 4096 19359: read(14, " .DD /A5 oF5 b OD6EECA I".., 4096) = 4096 19359: read(14, "05AD9C83C4B9DCC2D9918C06".., 4096) = 4096 19359: read(14, "C393CD9D a02 / 9 y88FF0F".., 4096) = 4096 19359: read(14, " %CBCCE5A2E11499A1 K K /".., 4096) = 4096 19359: read(14, "DEDDBFFFF9F5 w\0 U | jE0".., 4096) = 4096 19359: read(14, "FF gFC8A _F0 7 ~C7 ?F8 C".., 4096) = 4096 19359: writev(7, 0x08046D40, 4) Err#11 EAGAIN 19359: pollsys(0x08044AF0, 1, 0x08044AD0, 0x00000000) = 1 19359: writev(7, 0x08046D40, 4) = 65536 19359: pollsys(0x08044AF0, 1, 0x08044AD0, 0x00000000) (sleeping...) god at irt-web-05:DTT 10:11am 89 # /usr/local/bin/tcptune.sh show ip_ignore_redirect 0 tcp_conn_grace_period 0 tcp_conn_req_max_q 128 tcp_conn_req_max_q0 1024 tcp_conn_req_min 1 tcp_cwnd_max 1048576 tcp_fin_wait_2_flush_interval 675000 tcp_ip_abort_cinterval 180000 tcp_ip_abort_interval 480000 tcp_keepalive_interval 7200000 tcp_recv_hiwat 49152 tcp_rexmit_interval_initial 3000 tcp_rexmit_interval_max 60000 tcp_rexmit_interval_min 400 tcp_slow_start_initial 4 tcp_time_wait_interval 60000 tcp_xmit_hiwat 49152 Thanks! Fletch. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20060807/7badf602/attachment.html>
On Mon, 7 Aug 2006, Fletcher Cocquyt wrote:> I have a reproducible hang on my sol10u2 apache 2.0.59 web server.Is this a system or web server hang? Assuming the web server is hung, what does pstack print when you run it against a hung process? - Ryan -- UNIX Administrator http://prefetch.net
Apache hang - pstack below: god at irt-web-xyz:httpd-2.0.59 2:03pm 129 # pstack 6092 6092: /opt/httpd-2.0.59/bin/httpd -k start bfb906d7 pollsys (8044ad0, 1, 8044ab0, 0) bfb3a722 poll (8044ad0, 1, 493e0, 493e0, 7, 40004) + 52 bfdba1e7 apr_poll (8044b40, 1, 8044b3c, 493e0, 0, 0) + ab bfdba7f7 apr_wait_for_io_or_timeout (0, 8242718, 0, 8046d30) + 6f bfdb0ec2 apr_socket_sendv (8242718, 8046d30, 2, 8044be8) + 4a bfdb125a apr_sendv (8242718, 8046d30, 2, 8044be8, 8271a18, 8242718) + 22 080b426e writev_it_all (4, 2050f, 8044c64, 8271a18, 0, 0) + 3e 080b5832 core_output_filter (8242bc8, 8271a18, 82466f0, bfdcb9b8) + 9a6 0808da31 chunk_filter (8313848, 8271a18, 852cbc0, 1000) + 169 080b0126 ap_content_length_filter (8265e38, 8271a18, 8046ea8, bfdb8aab) + aa 08091353 ap_byterange_filter (8265e20, 8271a18, 1f91b, bf816b47) + 73 080adf81 ap_filter_flush (8271a18, 8265e20, 82466f0, 1355) + 11 bfe16126 apr_brigade_write (8271a18, 80adf70, 8265e20, 921f0e0, 1f91b, 92d7aa8) + 7a bf8dd62b brigade_write_fn (85a7270, 921f0e0, 8046fa8, 0) + 2f bf813a92 svn_stream_write (85a7278, 921f0e0, 8046fa8, 1f91b) + 26 bf8161a7 encode_data (862dda8, 92c04c0, 8046ffc, 8577ad0) + 8f bf813a92 svn_stream_write (85a7288, 92c04c0, 8046ffc, fffffffe, 8490800, 84907e8) + 26 bf863080 window_handler (92e34c8, 8577ad0, 0, bf8de3d3) + 28c bf8de416 window_handler (92e34c8, 85a7260, 84ae838, bf8c88fc) + 56 bf864c11 svn_txdelta_send_txstream (84c3c48, bf8de3c0, 85a7260, 836d578) + 51 bf8ae87f update_entry (0, 0, 836d5c0, 84fbdc0, 8329550, 836d5b0) + 533 bf8ae28e delta_dirs (0, 83294a8, 8329550, 83294a0, 0, 8329468) + 3f6 bf8ae727 update_entry (0, 0, 83294a8, 832d408, 82b8798, 83294a0) + 3db bf8ae28e delta_dirs (824bac0, 827a5a0, 82b8798, bf8b1ce9, 1, 8265158) + 3f6 bf8aee76 svn_repos_finish_report (827a518, 8265158, 8282240, 827a6b0) + 3c2 bf8defa6 dav_svn__update_report (8271710, 82713e0, 8265e20, 827149a) + 976 bf8e1443 dav_svn_deliver_report (8265190, 8271710, 82713e0, 8265e20, 8265190, bf99d50c) + 20b bf996c76 dav_method_report (ffffffff, 0, 0, bfdb8aab, 8271348, 8271170) + be bf997f52 dav_handler (8265190, 82664e8, 0, 82664b8) + 906 080a24ea ap_run_handler (8265190, 8265190, 8047838, 80a2847, 82651a8, 64) + 32 080a28b1 ap_invoke_handler (8265190, 0, 8047868, 80ae861) + ad 080920b5 ap_process_request (8265190, 4, 8265190, 0) + 135 0808db9d ap_process_http_connection (82427f0, 82466f0, 80478c8, 80abe85) + e9 080abbf6 ap_run_process_connection (82427f0, 8242718, 82426e0, 82427f0, 0, 0) + 32 080a0cfd child_main (10, 1, 1, 0) + 345 080a0f27 make_child (8047a00, 8047a08, 8047a38, 80a164f, 3f1, 8235440) + d7 080a0fb8 startup_children (3f1, 8235440, 8047978, 8074d65, fc8, 5) + 40 080a164f ap_mpm_run (80f9150, 8133238, 80fe880, 80fe880) + 68f 080a6963 main (3, 8047ad8, 8047ae8) + 5cb 0806d63c _start (3, 8047bd8, 8047bf4, 8047bf7, 0, 8047bfd) + 80 -----Original Message----- From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Matty Sent: Monday, August 07, 2006 1:57 PM To: Solaris Dtrace List Subject: Re: [dtrace-discuss] dtrace apache hang on sol10u2 On Mon, 7 Aug 2006, Fletcher Cocquyt wrote:> I have a reproducible hang on my sol10u2 apache 2.0.59 web server.Is this a system or web server hang? Assuming the web server is hung, what does pstack print when you run it against a hung process? - Ryan -- UNIX Administrator http://prefetch.net _______________________________________________ dtrace-discuss mailing list dtrace-discuss at opensolaris.org
Fletcher Cocquyt
2006-Aug-09 22:28 UTC
[dtrace-discuss] apache hangs on sol10u2 not sol10u1
So I tarred up the apache install and dropped it on a sol10u1 server - it runs perfectly clean - no hanging web/svn operations Maybe I''m guilty of jumping on U2 too quickly before the first round of patches? Still have the Sun case open - but I think I''ll need to downgrade to U1 for operational time constraints. -----Original Message----- From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Fletcher Cocquyt Sent: Monday, August 07, 2006 2:05 PM To: ''Matty''; ''Solaris Dtrace List'' Subject: RE: [dtrace-discuss] dtrace apache hang on sol10u2 Apache hang - pstack below: god at irt-web-xyz:httpd-2.0.59 2:03pm 129 # pstack 6092 6092: /opt/httpd-2.0.59/bin/httpd -k start bfb906d7 pollsys (8044ad0, 1, 8044ab0, 0) bfb3a722 poll (8044ad0, 1, 493e0, 493e0, 7, 40004) + 52 bfdba1e7 apr_poll (8044b40, 1, 8044b3c, 493e0, 0, 0) + ab bfdba7f7 apr_wait_for_io_or_timeout (0, 8242718, 0, 8046d30) + 6f bfdb0ec2 apr_socket_sendv (8242718, 8046d30, 2, 8044be8) + 4a bfdb125a apr_sendv (8242718, 8046d30, 2, 8044be8, 8271a18, 8242718) + 22 080b426e writev_it_all (4, 2050f, 8044c64, 8271a18, 0, 0) + 3e 080b5832 core_output_filter (8242bc8, 8271a18, 82466f0, bfdcb9b8) + 9a6 0808da31 chunk_filter (8313848, 8271a18, 852cbc0, 1000) + 169 080b0126 ap_content_length_filter (8265e38, 8271a18, 8046ea8, bfdb8aab) + aa 08091353 ap_byterange_filter (8265e20, 8271a18, 1f91b, bf816b47) + 73 080adf81 ap_filter_flush (8271a18, 8265e20, 82466f0, 1355) + 11 bfe16126 apr_brigade_write (8271a18, 80adf70, 8265e20, 921f0e0, 1f91b, 92d7aa8) + 7a bf8dd62b brigade_write_fn (85a7270, 921f0e0, 8046fa8, 0) + 2f bf813a92 svn_stream_write (85a7278, 921f0e0, 8046fa8, 1f91b) + 26 bf8161a7 encode_data (862dda8, 92c04c0, 8046ffc, 8577ad0) + 8f bf813a92 svn_stream_write (85a7288, 92c04c0, 8046ffc, fffffffe, 8490800, 84907e8) + 26 bf863080 window_handler (92e34c8, 8577ad0, 0, bf8de3d3) + 28c bf8de416 window_handler (92e34c8, 85a7260, 84ae838, bf8c88fc) + 56 bf864c11 svn_txdelta_send_txstream (84c3c48, bf8de3c0, 85a7260, 836d578) + 51 bf8ae87f update_entry (0, 0, 836d5c0, 84fbdc0, 8329550, 836d5b0) + 533 bf8ae28e delta_dirs (0, 83294a8, 8329550, 83294a0, 0, 8329468) + 3f6 bf8ae727 update_entry (0, 0, 83294a8, 832d408, 82b8798, 83294a0) + 3db bf8ae28e delta_dirs (824bac0, 827a5a0, 82b8798, bf8b1ce9, 1, 8265158) + 3f6 bf8aee76 svn_repos_finish_report (827a518, 8265158, 8282240, 827a6b0) + 3c2 bf8defa6 dav_svn__update_report (8271710, 82713e0, 8265e20, 827149a) + 976 bf8e1443 dav_svn_deliver_report (8265190, 8271710, 82713e0, 8265e20, 8265190, bf99d50c) + 20b bf996c76 dav_method_report (ffffffff, 0, 0, bfdb8aab, 8271348, 8271170) + be bf997f52 dav_handler (8265190, 82664e8, 0, 82664b8) + 906 080a24ea ap_run_handler (8265190, 8265190, 8047838, 80a2847, 82651a8, 64) + 32 080a28b1 ap_invoke_handler (8265190, 0, 8047868, 80ae861) + ad 080920b5 ap_process_request (8265190, 4, 8265190, 0) + 135 0808db9d ap_process_http_connection (82427f0, 82466f0, 80478c8, 80abe85) + e9 080abbf6 ap_run_process_connection (82427f0, 8242718, 82426e0, 82427f0, 0, 0) + 32 080a0cfd child_main (10, 1, 1, 0) + 345 080a0f27 make_child (8047a00, 8047a08, 8047a38, 80a164f, 3f1, 8235440) + d7 080a0fb8 startup_children (3f1, 8235440, 8047978, 8074d65, fc8, 5) + 40 080a164f ap_mpm_run (80f9150, 8133238, 80fe880, 80fe880) + 68f 080a6963 main (3, 8047ad8, 8047ae8) + 5cb 0806d63c _start (3, 8047bd8, 8047bf4, 8047bf7, 0, 8047bfd) + 80 -----Original Message----- From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Matty Sent: Monday, August 07, 2006 1:57 PM To: Solaris Dtrace List Subject: Re: [dtrace-discuss] dtrace apache hang on sol10u2 On Mon, 7 Aug 2006, Fletcher Cocquyt wrote:> I have a reproducible hang on my sol10u2 apache 2.0.59 web server.Is this a system or web server hang? Assuming the web server is hung, what does pstack print when you run it against a hung process? - Ryan -- UNIX Administrator http://prefetch.net _______________________________________________ dtrace-discuss mailing list dtrace-discuss at opensolaris.org _______________________________________________ dtrace-discuss mailing list dtrace-discuss at opensolaris.org
Fletcher Cocquyt
2006-Aug-14 19:15 UTC
[dtrace-discuss] apache hangs on sol10u2 and sol10u1 (after patches)
The plot thickens - the hangs re-appeared after applying the recommended patches to sol10u1 At this point I have re-installed sol10u1 and checkpointed the non-hanging webserver (no patches) with a Flash archive and I have a raidctl mirror sync in progress. Once the mirror is synced up, my plan is to 1) break the mirror 2) apply the kernel patch, or next in recommended list 3) reboot, re-run the web test 4) if HANG report to SUN (on the open case) - this patch breaks web server 5) else repeat 2-4 with next patch in the list Hopefully I will isolate what I suspect is a bad patch Stay tuned... -----Original Message----- From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Fletcher Cocquyt Sent: Wednesday, August 09, 2006 3:28 PM To: ''Matty''; ''Solaris Dtrace List'' Subject: RE: [dtrace-discuss] apache hangs on sol10u2 not sol10u1 So I tarred up the apache install and dropped it on a sol10u1 server - it runs perfectly clean - no hanging web/svn operations Maybe I''m guilty of jumping on U2 too quickly before the first round of patches? Still have the Sun case open - but I think I''ll need to downgrade to U1 for operational time constraints. -----Original Message----- From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Fletcher Cocquyt Sent: Monday, August 07, 2006 2:05 PM To: ''Matty''; ''Solaris Dtrace List'' Subject: RE: [dtrace-discuss] dtrace apache hang on sol10u2 Apache hang - pstack below: god at irt-web-xyz:httpd-2.0.59 2:03pm 129 # pstack 6092 6092: /opt/httpd-2.0.59/bin/httpd -k start bfb906d7 pollsys (8044ad0, 1, 8044ab0, 0) bfb3a722 poll (8044ad0, 1, 493e0, 493e0, 7, 40004) + 52 bfdba1e7 apr_poll (8044b40, 1, 8044b3c, 493e0, 0, 0) + ab bfdba7f7 apr_wait_for_io_or_timeout (0, 8242718, 0, 8046d30) + 6f bfdb0ec2 apr_socket_sendv (8242718, 8046d30, 2, 8044be8) + 4a bfdb125a apr_sendv (8242718, 8046d30, 2, 8044be8, 8271a18, 8242718) + 22 080b426e writev_it_all (4, 2050f, 8044c64, 8271a18, 0, 0) + 3e 080b5832 core_output_filter (8242bc8, 8271a18, 82466f0, bfdcb9b8) + 9a6 0808da31 chunk_filter (8313848, 8271a18, 852cbc0, 1000) + 169 080b0126 ap_content_length_filter (8265e38, 8271a18, 8046ea8, bfdb8aab) + aa 08091353 ap_byterange_filter (8265e20, 8271a18, 1f91b, bf816b47) + 73 080adf81 ap_filter_flush (8271a18, 8265e20, 82466f0, 1355) + 11 bfe16126 apr_brigade_write (8271a18, 80adf70, 8265e20, 921f0e0, 1f91b, 92d7aa8) + 7a bf8dd62b brigade_write_fn (85a7270, 921f0e0, 8046fa8, 0) + 2f bf813a92 svn_stream_write (85a7278, 921f0e0, 8046fa8, 1f91b) + 26 bf8161a7 encode_data (862dda8, 92c04c0, 8046ffc, 8577ad0) + 8f bf813a92 svn_stream_write (85a7288, 92c04c0, 8046ffc, fffffffe, 8490800, 84907e8) + 26 bf863080 window_handler (92e34c8, 8577ad0, 0, bf8de3d3) + 28c bf8de416 window_handler (92e34c8, 85a7260, 84ae838, bf8c88fc) + 56 bf864c11 svn_txdelta_send_txstream (84c3c48, bf8de3c0, 85a7260, 836d578) + 51 bf8ae87f update_entry (0, 0, 836d5c0, 84fbdc0, 8329550, 836d5b0) + 533 bf8ae28e delta_dirs (0, 83294a8, 8329550, 83294a0, 0, 8329468) + 3f6 bf8ae727 update_entry (0, 0, 83294a8, 832d408, 82b8798, 83294a0) + 3db bf8ae28e delta_dirs (824bac0, 827a5a0, 82b8798, bf8b1ce9, 1, 8265158) + 3f6 bf8aee76 svn_repos_finish_report (827a518, 8265158, 8282240, 827a6b0) + 3c2 bf8defa6 dav_svn__update_report (8271710, 82713e0, 8265e20, 827149a) + 976 bf8e1443 dav_svn_deliver_report (8265190, 8271710, 82713e0, 8265e20, 8265190, bf99d50c) + 20b bf996c76 dav_method_report (ffffffff, 0, 0, bfdb8aab, 8271348, 8271170) + be bf997f52 dav_handler (8265190, 82664e8, 0, 82664b8) + 906 080a24ea ap_run_handler (8265190, 8265190, 8047838, 80a2847, 82651a8, 64) + 32 080a28b1 ap_invoke_handler (8265190, 0, 8047868, 80ae861) + ad 080920b5 ap_process_request (8265190, 4, 8265190, 0) + 135 0808db9d ap_process_http_connection (82427f0, 82466f0, 80478c8, 80abe85) + e9 080abbf6 ap_run_process_connection (82427f0, 8242718, 82426e0, 82427f0, 0, 0) + 32 080a0cfd child_main (10, 1, 1, 0) + 345 080a0f27 make_child (8047a00, 8047a08, 8047a38, 80a164f, 3f1, 8235440) + d7 080a0fb8 startup_children (3f1, 8235440, 8047978, 8074d65, fc8, 5) + 40 080a164f ap_mpm_run (80f9150, 8133238, 80fe880, 80fe880) + 68f 080a6963 main (3, 8047ad8, 8047ae8) + 5cb 0806d63c _start (3, 8047bd8, 8047bf4, 8047bf7, 0, 8047bfd) + 80 -----Original Message----- From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss-bounces at opensolaris.org] On Behalf Of Matty Sent: Monday, August 07, 2006 1:57 PM To: Solaris Dtrace List Subject: Re: [dtrace-discuss] dtrace apache hang on sol10u2 On Mon, 7 Aug 2006, Fletcher Cocquyt wrote:> I have a reproducible hang on my sol10u2 apache 2.0.59 web server.Is this a system or web server hang? Assuming the web server is hung, what does pstack print when you run it against a hung process? - Ryan -- UNIX Administrator http://prefetch.net _______________________________________________ dtrace-discuss mailing list dtrace-discuss at opensolaris.org _______________________________________________ dtrace-discuss mailing list dtrace-discuss at opensolaris.org _______________________________________________ dtrace-discuss mailing list dtrace-discuss at opensolaris.org