Cyril Plisko
2008-Sep-21 13:29 UTC
[dtrace-discuss] Weird dtrace USDT problem with shared objects
Hello ! I am helping some guys with migration to DTrace in their (quite big) application. during the migration they hit very strange problem. Here is the background: 1. The application is 64 bit (amd64). 2. The application manages storage and uses config_admin(3CFGADM) to do it dynamically. 3. The application is heavily instrumented with USDT probes (thousands of them). The issue we observe is as that when the application is running under dtrace as soon as one of the APIs from config_admin(3CFGADM) is called the application hangs. top shows everything is idle, but dtrace, which consumes around 24% of CPU (entirely in the kernel, according to top). At this point it can last for hours without making any visible progress. We''ve tried different things to pinpoint the problem. For example, recompiling everything to 32bit doesn''t exhibit this issue at al. Also if we ran the the application with only small number of probes enabled (couple of dozens) it manages to get over config_admin call, albeit _very_ slowly. When stuck, as soon as the dtrace process is terminated, the application resumes and continues as expected. Judging by other means we feel that the problem manifests itself when config_admin loads /usr/lib/64/libsun_fc.so.1 object. This object comes as the final link of the following chain: /usr/lib/amd64/libcfgadm.so.1 /usr/lib/cfgadm/amd64/fp.so.1 /usr/lib/64/libHBAAPI.so.1 /usr/lib/64/libsun_fc.so.1 A number of times we''ve managed to crash the application when terminated the dtrace with Ctrl-C. When this happened the stack always showed the rtld_db_dlactivity at the top of the stack (see below). I am scratching my head trying to figure out where the problem is, or what is the possible workaround can be. Is there any way to diagnose the problem further ? I''d appreciate any help. cc5r1:/var/core# mdb core.1746 Loading modules: [ libumem.so.1 libc.so.1 libuutil.so.1 libavl.so.1 libnvpair.so.1 libsysevent.so.1 ld.so.1 ]> $Cfffffd7fd11786b0 ld.so.1`rtld_db_dlactivity() fffffd7fd11786f0 ld.so.1`leave+0x3d() fffffd7fd1178760 ld.so.1`call_init+0x126() fffffd7fd1178790 ld.so.1`load_completion+0x10e() fffffd7fd1178810 ld.so.1`dlmopen_intn+0x223() fffffd7fd1178830 ld.so.1`dlmopen_check+0xa6() fffffd7fd1178860 ld.so.1`dlopen+0x43() fffffd7fd1178ba0 libHBAAPI.so.1`HBA_LoadLibrary+0x287() fffffd7fd1179840 fp.so.1`findMatchingAdapterPort+0x3a() fffffd7fd1179b10 fp.so.1`fca_change_state+0x38() fffffd7fd1179e50 fp.so.1`cfga_change_state+0x334() fffffd7fd117ab40 libcfgadm.so.1`config_change_state+0x11e() fffffd7fd117abd0 iopd.so`__1cLConfigAdminLchangeState6MnKcfga_cmd_t_ipkpcbbI_nKcfga_err_t__+0x177() fffffd7fd117ac50 iopd.so`__1cLConfigAdminLchangeState6MnKcfga_cmd_t_pnPFixLenStrVector4UbE__bbI_nKcfga_err_t__+0xd4() fffffd7fd117ac80 iopd.so`__1cLConfigAdminRunconfigureStatic6Mb_nKcfga_err_t__+0x3f() fffffd7fd117acd0 iopd.so`__1cLConfigAdminRunconfigureStatic6Mpkc_nKcfga_err_t__+0x7f() fffffd7fd117ad50 iopd.so`__1cLConfigAdminFretry6Mm0A_Mpkc_nKcfga_err_t_iII2_b_+0x96() fffffd7fd117b150 iopd.so`__1cPiop_smon_serverIiop_init6MpnPsmon_iop_init_s__nNcllf_status_e__+0x13d() fffffd7fd117bcf0 iopd.so`__1cPiop_smon_serverGserver6MpcLpnJdoor_desc_I_v_+0x61d() fffffd7fd117fd70 libcommon.so.1`server_proc+0xc3() 0000000000000000 libc.so.1`__door_return+0x4d() -- Regards, Cyril
Chad Mynhier
2008-Sep-21 16:52 UTC
[dtrace-discuss] Weird dtrace USDT problem with shared objects
On Sun, Sep 21, 2008 at 9:29 AM, Cyril Plisko <cyril.plisko at mountall.com> wrote:> Hello ! > > I am helping some guys with migration to DTrace in their (quite big) > application. during the migration they hit very strange problem.Cyril, Something you could try is running dtrace with the environment variable DTRACE_DEBUG set to 1. This will give you copious amounts of debugging data. (If you''ve never looked at this debugging data, try looking at it for something very simple, like "dtrace -n ''pid$target::main:entry'' -c ls" to get a feel for what the output looks like.) The output you see when the application hangs is likely to be a good starting point for debugging this.> > > A number of times we''ve managed to crash the application when > terminated the dtrace with Ctrl-C. When this happened the stack always > showed the rtld_db_dlactivity at the top of the stack (see below). >DTrace currently uses breakpoints in a few places, rtld_db_activity being one of them. What''s likely happening is that your dtrace invocation isn''t cleaning up after itself when you kill it. The application hits the breakpoint and crashes. See http://bugs.opensolaris.org/view_bug.do?bug_id=6593259 and related bugs. (I''m currently working on this bug.) Chad
Cyril Plisko
2008-Sep-22 11:48 UTC
[dtrace-discuss] Weird dtrace USDT problem with shared objects
On Sun, Sep 21, 2008 at 7:52 PM, Chad Mynhier <cmynhier at gmail.com> wrote:> On Sun, Sep 21, 2008 at 9:29 AM, Cyril Plisko <cyril.plisko at mountall.com> wrote: >> Hello ! >> >> I am helping some guys with migration to DTrace in their (quite big) >> application. during the migration they hit very strange problem. > > Cyril, > > Something you could try is running dtrace with the environment > variable DTRACE_DEBUG set to 1. This will give you copious amounts of > debugging data. (If you''ve never looked at this debugging data, try > looking at it for something very simple, like "dtrace -n > ''pid$target::main:entry'' -c ls" to get a feel for what the output > looks like.) The output you see when the application hangs is likely > to be a good starting point for debugging this. >Chad, many thanks for the hint. I''ve tried DTRACE_DEBUG and when the application hangs it (dtrace) runs through an endless of the next messages. libdtrace DEBUG: lookup of ___SUNW_dof succeeded for libc.so.1 libdtrace DEBUG: lookup of __SUNW_dof succeeded for iopd.so libdtrace DEBUG: lookup of __SUNW_dof succeeded for libemc.so.1 libdtrace DEBUG: lookup of __SUNW_dof succeeded for libthrpool.so.1 libdtrace DEBUG: lookup of __SUNW_dof succeeded for librdm.so.1 libdtrace DEBUG: lookup of __SUNW_dof succeeded for libtcpcomm.so.1 libdtrace DEBUG: lookup of __SUNW_dof succeeded for libcore_if2.so.1 libdtrace DEBUG: lookup of __SUNW_dof succeeded for lsi_inf.so libdtrace DEBUG: lookup of __SUNW_dof succeeded for san_tap.so libdtrace DEBUG: lookup of __SUNW_dof succeeded for booter libdtrace DEBUG: lookup of __SUNW_dof succeeded for libcommon.so.1 libdtrace DEBUG: lookup of __SUNW_dof succeeded for libCrun.so.1 All the shared objects (except libc.so.1 and libCrun.so.1 of course) here are internal to application in hand. I admit I have little idea what does that mean, or what should I do next to understand the problem. Any guidance will be heartly appreciated.>> >> >> A number of times we''ve managed to crash the application when >> terminated the dtrace with Ctrl-C. When this happened the stack always >> showed the rtld_db_dlactivity at the top of the stack (see below). >> > > DTrace currently uses breakpoints in a few places, rtld_db_activity > being one of them. What''s likely happening is that your dtrace > invocation isn''t cleaning up after itself when you kill it. The > application hits the breakpoint and crashes. See > http://bugs.opensolaris.org/view_bug.do?bug_id=6593259 and related > bugs. (I''m currently working on this bug.) > > Chad >-- Regards, Cyril
Chad Mynhier
2008-Sep-22 20:54 UTC
[dtrace-discuss] Weird dtrace USDT problem with shared objects
On Mon, Sep 22, 2008 at 7:48 AM, Cyril Plisko <cyril.plisko at mountall.com> wrote:> On Sun, Sep 21, 2008 at 7:52 PM, Chad Mynhier <cmynhier at gmail.com> wrote: >> On Sun, Sep 21, 2008 at 9:29 AM, Cyril Plisko <cyril.plisko at mountall.com> wrote: >>> Hello ! >>> >>> I am helping some guys with migration to DTrace in their (quite big) >>> application. during the migration they hit very strange problem. >> >> Cyril, >> >> Something you could try is running dtrace with the environment >> variable DTRACE_DEBUG set to 1. > > Chad, > > many thanks for the hint. I''ve tried DTRACE_DEBUG and when the > application hangs it (dtrace) runs through an endless of the next > messages. > > > libdtrace DEBUG: lookup of ___SUNW_dof succeeded for libc.so.1 > libdtrace DEBUG: lookup of __SUNW_dof succeeded for iopd.so > libdtrace DEBUG: lookup of __SUNW_dof succeeded for libemc.so.1Cyril, Would you mind putting the debugging output somewhere I could look at it? I don''t know that I''d be able to figure out what''s going on, but I might be able to get some more information out of it. Chad