Manfred Mücke
2009-Aug-07 15:38 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
Hello, I have a fortran program and want to monitor entry/return events of a specific function. No big thing (I thought). The function is called DCNTRL in the source code. Checking the binary gives: > nm myprog | grep FUNC | grep -i DCNTRL [15968] | 6564704| 30193|FUNC |GLOB |0 |12 |dcntrl_ OK, its internal name is dcntrl_. Let''s instrument the respective pid entry/return probes: > dtrace -n ''pid$target::dcntrl_:entry'' -n ''pid$target::dcntrl_:return'' -c myprog dtrace: invalid probe specifier pid$target::dcntrl_:return: probe description pid17186::dcntrl_:return does not match any probes Huh, DTrace does not find the return probe!? But I''m sure the function exist, so let''s put some pressure on DTrace (-Z): > dtrace -Z -n ''pid$target::dcntrl_:entry'' -n ''pid$target::dcntrl_:return'' -c myprog dtrace: description ''pid$target::dcntrl*:entry'' matched 1 probe dtrace: description ''pid$target::dcntrl*:return'' matched 0 probes First question: How comes DTrace can instrument the entry probe, but not the return probe of that function? Now let''s count events a the instrumented probes: > dtrace -Z -n ''pid$target::dcntrl_:entry {@f[probefunc,probename]=count()}'' -n ''pid$target::dcntrl_:return {@f[probefunc,probename]=count()}'' -c myprog dcntrl_ entry 1 Hm, seems like that function was only called once. Fact is, it was called a couple of times (verified), but the probe seems only to fire the first time. I conclude that I''m unable to both instrument and observe the function. Does anyone have an idea, how I could proceed? Thanks for your help, Manfred PS: I explicitly switched inlining off for that function. (-inline=no%dcntrl). Just for reference, I attach the first and last lines of the function:> mdb myprog > dcntrl_::disdcntrl_: pushq %rbp dcntrl_+1: movq %rsp,%rbp dcntrl_+4: pushq %rbx dcntrl_+5: pushq %r12 dcntrl_+7: pushq %r13 dcntrl_+9: pushq %r14 dcntrl_+0xb: pushq %r15 [..] dcntrl_+0x75e6: popq %r15 dcntrl_+0x75e8: popq %r14 dcntrl_+0x75ea: popq %r13 dcntrl_+0x75ec: popq %r12 dcntrl_+0x75ee: popq %rbx dcntrl_+0x75ef: leave dcntrl_+0x75f0: ret -- This message posted from opensolaris.org
Jonathan Adams
2009-Aug-07 17:22 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
On Fri, Aug 07, 2009 at 08:38:56AM -0700, Manfred M?cke wrote:> Hello, > > I have a fortran program and want to monitor entry/return events of a specific function. No big thing (I thought). The function is called DCNTRL in the source code. Checking the binary gives: > > > nm myprog | grep FUNC | grep -i DCNTRL > [15968] | 6564704| 30193|FUNC |GLOB |0 |12 |dcntrl_ > > OK, its internal name is dcntrl_. Let''s instrument the respective pid entry/return probes: > > > dtrace -n ''pid$target::dcntrl_:entry'' -n ''pid$target::dcntrl_:return'' -c myprog > dtrace: invalid probe specifier pid$target::dcntrl_:return: probe description pid17186::dcntrl_:return does not match any probes > > Huh, DTrace does not find the return probe!? But I''m sure the function exist, so let''s put some pressure on DTrace (-Z): > > > dtrace -Z -n ''pid$target::dcntrl_:entry'' -n ''pid$target::dcntrl_:return'' -c myprog > dtrace: description ''pid$target::dcntrl*:entry'' matched 1 probe > dtrace: description ''pid$target::dcntrl*:return'' matched 0 probes > > First question: How comes DTrace can instrument the entry probe, but > not the return probe of that function?If dtrace thinks something fishy is going on, it can refuse to instrument the return sites. This is mainly due to issues with jump-tables. If you set DTRACE_DEBUG= in your environment, is there anything of interest? Look for "grabbed pid NNNN" as a starting place.> Now let''s count events a the instrumented probes: > > > dtrace -Z -n ''pid$target::dcntrl_:entry {@f[probefunc,probename]=count()}'' -n ''pid$target::dcntrl_:return {@f[probefunc,probename]=count()}'' -c myprog > > dcntrl_ entry 1 > > Hm, seems like that function was only called once. Fact is, it was > called a couple of times (verified), but the probe seems only to fire > the first time. > > I conclude that I''m unable to both instrument and observe the > function. Does anyone have an idea, how I could proceed? > > Thanks for your help, Manfred > > PS: I explicitly switched inlining off for that function. > (-inline=no%dcntrl). Just for reference, I attach the first and last > lines of the function: > > > mdb myprog > > dcntrl_::dis > dcntrl_: pushq %rbp > dcntrl_+1: movq %rsp,%rbp > dcntrl_+4: pushq %rbx > dcntrl_+5: pushq %r12 > dcntrl_+7: pushq %r13 > dcntrl_+9: pushq %r14 > dcntrl_+0xb: pushq %r15 > [..] > dcntrl_+0x75e6: popq %r15 > dcntrl_+0x75e8: popq %r14 > dcntrl_+0x75ea: popq %r13 > dcntrl_+0x75ec: popq %r12 > dcntrl_+0x75ee: popq %rbx > dcntrl_+0x75ef: leave > dcntrl_+0x75f0: retOne workaround might be to instrument pidNNNN::dcntrl:75f0 but that will change with recompilation. (And you''d need to get any other rets) Cheers, - jonathan
Manfred Mücke
2009-Aug-10 07:22 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
> If dtrace thinks something fishy is going on, it can refuse to instrument > the return sites. This is mainly due to issues with jump-tables. > If you set DTRACE_DEBUG= in your environment, is there anything of interest? > Look for "grabbed pid NNNN" as a starting place.Thanks for the hint, I was not aware of DTRACE_DEBUG. I found the following, which confirms your assumption, that it might be an issue with jump-tables: libdtrace DEBUG: creating probe pid17864:myprog:dcntrl_:return libdtrace DEBUG: error at 0x851 (assuming jump table) I searched the forum but haven''t found an explanation under what conditions dtrace assumes a jump-table. Are there any (Fortran) compiler settings which could help avoiding this issue? In what context should I read the given address 0x851? Neither the function dcntrl_ nor the calling function has an opcode starting at this address. Thanks, Manfred -- This message posted from opensolaris.org
Manfred Mücke
2009-Aug-10 08:29 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
> One workaround might be to instrument > pidNNNN::dcntrl:75f0 > but that will change with recompilation. (And you''d > need to get any other rets)I tried that, but got the same error from dtrace as when using a return probe. Just that when using pid offset probes, dtrace refuses to instrument both function entry and return!> dtrace -Z -n ''pid$target::dcntrl_:0 {@f[probefunc,probename]=count()}'' -n ''pid$target::dcntrl_:75f0 {@f[probefunc,probename]=count()}'' -c myproglibdtrace DEBUG: creating probe pid18000:myprog:dcntrl_:0 libdtrace DEBUG: error at 0x8b0 (assuming jump table) libdtrace DEBUG: creating probe pid18000:myprog:dcntrl_:75f0 libdtrace DEBUG: error at 0x8b0 (assuming jump table) Looks like this function is not too fond of being dtraced. \Manfred -- This message posted from opensolaris.org
Jonathan Adams
2009-Aug-10 16:40 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
On Mon, Aug 10, 2009 at 12:22:12AM -0700, Manfred M?cke wrote:> > If dtrace thinks something fishy is going on, it can refuse to instrument > > the return sites. This is mainly due to issues with jump-tables. > > If you set DTRACE_DEBUG= in your environment, is there anything of interest? > > Look for "grabbed pid NNNN" as a starting place. > > Thanks for the hint, I was not aware of DTRACE_DEBUG. I found the following, which confirms your assumption, that it might be an issue with jump-tables: > > libdtrace DEBUG: creating probe pid17864:myprog:dcntrl_:return > libdtrace DEBUG: error at 0x851 (assuming jump table) > > I searched the forum but haven''t found an explanation under what > conditions dtrace assumes a jump-table. Are there any (Fortran) > compiler settings which could help avoiding this issue?That error is from usr/src/lib/libdtrace/i386/dt_isadep.c: 78 dt_pid_has_jump_table(struct ps_prochandle *P, dtrace_hdl_t *dtp, 79 uint8_t *text, fasttrap_probe_spec_t *ftp, const GElf_Sym *symp) 80 { ... 91 for (i = 0; i < ftp->ftps_size; i += size) { 92 size = dt_instr_size(&text[i], dtp, pid, symp->st_value + i, 93 dmodel); 95 /* 96 * Assume the worst if we hit an illegal instruction. 97 */ 98 if (size <= 0) { 99 dt_dprintf("error at %#lx (assuming jump table)\n", i); 100 return (1); 101 } ... 115 } 116 117 return (0); 118 } This means that there is some instruction in the stream that dtrace''s disassembler is either getting the size wrong on (causing us to get out of alignment, causing us at a later point to fail on a bad instruction), or we''re hitting an instruction we don''t understand. Could you send me the output of: "echo dcntrl_::dis | mdb ./myprog" which uses the same disassembler. Cheers, - jonathan> In what context should I read the given address 0x851? Neither the > function dcntrl_ nor the calling function has an opcode starting at > this address.It''s an offset
Manfred Mücke
2009-Dec-09 09:00 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
Hi, finally I found the time to look again at the issue described above. I was told (thanks Diane) that f95 generates jump tables when hitting SELECT CASE statements. Checking my code shows that there is no SELECT CASE statement anywhere near the function in question. Ruling out SELECT CASE statements leaves to possibilities: either there is some other Fortran construct resulting in jump tables, or the issue we face is not really about jump tables. Checking http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libdtrace/i386/dt_isadep.c it becomes clear, that the message "assuming jump table" is potentially misleading. It really covers everything resulting in an illegal opcode: 93 /* 94 * Assume the worst if we hit an illegal instruction. 95 */ 96 if (size <= 0) { 97 dt_dprintf("error at %#lx (assuming jump table)\n", i); 98 return (1); 99 } dt_instr_size returns -1 in case the instruction can not be disassembled or if a FASTTRAP is found. 476 if (dtrace_disx86(&x86dis, cpu_mode) != 0) 477 return (-1); 478 479 /* 480 * If the instruction was a single-byte breakpoint, there may be 481 * another debugger attached to this process. The original instruction 482 * can''t be recovered so this must fail. 483 */ 484 if (x86dis.d86_len == 1 && instr[0] == FASTTRAP_INSTR) 485 return (-1); Disassembling the binary using mdb reveals that indeed an illegal instruction is the source of the problem:> dcntrl_+875::disdcntrl_+0x85b: movl %edx,%eax dcntrl_+0x85d: subl %ecx,%eax dcntrl_+0x85f: incl %eax dcntrl_+0x861: cmpl %edx,%ecx dcntrl_+0x863: jle +0x4 <dcntrl_+0x867> dcntrl_+0x865: xorl %eax,%eax dcntrl_+0x867: cmpl $0x8,%eax dcntrl_+0x86a: jl +0x50 <dcntrl_+0x8ba> dcntrl_+0x86c: movl -0x9b8(%rbp),%eax dcntrl_+0x872: addl $-0x7,%eax dcntrl_+0x875: ***ERROR--unknown op code*** dcntrl_+0x877: movw (%rax),%es dcntrl_+0x879: addl %eax,(%rax) dcntrl_+0x87b: addb %ah,0xf(%rsi) dcntrl_+0x87e: pushq %rdi dcntrl_+0x87f: shlb $0x13,0xf(%rsi) dcntrl_+0x883: ***ERROR--unknown op code*** dcntrl_+0x884: movlpd %xmm0,0x8(%rsi) dcntrl_+0x889: movlpd %xmm0,0x10(%rsi) dcntrl_+0x88e: movlpd %xmm0,0x18(%rsi) dcntrl_+0x893: movlpd %xmm0,0x20(%rsi) Can anyone comment on this situation? This is a binary generated by f95 from f77 source code. It seems odd that f95 should generate illegal instructions. Maybe this is a Fortran/x86-specific construct unknown to mdb? For the ones able to read it, here is the binary dump:> dcntrl_+86c,100::dump -qfg10 1 2 3 4 5 6 7 8 9 a b \/ d e f 4477e0: c0 3b ca 7e 02 33 c0 83 f8 08 7c 4e 8b 85 48 f6 4477f0: ff ff 83 c0 f9 0f 0d 8e 00 01 00 00 66 0f 57 c0 447800: 66 0f 13 06 66 0f 13 46 08 66 0f 13 46 10 66 0f 447810: 13 46 18 66 0f 13 46 20 66 0f 13 46 28 66 0f 13 447820: 46 30 66 0f 13 46 38 48 83 c6 40 83 c1 08 3b c8 447830: 7e c3 3b 8d 48 f6 ff ff 7f 24 66 0f 57 c0 66 90 447840: 66 0f 13 06 48 83 c6 08 ff c1 3b 8d 48 f6 ff ff 447850: 7e ee eb 0a c7 05 96 91 5b 00 02 00 00 00 f2 0f 447860: 2c 05 da 91 5b 00 89 05 58 ba 62 05 83 3d 1d be 447870: 62 05 00 75 19 48 8d 3d 48 ba 62 05 48 33 c0 e8 447880: fc c9 41 00 c7 05 36 ba 62 05 01 00 00 00 83 3d 447890: 57 b8 9c 05 02 7d 15 66 0f 12 05 91 91 5b 00 66 4478a0: 0f 13 85 e8 f6 ff ff e9 94 00 00 00 48 63 05 35 4478b0: b8 9c 05 48 89 85 b0 f9 ff ff 48 c7 85 d0 f9 ff 4478c0: ff 80 1f 93 00 48 8d 85 50 f8 ff ff 48 89 85 d8 4478d0: f9 ff ff 48 c7 85 c0 f9 ff ff 30 7a 9d 00 c7 85 4478e0: b8 f9 ff ff 00 00 00 00 48 8d bd b0 f9 ff ff 48 Thanks, Manfred -- This message posted from opensolaris.org
Manfred Mücke
2009-Dec-11 16:23 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
Hi, thanks to a very helpful compiler geek at Sun, I made some progress :-) . The dtrace error message ("assuming jump table") is misleading and the issue has actually nothing to do with jump tables. When instrumenting user functions, dtrace steps through all instructions of that very function in order to verify that there are no jump tables. Doing so, it uses dt_instr_size to check each instruction''s size. In case dt_instr_size returns -1, dtrace refuses to instrument the function (issueing the error message "assuming jump table"). The two reasons triggering a return value of -1 are: 1) the instruction is not in the instruction set (checked via dtrace_disx86) or, 2) the instruction is a FASTTRAP_INSTR (meaning that another debugger has already instrumented the function). If there is no other debugger active, the error therefore really means "encountered illegal instruction in function". Digging deeper, I checked the reported code location with mdb:>> dcntrl_+857::dis > dcntrl_+0x85d: addl $-0x7,%eax > dcntrl_+0x860: ***ERROR--unknown op code*** > dcntrl_+0x862: movw (%rax),%es > dcntrl_+0x864: addl %eax,(%rax) > dcntrl_+0x866: addb %ah,0xf(%rsi) > dcntrl_+0x869: pushq %rdi > dcntrl_+0x86a: shlb $0x13,0xf(%rsi) > dcntrl_+0x86e: ***ERROR--unknown op code*** > dcntrl_+0x86f: movlpd %xmm0,0x8(%rsi) > dcntrl_+0x874: movlpd %xmm0,0x10(%rsi)There are unknown op-codes reported at exactly the locations dtrace complains about. Discussing this, it became clear that the probability of illegal op-codes is really very low. So I cross checked with dbx, which surprisingly showed the following:> (dbx) dis dcntrl+0x85d > 0x0000000000447bb5: dcntrl+0x085d: addl $0xfffffffffffffff9,%eax > 0x0000000000447bb8: dcntrl+0x0860: prefetchw 0x0000000000000100(%rsi) > 0x0000000000447bbf: dcntrl+0x0867: xorpd %xmm0,%xmm0 > 0x0000000000447bc3: dcntrl+0x086b: movlpd %xmm0,(%rsi) > 0x0000000000447bc7: dcntrl+0x086f: movlpd %xmm0,0x0000000000000008(%rsi) > 0x0000000000447bcc: dcntrl+0x0874: movlpd %xmm0,0x0000000000000010(%rsi)Observe that at the location where mdb reported an illegal opcode, dbx shows a prefetchw opcode. Obviously mdb is not aware of this operation. I believe the fact that two debuggers, using different disassembly functionality on the very same system is an interesting discussion in itself. I deduce from the above observation that dtrace somehow follows the same wisdom as mdb (instead of dbx) in disassembling user functions. There is a possible workaround (which is however not satisfying in the long run: If we instruct the compiler to generate less architecture-specific code (-xtarget=generic) chances increase that the binary contains no instructions offensive to dtrace. So I recompiled my application with f95 -xtarget=generic -m64 and veryfied that mdb does not report any illegal opcode in the function of interest. Indeed, the illegal op-codes disappeared. However, instrumenting the application with dtrace still does not yield the desired result. Checking with DTRACE_DEBUG: libdtrace DEBUG: creating probe pid10266:charmm:dcntrl_:return libdtrace DEBUG: error at 0 (assuming jump table) While the error location has changed, the error itself remains. Even more, checking the location with mdb does not reveal an illegal opcode: > dcntrl_::dis dcntrl_: pushq %rbp dcntrl_+1: movq %rsp,%rbp dcntrl_+4: pushq %rbx dcntrl_+5: pushq %r12 Summary: some mystery resolved, but instrumentation still defunct. Any help is appreciated! Thanks, Manfred PS: In case someone wants to know SunOS 5.10 Generic_141445-09 i86pc i386 i86pc version of "/opt/sunstudio12.1/bin/cc": Sun C 5.10 SunOS_i386 Patch 142363-02 2009/09/21 version of "/opt/sunstudio12.1/bin/f90": Sun Fortran 95 8.4 SunOS_i386 Patch 141852-01 2009/07/24 version of "/opt/sunstudio12.1/bin/dbx": Sun DBX Debugger 7.7 SunOS_i386 2009/06/03 -- This message posted from opensolaris.org