Manfred Mücke
2009-Aug-07 15:38 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
Hello,
I have a fortran program and want to monitor entry/return events of a specific
function. No big thing (I thought). The function is called DCNTRL in the source
code. Checking the binary gives:
> nm myprog | grep FUNC | grep -i DCNTRL
[15968] | 6564704| 30193|FUNC |GLOB |0 |12 |dcntrl_
OK, its internal name is dcntrl_. Let''s instrument the respective pid
entry/return probes:
> dtrace -n ''pid$target::dcntrl_:entry'' -n
''pid$target::dcntrl_:return'' -c myprog
dtrace: invalid probe specifier pid$target::dcntrl_:return: probe description
pid17186::dcntrl_:return does not match any probes
Huh, DTrace does not find the return probe!? But I''m sure the function
exist, so let''s put some pressure on DTrace (-Z):
> dtrace -Z -n ''pid$target::dcntrl_:entry'' -n
''pid$target::dcntrl_:return'' -c myprog
dtrace: description ''pid$target::dcntrl*:entry'' matched 1
probe
dtrace: description ''pid$target::dcntrl*:return'' matched 0
probes
First question: How comes DTrace can instrument the entry probe, but not the
return probe of that function?
Now let''s count events a the instrumented probes:
> dtrace -Z -n ''pid$target::dcntrl_:entry
{@f[probefunc,probename]=count()}'' -n
''pid$target::dcntrl_:return {@f[probefunc,probename]=count()}''
-c myprog
dcntrl_ entry
1
Hm, seems like that function was only called once. Fact is, it was called a
couple of times (verified), but the probe seems only to fire the first time.
I conclude that I''m unable to both instrument and observe the function.
Does anyone have an idea, how I could proceed?
Thanks for your help, Manfred
PS: I explicitly switched inlining off for that function. (-inline=no%dcntrl).
Just for reference, I attach the first and last lines of the function:
> mdb myprog
> dcntrl_::dis
dcntrl_: pushq %rbp
dcntrl_+1: movq %rsp,%rbp
dcntrl_+4: pushq %rbx
dcntrl_+5: pushq %r12
dcntrl_+7: pushq %r13
dcntrl_+9: pushq %r14
dcntrl_+0xb: pushq %r15
[..]
dcntrl_+0x75e6: popq %r15
dcntrl_+0x75e8: popq %r14
dcntrl_+0x75ea: popq %r13
dcntrl_+0x75ec: popq %r12
dcntrl_+0x75ee: popq %rbx
dcntrl_+0x75ef: leave
dcntrl_+0x75f0: ret
--
This message posted from opensolaris.org
Jonathan Adams
2009-Aug-07 17:22 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
On Fri, Aug 07, 2009 at 08:38:56AM -0700, Manfred M?cke wrote:> Hello, > > I have a fortran program and want to monitor entry/return events of a specific function. No big thing (I thought). The function is called DCNTRL in the source code. Checking the binary gives: > > > nm myprog | grep FUNC | grep -i DCNTRL > [15968] | 6564704| 30193|FUNC |GLOB |0 |12 |dcntrl_ > > OK, its internal name is dcntrl_. Let''s instrument the respective pid entry/return probes: > > > dtrace -n ''pid$target::dcntrl_:entry'' -n ''pid$target::dcntrl_:return'' -c myprog > dtrace: invalid probe specifier pid$target::dcntrl_:return: probe description pid17186::dcntrl_:return does not match any probes > > Huh, DTrace does not find the return probe!? But I''m sure the function exist, so let''s put some pressure on DTrace (-Z): > > > dtrace -Z -n ''pid$target::dcntrl_:entry'' -n ''pid$target::dcntrl_:return'' -c myprog > dtrace: description ''pid$target::dcntrl*:entry'' matched 1 probe > dtrace: description ''pid$target::dcntrl*:return'' matched 0 probes > > First question: How comes DTrace can instrument the entry probe, but > not the return probe of that function?If dtrace thinks something fishy is going on, it can refuse to instrument the return sites. This is mainly due to issues with jump-tables. If you set DTRACE_DEBUG= in your environment, is there anything of interest? Look for "grabbed pid NNNN" as a starting place.> Now let''s count events a the instrumented probes: > > > dtrace -Z -n ''pid$target::dcntrl_:entry {@f[probefunc,probename]=count()}'' -n ''pid$target::dcntrl_:return {@f[probefunc,probename]=count()}'' -c myprog > > dcntrl_ entry 1 > > Hm, seems like that function was only called once. Fact is, it was > called a couple of times (verified), but the probe seems only to fire > the first time. > > I conclude that I''m unable to both instrument and observe the > function. Does anyone have an idea, how I could proceed? > > Thanks for your help, Manfred > > PS: I explicitly switched inlining off for that function. > (-inline=no%dcntrl). Just for reference, I attach the first and last > lines of the function: > > > mdb myprog > > dcntrl_::dis > dcntrl_: pushq %rbp > dcntrl_+1: movq %rsp,%rbp > dcntrl_+4: pushq %rbx > dcntrl_+5: pushq %r12 > dcntrl_+7: pushq %r13 > dcntrl_+9: pushq %r14 > dcntrl_+0xb: pushq %r15 > [..] > dcntrl_+0x75e6: popq %r15 > dcntrl_+0x75e8: popq %r14 > dcntrl_+0x75ea: popq %r13 > dcntrl_+0x75ec: popq %r12 > dcntrl_+0x75ee: popq %rbx > dcntrl_+0x75ef: leave > dcntrl_+0x75f0: retOne workaround might be to instrument pidNNNN::dcntrl:75f0 but that will change with recompilation. (And you''d need to get any other rets) Cheers, - jonathan
Manfred Mücke
2009-Aug-10 07:22 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
> If dtrace thinks something fishy is going on, it can refuse to instrument > the return sites. This is mainly due to issues with jump-tables. > If you set DTRACE_DEBUG= in your environment, is there anything of interest? > Look for "grabbed pid NNNN" as a starting place.Thanks for the hint, I was not aware of DTRACE_DEBUG. I found the following, which confirms your assumption, that it might be an issue with jump-tables: libdtrace DEBUG: creating probe pid17864:myprog:dcntrl_:return libdtrace DEBUG: error at 0x851 (assuming jump table) I searched the forum but haven''t found an explanation under what conditions dtrace assumes a jump-table. Are there any (Fortran) compiler settings which could help avoiding this issue? In what context should I read the given address 0x851? Neither the function dcntrl_ nor the calling function has an opcode starting at this address. Thanks, Manfred -- This message posted from opensolaris.org
Manfred Mücke
2009-Aug-10 08:29 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
> One workaround might be to instrument > pidNNNN::dcntrl:75f0 > but that will change with recompilation. (And you''d > need to get any other rets)I tried that, but got the same error from dtrace as when using a return probe. Just that when using pid offset probes, dtrace refuses to instrument both function entry and return!> dtrace -Z -n ''pid$target::dcntrl_:0 {@f[probefunc,probename]=count()}'' -n ''pid$target::dcntrl_:75f0 {@f[probefunc,probename]=count()}'' -c myproglibdtrace DEBUG: creating probe pid18000:myprog:dcntrl_:0 libdtrace DEBUG: error at 0x8b0 (assuming jump table) libdtrace DEBUG: creating probe pid18000:myprog:dcntrl_:75f0 libdtrace DEBUG: error at 0x8b0 (assuming jump table) Looks like this function is not too fond of being dtraced. \Manfred -- This message posted from opensolaris.org
Jonathan Adams
2009-Aug-10 16:40 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
On Mon, Aug 10, 2009 at 12:22:12AM -0700, Manfred M?cke wrote:> > If dtrace thinks something fishy is going on, it can refuse to instrument > > the return sites. This is mainly due to issues with jump-tables. > > If you set DTRACE_DEBUG= in your environment, is there anything of interest? > > Look for "grabbed pid NNNN" as a starting place. > > Thanks for the hint, I was not aware of DTRACE_DEBUG. I found the following, which confirms your assumption, that it might be an issue with jump-tables: > > libdtrace DEBUG: creating probe pid17864:myprog:dcntrl_:return > libdtrace DEBUG: error at 0x851 (assuming jump table) > > I searched the forum but haven''t found an explanation under what > conditions dtrace assumes a jump-table. Are there any (Fortran) > compiler settings which could help avoiding this issue?That error is from usr/src/lib/libdtrace/i386/dt_isadep.c: 78 dt_pid_has_jump_table(struct ps_prochandle *P, dtrace_hdl_t *dtp, 79 uint8_t *text, fasttrap_probe_spec_t *ftp, const GElf_Sym *symp) 80 { ... 91 for (i = 0; i < ftp->ftps_size; i += size) { 92 size = dt_instr_size(&text[i], dtp, pid, symp->st_value + i, 93 dmodel); 95 /* 96 * Assume the worst if we hit an illegal instruction. 97 */ 98 if (size <= 0) { 99 dt_dprintf("error at %#lx (assuming jump table)\n", i); 100 return (1); 101 } ... 115 } 116 117 return (0); 118 } This means that there is some instruction in the stream that dtrace''s disassembler is either getting the size wrong on (causing us to get out of alignment, causing us at a later point to fail on a bad instruction), or we''re hitting an instruction we don''t understand. Could you send me the output of: "echo dcntrl_::dis | mdb ./myprog" which uses the same disassembler. Cheers, - jonathan> In what context should I read the given address 0x851? Neither the > function dcntrl_ nor the calling function has an opcode starting at > this address.It''s an offset
Manfred Mücke
2009-Dec-09 09:00 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
Hi,
finally I found the time to look again at the issue described above.
I was told (thanks Diane) that f95 generates jump tables when hitting SELECT
CASE statements. Checking my code shows that there is no SELECT CASE statement
anywhere near the function in question. Ruling out SELECT CASE statements leaves
to possibilities: either there is some other Fortran construct resulting in jump
tables, or the issue we face is not really about jump tables.
Checking
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libdtrace/i386/dt_isadep.c
it becomes clear, that the message "assuming jump table" is
potentially misleading. It really covers everything resulting in an illegal
opcode:
93 /*
94 * Assume the worst if we hit an illegal instruction.
95 */
96 if (size <= 0) {
97 dt_dprintf("error at %#lx (assuming jump
table)\n", i);
98 return (1);
99 }
dt_instr_size returns -1 in case the instruction can not be disassembled or if a
FASTTRAP is found.
476 if (dtrace_disx86(&x86dis, cpu_mode) != 0)
477 return (-1);
478
479 /*
480 * If the instruction was a single-byte breakpoint, there may be
481 * another debugger attached to this process. The original
instruction
482 * can''t be recovered so this must fail.
483 */
484 if (x86dis.d86_len == 1 && instr[0] == FASTTRAP_INSTR)
485 return (-1);
Disassembling the binary using mdb reveals that indeed an illegal instruction is
the source of the problem:
> dcntrl_+875::dis
dcntrl_+0x85b: movl %edx,%eax
dcntrl_+0x85d: subl %ecx,%eax
dcntrl_+0x85f: incl %eax
dcntrl_+0x861: cmpl %edx,%ecx
dcntrl_+0x863: jle +0x4 <dcntrl_+0x867>
dcntrl_+0x865: xorl %eax,%eax
dcntrl_+0x867: cmpl $0x8,%eax
dcntrl_+0x86a: jl +0x50 <dcntrl_+0x8ba>
dcntrl_+0x86c: movl -0x9b8(%rbp),%eax
dcntrl_+0x872: addl $-0x7,%eax
dcntrl_+0x875: ***ERROR--unknown op code***
dcntrl_+0x877: movw (%rax),%es
dcntrl_+0x879: addl %eax,(%rax)
dcntrl_+0x87b: addb %ah,0xf(%rsi)
dcntrl_+0x87e: pushq %rdi
dcntrl_+0x87f: shlb $0x13,0xf(%rsi)
dcntrl_+0x883: ***ERROR--unknown op code***
dcntrl_+0x884: movlpd %xmm0,0x8(%rsi)
dcntrl_+0x889: movlpd %xmm0,0x10(%rsi)
dcntrl_+0x88e: movlpd %xmm0,0x18(%rsi)
dcntrl_+0x893: movlpd %xmm0,0x20(%rsi)
Can anyone comment on this situation? This is a binary generated by f95 from f77
source code. It seems odd that f95 should generate illegal instructions. Maybe
this is a Fortran/x86-specific construct unknown to mdb?
For the ones able to read it, here is the binary dump:
> dcntrl_+86c,100::dump -qfg1
0 1 2 3 4 5 6 7 8 9 a b \/ d e f
4477e0: c0 3b ca 7e 02 33 c0 83 f8 08 7c 4e 8b 85 48 f6
4477f0: ff ff 83 c0 f9 0f 0d 8e 00 01 00 00 66 0f 57 c0
447800: 66 0f 13 06 66 0f 13 46 08 66 0f 13 46 10 66 0f
447810: 13 46 18 66 0f 13 46 20 66 0f 13 46 28 66 0f 13
447820: 46 30 66 0f 13 46 38 48 83 c6 40 83 c1 08 3b c8
447830: 7e c3 3b 8d 48 f6 ff ff 7f 24 66 0f 57 c0 66 90
447840: 66 0f 13 06 48 83 c6 08 ff c1 3b 8d 48 f6 ff ff
447850: 7e ee eb 0a c7 05 96 91 5b 00 02 00 00 00 f2 0f
447860: 2c 05 da 91 5b 00 89 05 58 ba 62 05 83 3d 1d be
447870: 62 05 00 75 19 48 8d 3d 48 ba 62 05 48 33 c0 e8
447880: fc c9 41 00 c7 05 36 ba 62 05 01 00 00 00 83 3d
447890: 57 b8 9c 05 02 7d 15 66 0f 12 05 91 91 5b 00 66
4478a0: 0f 13 85 e8 f6 ff ff e9 94 00 00 00 48 63 05 35
4478b0: b8 9c 05 48 89 85 b0 f9 ff ff 48 c7 85 d0 f9 ff
4478c0: ff 80 1f 93 00 48 8d 85 50 f8 ff ff 48 89 85 d8
4478d0: f9 ff ff 48 c7 85 c0 f9 ff ff 30 7a 9d 00 c7 85
4478e0: b8 f9 ff ff 00 00 00 00 48 8d bd b0 f9 ff ff 48
Thanks, Manfred
--
This message posted from opensolaris.org
Manfred Mücke
2009-Dec-11 16:23 UTC
[dtrace-discuss] Problem tracing Fortran function with pid provider
Hi,
thanks to a very helpful compiler geek at Sun, I made some progress :-) .
The dtrace error message ("assuming jump table") is misleading and the
issue has actually nothing to do with jump tables. When instrumenting user
functions, dtrace steps through all instructions of that very function in order
to verify that there are no jump tables. Doing so, it uses dt_instr_size to
check each instruction''s size. In case dt_instr_size returns -1, dtrace
refuses to instrument the function (issueing the error message "assuming
jump table"). The two reasons triggering a return value of -1 are:
1) the instruction is not in the instruction set (checked via dtrace_disx86)
or,
2) the instruction is a FASTTRAP_INSTR (meaning that another debugger has
already instrumented the function).
If there is no other debugger active, the error therefore really means
"encountered illegal instruction in function". Digging deeper, I
checked the reported code location with mdb:
>> dcntrl_+857::dis
> dcntrl_+0x85d: addl $-0x7,%eax
> dcntrl_+0x860: ***ERROR--unknown op code***
> dcntrl_+0x862: movw (%rax),%es
> dcntrl_+0x864: addl %eax,(%rax)
> dcntrl_+0x866: addb %ah,0xf(%rsi)
> dcntrl_+0x869: pushq %rdi
> dcntrl_+0x86a: shlb $0x13,0xf(%rsi)
> dcntrl_+0x86e: ***ERROR--unknown op code***
> dcntrl_+0x86f: movlpd %xmm0,0x8(%rsi)
> dcntrl_+0x874: movlpd %xmm0,0x10(%rsi)
There are unknown op-codes reported at exactly the locations dtrace complains
about. Discussing this, it became clear that the probability of illegal op-codes
is really very low. So I cross checked with dbx, which surprisingly showed the
following:
> (dbx) dis dcntrl+0x85d
> 0x0000000000447bb5: dcntrl+0x085d: addl $0xfffffffffffffff9,%eax
> 0x0000000000447bb8: dcntrl+0x0860: prefetchw 0x0000000000000100(%rsi)
> 0x0000000000447bbf: dcntrl+0x0867: xorpd %xmm0,%xmm0
> 0x0000000000447bc3: dcntrl+0x086b: movlpd %xmm0,(%rsi)
> 0x0000000000447bc7: dcntrl+0x086f: movlpd
%xmm0,0x0000000000000008(%rsi)
> 0x0000000000447bcc: dcntrl+0x0874: movlpd
%xmm0,0x0000000000000010(%rsi)
Observe that at the location where mdb reported an illegal opcode, dbx shows a
prefetchw opcode. Obviously mdb is not aware of this operation. I believe the
fact that two debuggers, using different disassembly functionality on the very
same system is an interesting discussion in itself.
I deduce from the above observation that dtrace somehow follows the same wisdom
as mdb (instead of dbx) in disassembling user functions.
There is a possible workaround (which is however not satisfying in the long run:
If we instruct the compiler to generate less architecture-specific code
(-xtarget=generic) chances increase that the binary contains no instructions
offensive to dtrace.
So I recompiled my application with f95 -xtarget=generic -m64 and veryfied that
mdb does not report any illegal opcode in the function of interest. Indeed, the
illegal op-codes disappeared.
However, instrumenting the application with dtrace still does not yield the
desired result. Checking with DTRACE_DEBUG:
libdtrace DEBUG: creating probe pid10266:charmm:dcntrl_:return
libdtrace DEBUG: error at 0 (assuming jump table)
While the error location has changed, the error itself remains. Even more,
checking the location with mdb does not reveal an illegal opcode:
> dcntrl_::dis
dcntrl_: pushq %rbp
dcntrl_+1: movq %rsp,%rbp
dcntrl_+4: pushq %rbx
dcntrl_+5: pushq %r12
Summary: some mystery resolved, but instrumentation still defunct.
Any help is appreciated!
Thanks, Manfred
PS: In case someone wants to know
SunOS 5.10 Generic_141445-09 i86pc i386 i86pc
version of "/opt/sunstudio12.1/bin/cc": Sun C 5.10 SunOS_i386 Patch
142363-02 2009/09/21
version of "/opt/sunstudio12.1/bin/f90": Sun Fortran 95 8.4 SunOS_i386
Patch 141852-01 2009/07/24
version of "/opt/sunstudio12.1/bin/dbx": Sun DBX Debugger 7.7
SunOS_i386 2009/06/03
--
This message posted from opensolaris.org