Hi, I''m working on a project where I would like to use DTrace to plug into printf* calls being made by a deamon and print out what those printfs would be printing were the daemon''s output stream not getting dumped into the bit bucket. I can connect to the daemon and grab the printf calls and print the format string, but how do I deal with the printf varargs? Not only do I not know how many args there are, I don''t know which ones are numbers (and can be used directly), and which ones are pointers/strings (and must be copyin''ed or copyinstr''ed). I don''t see any way in D to process the format string to find the answers. Any clever tricks for handling something like this? * The printf calls aren''t actually printf() calls. They are calls to an internal routine that functions similarly to printf(), with a caveat. Unless debugging is turned on (by setting an env var), the contents of the printf''s won''t be executed. It does a cheap test and returns if no debug output is requested. My real objective is to use DTrace to be able to dynamically turn on debugging by plugging into the printf calls themselves, so that restarting the daemon to get debug output is no longer required. This is an old and large source base, so I''m trying to avoid massive code overhauls to reach my goal. Thanks, Daniel
Hi Daniel, It sounds like what you want is is-enabled USDT probes: http://blogs.sun.com/ahl/entry/user_land_tracing_gets_better Specifically, I suggest you do something like this: ---8<--- dprintf.d ---8<--- provider debug { probe debug(char *); }; --->8--- dprintf.d --->8--- ---8<--- dprintf.h ---8<--- #define dprintf(...) \ { if (DEBUG_DEBUG_ENABLED()) _dprintf(__VA_ARGS__); } --->8--- dprintf.h --->8--- ---8<--- dprintf.c ---8<--- void _dprintf(const char *fmt, ...) { va_list ap; char buf[512], *bufp = buf; int n; va_start(ap, fmt); n = vsnprintf(buf, sizeof (buf), fmt, ap); va_end(ap); if (n >= sizeof (buf)) { bufp = alloca(n + 1); va_start(ap, fmt); n = vsprintf(bufp, fmt, ap); va_end(ap); } DEBUG_DEBUG(bufp); } --->8--- dprintf.c --->8--- Then from your code you can call dprintf() and incur basically no overhead. You can use a very simple D script to format the output: ---8<--- #!/usr/sbin/dtrace -s #pragma D option quiet debug*:::debug { printf("%s\n", copyinstr(arg0)); } --->8--- I''d actually like to see something like this ship as a Solaris library... Adam On Mon, Jul 23, 2007 at 12:37:57PM -0700, Daniel Templeton wrote:> Hi, > > I''m working on a project where I would like to use DTrace to plug into > printf* calls being made by a deamon and print out what those printfs > would be printing were the daemon''s output stream not getting dumped > into the bit bucket. I can connect to the daemon and grab the printf > calls and print the format string, but how do I deal with the printf > varargs? Not only do I not know how many args there are, I don''t know > which ones are numbers (and can be used directly), and which ones are > pointers/strings (and must be copyin''ed or copyinstr''ed). I don''t see > any way in D to process the format string to find the answers. Any > clever tricks for handling something like this? > > * The printf calls aren''t actually printf() calls. They are calls to an > internal routine that functions similarly to printf(), with a caveat. > Unless debugging is turned on (by setting an env var), the contents of > the printf''s won''t be executed. It does a cheap test and returns if no > debug output is requested. My real objective is to use DTrace to be > able to dynamically turn on debugging by plugging into the printf calls > themselves, so that restarting the daemon to get debug output is no > longer required. This is an old and large source base, so I''m trying to > avoid massive code overhauls to reach my goal. > > Thanks, > Daniel > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Adam, Yep. That''s exactly what I''m looking for. I''ve read through the links you sent, but I''m still not 100% clear on linking DTrace into the object file, mostly because the project on which I''m working is a little complex. Our build process is essentially: cc src1a.c cc src1b.c ... ar libsrc1.a src1a.o src1b.o ... cc src2a.c cc src2b.c ... ar libsrc2.a src2a.o src2b.o ... ... cc src98a.c cc src98b.c ... cc -o bin1 libsrc1 libsrc2 ... src98a.o src98b.o ... cc src99a.c cc src99b.c ... cc -o bin2 libsrc1 libsrc2 ... src99a.o src99b.o ... ... Can I use dtrace -G on archives? I would assume not. It appears to need to be run against all the object files that compose the binary. Is that true? How would I correctly modify this build process to make it work with USDT probes? I have only one probe provider and header file, but pretty much every file in the system uses the USDT macros defined there. Thanks, Daniel Adam Leventhal wrote:> Hi Daniel, > > It sounds like what you want is is-enabled USDT probes: > http://blogs.sun.com/ahl/entry/user_land_tracing_gets_better > > Specifically, I suggest you do something like this: > > ---8<--- dprintf.d ---8<--- > provider debug { > probe debug(char *); > }; > --->8--- dprintf.d --->8--- > > ---8<--- dprintf.h ---8<--- > #define dprintf(...) \ > { if (DEBUG_DEBUG_ENABLED()) _dprintf(__VA_ARGS__); } > --->8--- dprintf.h --->8--- > > ---8<--- dprintf.c ---8<--- > void > _dprintf(const char *fmt, ...) > { > va_list ap; > char buf[512], *bufp = buf; > int n; > > va_start(ap, fmt); > n = vsnprintf(buf, sizeof (buf), fmt, ap); > va_end(ap); > > if (n >= sizeof (buf)) { > bufp = alloca(n + 1); > va_start(ap, fmt); > n = vsprintf(bufp, fmt, ap); > va_end(ap); > } > > DEBUG_DEBUG(bufp); > } > --->8--- dprintf.c --->8--- > > Then from your code you can call dprintf() and incur basically no overhead. > You can use a very simple D script to format the output: > > ---8<--- > > #!/usr/sbin/dtrace -s > > #pragma D option quiet > > debug*:::debug > { > printf("%s\n", copyinstr(arg0)); > } > > --->8--- > > I''d actually like to see something like this ship as a Solaris library... > > Adam > > On Mon, Jul 23, 2007 at 12:37:57PM -0700, Daniel Templeton wrote: > >> Hi, >> >> I''m working on a project where I would like to use DTrace to plug into >> printf* calls being made by a deamon and print out what those printfs >> would be printing were the daemon''s output stream not getting dumped >> into the bit bucket. I can connect to the daemon and grab the printf >> calls and print the format string, but how do I deal with the printf >> varargs? Not only do I not know how many args there are, I don''t know >> which ones are numbers (and can be used directly), and which ones are >> pointers/strings (and must be copyin''ed or copyinstr''ed). I don''t see >> any way in D to process the format string to find the answers. Any >> clever tricks for handling something like this? >> >> * The printf calls aren''t actually printf() calls. They are calls to an >> internal routine that functions similarly to printf(), with a caveat. >> Unless debugging is turned on (by setting an env var), the contents of >> the printf''s won''t be executed. It does a cheap test and returns if no >> debug output is requested. My real objective is to use DTrace to be >> able to dynamically turn on debugging by plugging into the printf calls >> themselves, so that restarting the daemon to get debug output is no >> longer required. This is an old and large source base, so I''m trying to >> avoid massive code overhauls to reach my goal. >> >> Thanks, >> Daniel >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >> > >
Hey Daniel, You can''t use dtrace -G on archives. You''ll have to pull the archives apart or build them as shared libraries (with their own USDT providers). Is there a specific reason why you''re using archives? Adam On Mon, Jul 23, 2007 at 03:01:32PM -0700, Daniel Templeton wrote:> Adam, > > Yep. That''s exactly what I''m looking for. I''ve read through the links > you sent, but I''m still not 100% clear on linking DTrace into the object > file, mostly because the project on which I''m working is a little complex. > > Our build process is essentially: > > cc src1a.c > cc src1b.c > ... > ar libsrc1.a src1a.o src1b.o ... > cc src2a.c > cc src2b.c > ... > ar libsrc2.a src2a.o src2b.o ... > ... > cc src98a.c > cc src98b.c > ... > cc -o bin1 libsrc1 libsrc2 ... src98a.o src98b.o ... > cc src99a.c > cc src99b.c > ... > cc -o bin2 libsrc1 libsrc2 ... src99a.o src99b.o ... > ... > > Can I use dtrace -G on archives? I would assume not. It appears to > need to be run against all the object files that compose the binary. Is > that true? How would I correctly modify this build process to make it > work with USDT probes? I have only one probe provider and header file, > but pretty much every file in the system uses the USDT macros defined there. > > Thanks, > Daniel > > Adam Leventhal wrote: > > Hi Daniel, > > > > It sounds like what you want is is-enabled USDT probes: > > http://blogs.sun.com/ahl/entry/user_land_tracing_gets_better > > > > Specifically, I suggest you do something like this: > > > > ---8<--- dprintf.d ---8<--- > > provider debug { > > probe debug(char *); > > }; > > --->8--- dprintf.d --->8--- > > > > ---8<--- dprintf.h ---8<--- > > #define dprintf(...) \ > > { if (DEBUG_DEBUG_ENABLED()) _dprintf(__VA_ARGS__); } > > --->8--- dprintf.h --->8--- > > > > ---8<--- dprintf.c ---8<--- > > void > > _dprintf(const char *fmt, ...) > > { > > va_list ap; > > char buf[512], *bufp = buf; > > int n; > > > > va_start(ap, fmt); > > n = vsnprintf(buf, sizeof (buf), fmt, ap); > > va_end(ap); > > > > if (n >= sizeof (buf)) { > > bufp = alloca(n + 1); > > va_start(ap, fmt); > > n = vsprintf(bufp, fmt, ap); > > va_end(ap); > > } > > > > DEBUG_DEBUG(bufp); > > } > > --->8--- dprintf.c --->8--- > > > > Then from your code you can call dprintf() and incur basically no overhead. > > You can use a very simple D script to format the output: > > > > ---8<--- > > > > #!/usr/sbin/dtrace -s > > > > #pragma D option quiet > > > > debug*:::debug > > { > > printf("%s\n", copyinstr(arg0)); > > } > > > > --->8--- > > > > I''d actually like to see something like this ship as a Solaris library... > > > > Adam > > > > On Mon, Jul 23, 2007 at 12:37:57PM -0700, Daniel Templeton wrote: > > > >> Hi, > >> > >> I''m working on a project where I would like to use DTrace to plug into > >> printf* calls being made by a deamon and print out what those printfs > >> would be printing were the daemon''s output stream not getting dumped > >> into the bit bucket. I can connect to the daemon and grab the printf > >> calls and print the format string, but how do I deal with the printf > >> varargs? Not only do I not know how many args there are, I don''t know > >> which ones are numbers (and can be used directly), and which ones are > >> pointers/strings (and must be copyin''ed or copyinstr''ed). I don''t see > >> any way in D to process the format string to find the answers. Any > >> clever tricks for handling something like this? > >> > >> * The printf calls aren''t actually printf() calls. They are calls to an > >> internal routine that functions similarly to printf(), with a caveat. > >> Unless debugging is turned on (by setting an env var), the contents of > >> the printf''s won''t be executed. It does a cheap test and returns if no > >> debug output is requested. My real objective is to use DTrace to be > >> able to dynamically turn on debugging by plugging into the printf calls > >> themselves, so that restarting the daemon to get debug output is no > >> longer required. This is an old and large source base, so I''m trying to > >> avoid massive code overhauls to reach my goal. > >> > >> Thanks, > >> Daniel > >> _______________________________________________ > >> dtrace-discuss mailing list > >> dtrace-discuss at opensolaris.org > >> > > > > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Adam, It''s legacy. I suspect the original reason was that there are multiple binaries built from the same object files, so instead of listing all the object files for every binary, they organized things into archives. They used archives instead of shared libraries because of a perceived performance hit for shared libraries. And probably to simplify the end-user experience. Regardless, it''s not something I''m going to be able to change without a big fight. Given that the archives are here to stay, let me ask a few questions to clarify my best approach: 0) Does dtrace -G have to be run on all the object files in the binary? 1) Does dtrace -G have to be run on all the object files at once? 2) Can I run it on the object files in groups, such as right before the archives get created, and then include the resulting object file in the archive? 3) What happens if dtrace -G gets run on the same object file twice? Thanks for helping me sort through this mess of a build environment. :) Daniel Adam Leventhal wrote:> Hey Daniel, > > You can''t use dtrace -G on archives. You''ll have to pull the archives apart > or build them as shared libraries (with their own USDT providers). > > Is there a specific reason why you''re using archives? > > Adam > > On Mon, Jul 23, 2007 at 03:01:32PM -0700, Daniel Templeton wrote: > >> Adam, >> >> Yep. That''s exactly what I''m looking for. I''ve read through the links >> you sent, but I''m still not 100% clear on linking DTrace into the object >> file, mostly because the project on which I''m working is a little complex. >> >> Our build process is essentially: >> >> cc src1a.c >> cc src1b.c >> ... >> ar libsrc1.a src1a.o src1b.o ... >> cc src2a.c >> cc src2b.c >> ... >> ar libsrc2.a src2a.o src2b.o ... >> ... >> cc src98a.c >> cc src98b.c >> ... >> cc -o bin1 libsrc1 libsrc2 ... src98a.o src98b.o ... >> cc src99a.c >> cc src99b.c >> ... >> cc -o bin2 libsrc1 libsrc2 ... src99a.o src99b.o ... >> ... >> >> Can I use dtrace -G on archives? I would assume not. It appears to >> need to be run against all the object files that compose the binary. Is >> that true? How would I correctly modify this build process to make it >> work with USDT probes? I have only one probe provider and header file, >> but pretty much every file in the system uses the USDT macros defined there. >> >> Thanks, >> Daniel >> >> Adam Leventhal wrote: >> >>> Hi Daniel, >>> >>> It sounds like what you want is is-enabled USDT probes: >>> http://blogs.sun.com/ahl/entry/user_land_tracing_gets_better >>> >>> Specifically, I suggest you do something like this: >>> >>> ---8<--- dprintf.d ---8<--- >>> provider debug { >>> probe debug(char *); >>> }; >>> --->8--- dprintf.d --->8--- >>> >>> ---8<--- dprintf.h ---8<--- >>> #define dprintf(...) \ >>> { if (DEBUG_DEBUG_ENABLED()) _dprintf(__VA_ARGS__); } >>> --->8--- dprintf.h --->8--- >>> >>> ---8<--- dprintf.c ---8<--- >>> void >>> _dprintf(const char *fmt, ...) >>> { >>> va_list ap; >>> char buf[512], *bufp = buf; >>> int n; >>> >>> va_start(ap, fmt); >>> n = vsnprintf(buf, sizeof (buf), fmt, ap); >>> va_end(ap); >>> >>> if (n >= sizeof (buf)) { >>> bufp = alloca(n + 1); >>> va_start(ap, fmt); >>> n = vsprintf(bufp, fmt, ap); >>> va_end(ap); >>> } >>> >>> DEBUG_DEBUG(bufp); >>> } >>> --->8--- dprintf.c --->8--- >>> >>> Then from your code you can call dprintf() and incur basically no overhead. >>> You can use a very simple D script to format the output: >>> >>> ---8<--- >>> >>> #!/usr/sbin/dtrace -s >>> >>> #pragma D option quiet >>> >>> debug*:::debug >>> { >>> printf("%s\n", copyinstr(arg0)); >>> } >>> >>> --->8--- >>> >>> I''d actually like to see something like this ship as a Solaris library... >>> >>> Adam >>> >>> On Mon, Jul 23, 2007 at 12:37:57PM -0700, Daniel Templeton wrote: >>> >>> >>>> Hi, >>>> >>>> I''m working on a project where I would like to use DTrace to plug into >>>> printf* calls being made by a deamon and print out what those printfs >>>> would be printing were the daemon''s output stream not getting dumped >>>> into the bit bucket. I can connect to the daemon and grab the printf >>>> calls and print the format string, but how do I deal with the printf >>>> varargs? Not only do I not know how many args there are, I don''t know >>>> which ones are numbers (and can be used directly), and which ones are >>>> pointers/strings (and must be copyin''ed or copyinstr''ed). I don''t see >>>> any way in D to process the format string to find the answers. Any >>>> clever tricks for handling something like this? >>>> >>>> * The printf calls aren''t actually printf() calls. They are calls to an >>>> internal routine that functions similarly to printf(), with a caveat. >>>> Unless debugging is turned on (by setting an env var), the contents of >>>> the printf''s won''t be executed. It does a cheap test and returns if no >>>> debug output is requested. My real objective is to use DTrace to be >>>> able to dynamically turn on debugging by plugging into the printf calls >>>> themselves, so that restarting the daemon to get debug output is no >>>> longer required. This is an old and large source base, so I''m trying to >>>> avoid massive code overhauls to reach my goal. >>>> >>>> Thanks, >>>> Daniel >>>> _______________________________________________ >>>> dtrace-discuss mailing list >>>> dtrace-discuss at opensolaris.org >>>> >>>> >>> >>> >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >> > >
On Tue, Jul 24, 2007 at 08:31:56AM -0700, Daniel Templeton wrote:> Given that the archives are here to stay, let me ask a few questions to > clarify my best approach: > > 0) Does dtrace -G have to be run on all the object files in the binary?It has to be run on every object file that has USDT probes or is-enabled tracepoints; in this case, that means every file that calls dprintf().> 1) Does dtrace -G have to be run on all the object files at once?That''s an interesting thought. Right now, you can''t do that because DTrace only allows one DOF container for a given ELF load object. This was done to prevent duplicate registration, but allowing multiple containers per object is arguably a useful feature and there are other ways to prevent duplicate registration.> 2) Can I run it on the object files in groups, such as right before the > archives get created, and then include the resulting object file in the > archive?That pretty much amounts to the same thing as (1) unfortunately.> 3) What happens if dtrace -G gets run on the same object file twice?Running dtrace -G has the result of identifying calls to tracepoints, recording their locations, and modifying them to be nops. A subsequent run will identify and record the locations, but make no modifications. This is necessary to support a incremental build environments. I''d suggest you just unpack the archives (ar -x) and use the raw object files. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Adam, Thanks for the help. I found a way to cheat together a functional build, but I have absolutely no idea how I''m going to automate this. The issues are deep and many. :) On to a new problem. Here''s my script: --- #!/usr/sbin/dtrace -s #pragma D option quiet dsge$1:::enter { printf("--> %s() {\n", copyinstr(arg0)); } dsge$1:::exit { printf("<-- %s() %s %d }\n", copyinstr(arg0), copyinstr(arg1), arg2); } dsge$1:::info { printf("%s\n", copyinstr(arg0)); } --- It works fine, mostly. In certain places, I get output like the following: dtrace: error on enabled probe ID 1397 (ID 12048: dsge4164:sge_execd:mconf_get_simulate_hosts:exit): invalid address (0x5e1b87) in action #2 at DIF offset 28 Clearly, dtrace couldn''t resolve one of the char *''s. It happens every time this probe gets called. It also happens more randomly with some other probes. Is dtrace finding pointer problems in the source code, or is this problem related to memory getting paged around? In other words, should I be debugging my project or the DTrace end of things? Thanks! Daniel Adam Leventhal wrote:> On Tue, Jul 24, 2007 at 08:31:56AM -0700, Daniel Templeton wrote: > >> Given that the archives are here to stay, let me ask a few questions to >> clarify my best approach: >> >> 0) Does dtrace -G have to be run on all the object files in the binary? >> > > It has to be run on every object file that has USDT probes or is-enabled > tracepoints; in this case, that means every file that calls dprintf(). > > >> 1) Does dtrace -G have to be run on all the object files at once? >> > > That''s an interesting thought. Right now, you can''t do that because DTrace > only allows one DOF container for a given ELF load object. This was done to > prevent duplicate registration, but allowing multiple containers per object > is arguably a useful feature and there are other ways to prevent duplicate > registration. > > >> 2) Can I run it on the object files in groups, such as right before the >> archives get created, and then include the resulting object file in the >> archive? >> > > That pretty much amounts to the same thing as (1) unfortunately. > > >> 3) What happens if dtrace -G gets run on the same object file twice? >> > > Running dtrace -G has the result of identifying calls to tracepoints, > recording their locations, and modifying them to be nops. A subsequent run > will identify and record the locations, but make no modifications. This is > necessary to support a incremental build environments. > > I''d suggest you just unpack the archives (ar -x) and use the raw object files. > > Adam > >
One more interesting tidbit. The char * that dtrace doesn''t like is coming from a __FILE__ macro in the source code: #define DRETURN(ret) \ DSGE_EXIT(SGE_FUNC, __FILE__, __LINE__); \ return ret Most of the time it works, but for some functions it doesn''t. I''m using Studio 11 on AMD64. Daniel Daniel Templeton wrote:> Adam, > > Thanks for the help. I found a way to cheat together a functional > build, but I have absolutely no idea how I''m going to automate this. > The issues are deep and many. :) > > On to a new problem. Here''s my script: > > --- > > #!/usr/sbin/dtrace -s > > #pragma D option quiet > > dsge$1:::enter > { > printf("--> %s() {\n", copyinstr(arg0)); > } > > dsge$1:::exit > { > printf("<-- %s() %s %d }\n", copyinstr(arg0), copyinstr(arg1), arg2); > } > > dsge$1:::info > { > printf("%s\n", copyinstr(arg0)); > } > > --- > > It works fine, mostly. In certain places, I get output like the > following: > > dtrace: error on enabled probe ID 1397 (ID 12048: > dsge4164:sge_execd:mconf_get_simulate_hosts:exit): invalid address > (0x5e1b87) in action #2 at DIF offset 28 > > Clearly, dtrace couldn''t resolve one of the char *''s. It happens > every time this probe gets called. It also happens more randomly with > some other probes. Is dtrace finding pointer problems in the source > code, or is this problem related to memory getting paged around? In > other words, should I be debugging my project or the DTrace end of > things? > > Thanks! > Daniel > > Adam Leventhal wrote: >> On Tue, Jul 24, 2007 at 08:31:56AM -0700, Daniel Templeton wrote: >> >>> Given that the archives are here to stay, let me ask a few questions >>> to clarify my best approach: >>> >>> 0) Does dtrace -G have to be run on all the object files in the binary? >>> >> >> It has to be run on every object file that has USDT probes or is-enabled >> tracepoints; in this case, that means every file that calls dprintf(). >> >> >>> 1) Does dtrace -G have to be run on all the object files at once? >>> >> >> That''s an interesting thought. Right now, you can''t do that because >> DTrace >> only allows one DOF container for a given ELF load object. This was >> done to >> prevent duplicate registration, but allowing multiple containers per >> object >> is arguably a useful feature and there are other ways to prevent >> duplicate >> registration. >> >> >>> 2) Can I run it on the object files in groups, such as right before >>> the archives get created, and then include the resulting object file >>> in the archive? >>> >> >> That pretty much amounts to the same thing as (1) unfortunately. >> >> >>> 3) What happens if dtrace -G gets run on the same object file twice? >>> >> >> Running dtrace -G has the result of identifying calls to tracepoints, >> recording their locations, and modifying them to be nops. A >> subsequent run >> will identify and record the locations, but make no modifications. >> This is >> necessary to support a incremental build environments. >> >> I''d suggest you just unpack the archives (ar -x) and use the raw >> object files. >> >> Adam >> >> > >
Hi Daniel, Since the definition of __FILE__ is added by the C pre-processor, which is very early in the stage when the source file gets translated into an object file, can you just run the C pre-processor on those files that dtrace does not like to see what the input to the compiler looks like?? http://developers.sun.com/sunstudio/documentation/ss11/mr/man1/cc.1.html -E Runs the source file through the preprocessor only and sends the output to stdout. The preprocessor is built directly into the compiler, except in -Xs mode, where /usr/ccs/lib/cpp is invoked. Includes the preprocessor line numbering information. See also -P option. Rayson On 7/24/07, Daniel Templeton <Dan.Templeton at sun.com> wrote:> One more interesting tidbit. The char * that dtrace doesn''t like is > coming from a __FILE__ macro in the source code: > > #define DRETURN(ret) \ > DSGE_EXIT(SGE_FUNC, __FILE__, __LINE__); \ > return ret > > Most of the time it works, but for some functions it doesn''t. I''m using > Studio 11 on AMD64. > > Daniel > > Daniel Templeton wrote: > > Adam, > > > > Thanks for the help. I found a way to cheat together a functional > > build, but I have absolutely no idea how I''m going to automate this. > > The issues are deep and many. :) > > > > On to a new problem. Here''s my script: > > > > --- > > > > #!/usr/sbin/dtrace -s > > > > #pragma D option quiet > > > > dsge$1:::enter > > { > > printf("--> %s() {\n", copyinstr(arg0)); > > } > > > > dsge$1:::exit > > { > > printf("<-- %s() %s %d }\n", copyinstr(arg0), copyinstr(arg1), arg2); > > } > > > > dsge$1:::info > > { > > printf("%s\n", copyinstr(arg0)); > > } > > > > --- > > > > It works fine, mostly. In certain places, I get output like the > > following: > > > > dtrace: error on enabled probe ID 1397 (ID 12048: > > dsge4164:sge_execd:mconf_get_simulate_hosts:exit): invalid address > > (0x5e1b87) in action #2 at DIF offset 28 > > > > Clearly, dtrace couldn''t resolve one of the char *''s. It happens > > every time this probe gets called. It also happens more randomly with > > some other probes. Is dtrace finding pointer problems in the source > > code, or is this problem related to memory getting paged around? In > > other words, should I be debugging my project or the DTrace end of > > things? > > > > Thanks! > > Daniel > > > > Adam Leventhal wrote: > >> On Tue, Jul 24, 2007 at 08:31:56AM -0700, Daniel Templeton wrote: > >> > >>> Given that the archives are here to stay, let me ask a few questions > >>> to clarify my best approach: > >>> > >>> 0) Does dtrace -G have to be run on all the object files in the binary? > >>> > >> > >> It has to be run on every object file that has USDT probes or is-enabled > >> tracepoints; in this case, that means every file that calls dprintf(). > >> > >> > >>> 1) Does dtrace -G have to be run on all the object files at once? > >>> > >> > >> That''s an interesting thought. Right now, you can''t do that because > >> DTrace > >> only allows one DOF container for a given ELF load object. This was > >> done to > >> prevent duplicate registration, but allowing multiple containers per > >> object > >> is arguably a useful feature and there are other ways to prevent > >> duplicate > >> registration. > >> > >> > >>> 2) Can I run it on the object files in groups, such as right before > >>> the archives get created, and then include the resulting object file > >>> in the archive? > >>> > >> > >> That pretty much amounts to the same thing as (1) unfortunately. > >> > >> > >>> 3) What happens if dtrace -G gets run on the same object file twice? > >>> > >> > >> Running dtrace -G has the result of identifying calls to tracepoints, > >> recording their locations, and modifying them to be nops. A > >> subsequent run > >> will identify and record the locations, but make no modifications. > >> This is > >> necessary to support a incremental build environments. > >> > >> I''d suggest you just unpack the archives (ar -x) and use the raw > >> object files. > >> > >> Adam > >> > >> > > > > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
On Tue, Jul 24, 2007 at 01:28:21PM -0700, Daniel Templeton wrote:> It works fine, mostly. In certain places, I get output like the following: > > dtrace: error on enabled probe ID 1397 (ID 12048: > dsge4164:sge_execd:mconf_get_simulate_hosts:exit): invalid address > (0x5e1b87) in action #2 at DIF offset 28 > > Clearly, dtrace couldn''t resolve one of the char *''s. It happens every > time this probe gets called. It also happens more randomly with some > other probes. Is dtrace finding pointer problems in the source code, or > is this problem related to memory getting paged around? In other words, > should I be debugging my project or the DTrace end of things?I''d wager the problem is that the string at 0x5e1b87 hasn''t been paged in since DTrace is the first thing touching it. This is a long-standing issue and one that''s not easily resolved in DTrace. If you could arrange to ''touch'' those strings either once during your program''s initialization or before the DTrace probe it should fix the problem. You can confirm that this is the issue by using mdb to print the address as a string: $ mdb -p `pgrep myapp`> 0x5e1b87/s... Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Adam, OK. I now have it working. I''m still having some issues, though. The USDT probes are included through marcos. My first passed at one particular macro looked like this: #define DRETURN_VOID \ DSGE_EXIT(SGE_FUNC, xaybzc, SGE_FILE, __LINE__); \ if (rmon_condition(xaybzc, TRACE)) \ rmon_mexit(SGE_FUNC, __FILE__, __LINE__); \ return and it worked. Lines 2-4 are now extraneous because DTrace has subsumed their role, so I tried to remove them: #define DRETURN_VOID \ DSGE_EXIT(SGE_FUNC, xaybzc, SGE_FILE, __LINE__) but that didn''t work. When I tried to run dtrace -G it told me that it couldn''t link my dtrace script for one of my object files. After much frustration, I ended up with: #define DRETURN_VOID \ DSGE_EXIT(SGE_FUNC, xaybzc, SGE_FILE, __LINE__); \ sleep(0); \ return which works, but is a hack. What''s going on? Note that the final missing semi-colon is intentional. The macro is intended to be used like a function call. The SGE_FUNC, SGE_FILE, and xaybzc variables are defined in another macro. Next problem. With the above hack, I have one daemon working, and now I''m working on the other three. They are all being compiled from the same source base and are reusing compiled objects where possible. I am following exactly the same procedure with the other three daemons, but when I run dtrace -l against the running binary, I get: # dtrace -l -n dsge5536::: ID PROVIDER MODULE FUNCTION NAME dtrace: failed to match dsge5536:::: No probe matches description Here are the relevant Makefile lines from the working daemon: sge_execd: $(EXECD_OBJS) $(EXECD_ADD_OBJS) $(EXECD_LIB_DEPENDS) dsge_execd.o $(LD_WRAPPER) $(CC) -o $@ $(LFLAGS) dsge_execd.o $(EXECD_OBJS) $(EXECD_ADD_OBJS) $(EXECD_ADD_LIBS) $(SLIBS) $(LIBS) $(LOADAVGLIBS) $(SECLIBS_STATIC) dsge_execd.o: $(EXECD_OBJS) $(EXECD_ADD_OBJS) /usr/sbin/dtrace -G -s $(RMONLIB_DIR)/debug.d -o dsge_execd.o $(EXECD_OBJS) $(EXECD_ADD_OBJS) and from one of the not-working daemons, sge_schedd: sge_schedd: $(SCHEDD_OBJS) $(SCHEDD_ADD_OBJS) $(SCHEDD_LIB_DEPENDS) dsge_schedd.o $(LD_WRAPPER) $(CC) -o $@ $(LFLAGS) dsge_schedd.o $(SCHEDD_OBJS) $(SCHEDD_ADD_OBJS) $(SCHEDD_ADD_LIBS) $(SLIBS) $(LIBS) $(LOADAVGLIBS) dsge_schedd.o: $(SCHEDD_OBJS) $(SCHEDD_ADD_OBJS) /usr/sbin/dtrace -G -s $(RMONLIB_DIR)/debug.d -o dsge_schedd.o $(SCHEDD_OBJS) $(SCHEDD_ADD_OBJS) Below is the relevant portion of the make output for sge_schedd. My .d scripts are attached. Any insights would be appreciated. Thanks! Daniel --- /usr/sbin/dtrace -G -s ../libs/rmon/debug.d -o dsge_schedd.o sge_schedd.o scheduler.o sge_category.o sge_process_events.o category.o read_write_complex.o parse_job_cull.o parse_qsub.o sge_options.o sge_orders.o sge_pe_schedd.o shutdown.o sig_handlers.o unparse_job_cull.o usage.o sge_mt_init.o sge_select_queue.o valid_queue_user.o sort_hosts.o sge_complex_schedd.o sge_job_schedd.o sge_range_schedd.o sge_pe_schedd.o sge_qeti.o debit.o subordinate_schedd.o load_correction.o suspend_thresholds.o schedd_monitor.o scale_usage.o sgeee.o sge_urgency.o sge_resource_utilization.o sge_serf.o sge_support.o schedd_message.o sge_orders.o sge_schedd_text.o sge_ssi.o sge_sharetree_printing.o sge_interactive_sched.o sge_resource_quota_schedd.o sge_mirror.o sge_job_mirror.o sge_ja_task_mirror.o sge_pe_task_mirror.o sge_host_mirror.o sge_queue_mirror.o sge_sharetree_mirror.o sge_sched_conf_mirror.o sge_event_client.o pack_job_delivery.o sge_gdi_request.o sge_gdi_ctx.o sge_gdi2.o sge_qexec.o qm_name.o sge_security.o sge_qtcsh.o version.o config.o cull_parse_util.o parse.o sge_attr.o sge_calendar.o sge_centry.o sge_conf.o sge_cqueue.o sge_cqueue_verify.o sge_ckpt.o sge_cuser.o sge_event.o sge_gqueue.o sge_host.o sge_hgroup.o sge_href.o sge_id.o sge_ja_task.o sge_job.o sge_resource_quota.o sge_load.o sge_order.o sge_mailrec.o sge_manop.o sge_mesobj.o sge_object.o sge_path_alias.o sge_pe.o sge_pe_task.o sge_qinstance.o sge_qinstance_state.o sge_qinstance_type.o sge_qref.o sge_range.o sge_report.o sge_schedd_conf.o sge_sharetree.o sge_str.o sge_subordinate.o sge_sub_object.o sge_suser.o sge_usage.o sge_ulong.o sge_userprj.o sge_userset.o sge_utility.o str2nm_converter.o sge_var.o sge_eval_expression.o sge_answer.o sge_feature.o sge_all_listsL.o cull_list.o cull_hash.o cull_where.o cull_parse.o cull_what.o cull_what_elem.o cull_what_print.o cull_multitype.o cull_db.o cull_sort.o cull_dump_scan.o cull_lerrno.o cull_pack.o cull_tree.o cull_file.o cull_state.o cull_xml.o cull_packL.o pack.o cl_tcp_framework.o cl_ssl_framework.o cl_communication.o cl_xml_parsing.o cl_connection_list.o cl_app_message_queue.o cl_message_list.o cl_host_list.o cl_host_alias_list.o cl_endpoint_list.o cl_handle_list.o cl_application_error_list.o cl_commlib.o cl_util.o cl_errors.o cl_raw_list.o cl_log_list.o cl_string_list.o cl_thread.o cl_thread_list.o sge_loadmem.o sge_getloadavg.o sge_monitor.o config_file.o setup_path.o sge_afsutil.o sge_arch.o sge_bitfield.o sge_bootstrap.o sge_dstring.o sge_edit.o sge_hostname.o sge_htable.o sge_io.o sge_language.o sge_log.o sge_nprocs.o sge_os.o sge_parse_num_par.o sge_profiling.o sge_prog.o sge_signal.o sge_spool.o sge_stdio.o sge_stdlib.o sge_string.o sge_time.o sge_tmpnam.o sge_uidgid.o sge_unistd.o sge_env.o sge_error_class.o sge_csp_path.o sge_lock.o sge_mtutil.o rmon_macros.o rmon_monitoring_level.o cc -o sge_schedd -xarch=amd64 -xildoff -L/usr/local/BerkeleyDB.4.4//lib/ -L. -R \$ORIGIN/../../lib/sol-amd64 -L/usr/local/ssl/lib dsge_schedd.o sge_schedd.o scheduler.o sge_category.o sge_process_events.o category.o read_write_complex.o parse_job_cull.o parse_qsub.o sge_options.o sge_orders.o sge_pe_schedd.o shutdown.o sig_handlers.o unparse_job_cull.o usage.o sge_mt_init.o sge_select_queue.o valid_queue_user.o sort_hosts.o sge_complex_schedd.o sge_job_schedd.o sge_range_schedd.o sge_pe_schedd.o sge_qeti.o debit.o subordinate_schedd.o load_correction.o suspend_thresholds.o schedd_monitor.o scale_usage.o sgeee.o sge_urgency.o sge_resource_utilization.o sge_serf.o sge_support.o schedd_message.o sge_orders.o sge_schedd_text.o sge_ssi.o sge_sharetree_printing.o sge_interactive_sched.o sge_resource_quota_schedd.o sge_mirror.o sge_job_mirror.o sge_ja_task_mirror.o sge_pe_task_mirror.o sge_host_mirror.o sge_queue_mirror.o sge_sharetree_mirror.o sge_sched_conf_mirror.o sge_event_client.o pack_job_delivery.o sge_gdi_request.o sge_gdi_ctx.o sge_gdi2.o sge_qexec.o qm_name.o sge_security.o sge_qtcsh.o version.o config.o cull_parse_util.o parse.o sge_attr.o sge_calendar.o sge_centry.o sge_conf.o sge_cqueue.o sge_cqueue_verify.o sge_ckpt.o sge_cuser.o sge_event.o sge_gqueue.o sge_host.o sge_hgroup.o sge_href.o sge_id.o sge_ja_task.o sge_job.o sge_resource_quota.o sge_load.o sge_order.o sge_mailrec.o sge_manop.o sge_mesobj.o sge_object.o sge_path_alias.o sge_pe.o sge_pe_task.o sge_qinstance.o sge_qinstance_state.o sge_qinstance_type.o sge_qref.o sge_range.o sge_report.o sge_schedd_conf.o sge_sharetree.o sge_str.o sge_subordinate.o sge_sub_object.o sge_suser.o sge_usage.o sge_ulong.o sge_userprj.o sge_userset.o sge_utility.o str2nm_converter.o sge_var.o sge_eval_expression.o sge_answer.o sge_feature.o sge_all_listsL.o cull_list.o cull_hash.o cull_where.o cull_parse.o cull_what.o cull_what_elem.o cull_what_print.o cull_multitype.o cull_db.o cull_sort.o cull_dump_scan.o cull_lerrno.o cull_pack.o cull_tree.o cull_file.o cull_state.o cull_xml.o cull_packL.o pack.o cl_tcp_framework.o cl_ssl_framework.o cl_communication.o cl_xml_parsing.o cl_connection_list.o cl_app_message_queue.o cl_message_list.o cl_host_list.o cl_host_alias_list.o cl_endpoint_list.o cl_handle_list.o cl_application_error_list.o cl_commlib.o cl_util.o cl_errors.o cl_raw_list.o cl_log_list.o cl_string_list.o cl_thread.o cl_thread_list.o sge_loadmem.o sge_getloadavg.o sge_monitor.o config_file.o setup_path.o sge_afsutil.o sge_arch.o sge_bitfield.o sge_bootstrap.o sge_dstring.o sge_edit.o sge_hostname.o sge_htable.o sge_io.o sge_language.o sge_log.o sge_nprocs.o sge_os.o sge_parse_num_par.o sge_profiling.o sge_prog.o sge_signal.o sge_spool.o sge_stdio.o sge_stdlib.o sge_string.o sge_time.o sge_tmpnam.o sge_uidgid.o sge_unistd.o sge_env.o sge_error_class.o sge_csp_path.o sge_lock.o sge_mtutil.o rmon_macros.o rmon_monitoring_level.o -ldl -lsocket -lnsl -lm -lpthread -lkstat ld: warning: file sge_pe_schedd.o: attempted multiple inclusion of file ld: warning: file sge_orders.o: attempted multiple inclusion of file Adam Leventhal wrote:> On Tue, Jul 24, 2007 at 01:28:21PM -0700, Daniel Templeton wrote: > >> It works fine, mostly. In certain places, I get output like the following: >> >> dtrace: error on enabled probe ID 1397 (ID 12048: >> dsge4164:sge_execd:mconf_get_simulate_hosts:exit): invalid address >> (0x5e1b87) in action #2 at DIF offset 28 >> >> Clearly, dtrace couldn''t resolve one of the char *''s. It happens every >> time this probe gets called. It also happens more randomly with some >> other probes. Is dtrace finding pointer problems in the source code, or >> is this problem related to memory getting paged around? In other words, >> should I be debugging my project or the DTrace end of things? >> > > I''d wager the problem is that the string at 0x5e1b87 hasn''t been paged in > since DTrace is the first thing touching it. This is a long-standing issue > and one that''s not easily resolved in DTrace. If you could arrange to > ''touch'' those strings either once during your program''s initialization or > before the DTrace probe it should fix the problem. > > You can confirm that this is the issue by using mdb to print the address as > a string: > > $ mdb -p `pgrep myapp` > >> 0x5e1b87/s >> > ... > > Adam > >-------------- next part -------------- A non-text attachment was scrubbed... Name: dl.d Type: text/x-dsrc Size: 382 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20070803/49bfab17/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: debug.d Type: text/x-dsrc Size: 140 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20070803/49bfab17/attachment-0001.bin>
On Fri, Aug 03, 2007 at 08:54:34AM -0700, Daniel Templeton wrote:> but that didn''t work. When I tried to run dtrace -G it told me that it > couldn''t link my dtrace script for one of my object files. After much > frustration, I ended up with:Can you post the output from dtrace -G?> Next problem. With the above hack, I have one daemon working, and now > I''m working on the other three. They are all being compiled from the > same source base and are reusing compiled objects where possible. I am > following exactly the same procedure with the other three daemons, but > when I run dtrace -l against the running binary, I get: > > # dtrace -l -n dsge5536::: > ID PROVIDER MODULE FUNCTION NAME > dtrace: failed to match dsge5536:::: No probe matches descriptionAre you sure you have the correct pid? Could you take a core of that process using gcore(1) and send it to me privately? It may be worth recompiling the object files for the second binary just to verify that that''s not the problem. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Adam, The dtrace -G output is below. To cause the failure, I change the DRETURN_VOID macro from: #define DRETURN_VOID \ DSGE_EXIT(SGE_FUNC, xaybzc, SGE_FILE, __LINE__); \ sleep(0); \ return to: #define DRETURN_VOID \ DSGE_EXIT(SGE_FUNC, xaybzc, SGE_FILE, __LINE__); \ return I am sure that I''m using the correct PIDs. :) I now have 2 out of four daemons working. For the daemons that work, I am able to build them in any order, so I''m certain that the order for building the daemons doesn''t matter. I will send you the core files for the not-working and working daemons momentarily. Daniel --- /usr/sbin/dtrace -G -s ../libs/rmon/debug.d -o dsge_execd.o execd.o sge_load_sensor.o dispatcher.o exec_job.o execd_ck_to_do.o execd_get_new_conf.o execd_job_exec.o execd_kill_execd.o execd_signal_queue.o execd_ticket.o get_path.o job_report_execd.o load_avg.o pdc.o procfs.o ptf.o reaper_execd.o sge_report_execd.o setup_execd.o tmpdir.o usage.o sge_options.o admin_mail.o config_file.o mail.o read_object.o read_write_job.o sge_dirent.o execution_states.o sig_handlers.o startprog.o shutdown.o sge_mt_init.o pack_job_delivery.o sge_gdi_request.o sge_gdi_ctx.o sge_gdi2.o sge_qexec.o qm_name.o sge_security.o sge_qtcsh.o version.o config.o cull_parse_util.o parse.o sge_attr.o sge_calendar.o sge_centry.o sge_conf.o sge_cqueue.o sge_cqueue_verify.o sge_ckpt.o sge_cuser.o sge_event.o sge_gqueue.o sge_host.o sge_hgroup.o sge_href.o sge_id.o sge_ja_task.o sge_job.o sge_resource_quota.o sge_load.o sge_order.o sge_mailrec.o sge_manop.o sge_mesobj.o sge_object.o sge_path_alias.o sge_pe.o sge_pe_task.o sge_qinstance.o sge_qinstance_state.o sge_qinstance_type.o sge_qref.o sge_range.o sge_report.o sge_schedd_conf.o sge_sharetree.o sge_str.o sge_subordinate.o sge_sub_object.o sge_suser.o sge_usage.o sge_ulong.o sge_userprj.o sge_userset.o sge_utility.o str2nm_converter.o sge_var.o sge_eval_expression.o sge_answer.o sge_feature.o sge_all_listsL.o cull_list.o cull_hash.o cull_where.o cull_parse.o cull_what.o cull_what_elem.o cull_what_print.o cull_multitype.o cull_db.o cull_sort.o cull_dump_scan.o cull_lerrno.o cull_pack.o cull_tree.o cull_file.o cull_state.o cull_xml.o cull_packL.o pack.o cl_tcp_framework.o cl_ssl_framework.o cl_communication.o cl_xml_parsing.o cl_connection_list.o cl_app_message_queue.o cl_message_list.o cl_host_list.o cl_host_alias_list.o cl_endpoint_list.o cl_handle_list.o cl_application_error_list.o cl_commlib.o cl_util.o cl_errors.o cl_raw_list.o cl_log_list.o cl_string_list.o cl_thread.o cl_thread_list.o sge_loadmem.o sge_getloadavg.o sge_monitor.o config_file.o setup_path.o sge_afsutil.o sge_arch.o sge_bitfield.o sge_bootstrap.o sge_dstring.o sge_edit.o sge_hostname.o sge_htable.o sge_io.o sge_language.o sge_log.o sge_nprocs.o sge_os.o sge_parse_num_par.o sge_profiling.o sge_prog.o sge_signal.o sge_spool.o sge_stdio.o sge_stdlib.o sge_string.o sge_time.o sge_tmpnam.o sge_uidgid.o sge_unistd.o sge_env.o sge_error_class.o sge_csp_path.o sge_lock.o sge_mtutil.o rmon_macros.o rmon_monitoring_level.o dtrace: failed to link script ../libs/rmon/debug.d: an error was encountered while processing sge_gdi_ctx.o Adam Leventhal wrote:> On Fri, Aug 03, 2007 at 08:54:34AM -0700, Daniel Templeton wrote: > >> but that didn''t work. When I tried to run dtrace -G it told me that it >> couldn''t link my dtrace script for one of my object files. After much >> frustration, I ended up with: >> > > Can you post the output from dtrace -G? > > >> Next problem. With the above hack, I have one daemon working, and now >> I''m working on the other three. They are all being compiled from the >> same source base and are reusing compiled objects where possible. I am >> following exactly the same procedure with the other three daemons, but >> when I run dtrace -l against the running binary, I get: >> >> # dtrace -l -n dsge5536::: >> ID PROVIDER MODULE FUNCTION NAME >> dtrace: failed to match dsge5536:::: No probe matches description >> > > Are you sure you have the correct pid? Could you take a core of that > process using gcore(1) and send it to me privately? > > It may be worth recompiling the object files for the second binary just to > verify that that''s not the problem. > > Adam > >
Can you send me sge_gdi_ctx.o? Are you using gcc on SPARC by any chance? Adam On Fri, Aug 03, 2007 at 10:32:52AM -0700, Daniel Templeton wrote:> Adam, > > The dtrace -G output is below. To cause the failure, I change the > DRETURN_VOID macro from: > > #define DRETURN_VOID \ > DSGE_EXIT(SGE_FUNC, xaybzc, SGE_FILE, __LINE__); \ > sleep(0); \ > return > > to: > > #define DRETURN_VOID \ > DSGE_EXIT(SGE_FUNC, xaybzc, SGE_FILE, __LINE__); \ > return > > I am sure that I''m using the correct PIDs. :) I now have 2 out of four > daemons working. For the daemons that work, I am able to build them in > any order, so I''m certain that the order for building the daemons > doesn''t matter. I will send you the core files for the not-working and > working daemons momentarily. > > Daniel > > --- > > /usr/sbin/dtrace -G -s ../libs/rmon/debug.d -o dsge_execd.o execd.o > sge_load_sensor.o dispatcher.o exec_job.o execd_ck_to_do.o > execd_get_new_conf.o execd_job_exec.o execd_kill_execd.o > execd_signal_queue.o execd_ticket.o get_path.o job_report_execd.o > load_avg.o pdc.o procfs.o ptf.o reaper_execd.o sge_report_execd.o > setup_execd.o tmpdir.o usage.o sge_options.o admin_mail.o > config_file.o mail.o read_object.o read_write_job.o sge_dirent.o > execution_states.o sig_handlers.o startprog.o shutdown.o > sge_mt_init.o pack_job_delivery.o sge_gdi_request.o sge_gdi_ctx.o > sge_gdi2.o sge_qexec.o qm_name.o sge_security.o sge_qtcsh.o > version.o config.o cull_parse_util.o parse.o sge_attr.o > sge_calendar.o sge_centry.o sge_conf.o sge_cqueue.o > sge_cqueue_verify.o sge_ckpt.o sge_cuser.o sge_event.o sge_gqueue.o > sge_host.o sge_hgroup.o sge_href.o sge_id.o sge_ja_task.o > sge_job.o sge_resource_quota.o sge_load.o sge_order.o sge_mailrec.o > sge_manop.o sge_mesobj.o sge_object.o sge_path_alias.o sge_pe.o > sge_pe_task.o sge_qinstance.o sge_qinstance_state.o > sge_qinstance_type.o sge_qref.o sge_range.o sge_report.o > sge_schedd_conf.o sge_sharetree.o sge_str.o sge_subordinate.o > sge_sub_object.o sge_suser.o sge_usage.o sge_ulong.o sge_userprj.o > sge_userset.o sge_utility.o str2nm_converter.o sge_var.o > sge_eval_expression.o sge_answer.o sge_feature.o sge_all_listsL.o > cull_list.o cull_hash.o cull_where.o cull_parse.o cull_what.o > cull_what_elem.o cull_what_print.o cull_multitype.o cull_db.o > cull_sort.o cull_dump_scan.o cull_lerrno.o cull_pack.o cull_tree.o > cull_file.o cull_state.o cull_xml.o cull_packL.o pack.o > cl_tcp_framework.o cl_ssl_framework.o cl_communication.o > cl_xml_parsing.o cl_connection_list.o cl_app_message_queue.o > cl_message_list.o cl_host_list.o cl_host_alias_list.o > cl_endpoint_list.o cl_handle_list.o cl_application_error_list.o > cl_commlib.o cl_util.o cl_errors.o cl_raw_list.o > cl_log_list.o cl_string_list.o cl_thread.o > cl_thread_list.o sge_loadmem.o sge_getloadavg.o sge_monitor.o > config_file.o setup_path.o sge_afsutil.o sge_arch.o sge_bitfield.o > sge_bootstrap.o sge_dstring.o sge_edit.o sge_hostname.o > sge_htable.o sge_io.o sge_language.o sge_log.o sge_nprocs.o > sge_os.o sge_parse_num_par.o sge_profiling.o sge_prog.o > sge_signal.o sge_spool.o sge_stdio.o sge_stdlib.o sge_string.o > sge_time.o sge_tmpnam.o sge_uidgid.o sge_unistd.o sge_env.o > sge_error_class.o sge_csp_path.o sge_lock.o sge_mtutil.o > rmon_macros.o rmon_monitoring_level.o > dtrace: failed to link script ../libs/rmon/debug.d: an error was > encountered while processing sge_gdi_ctx.o > > > Adam Leventhal wrote: > >On Fri, Aug 03, 2007 at 08:54:34AM -0700, Daniel Templeton wrote: > > > >>but that didn''t work. When I tried to run dtrace -G it told me that it > >>couldn''t link my dtrace script for one of my object files. After much > >>frustration, I ended up with: > >> > > > >Can you post the output from dtrace -G? > > > > > >>Next problem. With the above hack, I have one daemon working, and now > >>I''m working on the other three. They are all being compiled from the > >>same source base and are reusing compiled objects where possible. I am > >>following exactly the same procedure with the other three daemons, but > >>when I run dtrace -l against the running binary, I get: > >> > >># dtrace -l -n dsge5536::: > >> ID PROVIDER MODULE FUNCTION NAME > >>dtrace: failed to match dsge5536:::: No probe matches description > >> > > > >Are you sure you have the correct pid? Could you take a core of that > >process using gcore(1) and send it to me privately? > > > >It may be worth recompiling the object files for the second binary just to > >verify that that''s not the problem. > > > >Adam > > > >-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Nope. Studio 11 on AMD64. I will send the .c and .o files privately. Daniel Adam Leventhal wrote:> Can you send me sge_gdi_ctx.o? Are you using gcc on SPARC by any chance? > > Adam > > On Fri, Aug 03, 2007 at 10:32:52AM -0700, Daniel Templeton wrote: > >> Adam, >> >> The dtrace -G output is below. To cause the failure, I change the >> DRETURN_VOID macro from: >> >> #define DRETURN_VOID \ >> DSGE_EXIT(SGE_FUNC, xaybzc, SGE_FILE, __LINE__); \ >> sleep(0); \ >> return >> >> to: >> >> #define DRETURN_VOID \ >> DSGE_EXIT(SGE_FUNC, xaybzc, SGE_FILE, __LINE__); \ >> return >> >> I am sure that I''m using the correct PIDs. :) I now have 2 out of four >> daemons working. For the daemons that work, I am able to build them in >> any order, so I''m certain that the order for building the daemons >> doesn''t matter. I will send you the core files for the not-working and >> working daemons momentarily. >> >> Daniel >> >> --- >> >> /usr/sbin/dtrace -G -s ../libs/rmon/debug.d -o dsge_execd.o execd.o >> sge_load_sensor.o dispatcher.o exec_job.o execd_ck_to_do.o >> execd_get_new_conf.o execd_job_exec.o execd_kill_execd.o >> execd_signal_queue.o execd_ticket.o get_path.o job_report_execd.o >> load_avg.o pdc.o procfs.o ptf.o reaper_execd.o sge_report_execd.o >> setup_execd.o tmpdir.o usage.o sge_options.o admin_mail.o >> config_file.o mail.o read_object.o read_write_job.o sge_dirent.o >> execution_states.o sig_handlers.o startprog.o shutdown.o >> sge_mt_init.o pack_job_delivery.o sge_gdi_request.o sge_gdi_ctx.o >> sge_gdi2.o sge_qexec.o qm_name.o sge_security.o sge_qtcsh.o >> version.o config.o cull_parse_util.o parse.o sge_attr.o >> sge_calendar.o sge_centry.o sge_conf.o sge_cqueue.o >> sge_cqueue_verify.o sge_ckpt.o sge_cuser.o sge_event.o sge_gqueue.o >> sge_host.o sge_hgroup.o sge_href.o sge_id.o sge_ja_task.o >> sge_job.o sge_resource_quota.o sge_load.o sge_order.o sge_mailrec.o >> sge_manop.o sge_mesobj.o sge_object.o sge_path_alias.o sge_pe.o >> sge_pe_task.o sge_qinstance.o sge_qinstance_state.o >> sge_qinstance_type.o sge_qref.o sge_range.o sge_report.o >> sge_schedd_conf.o sge_sharetree.o sge_str.o sge_subordinate.o >> sge_sub_object.o sge_suser.o sge_usage.o sge_ulong.o sge_userprj.o >> sge_userset.o sge_utility.o str2nm_converter.o sge_var.o >> sge_eval_expression.o sge_answer.o sge_feature.o sge_all_listsL.o >> cull_list.o cull_hash.o cull_where.o cull_parse.o cull_what.o >> cull_what_elem.o cull_what_print.o cull_multitype.o cull_db.o >> cull_sort.o cull_dump_scan.o cull_lerrno.o cull_pack.o cull_tree.o >> cull_file.o cull_state.o cull_xml.o cull_packL.o pack.o >> cl_tcp_framework.o cl_ssl_framework.o cl_communication.o >> cl_xml_parsing.o cl_connection_list.o cl_app_message_queue.o >> cl_message_list.o cl_host_list.o cl_host_alias_list.o >> cl_endpoint_list.o cl_handle_list.o cl_application_error_list.o >> cl_commlib.o cl_util.o cl_errors.o cl_raw_list.o >> cl_log_list.o cl_string_list.o cl_thread.o >> cl_thread_list.o sge_loadmem.o sge_getloadavg.o sge_monitor.o >> config_file.o setup_path.o sge_afsutil.o sge_arch.o sge_bitfield.o >> sge_bootstrap.o sge_dstring.o sge_edit.o sge_hostname.o >> sge_htable.o sge_io.o sge_language.o sge_log.o sge_nprocs.o >> sge_os.o sge_parse_num_par.o sge_profiling.o sge_prog.o >> sge_signal.o sge_spool.o sge_stdio.o sge_stdlib.o sge_string.o >> sge_time.o sge_tmpnam.o sge_uidgid.o sge_unistd.o sge_env.o >> sge_error_class.o sge_csp_path.o sge_lock.o sge_mtutil.o >> rmon_macros.o rmon_monitoring_level.o >> dtrace: failed to link script ../libs/rmon/debug.d: an error was >> encountered while processing sge_gdi_ctx.o >> >> >> Adam Leventhal wrote: >> >>> On Fri, Aug 03, 2007 at 08:54:34AM -0700, Daniel Templeton wrote: >>> >>> >>>> but that didn''t work. When I tried to run dtrace -G it told me that it >>>> couldn''t link my dtrace script for one of my object files. After much >>>> frustration, I ended up with: >>>> >>>> >>> Can you post the output from dtrace -G? >>> >>> >>> >>>> Next problem. With the above hack, I have one daemon working, and now >>>> I''m working on the other three. They are all being compiled from the >>>> same source base and are reusing compiled objects where possible. I am >>>> following exactly the same procedure with the other three daemons, but >>>> when I run dtrace -l against the running binary, I get: >>>> >>>> # dtrace -l -n dsge5536::: >>>> ID PROVIDER MODULE FUNCTION NAME >>>> dtrace: failed to match dsge5536:::: No probe matches description >>>> >>>> >>> Are you sure you have the correct pid? Could you take a core of that >>> process using gcore(1) and send it to me privately? >>> >>> It may be worth recompiling the object files for the second binary just to >>> verify that that''s not the problem. >>> >>> Adam >>> >>> >>> > >
Daniel, The problem you''re seeing is that the new compilers are being rather clever with a tail-call optimization (I asked about SPARC and gcc because there''s a different type of tail-call cleverness there that I''m trying to avoid fixing ;-). I''ve filed this bug to track the issue: 6589130 dtrace -G fails for certain tail-calls on x86 ---8<--- [ahl 8.3.2007] Consider the following object code: sge_gdi_ctx_class_error+0x10f: 41 5e popq %r14 sge_gdi_ctx_class_error+0x111: 41 5d popq %r13 sge_gdi_ctx_class_error+0x113: 41 5c popq %r12 sge_gdi_ctx_class_error+0x115: 5d popq %rbp sge_gdi_ctx_class_error+0x116: 5b popq %rbx sge_gdi_ctx_class_error+0x117: e9 00 00 00 00 jmp +0x0 The code in dt_modtext() needs to allow for jmp instructions as well as calls. For a jmp it will need to turn it into a ret followed by a bunch of nops. This would also be a good opportunity to provide a better error message identifying the site of the problematic instruction. A workaround for this is to force the compiler to not tail-call the DTrace probe dummy function by adding some code after it. *** (#1 of 1): 2007-08-03 11:18:23 PDT adam.leventhal at sun.com --->8--- Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Adam, Great! At least there''s an explanation for it. Right now I''m using sleep(0) as the code to prevent the tail-calling. Can you recommend a better choice, i.e. lower impact? Thanks, Daniel Adam Leventhal wrote:> Daniel, > > The problem you''re seeing is that the new compilers are being rather clever > with a tail-call optimization (I asked about SPARC and gcc because there''s > a different type of tail-call cleverness there that I''m trying to avoid > fixing ;-). > > I''ve filed this bug to track the issue: > > 6589130 dtrace -G fails for certain tail-calls on x86 > > ---8<--- > > [ahl 8.3.2007] > > Consider the following object code: > > sge_gdi_ctx_class_error+0x10f: 41 5e popq %r14 > sge_gdi_ctx_class_error+0x111: 41 5d popq %r13 > sge_gdi_ctx_class_error+0x113: 41 5c popq %r12 > sge_gdi_ctx_class_error+0x115: 5d popq %rbp > sge_gdi_ctx_class_error+0x116: 5b popq %rbx > sge_gdi_ctx_class_error+0x117: e9 00 00 00 00 jmp +0x0 > > The code in dt_modtext() needs to allow for jmp instructions as well as calls. For a jmp it will need to turn it into a ret followed by a bunch of nops. > > This would also be a good opportunity to provide a better error message identifying the site of the problematic instruction. > > A workaround for this is to force the compiler to not tail-call the DTrace probe dummy function by adding some code after it. > *** (#1 of 1): 2007-08-03 11:18:23 PDT adam.leventhal at sun.com > > --->8--- > > Adam > >
On Fri, Aug 03, 2007 at 11:22:47AM -0700, Daniel Templeton wrote:> Great! At least there''s an explanation for it. Right now I''m using > sleep(0) as the code to prevent the tail-calling. Can you recommend a > better choice, i.e. lower impact?Something like this should be a bit cheaper: ... { volatile int dummy; ... dummy = 0; } Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl