I am not succeeding in doing this. Opensolaris.sh has option ( D and F) also bldenv also has an option ( -d) I tried setting either, neither, both.. Coincidentally, if bldenv is run with -d it outputs a burb mentioning that a debug build is configured. If this option is not given on the command line, the blurb says it is a release build, regardless of the debug flag settings in opensolaris.sh Built full nightly, incremental nightly, and a subset build ( make clean; make) from .../usr/src/lib. No cigar, the resulting *.so has no debug information. (Not stripped, but no debug info). I figured I ask before I start looking for the needle in the haystack, maybe I am missing something obvious. Another thing, ztest will load its libraries from the default library location ( typically /usr/lib), which is probably not what a developer would want, esp. if a random zfs version happens to be already installed on the dev system. Fortunately, this one has an easy solution: set the env variable LD_LIBRARY_PATH to the preferred load location ( typically somewhere in the dev workspace). A key point, the location must end in a semicolon. The semicolon causes the location specified to be searched _before_ the default. Perhaps, the build environment should set this, or use "YP," to force library searches to start inside the build output area where the so-s go. Also, as per docs, the opensolaris.sh "t" option is supposed to "build and use" the .../usr/src/tools. Well, it just uses the location, but does not build them if they are not already there. No big deal, they can be built from the aforementioned dir. -- This message posted from opensolaris.org
I''ve run into this same issue recently and I''m working with someone in the Sun Studio group to help me figure it out, but so far, no luck. I thought that the only thing I needed to do was to make sure that the compiler and the linker were being executed with the -g option, but I''ve done that and I still can''t get dbx to recognize the dynamic libraries has having debugger info. So if I get the answer, I''ll post it here. In the meantime, if anyone else knows the trick, I''d appreciate learning it. Lori On 06/11/09 14:12, Steve Gonczi wrote:> I am not succeeding in doing this. > > Opensolaris.sh has option ( D and F) also > bldenv also has an option ( -d) I tried setting either, neither, both.. > Coincidentally, if bldenv is run with -d it outputs a burb mentioning that a debug build is configured. > If this option is not given on the command line, the blurb says it is a release > build, regardless of the debug flag settings in opensolaris.sh > > Built full nightly, incremental nightly, and a subset build ( make clean; make) > from .../usr/src/lib. No cigar, the resulting *.so has no debug information. > (Not stripped, but no debug info). > > I figured I ask before I start looking for the needle in the haystack, maybe I am missing something obvious. > > Another thing, ztest will load its libraries from the default library location ( typically /usr/lib), which is > probably not what a developer would want, esp. if a random zfs version happens > to be already installed on the dev system. > > Fortunately, this one has an easy solution: > set the env variable LD_LIBRARY_PATH to the preferred load location ( typically > somewhere in the dev workspace). A key point, the location must end in a semicolon. > > The semicolon causes the location specified to be searched _before_ the default. > > Perhaps, the build environment should set this, or use "YP," to force library > searches to start inside the build output area where the so-s go. > > Also, as per docs, the opensolaris.sh "t" option is supposed to "build and use" the .../usr/src/tools. > Well, it just uses the location, but does not build them if they are not already > there. No big deal, they can be built from the aforementioned dir. >
Lori Alt wrote:> I''ve run into this same issue recently and I''m working with someone in > the Sun Studio group to help me figure it out, but so far, no luck. > I thought that the only thing I needed to do was to make sure that the > compiler and the linker were being executed with the -g option, but > I''ve done that and I still can''t get dbx to recognize the dynamic > libraries has having debugger info. > > So if I get the answer, I''ll post it here. In the meantime, if anyone > else knows the trick, I''d appreciate learning it.Are the binaries still gettting stripped? Run file path/to/libzpool.. It should tell you if it''s stripped or not.. /* Shameless self promotion Our project (OSUNIX) is a work-in-progress, but makes all this a lot easier. In theory to do this for osunix you''d just have to change a simple build configuration file and pmerge -1 libzpool.. You would probably want to make a snapshot and new boot environment before this, which in the future should be automatically.. You could also test any patches by changing only one line in the build script (This is convenient for testing webrev.. etc) Handling this cleanly was in my short lightening talk at CommunityOne.. After our next release I''ll post a tutorial on this.. */ ./C
On 06/11/09 16:28, C. Bergstr?m wrote:> Lori Alt wrote: >> I''ve run into this same issue recently and I''m working with someone >> in the Sun Studio group to help me figure it out, but so far, no luck. >> I thought that the only thing I needed to do was to make sure that >> the compiler and the linker were being executed with the -g option, >> but I''ve done that and I still can''t get dbx to recognize the dynamic >> libraries has having debugger info. >> >> So if I get the answer, I''ll post it here. In the meantime, if >> anyone else knows the trick, I''d appreciate learning it. > Are the binaries still gettting stripped? Run file path/to/libzpool.. > It should tell you if it''s stripped or not..In my case, the binaries aren''t stripped. I had checked on that. But I did get an answer from the Sun Studio project: the problem I''ve been seeing (which might not be the same problem that Steve reports, but could be) appears to be a known bug (6823053). I don''t know the prospects for a fix, but there''s a workaround, which is to modify the compiler flags to change this: -xdebugformat=stabs to this: -xdebugformat=dwarf Also make sure the compiler is called with -g. This worked for me. But I have no idea what other effects this change may have, so use at your own risk. Lori
Thanks for the replies. I tried switching to dwarf.. no sucess. I see what is happening. The last step in the scheme of the build system is running ctfconvert on the various *.o files, predictably converting the debug info to ctf format. The ctf format AFAIK is primarily for mdb. Mdb is a fine tool for assembler/kernel debugging. Source debugging is preferable for some work, at least for me. I do not know if there is an easy fix for this issue, my solution is to hack an option, probably an env variable to disable the ctfconvert pass. Ctf does not appear to be useful for any of the source debuggers (dbx or gdb). BTW switching to dwarf reveales what appears to be a build anomaly... Apparently, ctfconvert runs twice on at least one of the .o files. This seems to cause no problem with the stabs format, but breaks the build when dwarf is used, and it tries to convert lib/libnsl/amd64/pics/rpc_comdata1.o twice. -- This message posted from opensolaris.org
Steve Gonczi wrote:> Thanks for the replies. I tried switching to dwarf.. no sucess. > > I see what is happening. The last step in the scheme of the build system is running ctfconvert > on the various *.o files, predictably converting the debug info to ctf format. >After you bldenv -d ./opensolaris.sh you can manually change a few things in the env.. (not normally recommended) - export CTFCONVERT=/opt/onbld/bin/i386/ctfconvert - export CTFMERGE=/opt/onbld/bin/i386/ctfmerge - export CTFSTABS=/opt/onbld/bin/i386/ctfstabs + export CTFCONVERT=/bin/true + export CTFMERGE=/bin/true + export CTFSTABS=/bin/true This way the build continues, but ctf* doesn''t actually do anything.. ./C
Awesome, thanks. Much easier than what I was going to do. I am going to go back to using stabs for this, I think. -- This message posted from opensolaris.org
This proves to be unexpectedly difficult to solve. I find it hard to fathom, that nobody at SUN runs dbx on the various usermode components of the OS, specifically the ztest usermode exerciser. I believe this forum proposes to be the right place to ask this question: Is there a working [easy] way to build dbx-debuggable userland components without hacking the build environment? I was hoping to get a definitive answer from someone who does this on a daily basis. -- This message posted from opensolaris.org
Steve Gonczi wrote:>This proves to be unexpectedly difficult to solve. > >I find it hard to fathom, that nobody at SUN runs dbx on the various usermode components of the OS, >specifically the ztest usermode exerciser. > >I believe this forum proposes to be the right place to ask this question: > >Is there a working [easy] way to build dbx-debuggable userland components >without hacking the build environment? > >I was hoping to get a definitive answer from someone who does this on a daily basis. >I feel your pain. I''ve been fighting this battle for a week. I do Solaris debugging on a daily basis, but only recently started working on the zfs userland components. I too am surprised that this isn''t getting wider attention. (I should say that I don''t think this problem is limited to the zfs userland components. I think it affect all userland code.) I hope someone comes up with an answer to your question because all I can do is offer more details about how I hacked the environment to get this to work. But I DID get it to work. If you''re interested, I can give you all the steps I used (I left out the gory details in my earlier mail because I hoped that just setting the debug format to "dwarf" would do the trick). Let me know and i''ll write them down (it''s not THAT bad. It''s a kludge, but it''s an easy kludge to try.) In the meantime, I''m going to do what I can to raise the visibility of this problem. The bug that appears to be at the root of at least my dbx problems is 6823053. I''m going to raise the priority of it and add some comments to it. Lori
By all means, please share your hacks. I am not sure why switching to dwarf would make any difference, given that stabs is the native Sun format, dbx can surely handle it. I could see the point if you are using a recent gdb maybe. For me, the ctf* utilities are the likely problem, but seems I can not disable running these because then some of the dynamically generated header files do not get built. -- This message posted from opensolaris.org
On 06/12/09 11:48, Steve Gonczi wrote:> By all means, please share your hacks. > > I am not sure why switching to dwarf would make any difference, given that stabs is the native Sun format, dbx can surely handle it.From what I''ve learned, the bug isn''t in dbx. Yes, it can read stabs. But somehow the compilation process isn''t generating them correctly.> I could see the point if you are using a recent gdb maybe. > > For me, the ctf* utilities are the likely problem, but seems I can not disable running these because then some of the dynamically generated header files do not get built. >First, I did a full build of the entire workspace. Now cd to the <workspace>/usr/src/lib/libzfs directory (You will need to go the libzpool directory). % make clobber % make install > doit 2>&1 Now edit the file "doit" to a) remove all lines beginning with "+" b) modify the lines that simply name a directory to prefix them with "cd". i.e., this line: /builds/lalt/onnv/usr/src/lib/libzfs/i386 becomes: cd /builds/lalt/onnv/usr/src/lib/libzfs/i386 c) globally replace the string: -xdebugformat=stabs with -xdebugformat=dwarf Now, while your current directory is <repo>/usr/src/lib/libzfs, execute "doit" as a script. At this point, I can now run a problem under dbx which accesses libzfs.so.1 (you might have to set LD_LIBRARY_PATH) to make sure you''re using the one you just built, not whatever is in /lib) and debug functions in the libzfs.so.1 library. That is, I no longer get messages like: (dbx) stop in zfs_send dbx: warning: ''zfs_send'' has no debugger info -- will trigger on first instruction I think you may be right that it''s the ctf* functions (which run silently during post processing) that are run by "make" that end up removing the debugger info. Doing the compiles and links explicitly with a script like this suppresses that step. Yes, I''m sure that there are cleaner ways to get around this, but I''m tired of experimenting and this works for me. Lori
One problem with this build system, perhaps it is not fully implemented. Its doc states it was a goal for it to allow building various sub-components by switching to the sub-component''s predicatable directory, and building said subcomponent from there, by typing make/dmake [all]. This fully works for some components ( e.g. tools) and "sort of" works for others. E.g.: you can go to usr/src/lib and type "make" or "make libzfs" and something will run without complaining. But dependencies do not get picked up, e.g. if you wipe out any *.o files in ../usr/src/lib/libzpool/amd64/pics none of these will get rebuilt. Yet if you do an incremental nightly build, they will be. I am experimenting with the previously mentioned permutations, like forcing dwarf format, disabling the ctf conversion. First do a full nightly build with the "standard" build settings, then wipe out the desired .o files and do an incremental nightly build with the various hacked settings. -- This message posted from opensolaris.org
Thank you for posting this. Let me recap your key insights: 1) A successful full nightly build is pre-requisite 2) You can build a specific library from its lib/libxxx directory via make clobber; make install. (I did notice make clean did not work but make clobber did. I was unaware of the "install" target. 3) dwarf works better than stabs Any reason why redefining the preferred debug format in Makefile.master before step 2 to dwarf would not work? -- This message posted from opensolaris.org
I did the full nightly, then edited the Makefile.master debug setting so it generates dwarf debug info. Killed the CTF* utilities as outlined previously in this thread. Recompiled libzpool and libzfs from their usr/src/lib/libz* directories ( make clean|clobber, usually try both, then make or make install - also try both). After the successful compile, set LD_LIBRARY_PATH to where the freshly baked libzpool lives. Do not forget the semicolon after the search path, must be prefixed by \ else the shell strips it. I did not need to capture, and edit the make output. Things are mostly working now. It is beyond me why this would not work with stabs. The only remaining thing is to turn off optimizing, so I can look at local variables. Thanks for all of you who helped out. Cheers, Steve -- This message posted from opensolaris.org
On Fri, Jun 12, 2009 at 03:04:55PM -0700, Steve Gonczi wrote:> I did the full nightly, then edited the Makefile.master debug setting so it generates dwarf debug info. Killed the CTF* utilities as outlined previously in this thread. > Recompiled libzpool and libzfs from their usr/src/lib/libz* directories ( make clean|clobber, usually try both, then make or make install - also try both). > After the successful compile, set LD_LIBRARY_PATH > to where the freshly baked libzpool lives. Do not forget the semicolon after the search path, must be prefixed by \ else the shell strips it. > > I did not need to capture, and edit the make output. > > Things are mostly working now. It is beyond me why this would not > work with stabs. The only remaining thing is to turn off optimizing, > so I can look at local variables.Just jumping in here after the fact, but I thought I''d give some background: Most of the engineers working in ON got started with kernel programming on Solaris, and so have a large amount of background with MDB and assembly-level debugging. The CTF data provides all of the structure printing you might need using MDB, and the experience with assembly provides the "map stack trace to C program locations", along with figuring out which local variables are where, etc. Since most kernel engineers aren''t familiar with DBX, there''s not a huge community clamoring for the ability to use it. The CTF tools process the debugging information (stabs or dwarf) in order to generate the C type information MDB uses. By default, they also strip all other debugging information from the binaries, since they bloat the binary sizes and aren''t wanted in our shipping products. Unfortunately, there is no easy way to add the ''-g'' flag to disable that stripping, except for: CTFCVTFLAGS=''-i -L VERSION -g'' CTFMGRFLAGS=''-g'' dmake install I think that a set of simple environment settings which would enable this for a build or part there-of would be useful, so you could do something like: (after a full nightly build) cd usr/src/lib/libzpool dmake clobber KEEP_DEBUG_INFO=yes DEBUG_TYPE=dwarf dmake install and have both CTF and dwarf info would be a good RFE. Cheers, - jonathan
Hi Jonathan, Thanks for providing us with a SUN developer''s perspective, and for the new info on config tweaks. I have gained a tiny bit more insight on this: The debug (-g) option is already set in all the compiles, given that the requisite bldenv is set up as debug. It is currently unclear whether the flag is set set in opensolaris.sh or as a bldenv command line option, but setting both of these certainly works. Because the make option "-e" set by bldenv, any macro set in Makefile.master can be simply over-ridden by just setting it in the current build shell. This means, no editing of Makefile.master is necessary. E.g. in my environment (tcsh) I simply issue: setenv DEBUGFORMAT -xdebugformat=dwarf; setenv COPTFLAG; setenv COPTFLAG64; This effectively shuts off all optimizations, and switches to DWARF output for any subsequent debug userland builds. In addition, run dbx via ddd --dbx, and we have ourselves a neat gui source debug environment. Gdb does not seem to work with the SUN toolchain. For gdb afficionados, dbx commands can be munged into a gdb-like command set via .dbxrc macros. Absolutely no disrespect to mdb, I love the thing for kernel/assembler debugging. However, stepping through source code, walking the stack frames, seeing local variables effortlessly is still a significant plus. What I would ultimately like to see is a 2 machine kernel source debug environment similar to the Linux kgdb, or the IBM AIX kdbx environment. Who knows, maybe an Ethernet gdb kernel stub could be implemented in mdb.. -- This message posted from opensolaris.org
On Mon, Jun 15, 2009 at 12:10:12PM -0700, Steve Gonczi wrote:> Absolutely no disrespect to mdb, I love the thing for kernel/assembler > debugging. > > However, stepping through source code, walking the stack frames, > seeing local variables effortlessly is still a significant plus. > > What I would ultimately like to see is a 2 machine kernel source > debug environment similar to the Linux kgdb, or the IBM AIX kdbx > environment. Who knows, maybe an Ethernet gdb kernel stub could be > implemented in mdb..My experience with the gdb protocol has been very negative; it''s very poorly designed, and prone to breakage. While some kind of kmdb(1)-over-ethernet might be very cool, you''re not likely to see a lot of excitement from Solaris engineering for a source-level kernel debugger; in general, it saves you a very minor amount of your total debugging time, since it''s straightforward (if slightly time consuming) to backtrack from the assembly. Figuring out how the datastructures are corrupted and how they got that way is where you spend most of your time. Also, given that we''ve discovered compiler bugs on innumerable occasions, there''s not a lot of trust that the line-number/local variable information provided by the compiler is accurate enough, and turning off optimization to get local variables changes the compiler output significantly compared to what our customers are actually running, and can mask many subtle bugs. Source-level debugging also adds a dependency from "binaries running on system" to "corresponding source on other system" which will be hard to get right in practice. So in summary, getting good source-level kernel debugging is viewed as a large time investment for dubious gain. We''d rather have smarter mdb dcmds or (for example) a better surface syntax for MDB than something which won''t actually make the bug analysis we do on a day-to-day basis measurably easier. That''s not to say you''d be wasting your time trying to get something like this up-and-running, or that it wouldn''t be useful to get better DBX support for userland debugging. Just understand that (especially for kernel source-level support), the arguments have already happened several times in the past, and you''re not likely to change the consensus. (I could see DBX/gdb support for device driver writers to be something that someone might find useful. But Solaris engineering probably wouldn''t find it so.) Cheers, - jonathan
> My experience with the gdb protocol has been very > negative; it''s > very poorly designed, and prone to breakage.And, with a little luck, nobody from the GNU camp is reading this list :-)> While some kind of kmdb(1)-over-ethernet might be > very cool...Yes.. given that trying to get a KMDB session to work through LOM is such a pain and many new x86 hardware platforms do not have a serial port, that would be wonderful.> straightforward (if > slightly time consuming) to backtrack from the > assembly.I have one significant pain point with mdb, perhaps it is just my ignorance: I find it difficult to reconstruct stack frames, ie. local variables on 64 bit amd. A bit off topic, but is there an easy way to obtain at least the rbp values for the prior stack frames? E.g. the AIX kdb has an option for their stack backtrace to print register state for each stack frame on the stack. To be fair, kudos to the folks at SUN for recognizing the need of field supportability, and giving us mdb. Also, dtrace is sheer goodness, and I do not have _much_ to complain about. Cheers, Steve -- This message posted from opensolaris.org
On Mon, Jun 15, 2009 at 02:35:10PM -0700, Steve Gonczi wrote:> > My experience with the gdb protocol has been very > > negative; it''s > > very poorly designed, and prone to breakage. > > And, with a little luck, nobody from the GNU camp is reading this list :-)I think my experience is all well-known; the lack of versioning or even verifying that the architectures both sides are using match, etc. It''s just the way the protocol grew.> > While some kind of kmdb(1)-over-ethernet might be > > very cool... > > Yes.. given that trying to get a KMDB session to work through LOM is such > a pain and many new x86 hardware platforms do not have a serial port, > that would be wonderful. > > > straightforward (if > > slightly time consuming) to backtrack from the > > assembly. > > I have one significant pain point with mdb, perhaps it is just my ignorance: > > I find it difficult to reconstruct stack frames, ie. local variables > on 64 bit amd. A bit off topic, but is there an easy way to obtain at > least the rbp values for the prior stack frames?Yeah, for any given line of ::findstack -v or $C, you have:> ffffff014bdab900::findstack -vstack pointer for thread ffffff014bdab900: ffffff00041dbcc0 [ ffffff00041dbcc0 _resume_from_idle+0xf1() ] ^^^^^^^^^^^^^^^^ %rsp ^^^ PC ffffff00041dbcf0 swtch+0x147() ^ rbp ^ PC (frame 1) ffffff00041dbd50 cv_wait_sig_swap_core+0x170(ffffff014bdabade, ffffff014bdabae0, 0) ^ rbp ^ PC (frame 2) ffffff00041dbd70 cv_wait_sig_swap+0x18(ffffff014bdabade, ffffff014bdabae0) ^ rbp ^ PC (frame 3) ffffff00041dbde0 cv_waituntil_sig+0x13c(ffffff014bdabade, ffffff014bdabae0, 0 ^ rbp ^ PC (frame 4) ... etc. The main problem in AMD64 is backtracking the callee-saved variables, but at least the function arguments are saved for kernel calls. Cheers, - jonathan> E.g. the AIX kdb has an option for their stack backtrace to print > register state for each stack frame on the stack. > > To be fair, kudos to the folks at SUN for recognizing the need of > field supportability, and giving us mdb. Also, dtrace is sheer > goodness, and I do not have _much_ to complain about.Understood. Cheers, - jonathan
I see. The first number (which I always assumed was just the numeric equivalent of the function+offset shown in the next column) is the frame pointer for the associated stack frame. Presumably, the first line findstack prints is a typo: it calls the value "stack pointer" I think it is really "frame pointer" ie it is rbp, not rsp. Thank you very much for the info. Steve -- This message posted from opensolaris.org
On Mon, Jun 15, 2009 at 04:58:27PM -0700, Steve Gonczi wrote:> I see. > > The first number (which I always assumed was just > the numeric equivalent of the function+offset shown in the next column) > is the frame pointer for the associated stack frame. > > Presumably, the first line findstack prints is a typo: it calls the > value "stack pointer" I think it is really "frame pointer" ie it is > rbp, not rsp.It''s really a matter of terminology; it''s called a "stack pointer" because it''s a value you pass to "$C" or "::stack" to get a stack out. On SPARC, the stack pointer is *always* pointing to a stack frame, since that''s where the register window is saved (register windows are *FUN*), and the term has stuck, even though on x86 "frame pointer" might be a better term. Cheers, - jonathan
I managed to get the ztest debug compile, using the technique written up in this thread. One thing is still not working: Trying to apply the same methodology to an userland command (e.g. fashioning a debug-able zpool command) does not work. No matter what I do, my freshly recompiled zpool command insists on loading its libraries from the system locations ( /lib and /usr/lib) and not my workspace location (where the debug libraries are). I have a LD_LIBRARY_PATH set to where the debug lib''s are ( actually 2 locations separated by ":" and terminated with a "\;". Using the same settings I use for ztest, works like a charm there. If you have managed to debug any of the usermode commands, please share your experience. I continue looking at this, my current theory is maybe the library search location is made fixed during the build. Steve -- This message posted from opensolaris.org
On Thu, Jun 18, 2009 at 10:57:45AM -0700, Steve Gonczi wrote:> I managed to get the ztest debug compile, using the technique written up in this thread. > > One thing is still not working: Trying to apply the same methodology to an userland > command (e.g. fashioning a debug-able zpool command) does not work. > > No matter what I do, my freshly recompiled zpool command insists on loading its > libraries from the system locations ( /lib and /usr/lib) and not my workspace > location (where the debug libraries are). > > I have a LD_LIBRARY_PATH set to where the debug lib''s are ( actually 2 locations > separated by ":" and terminated with a "\;". Using the same settings I use for ztest, > works like a charm there.What does: ldd /path/to/zpool output? You should generally use: LD_LIBRARY_PATH_32=/path1:/path2 for 32-bit libraries and binaries, and LD_LIBRARY_PATH_64=/path1/64:/path2/64 (where 64 could also be "amd64" or "sparcv9", depending upon your ISA) There should be no '';''s in LD_LIBRARY_PATH.> If you have managed to debug any of the usermode commands, please share > your experience. > > I continue looking at this, my current theory is maybe the library > search location is made fixed during the build.Unless: elfdump -d /path/to/zpool contains a "RUNPATH" or "RPATH" line, the search location is not fixed. The default build environment does not set a runpath. Is your zpool binary setuid? That will turn off LD_LIBRARY_PATH* searching. Cheers, - jonathan> > Steve > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code
The problem turns out to be 32/64 bit. The zpool command uses 32 bit libraries, and I was only setting the 64 bit locations. The minute I did a "file" on the libraries zpool was loading, this became obvious. Thanks for pointing out the 32/64 LD_LIBRARY_PATH variants, and the elfdump trick to see if a search location was set. I got the semicolon idea from the ld man pages. (Perhaps I was wrongly assuming that it also had some effect on the run-time library path search order) Paraphrasing the man page: For ld processing, including a semicolon in the LD_LIBRARY_PATH causes the path(s) preceeding the semi to be searched first (before the -L paths). Without the semi, the entire LD_LIBRARY PATH is searched _after_ processing the -L options. Cheers, Steve -- This message posted from opensolaris.org
Jonathan Adams wrote:> On Fri, Jun 12, 2009 at 03:04:55PM -0700, Steve Gonczi wrote: > >> I did the full nightly, then edited the Makefile.master debug setting so it generates dwarf debug info. Killed the CTF* utilities as outlined previously in this thread. >> Recompiled libzpool and libzfs from their usr/src/lib/libz* directories ( make clean|clobber, usually try both, then make or make install - also try both). >> After the successful compile, set LD_LIBRARY_PATH >> to where the freshly baked libzpool lives. Do not forget the semicolon after the search path, must be prefixed by \ else the shell strips it. >> >> I did not need to capture, and edit the make output. >> >> Things are mostly working now. It is beyond me why this would not >> work with stabs. The only remaining thing is to turn off optimizing, >> so I can look at local variables. >> > > Just jumping in here after the fact, but I thought I''d give some background: > > Most of the engineers working in ON got started with kernel > programming on Solaris, and so have a large amount of background > with MDB and assembly-level debugging. The CTF data provides > all of the structure printing you might need using MDB, and > the experience with assembly provides the "map stack trace to C > program locations", along with figuring out which local variables > are where, etc. > > Since most kernel engineers aren''t familiar with DBX, there''s not a > huge community clamoring for the ability to use it. > > The CTF tools process the debugging information (stabs or dwarf) > in order to generate the C type information MDB uses.A small nit on this thread and also a wild thought.. 1) It''s stab (not stabs - I''ve been corrected on this before) And there are several sun doc pages that need updating as well. 2) While I''m unaware of an open source compiler and debugger with dwarf3 support, but it''s supposed to handle compression well. [1] How this compares to CTF would have to be tested. [1] http://reality.sgiweb.org/davea/dwarf3features.pdf ./C --- OSUNIX - Built from the best of OpenSolaris Technology http://www.osunix.org
Hello, 1) Ztest expects to run from /usr/bin. It has a hard-coded assumption as to its location. 2) To debug it with some success, it should be linked with the -mt option, because it uses multiple lwp-s. Currently, it is not being built with that. Cheers Steve -- This message posted from opensolaris.org
Here is another, interesting wrinkle: Looking at /usr/bin and /usr/sbin, I notice that a whole bunch of seemingly unrelated utilities appear to be just hard links to a shared file. ( ls -il reveals a shared inode number, same size, and the same link count to groups of them). I am guessing that some groups of these utilities go through common front end code, that then dispatches to the correct bits based on argv0. Could someone confirm this? This would explain why recompiling ztest and zdb and copying the new bits into /usr/bin and /usr/sbin respectively resulted in a whole bunch of my utilities "becoming zdb". -- This message posted from opensolaris.org
On 02.07.09 19:11, Steve Gonczi wrote:> Here is another, interesting wrinkle: > > Looking at /usr/bin and /usr/sbin, I notice that a whole bunch of seemingly unrelated > utilities appear to be just hard links to a shared file. ( ls -il reveals a shared > inode number, same size, and the same link count to groups of them). > > I am guessing that some groups of these utilities go through common front end code, > that then dispatches to the correct bits based on argv0. Could someone confirm this? > > This would explain why recompiling ztest and zdb and copying the new bits into /usr/bin > and /usr/sbin respectively resulted in a whole bunch of my utilities "becoming zdb".this is isaexec you should put your recompiled zdb/ztest into appropriate directory like /usr/bin/i86 /usr/bin/amd64 /usr/sbin/i86 /usr/sbin/amd64 /usr/sbin/sparcv9 /usr/sbin/sparcv9 etc Victor
Steve Gonczi wrote:> Here is another, interesting wrinkle: > > Looking at /usr/bin and /usr/sbin, I notice that a whole bunch of seemingly unrelated > utilities appear to be just hard links to a shared file. ( ls -il reveals a shared > inode number, same size, and the same link count to groups of them).Lots are hardlinks to /usr/lib/isaexec For why see the isaexec man page.> I am guessing that some groups of these utilities go through common front end code, > that then dispatches to the correct bits based on argv0. Could someone confirm this?Some do that but most of them will be the correct "bitness" versions ie 32 vs 64 bit.> This would explain why recompiling ztest and zdb and copying the new bits into /usr/bin > and /usr/sbin respectively resulted in a whole bunch of my utilities "becoming zdb".zdb is in fact an isaexec link. -- Darren J Moffat
Steve Gonczi wrote:> Here is another, interesting wrinkle: > > Looking at /usr/bin and /usr/sbin, I notice that a whole bunch of seemingly unrelated > utilities appear to be just hard links to a shared file. ( ls -il reveals a shared > inode number, same size, and the same link count to groups of them). > > I am guessing that some groups of these utilities go through common front end code, > that then dispatches to the correct bits based on argv0. Could someone confirm this? > > This would explain why recompiling ztest and zdb and copying the new bits into /usr/bin > and /usr/sbin respectively resulted in a whole bunch of my utilities "becoming zdb".They are links to isaexec, which is a wrapper executable that runs the most appropriate version for the current machines. So zdb points to isaexec which then runs, for example, i86/zdb or amd64/zdb depending on the machine''s architecture. -tim
OK thanks for everyone who commented on the isainfo topic. I am running ztest under dbx control now ( added the -mt option as well). After running for a while, it dies with the message: Running: ztest (process id 103450) child died with signal 5. Bringing up the core: Corefile specified executable: "/usr/bin/amd64/ztest" core file header read successfully t at 530 (l at 530) terminated by signal TRAP (breakpoint trap) 0xfffffd7fff3c8e29: rtld_db_dlactivity+0x0001: movq %rsp,%rbp Current function is gzip_compress 46 if (z_compress_level(d_start, &dstlen, s_start, s_len, n) != Z_OK) { (dbx 11) where current thread: t at 530 [1] rtld_db_dlactivity(0xfffffd7fff3fb1e0, 0x3, 0x1, 0xfffffd7fff3feac8, 0xfffffd7fff3c8e28, 0xfffffd7fff360800), at 0xfffffd7fff3c8e29 [2] 0x40(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x40 [3] lm_move(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3ca5f5 [4] relocate_lmc(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3bc0e4 [5] elf_lazy_load(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3c08de [6] _lookup_sym(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3bf96c [7] lookup_sym(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3bfdf3 [8] elf_bndr(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3d70b1 [9] elf_rtbndr(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff3bb4d4 [10] 0xfffffd7fff350030(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff350030 [11] 0xfffffd7fff350030(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff350030 =>[12] gzip_compress(s_start = 0x40ef800, d_start = 0x4a23400, s_len = 1024U, d_len = 512U, n = 3), line 46 in "gzip.c" [13] zio_compress_data(cpfunc = 7, src = 0x40ef800, srcsize = 1024U, destp = 0xfffffd7fec082ee8, destsizep = 0xfffffd7fec082ed8, destbufsizep = 0xfffffd7fec082ed0), line 115 in "zio_compress.c" [14] zio_write_bp_init(zio = 0x6305640), line 907 in "zio.c" [15] zio_execute(zio = 0x6305640), line 1051 in "zio.c" [16] taskq_thread(arg = 0x4aad00), line 157 in "taskq.c" [17] _thrp_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffab4eb85 [18] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffab4ee40 I have no breakpoints set in the code, Ztest runs to completion when not under dbx control. but when run from the debugger, it always terminates with the above stack. I tried to catch or ignore the signal, or all signals from dbx, makes no difference. Any suggestions what is happening here? I a considering 2 possibilities: 1) dbx is reacting to ztest threads getting killed? 2) This is a bona fide crash, but dbx is interpreting it incorrectly. TIA for any insights. Steve -- This message posted from opensolaris.org
This isssue is beginning to look like a consequence of ztest killing some of its threads. I think signal 5 is really CLD_STOPPED (same as SIGTRAP) Anyone has a suggestion how to get around this in dbx? What I would like to happen is for dbx to not terminate when a thread gets killed. Tried "ignore 5" (no effect). -- This message posted from opensolaris.org