shengliang zhang
2009-Nov-28 05:36 UTC
[dtrace-discuss] how to use dtrace the process of building the device tree in Solaris?
Hi Experts, I want to trace the process of building the device tree in Solaris. How to implement this use dtrace? Thanks, Eric -- This message posted from opensolaris.org
Steve Scargall
2009-Nov-29 22:41 UTC
[dtrace-discuss] how to use dtrace the process of building the device tree in Solaris?
shengliang zhang wrote:> I want to trace the process of building the device tree in Solaris. > How to implement this use dtrace? >Firstly this is an OpenSolaris forum so if your question relates to Solaris 10 then use internal channels. Hint: If you''re not on Solaris 10u6 or higher then you really need to be. There were several put-back''s in to S10u6 specifically around libdevinfo, specifically mt_config_fini(), which greatly improved performance in this area. Secondly, how deep in to the mechanics do you want to go? Is there a problem or are you just curious about the mechanics? I''ll give you a starting place and see how you get on..... Lets assume you want to see what happens to the device tree using "devfsadm -Cv". You can actually learn a lot using truss - "truss -leafdDE -o /var/tmp/truss.devfsadm-Cv.out devfsadm -Cv". The more interesting bit is this: PID/TID -d -D -E Command = return ------- ------ ------ ------ -------------------------------------------------------- (...snip...) 1919/1: 0.1080 0.0002 0.0000 open("/devices/pseudo/devinfo at 0:devinfo", O_RDONLY) = 6 1919/1: 0.1082 0.0002 0.0000 ioctl(6, DINFOIDENT, 0x00000000) = 57311 1919/1: 0.3840 0.2758 0.0077 ioctl(6, 0x20DF2F, 0x080473F0) = 155964 1919/1: 0.3842 0.0002 0.0000 brk(0x080EC110) = 0 1919/1: 0.3842 0.0000 0.0000 brk(0x080FE110) = 0 1919/1: 0.3848 0.0006 0.0004 ioctl(6, DINFOUSRLD, 0x080D6000) = 159744 1919/1: 0.3848 0.0000 0.0000 close(6) = 0 This is where we take a snapshot of the device tree, load it in to user-land and then manipulate it. Note in some older releases of OpenSolaris and Solaris 10 you may see the following instead: 16456/1: 0.2428 0.0015 0.0001 open("/devices/pseudo/devinfo at 0:devinfo", O_RDONLY) = 3 16456/1: 0.2908 0.0480 0.0016 ioctl(3, *0xDF81*, 0xFFBFF688) = 0 16456/1: 0.2911 0.0003 0.0000 close(3) = 0 16456/1: 0.2914 0.0003 0.0001 open("/devices/pseudo/devinfo at 0:devinfo", O_RDONLY) = 3 16456/1: 0.2916 0.0002 0.0000 ioctl(3, 0xDF82, 0x00000000) = 57311 16456/1: 0.4522 0.1606 0.0077 ioctl(3, 0xDF2F, 0xFFBFF214) = 118056 16456/1: 0.4525 0.0003 0.0000 brk(0x00102000) = 0 16456/1: 0.4530 0.0005 0.0002 ioctl(3, *0xDF80*, 0x000E2000) = 122880 16456/1: 0.4532 0.0002 0.0000 close(3) = 0 Refer to http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/devinfo_impl.h for info on the 2nd argument to the ioctl() calls. 46 #define DIIOC (0xdf<<8) 73 #define DINFOUSRLD (DIIOC | 0x80) /* copy snapshot to usrland */ 74 #define DINFOLODRV (DIIOC | 0x81) /* force load a driver */ 75 #define DINFOIDENT (DIIOC | 0x82) /* identify the driver */ Note 0xDF2F is an OR of the following: 60 #define DINFOFORCE (DIIOC | 0x20) /* force load all drivers */ 53 #define DINFOSUBTREE (DIIOC | 0x01) /* include subtree */ 54 #define DINFOMINOR (DIIOC | 0x02) /* include minor data */ 55 #define DINFOPROP (DIIOC | 0x04) /* include properties */ 56 #define DINFOPATH (DIIOC | 0x08) /* include i/o pathing information */ Now go and read the libdevinfo(3LIB) man page. This tells you most of what you want you need to know before delving deeper with DTrace. Also read the source code as it''s reasonably well commented. In OSol and latish releases of S10 you should have the fbt:devinfo:: probes. Find out what you''ve got to play with by listing them: $ dtrace -ln ''fbt:devinfo::'' If, for example, you wanted to see the flow of all the ioctl and devinfo calls that devfsadm calls during its execution use: $ pfexec dtrace -F -o /var/tmp/dtrace.devfsadm-Cv.out \ -n ''syscall::ioctl:entry /execname == "devfsadm"/ {printf("-> %s(%lx,%lx,%lx,%lx,%lx)\n", probefunc,arg0,arg1,arg2,arg3,arg4); self->interested=1;}'' \ -n ''syscall::ioctl:return /self->interested/ {}'' \ -n ''fbt:devinfo:: /self->interested/ {}'' \ -n ''dtrace:::END {self->interest=0;}'' \ -c "devfsadm -Cv" That output, even on a small system, is going to be quite large so you''d be better to do some aggregation to get an idea of where you want to go next. Some nice quantize() graphs on the delta between entry and return probes would be a good idea. You can then drill down on the ones which take the longest. The graphs also give you an idea of what values you need when creating the predicate on the delta in the return probe ie: Provider:Module:Function:return /delta >= some_value/ {do something useful} Most people are only interested in this subject if devfsadm is taking a long time. If this is you, chances are the time will be consumed in one or both of the following places: 3737/1: 0.2940 0.0049 0.0011 open("/devices/pseudo/devinfo at 0:devinfo", O_RDONLY) = 3 *3737/1: 4.0493 3.7553 0.0018 ioctl(3, 0xDF81, 0xFFBFF5C0) = 0 <<<<* 3737/1: 4.0496 0.0003 0.0000 close(3) = 0 3737/1: 4.0499 0.0003 0.0001 open("/devices/pseudo/devinfo at 0:devinfo", O_RDONLY) = 3 3737/1: 4.0502 0.0003 0.0000 ioctl(3, 0xDF82, 0x00000000) = 57311 *3737/1: 7.9125 3.8623 0.0092 ioctl(3, 0xDF2F, 0xFFBFF14C) = 125472 **<<<<* Note that the elapsed time of the ioctl call is very short, but the delta is high. This is because the system call was kicked off CPU waiting for a event to happen. The event is not the ioctl itself but something which happens as a result of the ioctl, typically somewhere within one of the drivers. Using "on-off-cpu.d" you''ll be able to get the stack which caused the thread to be kicked off cpu. From that you can then include further probes and predicates to look deeper. You''ll most likely find that this thread is waiting on a mt_config_fini()->cv_wait(). The condition variable is issued because we have one or more config_grand_children()->mt_config_fini() threads that need to complete. You''ll need a threadlist (or crash dump) to determine how many threads we''re waiting for and why. Hint: use the pstop(), system(), and prun() commands within your D script to allow you to collect the data you want at the appropriate time. The original thread cannot continue until all it''s grand children have completed (ie: mtc_thr_count is zero). This is why it''s kicked off cpu. The most common cause for devfsadm taking a long time is when systems have large numbers disk device targets configured, but not used. The reason is that for each target the driver will send an SCSI INQUIRY packet which has a 250millisecond timeout. It doesn''t take long to rack up multiple seconds, or even minutes, of delay. The easiest way to see this particular case in action is using scsi.d: $ ./scsi.d-1.18 mpt 1 Hit Control C to interrupt 00007.657554003 mpt1:-> 0x12 INQUIRY address 00:00, lba 0x000000, len 0x000030, control 0x00 timeout 60 CDBP 60028113348 1 sched(0) cdb(6) 120000003000 00007.907823387 mpt1:<- 0x12 INQUIRY address 00:00, lba 0x000000, len 0x000030, control 0x00 timeout 60 CDBP 60028113348, reason 0x1 (INCOMPLETE) pkt_state 0x1 state 0x0 Success *Time 250390us* 00007.907969358 mpt1:-> 0x12 INQUIRY address 01:00, lba 0x000000, len 0x000030, control 0x00 timeout 60 CDBP 60028113348 1 sched(0) cdb(6) 120000003000 00008.158201728 mpt1:<- 0x12 INQUIRY address 01:00, lba 0x000000, len 0x000030, control 0x00 timeout 60 CDBP 60028113348, reason 0x1 (INCOMPLETE) pkt_state 0x1 state 0x0 Success *Time 250321us* 00008.158333640 mpt1:-> 0x12 INQUIRY address 02:00, lba 0x000000, len 0x000030, control 0x00 timeout 60 CDBP 60028113348 1 sched(0) cdb(6) 120000003000 00008.408558541 mpt1:<- 0x12 INQUIRY address 02:00, lba 0x000000, len 0x000030, control 0x00 timeout 60 CDBP 60028113348, reason 0x1 (INCOMPLETE) pkt_state 0x1 state 0x0 Success *Time 250313us* (...snip...) If this is you, then reduce the number of available targets configured in the appropriate driver (sd.conf, ssd.conf, mpt.conf, etc) You should have enough info now to go diving in to devinfo. Best of luck. HTH Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20091129/77bd965e/attachment.html>
shengliang zhang
2009-Nov-30 02:37 UTC
[dtrace-discuss] how to use dtrace the process of building the device tree in Solaris?
thanks, it is very helpful for me. BTW, The following steps explain the device tree construction. But it does not handle the pseudo device. So I also want to know how the pseudo device node is added on the device tree. The system goes through the following steps to build the tree: 1. The CPU is initialized and searches for firmware. 2. The main firmware (OpenBoot, Basic Input/Output System (BIOS), or Bootconf) initializes and creates the device tree with known or self-identifying hardware. 3. When the main firmware finds compatible firmware on a device, the main firmware initializes the device and retrieves the device''s properties. 4. The firmware locates and boots the operating system. 5. The kernel starts at the root node of the tree, searches for a matching device driver, and binds that driver to the device. 6. If the device is a nexus, the kernel looks for child devices that have not been detected by the firmware. The kernel adds any child devices to the tree below the nexus node. 7.The kernel repeats the process from Step 5 until no further device nodes need to be created. -- This message posted from opensolaris.org