Hello,
Our system is sunfire v890 running  solaris 10 08/07 release. The problem we are
experiencing is cpu hog process.
I find the pattern using ''truss -f -p pid'' is as following:
 
lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
27082:  pollsys(0xFFBFD560, 17, 0xFFBFF5C8, 0x00000000) = 1
27082:  lwp_sigmask(SIG_SETMASK, 0xFFBFFAFF, 0x0000FFF7) = 0xFFBFFEFF
[0x0000FFFF]
27082:  sigaction(SIGPIPE, 0xFFBFF388, 0xFFBFF428)      = 0
27082:  pollsys(0xFFBFD3F8, 1, 0xFFBFF460, 0x00000000)  = 1
27082:  sigaction(SIGPIPE, 0xFFBFF388, 0xFFBFF428)      = 0
27082:  recvfrom(386, "\r\n\r\n", 8, 2, 0xFFBFF4B4, 0xFFBFF4C4) = 4
27082:  time()                                          = 1245355435
27082:  time()                                          = 1245355435
27082:  time()                                          = 1245355435
On dev server, which doe NOT have the problem, the truss output is as following:
9050:  lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF
[0x0000FFFF]
19050:  lwp_sigmask(SIG_SETMASK, 0xFFBFFAFF, 0x0000FFF7) = 0xFFBFFEFF
[0x0000FFFF]
19050:  pollsys(0xFFBFEC10, 7, 0xFFBFF048, 0x00000000)  = 0
19050:  lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
[0x0000FFFF]
19050:  pollsys(0xFFBFD308, 13, 0xFFBFF370, 0x00000000) = 1
19050:  lwp_sigmask(SIG_SETMASK, 0xFFBFFAFF, 0x0000FFF7) = 0xFFBFFEFF
[0x0000FFFF]
19050:  sigaction(SIGPIPE, 0xFFBFEF98, 0xFFBFF038)      = 0
19050:  write(115, " 0 0 7 5 1   - 1   2 0  ".., 79)    = 79
19050:  sigaction(SIGPIPE, 0xFFBFEF98, 0xFFBFF038)      = 0
19050:  time()                                          = 1245355630
19050:  time()                                          = 1245355630
19050:  time()                                          = 1245355630
Looks like the difference is after sigaction, the problematic process is in dead
loop while the normal one is taking write action.
I tried dtrace to capture the difference as following:
#!/usr/sbin/dtrace -qs
proc:::signal-send
/args[2] == SIGPIPE/
{
??????? printf ( "SIGPIPE was sent by %s pid = %d \n",
args[1]->pr_fname,args[1]->pr_pid);
}
This doesn''t work as SIGPIPE is not a signal. Could anyone suggest the
right approach?
Thanks in advance,
zhu
      __________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your
favourite sites. Download it now
http://ca.toolbar.yahoo.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20090618/34f3fec3/attachment.html>
Hello,> > Our system is sunfire v890 running solaris 10 08/07 release. The problem we > are experiencing is one process is taking too much cpu resource. > > I find the pattern using ''truss -f -p pid'' is as following: > > lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF] > 27082: pollsys(0xFFBFD560, 17, 0xFFBFF5C8, 0x00000000) = 1 > 27082: lwp_sigmask(SIG_SETMASK, 0xFFBFFAFF, 0x0000FFF7) = 0xFFBFFEFF > [0x0000FFFF] > 27082: sigaction(SIGPIPE, 0xFFBFF388, 0xFFBFF428) = 0 > 27082: pollsys(0xFFBFD3F8, 1, 0xFFBFF460, 0x00000000) = 1 > 27082: sigaction(SIGPIPE, 0xFFBFF388, 0xFFBFF428) = 0 > 27082: recvfrom(386, "\r\n\r\n", 8, 2, 0xFFBFF4B4, 0xFFBFF4C4) = 4 > 27082: time() = 1245355435 > 27082: time() = 1245355435 > 27082: time() = 1245355435 > > On dev server, which doe NOT have the problem, the truss output is as > following: > 9050: lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF > [0x0000FFFF] > 19050: lwp_sigmask(SIG_SETMASK, 0xFFBFFAFF, 0x0000FFF7) = 0xFFBFFEFF > [0x0000FFFF] > 19050: pollsys(0xFFBFEC10, 7, 0xFFBFF048, 0x00000000) = 0 > 19050: lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF > [0x0000FFFF] > 19050: pollsys(0xFFBFD308, 13, 0xFFBFF370, 0x00000000) = 1 > 19050: lwp_sigmask(SIG_SETMASK, 0xFFBFFAFF, 0x0000FFF7) = 0xFFBFFEFF > [0x0000FFFF] > 19050: sigaction(SIGPIPE, 0xFFBFEF98, 0xFFBFF038) = 0 > 19050: write(115, " 0 0 7 5 1 - 1 2 0 ".., 79) = 79 > 19050: sigaction(SIGPIPE, 0xFFBFEF98, 0xFFBFF038) = 0 > 19050: time() = 1245355630 > 19050: time() = 1245355630 > 19050: time() = 1245355630 > > Looks like the difference is after sigaction, the problematic process is in > dead loop while the normal one is taking write action. > > I tried dtrace to capture the difference as following: > #!/usr/sbin/dtrace -qs > proc:::signal-send > /args[2] == SIGPIPE/ > { > printf ( "SIGPIPE was sent by %s pid = %d \n", > args[1]->pr_fname,args[1]->pr_pid); > } > > This doesn''t work as SIGPIPE is not a signal. Could anyone suggest the right > approach? > > Thanks in advance, > > zhu > > > > > Looking for the perfect gift? Give the gift of Flickr! > <http://www.flickr.com/gift/> > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20090712/90076e05/attachment.html>
> > I tried dtrace to capture the difference as following: > > #!/usr/sbin/dtrace -qs > > proc:::signal-send > > /args[2] == SIGPIPE/ > > { > > printf ( "SIGPIPE was sent by %s pid = %d \n", > > args[1]->pr_fname,args[1]->pr_pid); > > } > > > > This doesn''t work as SIGPIPE is not a signal. Could anyone suggest the right > > approach? > > > > Thanks in advance, > >Not sure what you mean by "SIGPIPE is not a signal", but you might want to investigate "proc::psig:signal-handle". Rennie