Kristof Maris - Sun Microsystems Belgium
2006-Dec-04 14:02 UTC
[dtrace-discuss] lsof vs Dtrace
Hi, Can someone on this alias help me with the customer his question? Kind Regards, Kristof -------- Original Message -------- Subject: RE: Case# 37735291 Date: Mon, 04 Dec 2006 14:13:35 +0100 From: ABRAHAM.HOEKSEMA at fortisinvestments.com To: ''Kristof.Maris at Sun.COM'' <Kristof.Maris at Sun.COM> CC: CHRISTOPHE.GAUGE at fortisinvestments.com Hoi Kristof, What Christophe wants to know ... and me too, is: 1. Which port(s) are in use by process <pid>? With "lsof" we could do this using: lsof -p <pid> -a -i 4 2. Which process is using port <port>? With "lsof" we can use: lsof -i 4:<port> The problem with solaris 10 is actually a problem with lsof. In Solaris 10 lsof is not able to properly show/poll the port numbers. This is a known issue, and no workarround for lsof is available. What we would like to know is if there is an alternative tool/command to have access to this information. One tool I can think of is dtrace, but the question is how? Netstat is another one, but unfortunately it does not show any process information. Maybe you have other ideas. Cheers, Bram -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20061204/2e9f6a3a/attachment.html>
Kristof Maris - Sun Microsystems Belgium wrote:> 1. Which port(s) are in use by process <pid>? > With "lsof" we could do this using: lsof -p <pid> -a -i 4pfiles
pfiles is not the answer to questions like: [i]Which process is using port <port>?[/i] Anything else which can be used as lsof replacement? This message posted from opensolaris.org
Andreas Koppenhoefer wrote:> pfiles is not the answer to questions like: > [i]Which process is using port <port>?[/i] > > Anything else which can be used as lsof replacement?Not unless it (finally) arrived while I wasn''t looking :) Get your request added to bug 4616466. - Jeremy
Hi guys 1: The lsof problem with Solaris 10 has a solution: You need to include #include <sys/types32.h> in machine.h and compile. 2: There is an "lsof-i"replacement called "PCP" - see http://www.unix.ms/pcp/ Is bug 4616466 still valid? Thanks Shi This message posted from opensolaris.org
Tori Valdez writes:> Hi guys > > 1: The lsof problem with Solaris 10 has a solution: You need to include #include <sys/types32.h> in machine.h and compile. > > 2: There is an "lsof-i"replacement called "PCP" - see http://www.unix.ms/pcp/Interesting, though lsof does a lot more than just TCP.> Is bug 4616466 still valid?Not sure what "valid" means in this context, but it''s still an open RFE. It''ll be harder to implement than the submitter suggested because no _one_ process necessarily has any given socket open. It''s zero, one, or many processes. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
I think you can compile lsof from source code and run it under Solaris. If I find the link for how to do it, I''ll post it in here later. -- This message posted from opensolaris.org
A few comments to what has already been posted: pfiles It is true that you can find the same information as what lsof is producing by using a script that uses the Solaris ''pfiles'' command. One such example is "PCP" (http://www.unix.ms/pcp/). I have to warn against this. [i]pfiles will stop the process while it is being investigated. Many processes will not like this[/i]. I''ve screwed up quite a few daemons by executing a ''pfiles'' against them while they were running. Bottom line: pfiles is not a solution to the problem on a production system. Period ! lsof. Yes, lsof exists for Solaris and you can get pre-compiled binary for example from www.sunfreeware.com. However (1) It does not work inside a zone (2) it is not stable because it is written against parts of Solaris that are undocumented/evolving/unsupported. Personally I can live with (2), but not with (1) because these days just about everything on Solaris is running inside a zone (at least when you are talking servers in corporate landscape). So unless any of the Dtrace probes will eventually come up with something that can generate this information then I do not see any way forward for Solaris on this one. Solaris needs a solution for this. -- This message posted from opensolaris.org
Pfiles should be rewritten to not stop processes. I had to go look at the code to make sure you were right on this. If lsof can gather open file info without stopping processes, why can''t pfiles do that. Or, if there is some additional reliability/consistency to be gained by stopping the process, perhaps pfiles could be modified to have an option that causes it stop the process, but by default doesn''t, just to be safe. But I''m now off subject for the DTrace list. IMHO, Chip> -----Original Message----- > From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss- > bounces at opensolaris.org] On Behalf Of Lars Bruun-Hansen > Sent: Friday, December 12, 2008 5:05 AM > To: dtrace-discuss at opensolaris.org > Subject: Re: [dtrace-discuss] lsof vs Dtrace > > A few comments to what has already been posted: > > pfiles > It is true that you can find the same information as what lsof is > producing by using a script that uses the Solaris ''pfiles'' command.One> such example is "PCP" (http://www.unix.ms/pcp/). I have to warnagainst> this. [i]pfiles will stop the process while it is being investigated. > Many processes will not like this[/i]. I''ve screwed up quite a few > daemons by executing a ''pfiles'' against them while they were running. > Bottom line: pfiles is not a solution to the problem on a production > system. Period ! > > > lsof. > Yes, lsof exists for Solaris and you can get pre-compiled binary for > example from www.sunfreeware.com. However > (1) It does not work inside a zone > (2) it is not stable because it is written against parts of Solaris > that are undocumented/evolving/unsupported. > Personally I can live with (2), but not with (1) because these days > just about everything on Solaris is running inside a zone (at least > when you are talking servers in corporate landscape). > > So unless any of the Dtrace probes will eventually come up with > something that can generate this information then I do not see any way > forward for Solaris on this one. > > Solaris needs a solution for this. > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Chip Bennett writes:> Pfiles should be rewritten to not stop processes. I had to go look at > the code to make sure you were right on this. If lsof can gather open > file info without stopping processes, why can''t pfiles do that.lsof does it because it reads the volatile kernel structures on the running system. Often that works because things aren''t changing right at the moment when you look at them. But it''s also possible that you get back garbage. pfiles stops the process because it uses the debugging interfaces, just as (say) mdb or gdb.> Or, if there is some additional reliability/consistency to be gained by > stopping the process, perhaps pfiles could be modified to have an option > that causes it stop the process, but by default doesn''t, just to be > safe.The long-term answer, I think, is that we need stable kernel interfaces that will provide the information that lsof needs to work. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
On Fri, Dec 12, 2008 at 09:45:08AM -0500, James Carlson wrote:> Chip Bennett writes: > > Pfiles should be rewritten to not stop processes. I had to go look at > > the code to make sure you were right on this. If lsof can gather open > > file info without stopping processes, why can''t pfiles do that. > > lsof does it because it reads the volatile kernel structures on the > running system. Often that works because things aren''t changing right > at the moment when you look at them. But it''s also possible that you > get back garbage. > > pfiles stops the process because it uses the debugging interfaces, > just as (say) mdb or gdb.But not kmdb. mdb -k can do everything that pfiles can do but without stopping processes. The only problem with th kmdb approach is that not all sockets are (were, now that Volo has integrated?) associated with file structs, so finding open sockets for kernel-land services required quite a bit more work than merely walking the process table. I have a very out of date script lying around that did just that. But the point is that kmdb scripting and lsof are on the same footing.> > Or, if there is some additional reliability/consistency to be gained by > > stopping the process, perhaps pfiles could be modified to have an option > > that causes it stop the process, but by default doesn''t, just to be > > safe. > > The long-term answer, I think, is that we need stable kernel > interfaces that will provide the information that lsof needs to work.I agree. Nico --
Nicolas Williams
2008-Dec-12 16:56 UTC
[dtrace-discuss] Scripted kmdb-based lsof replacement (Re: lsof vs Dtrace)
On Fri, Dec 12, 2008 at 10:28:09AM -0600, Nicolas Williams wrote:> I have a very out of date script lying around that did just that.I found it, but all it does is walk the proc table. Still, it''s way faster than pfiles and almost[*] as good as pfiles. It seems to work on OpenSolaris 2008.11. I wonder if it will still work on a system with Volo. Also, I wonder if Volo will make it possible to more easily find kernel-land sockets. The script is attached. It does run way, way faster than pfiles: % ptime pfexec ./pfilesmdb.ksh \* > /dev/null real 2.355 user 0.319 sys 0.028 % ptree|awk ''{print $1}''|ptime pfexec ksh -c "xargs pfiles" > /dev/null real 37.620 user 0.828 sys 21.126 % [*] My script is based entirely on parsing the output of print ''::walk proc p|::eval "<p::ps; <p::pfiles"!cat''|mdb -k which means that current directory and other such references to files, directories, sockets, ... not from open file descriptor will not be reported. Adding support for those shouldn''t be hard though! Please feel free to do it. Nico -- -------------- next part -------------- #!/bin/ksh PROG=${0##*/} usage () { cat <<EOF Usage: $PROG <RPN boolean expression> -a expr1 expr2 expr1 AND expr2 -o expr1 expr2 expr1 OR expr2 expr1 expr2 .. exprN expr1 AND expr2 .. AND exprN Terms are of the form <attribute>=<pattern>, or just ''*'' to match any file descriptor. Attributes include: ftype (REG, DIR, FIFO, SOCK, CHR, ...) fpath (path, if any) fd (file descriptor number) af (address family) addr (address [source or destination]) saddr (source address) daddr (destination address) port (port [source or destination]) sport (source port) dport (destination port) <pattern> is any Korn Shell glob pattern. Example: Look for IPv4 sockets with source or destination addresses in 10.0.0.0/8 $PROG af=AF_INET address=10.* Example: Look for any regular file in /home $PROG ftype=REG fpath=/home/\* EOF exit 1 } parsefd () { typeset addr port set -- $fdline [[ $# -gt 0 ]] || return 1 fd=$1 ftype=$2 vnode=$3 fpath af saddr sport daddr dport shift 3 if [[ "$ftype" != SOCK ]] then fpath=$1 return 0 fi while [[ $# -gt 0 ]] do which=$1 af=$2 addr port shift 2 for i in addr port do [[ $# -eq 0 || "$1" = remote: ]] && break eval "${i}=\$1" shift done case $which in socket:) saddr=$addr; sport=$port;; remote:) daddr=$addr; dport=$port;; esac [[ $af = AF_UNIX ]] && fpath=${saddr:-${daddr}} done return 0 } fdgrep () { typeset attr pat # Recursive, simple-minded RPN boolean expr parser/evaluator while [[ $# -gt 0 ]] do # OR expr1 expr2 if [[ "$1" = ''-o'' ]] then shift if fdgrep "$1" then return 0 else shift fdgrep "$#" return $? fi fi # AND expr1 expr2 if [[ "$1" = ''-a'' ]] then shift if fdgrep "$1" then shift fdgrep "$@" return 0 else return 1 fi fi # Implicit AND expr1 expr2 .. exprN if [[ $# -gt 1 ]] then if fdgrep "$1" then shift continue else return 1 fi # Path match elif [[ "$1" = \* ]] then return 0 elif [[ "$1" = /\* ]] then [[ -n "$fpath" ]] || return 0 eval "[[ \"\$fpath\" = $1]]" return $? elif [[ "$1" = +([A-Z]) ]] then [[ $ftype = $1 ]] return $? else attr=${1%%=*} pat=${1#*=} case "$attr" in ftype) [[ $ftype = $pat ]]; return $?;; fpath) [[ "$fpath" = $pat ]]; return $?;; fd) [[ $fd = $pat ]]; return $?;; af) [[ $af = $pat ]]; return $?;; addr) [[ "$saddr" = $pat || $daddr = $pat ]]; return $?;; saddr) [[ "$saddr" = $pat ]]; return $?;; daddr) [[ "$daddr" = $pat ]]; return $?;; port) [[ "$sport" = $pat || $dport = $pat ]]; return $?;; sport) [[ "$sport" = $pat ]]; return $?;; dport) [[ "$dport" = $pat ]]; return $?;; *) usage;; esac fi done } False () { return 127 } #typeset -ft parsefd fdgrep #set -x [[ $# -eq 0 ]] && usage header_printed=False print ''::walk proc p|::eval "<p::ps; <p::pfiles"!cat''|mdb -k 2>/dev/null|while read line do case "$line" in S*) pshead="$line" read psline;; FD*) fdhead="line" $header_printed || { print "$pshead $fdhead" header_printed=: } while read fdline do if [[ "$fdline" = S* ]] then read psline break fi parsefd "$fdline" || continue fdgrep "$@" || continue print "$psline $fdline" done ;; esac done
Nicolas Williams writes:> But the point is that kmdb scripting and lsof are on the same footing.Yes, it''s exactly the same thing by slightly different means. The difference, and it''s not much, is that a user of mdb looking at the kernel likely knows that unless the target is stopped (as with mdb -K), the data structures are volatile, and that the contents of most of the structures is undocumented territory, so you need to have cscope open in another window while you try this trick. The user of ''lsof'', on the other hand, is likely an administrator who believes he''s getting the truth about what his system is actually doing. Unfortunately, that faith is at least a little misplaced due to the nature of reading from live kernel memory. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
On Fri, Dec 12, 2008 at 11:59:43AM -0500, James Carlson wrote:> Nicolas Williams writes: > > But the point is that kmdb scripting and lsof are on the same footing. > > Yes, it''s exactly the same thing by slightly different means. > > The difference, and it''s not much, is that a user of mdb looking at > the kernel likely knows that unless the target is stopped (as with mdb > -K), the data structures are volatile, and that the contents of most > of the structures is undocumented territory, so you need to have > cscope open in another window while you try this trick. > > The user of ''lsof'', on the other hand, is likely an administrator who > believes he''s getting the truth about what his system is actually > doing. Unfortunately, that faith is at least a little misplaced due > to the nature of reading from live kernel memory.Of course. However, I think in general administrators will prefer not-quite-the-truth fast over the-truth-seconds-ago slow. (See my follow-up to myself, which includes the script and timing.) Nico --
COW snap-shots for kernel memory. :-))))) Chip> -----Original Message----- > From: dtrace-discuss-bounces at opensolaris.org [mailto:dtrace-discuss- > bounces at opensolaris.org] On Behalf Of James Carlson > Sent: Friday, December 12, 2008 11:00 AM > To: Nicolas Williams > Cc: dtrace-discuss at opensolaris.org; Lars Bruun-Hansen > Subject: Re: [dtrace-discuss] lsof vs Dtrace > > Nicolas Williams writes: > > But the point is that kmdb scripting and lsof are on the same > footing. > > Yes, it''s exactly the same thing by slightly different means. > > The difference, and it''s not much, is that a user of mdb looking at > the kernel likely knows that unless the target is stopped (as with mdb > -K), the data structures are volatile, and that the contents of most > of the structures is undocumented territory, so you need to have > cscope open in another window while you try this trick. > > The user of ''lsof'', on the other hand, is likely an administrator who > believes he''s getting the truth about what his system is actually > doing. Unfortunately, that faith is at least a little misplaced due > to the nature of reading from live kernel memory. > > -- > James Carlson, Solaris Networking > <james.d.carlson at sun.com> > Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 > 2084 > MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 > 1677 > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
BTW, in case I wasn''t clear, I agree that we need interfaces for listing open resources that use locking to produce correct output. I don''t think stopping processes to produce that output is a good idea though. Nico --
Lars Bruun-Hansen wrote:> pfiles > It is true that you can find the same information as what lsof is > producing by using a script that uses the Solaris ''pfiles'' command. > One such example is "PCP" (http://www.unix.ms/pcp/). I have to warn > against this. [i]pfiles will stop the process while it is being > investigated. Many processes will not like this[/i]. I''ve screwed up > quite a few daemons by executing a ''pfiles'' against them while they > were running. Bottom line: pfiles is not a solution to the problem > on a production system. Period !It would be really useful if you could elaborate on how pfiles screwed up your daemons. Though pfiles will stop a process to get its information, it does so for a very short period of time and in a fashion that should be undetectable to the target process. Unless a process has real-time scheduling constraints, it shouldn''t be possible for it to tell that pfiles has stopped it. If you are seeing some problem where processes are malfunctioning because of pfiles, it is bug in pfiles that needs to be fixed. Dave
Nicolas Williams wrote:> On Fri, Dec 12, 2008 at 09:45:08AM -0500, James Carlson wrote: > >> Chip Bennett writes: >> >>> Pfiles should be rewritten to not stop processes. I had to go look at >>> the code to make sure you were right on this. If lsof can gather open >>> file info without stopping processes, why can''t pfiles do that. >>> >> lsof does it because it reads the volatile kernel structures on the >> running system. Often that works because things aren''t changing right >> at the moment when you look at them. But it''s also possible that you >> get back garbage. >> >> pfiles stops the process because it uses the debugging interfaces, >> just as (say) mdb or gdb. >> > > But not kmdb. mdb -k can do everything that pfiles can do but without > stopping processes. The only problem with th kmdb approach is that not > all sockets are (were, now that Volo has integrated?) associated with > file structs, so finding open sockets for kernel-land services required > quite a bit more work than merely walking the process table. >For the record: ''kmdb'' and ''mdb -k'' are different beasts. kmdb == mdb -K == "boot with -k" ="console only" == "stop the kernel in its tracks". mdb -k is "have a gander at the still-running kernel".
David Powell wrote:> Lars Bruun-Hansen wrote: > >> pfiles >> It is true that you can find the same information as what lsof is >> producing by using a script that uses the Solaris ''pfiles'' command. >> One such example is "PCP" (http://www.unix.ms/pcp/). I have to warn >> against this. [i]pfiles will stop the process while it is being >> investigated. Many processes will not like this[/i]. I''ve screwed up >> quite a few daemons by executing a ''pfiles'' against them while they >> were running. Bottom line: pfiles is not a solution to the problem >> on a production system. Period ! >> > > It would be really useful if you could elaborate on how pfiles > screwed up your daemons. Though pfiles will stop a process to get > its information, it does so for a very short period of time and in a > fashion that should be undetectable to the target process. Unless a > process has real-time scheduling constraints, it shouldn''t be > possible for it to tell that pfiles has stopped it. > > If you are seeing some problem where processes are malfunctioning > because of pfiles, it is bug in pfiles that needs to be fixed. >+1. I''d like to know what the problem(s) are, specifically, too.
On Fri, Dec 12, 2008 at 02:00:25PM -0800, Dan Mick wrote:> For the record: ''kmdb'' and ''mdb -k'' are different beasts. kmdb == mdb > -K == "boot with -k" => "console only" == "stop the kernel in its tracks". mdb -k is "have a > gander at the still-running kernel".I knew I''d get in trouble over that nit :-) I should have said "mdb -k". (The script does use mdb -k, of course -- you''d not want it to drop the system into kmdb!)
> > > > It would be really useful if you could elaborate on how pfiles > > screwed up your daemons. Though pfiles will stop a process to get > > its information, it does so for a very short period of time and in a > > fashion that should be undetectable to the target process. Unless a > > process has real-time scheduling constraints, it shouldn''t be > > possible for it to tell that pfiles has stopped it. > > > > If you are seeing some problem where processes are malfunctioning > > because of pfiles, it is bug in pfiles that needs to be fixed. > >I''ve seen that pfiles could take +10 seconds to execute. If the process was stopped for that whole period I do not know however as said the daemon will get screwed up. We are of course talking about daemons that communicate heavily with the world around it. "Screwed up" means that it would no longer be communicating with resource X, presumably because resource X would think that it has gone away, i.e. the resources that the daemon communicate with have simply not been programmed to expect that long outages in the communication (my guess). All I can say is that the daemon would need to be restarted after this. I admit that I''ve also done many pfiles against heavy communicating daemons where this has not been an issue. But to me it feels like "do you feel lucky today?" when executing pfiles. The man page for the p-tools (e.g. man pfiles) actually contain a warning about pfiles, pldd and pstack so others must have had my experience I reckon. Lars -- This message posted from opensolaris.org
Hey Kristof, sorry if your question has been answered but I saw this post and decided to respond anyway. I had a similar question which resulted in the script below. I hope this helps you. -------------------------------begin script #!/bin/ksh # get process id for port line=''-------------------------------------------------------------------------'' pids=`/usr/bin/ps -ef | sed 1d | awk ''{print $2}''` if [ $# -eq 0 ]; then read ans?"Enter port you like to know pid for: " else ans=$1 fi for f in $pids do /usr/proc/bin/pfiles $f 2>/dev/null | /usr/xpg4/bin/grep -q "port: $ans" if [ $? -eq 0 ] ; then echo "$line\nPort: $ans is being used by PID: \c" /usr/bin/ps -o pid -o args -p $f | sed 1d fi done exit 0 ----------------------------end script -- This message posted from opensolaris.org