FYI, I posted a blog a few days ago about a DTrace provider for NFS that is currently in development: blogs.sun.com/roller/page/samf?entry=a_dtrace_provider_for_nfs Let''s discuss any questions, comments, etc. here. I also advertised this on nfs-discuss at opensolaris.org. Naturally, I would expect the discussion here to be more on the specifics of DTrace, and the discussion on nfs-discuss to be more about NFS. Feel free to join either or both discussions. - Sam
On Mon, Jan 02, 2006 at 09:48:08AM -0700, Sam Falkner wrote:> FYI, I posted a blog a few days ago about a DTrace provider for NFS > that is currently in > development: > > blogs.sun.com/roller/page/samf?entry=a_dtrace_provider_for_nfs > > Let''s discuss any questions, comments, etc. here. I also advertised > this on > nfs-discuss at opensolaris.org. Naturally, I would expect the > discussion here to > be more on the specifics of DTrace, and the discussion on nfs-discuss > to be > more about NFS. Feel free to join either or both discussions.This is way cool! Things to add to args[0]: - cred_t (server-side) - RPC credentials: - RPCSEC_GSS info: - RPCSEC_GSS handle - sec triple (mechanism OID, QoP, GSS protection services) - client principal display name - server principal display name The actual buffer containing an operation''s octets would be nice. Nico --
On Jan 2, 2006, at 10:44 AM, Nicolas Williams wrote:> On Mon, Jan 02, 2006 at 09:48:08AM -0700, Sam Falkner wrote: >> FYI, I posted a blog a few days ago about a DTrace provider for NFS >> that is currently in >> development: >> >> blogs.sun.com/roller/page/samf?entry=a_dtrace_provider_for_nfs >> >> Let''s discuss any questions, comments, etc. here. I also advertised >> this on >> nfs-discuss at opensolaris.org. Naturally, I would expect the >> discussion here to >> be more on the specifics of DTrace, and the discussion on nfs-discuss >> to be >> more about NFS. Feel free to join either or both discussions. > > This is way cool!Thanks!> Things to add to args[0]: > > - cred_t (server-side) > - RPC credentials: > - RPCSEC_GSS info: > - RPCSEC_GSS handle > - sec triple (mechanism OID, QoP, GSS protection services) > - client principal display name > - server principal display nameYes -- these seem very good.> The actual buffer containing an operation''s octets would be nice.Do you mean "everything sent over the channel", i.e. over-the-wire except without any encryption? I''ll have a look at how this would be implemented. I''ll let you know... - Sam
On Mon, Jan 02, 2006 at 12:55:43PM -0700, Sam Falkner wrote:> On Jan 2, 2006, at 10:44 AM, Nicolas Williams wrote: > >The actual buffer containing an operation''s octets would be nice. > > Do you mean "everything sent over the channel", i.e. over-the-wire > except without any > encryption? I''ll have a look at how this would be implemented. I''ll > let you know...You can get at said octets in the clear because where you''d get at them you''re above RPCSEC_GSS (and well above SSHv2 and/or ESP/AH [IPsec]). So, getting the data in the clear is not the problem so much as finding the right places to put the probes. On the client-side rfs4call() looks like the right place. On the server-side rfs4_compound() looks like the right place. And rfs4_compound() is where you''d get the cred_t, client principal and sec triple info. Nico --
On Mon, Jan 02, 2006 at 02:11:37PM -0600, Nicolas Williams wrote:> On Mon, Jan 02, 2006 at 12:55:43PM -0700, Sam Falkner wrote: > > On Jan 2, 2006, at 10:44 AM, Nicolas Williams wrote: > > >The actual buffer containing an operation''s octets would be nice. > > > > Do you mean "everything sent over the channel", i.e. over-the-wire > > except without any > > encryption? I''ll have a look at how this would be implemented. I''ll > > let you know...And BTW, one reason for this is so you can see the actual RPCs in the clear even if you''re using privacy protection. Hooks can be provided so that privacy protected RPCs can be decoded properly (e.g., ethereal has functionality of this sort for Kerberos V, though you''d need better hooks than what it uses if the mechanism provided PFS) -- but it should be easier to just get the cleartext from the right place via dtrace, no? Or is this just not a problem because privacy or no privacy should make no difference in the otw ops? You could always debug without privacy protection I suppose...
On Jan 2, 2006, at 1:33 PM, Nicolas Williams wrote:> On Mon, Jan 02, 2006 at 02:11:37PM -0600, Nicolas Williams wrote: >> On Mon, Jan 02, 2006 at 12:55:43PM -0700, Sam Falkner wrote: >>> On Jan 2, 2006, at 10:44 AM, Nicolas Williams wrote: >>>> The actual buffer containing an operation''s octets would be nice. >>> >>> Do you mean "everything sent over the channel", i.e. over-the-wire >>> except without any >>> encryption? I''ll have a look at how this would be implemented. >>> I''ll >>> let you know... > > And BTW, one reason for this is so you can see the actual RPCs in the > clear even if you''re using privacy protection. Hooks can be > provided so > that privacy protected RPCs can be decoded properly (e.g., ethereal > has > functionality of this sort for Kerberos V, though you''d need better > hooks than what it uses if the mechanism provided PFS) -- but it > should > be easier to just get the cleartext from the right place via > dtrace, no? > > Or is this just not a problem because privacy or no privacy should > make > no difference in the otw ops? You could always debug without privacy > protection I suppose...No, not always. I''ve heard of one story where a problem only occurred with krb5p, and the poor fellows working on it (you know who you are!) couldn''t use snoop(1m). So yeah, this is a great idea. In fact, one example script I want to eventually write is a DTrace script that more or less simulates snoop(1m). It''ll probably be NFSv4 only, and not showing any lower layers, but it should be useful for those times when privacy is in effect. - Sam
Hey Sam, This is obviously great stuff -- thanks for undertaking it! This might be a dumb question, but your blog seems to focus exclusively on NFSv4. Would supporting a legacy (or future) version of NFS require a new version of the provider? If that''s the case, then perhaps the provider name should contain a 4; otherwise, it would be good to provide some documentation on how it might support older version and how it might be extended to support subsequent versions. Regarding arguments, I''m not sure that we really want to expose types such as READ4args which are (I believe) currently just implementation details. Would it make sense to invent a new structure or just enumerate the arguments to the operation in args[1..N]? I''ve always been (and continue to be) a little confused by compound operations. My understanding is that a compound operation is a single command that can invoke many operations at one go. I can imagine someone enabling the nfs::op-create:start probe (or whatever) and drawing the wrong conclusions because the create operations were mostly part of compound operations. Assuming I haven''t constructed a completely farcical scenario, would it make sense to have a compound operation fire the op-compound probe as well as any probes for operations collected by that compound operation? Presumably this new provider requires some changes to the nfs and nfssrv kernel modules as well as the addition of a new kernel module for the provider itself. It would be great if you could release those binaries and source so we could start to experiment with them on our own systems. Thanks. Adam On Mon, Jan 02, 2006 at 09:48:08AM -0700, Sam Falkner wrote:> FYI, I posted a blog a few days ago about a DTrace provider for NFS > that is currently in > development: > > blogs.sun.com/roller/page/samf?entry=a_dtrace_provider_for_nfs > > Let''s discuss any questions, comments, etc. here. I also advertised > this on > nfs-discuss at opensolaris.org. Naturally, I would expect the > discussion here to > be more on the specifics of DTrace, and the discussion on nfs-discuss > to be > more about NFS. Feel free to join either or both discussions. > > - Sam > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development blogs.sun.com/ahl
On Jan 2, 2006, at 7:07 PM, Adam Leventhal wrote:> Hey Sam, > > This is obviously great stuff -- thanks for undertaking it!No prob!> This might be a dumb question, but your blog seems to focus > exclusively > on NFSv4. Would supporting a legacy (or future) version of NFS require > a new version of the provider? If that''s the case, then perhaps the > provider name should contain a 4; otherwise, it would be good to > provide > some documentation on how it might support older version and how it > might > be extended to support subsequent versions.I wasn''t going to try to support older versions in the near term, but I didn''t want to exclude the possibility either. I hadn''t really thought about whether a different provider would be a good idea, but now I think that it might. An NFSv3 DTrace provider''s probes would probably fall into a different pattern than v4, so it should probably be separated into another provider. For now, I''ll add a "4" to the end of the providers'' names.> Regarding arguments, I''m not sure that we really want to expose types > such as READ4args which are (I believe) currently just implementation > details. Would it make sense to invent a new structure or just > enumerate > the arguments to the operation in args[1..N]?One reason I favor a structure is that it looks very difficult to me for a DTrace provider to give more than five arguments. So OPEN4args, which has six arguments, would be tough to break into args[1..6]. So I guess there are two questions here. First, is it a good idea to use structures, and second, if it is a good idea, how do we do it? Structures such as READ4args are defined by RFC 3530 (the document that describes NFSv4). As implemented by OpenSolaris, there may be some extra members in those structures, but someone understanding RFC 3530 could just use the "real" members only. The extra members of the structures would potentially be useful, but it does raise problems as to their stability. Of course, translators could easily translate them to the "vanilla RFC 3530" versions of the structures. Maybe that''s the way to go. Future minor versions of NFSv4 (e.g. the upcoming 4.1) will introduce new operations, arguments, results, etc. And future minor versions may even deprecate old operations (and hence their arguments). But as long as OpenSolaris supports NFSv4.0, READ4args should be valid.> I''ve always been (and continue to be) a little confused by compound > operations. My understanding is that a compound operation is a single > command that can invoke many operations at one go.Yes, this is right.> I can imagine someone > enabling the nfs::op-create:start probe (or whatever) and drawing the > wrong conclusions because the create operations were mostly part of > compound operations.Hopefully this wouldn''t be confusing, but let''s see how it plays out.> Assuming I haven''t constructed a completely farcical > scenario, would it make sense to have a compound operation fire the > op-compound probe as well as any probes for operations collected by > that > compound operation?Yes, that''s exactly how it''s implemented! There''s a hook in the nfs kernel module that corresponds to op-compound. op-compound drives any enabled subordinate probes, e.g. op-setattr. Things like op-setattr can further drive attr-* (attribute based) probes, e.g. attr-size. So, doing a truncate() on the client can drive the client to fire the probes op-compound, op-setattr, and attr-size (and potentially many more).> Presumably this new provider requires some changes to the nfs and > nfssrv > kernel modules as well as the addition of a new kernel module for the > provider itself. It would be great if you could release those binaries > and source so we could start to experiment with them on our own > systems.I''ve got three requests (so far) for this. I''ll be working on it! :-)> Thanks.Thank you! - Sam
On Tue, Jan 03, 2006 at 01:17:32PM -0700, Sam Falkner wrote:> >This might be a dumb question, but your blog seems to focus > >exclusively > >on NFSv4. Would supporting a legacy (or future) version of NFS require > >a new version of the provider? If that''s the case, then perhaps the > >provider name should contain a 4; otherwise, it would be good to > >provide > >some documentation on how it might support older version and how it > >might > >be extended to support subsequent versions. > > I wasn''t going to try to support older versions in the near term, but > I didn''t want to > exclude the possibility either. I hadn''t really thought about > whether a different > provider would be a good idea, but now I think that it might. An NFSv3 > DTrace provider''s probes would probably fall into a different pattern > than v4, so it > should probably be separated into another provider. > > For now, I''ll add a "4" to the end of the providers'' names.Making it the nfs4 provider seems a little awkward. Would it be worth doing the investigation to see how it might be extended to nfsv3 before finalizing that naming change?> >Regarding arguments, I''m not sure that we really want to expose types > >such as READ4args which are (I believe) currently just implementation > >details. Would it make sense to invent a new structure or just > >enumerate > >the arguments to the operation in args[1..N]? > > One reason I favor a structure is that it looks very difficult to me > for a DTrace > provider to give more than five arguments. So OPEN4args, which has six > arguments, would be tough to break into args[1..6].It''s a little tricky, but not actually that difficult to have more than 5 arguments. If you need a hand, feel free to ask. First you need to figure you which interface makes the most sense -- structures or enumerated argument lists.> Structures such as READ4args are defined by RFC 3530 (the document that > describes NFSv4). As implemented by OpenSolaris, there may be some extra > members in those structures, but someone understanding RFC 3530 could just > use the "real" members only. The extra members of the structures would > potentially be useful, but it does raise problems as to their stability. > > Of course, translators could easily translate them to the "vanilla > RFC 3530" versions of the structures. Maybe that''s the way to go.I didn''t realize those were defined by the standards. In that case I suggest you create a translator as you suggest. I''d also like to see prefixes for the structure members unless the standard also defines those names.> I''ve got three requests (so far) for this. I''ll be working on it! :-)Very cool. I look forward to playing with it. Adam -- Adam Leventhal, Solaris Kernel Development blogs.sun.com/ahl
On Mon, Jan 02, 2006 at 06:07:26PM -0800, Adam Leventhal wrote:> I''ve always been (and continue to be) a little confused by compound > operations. My understanding is that a compound operation is a single > command that can invoke many operations at one go.^^^^^^^^^ Yes, though the ops are evaluated in sequence, and there''s what you might call variables (two: current and saved filehandles) which are referenced and can be changed by various ops. So COMPOUND is like a very limited programming language.> I can imagine someone > enabling the nfs::op-create:start probe (or whatever) and drawing the > wrong conclusions because the create operations were mostly part of > compound operations.They can only be part of a COMPOUND.> Assuming I haven''t constructed a completely farcical > scenario, would it make sense to have a compound operation fire the > op-compound probe as well as any probes for operations collected by that > compound operation?That''s what I thought this prototype did. Note that each op has to fire in the order it''s processed. Nico --
On Tue, Jan 03, 2006 at 12:23:51PM -0800, Adam Leventhal wrote:> On Tue, Jan 03, 2006 at 01:17:32PM -0700, Sam Falkner wrote: > > For now, I''ll add a "4" to the end of the providers'' names. > > Making it the nfs4 provider seems a little awkward. Would it be worth doing > the investigation to see how it might be extended to nfsv3 before finalizing > that naming change?The protocols are quite different though. An NFSv3 server that translates NFSv3 RPCs to NFSv4 compounds might be feasible, but it''s not how Solaris does it. The reverse is not feasible. So there''s not likely to be any similarity between an NFSv4 and an NFSv2/3 DTrace providers. NFSv3 and NFSv2, at least, are fairly similar to each other. Now, the underlying VOPs done by the server to satisfy a client request could be traced separately. Perhaps we ought to have a VFS provider. Hmmmm! How about having the NFS provider have at least one generic probe pair that fires whenever any RPC is received/replied? Between that and a VFS provider one could then come up with useful DTrace scripts that work with all NFS versions. Nico --
G''Day Sam, On Mon, 2 Jan 2006, Sam Falkner wrote:> FYI, I posted a blog a few days ago about a DTrace provider for NFS > that is currently in > development: > > blogs.sun.com/roller/page/samf?entry=a_dtrace_provider_for_nfs > > Let''s discuss any questions, comments, etc. here. I also advertised > this on > nfs-discuss at opensolaris.org. Naturally, I would expect the > discussion here to > be more on the specifics of DTrace, and the discussion on nfs-discuss > to be > more about NFS. Feel free to join either or both discussions.This is good news. :-) At the moment it''s easy to fetch NFS client I/O activity from the io provider, but I''d like to trace server activity as well. I wrote a script called nfswizard.d in the DTraceToolkit to do something useful with io:nfs:: (I should rename it nfsclientwizard.d), it''s output is, --- # nfswizard.d Sampling... Hit Ctrl-C to end. ^C NFS Client Wizard. 2005 Dec 2 14:59:07 -> 2005 Dec 2 14:59:14 Read: 4591616 bytes (4 Mb) Write: 0 bytes (0 Mb) Read: 640 Kb/sec Write: 0 Kb/sec NFS I/O events: 166 Avg response time: 8 ms Max response time: 14 ms Response times (us): value ------------- Distribution ------------- count 128 | 0 256 | 1 512 |@@@ 14 1024 |@ 4 2048 |@@@@@@@ 30 4096 |@@@@@ 20 8192 |@@@@@@@@@@@@@@@@@@@@@@@ 97 16384 | 0 Top 25 files accessed (bytes): PATHNAME BYTES /net/mars/var/tmp/adm/vold.log 4096 /net/mars/var/tmp/adm/uptime 4096 /net/mars/var/tmp/adm/mail 4096 /net/mars/var/tmp/adm/authlog.5 4096 /net/mars/var/tmp/adm/ftpd 12288 /net/mars/var/tmp/adm/spellhist 16384 /net/mars/var/tmp/adm/messages 16384 /net/mars/var/tmp/adm/utmpx 20480 /net/mars/var/tmp/adm/ftpd.2 20480 /net/mars/var/tmp/adm/ftpd.3 20480 /net/mars/var/tmp/adm/ftpd.1 24576 /net/mars/var/tmp/adm/ftpd.0 24576 /net/mars/var/tmp/adm/lastlog 28672 /net/mars/var/tmp/adm/ipf 61440 /net/mars/var/tmp/adm/loginlog 69632 /net/mars/var/tmp/adm/ipf.4 73728 /net/mars/var/tmp/adm/messages.20040906 81920 /net/mars/var/tmp/adm/ipf.3 102400 /net/mars/var/tmp/adm/ipf.1 110592 /net/mars/var/tmp/adm/ipf.5 114688 /net/mars/var/tmp/adm/ipf.2 114688 /net/mars/var/tmp/adm/ipf.0 122880 /net/mars/var/tmp/adm/route.log 266240 /net/mars/var/tmp/adm/pppd.log 425984 /net/mars/var/tmp/adm/wtmpx 2842624 --- You may find the details I choose to examine interesting. Anyway, this sort of information would be great (and much more useful) from a server perspective. cheers, Brendan [Sydney, Australia]
A tiny niggle... If you have the option, capture the latency and RTT as well. The latency is the measure of how much time passes before the first byte arrives, and consists of - one round-trip-time (RTT), - the time the request sits in queue before it''s serviced, and - the time the program has to spend thinking before it has anything to send. If you have them separate, you can do analysis on the response time without queuing and without the RTT, and predict - the response time under increasing load (and queuing) - the response time with a longer or slower network It also helps when computing throughput in bytes per second (instead of in transaction psr second), as it''s usually better to do bytes/transfer_time than bytes/RT, where transfer time is RT - latency. The latter number can be compared directly to the network throughput, to see what percentage we''re using up. I''d just list the averages under the existing ones, and leave the graph the same: Read: 4591616 bytes (4 Mb) Write: 0 bytes (0 Mb) Read: 640 Kb/sec -- this might change Write: 0 Kb/sec NFS I/O events: 16 Avg response time: 8 ms Max response time: 14 ms Avg latency: 1 ms Avg RTT: 1 ms --dave (who''s writing a paper on this in his Copious Spare Time) c-b Brendan Gregg wrote:> G''Day Sam, > > On Mon, 2 Jan 2006, Sam Falkner wrote: > > >>FYI, I posted a blog a few days ago about a DTrace provider for NFS >>that is currently in >>development: >> >>blogs.sun.com/roller/page/samf?entry=a_dtrace_provider_for_nfs >> >>Let''s discuss any questions, comments, etc. here. I also advertised >>this on >>nfs-discuss at opensolaris.org. Naturally, I would expect the >>discussion here to >>be more on the specifics of DTrace, and the discussion on nfs-discuss >>to be >>more about NFS. Feel free to join either or both discussions. > > > This is good news. :-) > > At the moment it''s easy to fetch NFS client I/O activity from the io > provider, but I''d like to trace server activity as well. I wrote a script > called nfswizard.d in the DTraceToolkit to do something useful with > io:nfs:: (I should rename it nfsclientwizard.d), it''s output is, > > --- > # nfswizard.d > Sampling... Hit Ctrl-C to end. > ^C > NFS Client Wizard. 2005 Dec 2 14:59:07 -> 2005 Dec 2 14:59:14 > > Read: 4591616 bytes (4 Mb) > Write: 0 bytes (0 Mb) > > Read: 640 Kb/sec > Write: 0 Kb/sec > > NFS I/O events: 166 > Avg response time: 8 ms > Max response time: 14 ms > > Response times (us): > value ------------- Distribution ------------- count > 128 | 0 > 256 | 1 > 512 |@@@ 14 > 1024 |@ 4 > 2048 |@@@@@@@ 30 > 4096 |@@@@@ 20 > 8192 |@@@@@@@@@@@@@@@@@@@@@@@ 97 > 16384 | 0 > > Top 25 files accessed (bytes): > PATHNAME BYTES > /net/mars/var/tmp/adm/vold.log 4096 > /net/mars/var/tmp/adm/uptime 4096 > /net/mars/var/tmp/adm/mail 4096 > /net/mars/var/tmp/adm/authlog.5 4096 > /net/mars/var/tmp/adm/ftpd 12288 > /net/mars/var/tmp/adm/spellhist 16384 > /net/mars/var/tmp/adm/messages 16384 > /net/mars/var/tmp/adm/utmpx 20480 > /net/mars/var/tmp/adm/ftpd.2 20480 > /net/mars/var/tmp/adm/ftpd.3 20480 > /net/mars/var/tmp/adm/ftpd.1 24576 > /net/mars/var/tmp/adm/ftpd.0 24576 > /net/mars/var/tmp/adm/lastlog 28672 > /net/mars/var/tmp/adm/ipf 61440 > /net/mars/var/tmp/adm/loginlog 69632 > /net/mars/var/tmp/adm/ipf.4 73728 > /net/mars/var/tmp/adm/messages.20040906 81920 > /net/mars/var/tmp/adm/ipf.3 102400 > /net/mars/var/tmp/adm/ipf.1 110592 > /net/mars/var/tmp/adm/ipf.5 114688 > /net/mars/var/tmp/adm/ipf.2 114688 > /net/mars/var/tmp/adm/ipf.0 122880 > /net/mars/var/tmp/adm/route.log 266240 > /net/mars/var/tmp/adm/pppd.log 425984 > /net/mars/var/tmp/adm/wtmpx 2842624 > --- > > You may find the details I choose to examine interesting. Anyway, this > sort of information would be great (and much more useful) from a server > perspective. > > cheers, > > Brendan > > [Sydney, Australia] > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >-- David Collier-Brown, | Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest davecb at canada.sun.com | -- Mark Twain (416) 263-5733 (x65733) |
G''Day Dave, On Fri, 6 Jan 2006, David Collier-Brown wrote:> A tiny niggle... If you have the option, capture the latency > and RTT as well.Please niggle away. :)> The latency is the measure of how much time passes before > the first byte arrives, and consists of > - one round-trip-time (RTT), > - the time the request sits in queue before it''s serviced, and > - the time the program has to spend thinking before it > has anything to send.This makes sense to me. So, rather than measuring the specific overheads when initialising an NFS request (which could get out of hand), we calculate latency as a useful and close enough estimate. Starting with the obvious, I could use, time(latency) = time(1st nfs_read:return) - time(nfs_open:entry) time(RTT) = time(nfs_read:return) - time(nfs_read:entry) so that latency includes work performed from the nfs_open to the first byte received, and RTT is a read time. RTT could be calculated as an average for all RTTs measured. But. nfs_read can return straight from the cache, undercounting latency and RTT. I could look at nfs_bio to guarentee a network event - measuring nfs_open:entry to the 1st nfs_bio:return. This would be better, but then could overcount latency if there were several cached reads before the first nfs_bio. Hmm. Being a bit creative, time(latency) = time(1st nfs_read:entry) - time(nfs_open:entry) + time(1st nfs_bio:return) - time(1st nfs_bio:entry) time(RTT) = time(nfs_bio:return) - time(nfs_bio:entry) So that latency includes the nfs_open to 1st event overhead, plus the first RTT. And now to include NFS writes, time(latency) = MIN(time(1st nfs_read:entry), time(1st nfs_write:entry)) - time(nfs_open:entry) + time(1st nfs_bio:return) - time(1st nfs_bio:entry) ... I could measure RTT closer to the network driver by using io:nfs::done and io:nfs::start. I should also include other ops apart from read/write.> If you have them separate, you can do analysis on the > response time without queuing and without the RTT, and predict > - the response time under increasing load (and queuing) > - the response time with a longer or slower network > > It also helps when computing throughput in bytes per second > (instead of in transaction psr second), as it''s usually better > to do bytes/transfer_time than bytes/RT, where transfer time > is RT - latency.I understand this point - so that we know what the network interface has been asked to do. No big deal, but if latency includes one RTT, then wouldn''t transfer time be RT - latency + 1 RTT?> The latter number can be compared directly > to the network throughput, to see what percentage we''re using up.... To give an estimate for the percentage consumed - this wouldn''t take account of TCP/IP/Ethernet headers, TCP retransmits, etc. Anyway, using bytes/transfer_time may be fine for understanding maximum throughput and predicting response times - but I''m not sure a throughput percentage is that meaningful in terms of overall network utilisation (100% utilised may be fine, for short bursts).> I''d just list the averages under the existing ones, and leave the > graph the same: > > Read: 4591616 bytes (4 Mb) > Write: 0 bytes (0 Mb) > > Read: 640 Kb/sec -- this might changeThis is total read Kb per second for that sample. It would certainly change if I''m reporting maximum Kb/sec on the network interface, or some other statistic based on the transfer time. This isn''t based on transfer times.> Write: 0 Kb/sec > > > NFS I/O events: 16 > Avg response time: 8 ms > Max response time: 14 ms > Avg latency: 1 ms > Avg RTT: 1 msOk, looks good to me.> > --dave (who''s writing a paper on this in his Copious Spare Time) c-b > >The following is the output of a work in progress tool to experiment with these statistics. I only have fbt::: to play with - good thing the nfs code is well written. :) # ./nfsrtt.d [...] opened /net/mars/var/tmp/creatbyproc_example.txt latency 9 ms RTT 8 ms opened /net/mars/var/tmp/crypt_3rot13.c latency 1 ms RTT 1 ms opened /net/mars/var/tmp/cstyle latency 13 ms RTT 13 ms opened /net/mars/var/tmp/cswstat.d latency 11 ms RTT 11 ms ^C Latency (ns): value ------------- Distribution ------------- count 524288 | 0 1048576 |@ 1 2097152 |@@ 2 4194304 |@@@@@@@@@ 10 8388608 |@@@@@@@@@@@@@@@@@@ 19 16777216 |@@@@@@@@@ 10 33554432 |@ 1 67108864 | 0 RTT (ns): value ------------- Distribution ------------- count 262144 | 0 524288 |@@@ 28 1048576 |@@@@@@@@@@ 81 2097152 |@@@@@@@@@ 74 4194304 |@@@@@@@@@@@@ 96 8388608 |@@@ 23 16777216 |@@@ 23 33554432 | 2 67108864 | 0 The script is attached, which is for NFSv2,3,4. Don''t take it too seriously yet - I only just started work on this a couple of hours ago. It''s a stateful monster with many tentacles. And remember I''m still just looking at client activity from the client server itself. Next step is to see where else Latency and RTT are useful. by process/file/filesystem? predicted max transactions = 1 sec / latency? predicted max throughput = packet size * (1 sec / RTT)? ... I should also print more statistics on packet size. ... thanks for your email, Brendan [Sydney, Australia] -------------- next part -------------- #!/usr/sbin/dtrace -s /* nfsrtt.d - NFS RTT and Latency statistics. Work in progress. * * I''ve keyed on the vnode addr, and assumed that transactions are processed * in the same thread. I need to check whether both of these were wise - * I''d guess that this needs to be reworked to work correctly on a multi-CPU * server. */ #pragma D option quiet inline int DEBUG = 1; dtrace:::BEGIN { trace("Sampling...\n"); } fbt:nfs:nfs_open:entry, fbt:nfs:nfs3_open:entry, fbt:nfs:nfs4_open:entry { self->opened[(uint64_t)*args[0]] = timestamp; self->firstread[(uint64_t)*args[0]] = 1; DEBUG ? printf("opened %s\n", stringof(((struct vnode *)*args[0])->v_path)) : 1; } fbt:nfs:nfs_read:entry, fbt:nfs:nfs_write:entry, fbt:nfs:nfs3_read:entry, fbt:nfs:nfs3_write:entry, fbt:nfs:nfs4_read:entry, fbt:nfs:nfs4_write:entry /self->firstread[arg0] && self->opened[arg0]/ { self->initial[arg0] = timestamp - self->opened[arg0]; self->firstread[arg0] = 0; } fbt:nfs:nfs_bio:entry, fbt:nfs:nfs3_bio:entry, fbt:nfs:nfs4_bio:entry /self->opened[(uint64_t)args[0]->b_vp]/ { self->rttstart[(uint64_t)args[0]->b_vp] = timestamp; self->vn = (uint64_t)args[0]->b_vp; } fbt:nfs:nfs_bio:return, fbt:nfs:nfs3_bio:return, fbt:nfs:nfs4_bio:return /self->initial[self->vn]/ { this->latency = self->initial[self->vn] + timestamp - self->rttstart[self->vn]; @Latency = quantize(this->latency); self->initial[self->vn] = 0; DEBUG ? printf("\tlatency %d ms\n", this->latency / 1000000) : 1; } fbt:nfs:nfs_bio:return, fbt:nfs:nfs3_bio:return, fbt:nfs:nfs4_bio:return /self->rttstart[self->vn]/ { this->rtt = timestamp - self->rttstart[self->vn]; @RTT = quantize(this->rtt); self->rttstart[self->vn] = 0; self->vn = 0; DEBUG ? printf("\tRTT %d ms\n", this->rtt / 1000000) : 1; } fbt:nfs:nfs_close:entry, fbt:nfs:nfs3_close:entry, fbt:nfs:nfs4_close:entry { self->opened[arg0] = 0; } dtrace:::END { printf("Latency (ns):"); printa(@Latency); printf("RTT (ns):"); printa(@RTT); }
Sam: I have been practicing DTrace, one of a few things to share with you, pointers are very appreciated ! After I have worked on different platforms with different kernel modules, some times even with grid systems and resource virtualization such as processors, zone, file systems etc. I have realized that my D code stability in terms of solaris engineering or portability and reusability in terms of software engineering is where I need to spend time on. (1) different versions of kernel modules such as files systems and different versions of in kernel function calls (2) different platform architecture design and implementation such as cpu kernel structures etc. I felt lucky to deal with VM and VFS since they do the abstraction for me. However, CPU is one of the exception (3) normal host based instrumentation vs virtualization environment such as zone specific. This may not only for NFS, but NFS is one of the use cases to I need to continue to put my thoughts on. However, It may not always introduce more kernel abstraction and encapsulation and some work still can be done at D code level so that my D code can be more extendable to the new platforms, kernel modules and kernel functions as for the different versions of kernel modules and functions, they are really the challenges for my D code since it means I need to keep my D code to sync with the life cycle of your module and functions releases. pointers are very appreciated ! Thanks This message posted from opensolaris.org
I''m not sure I understand everything that you''re saying, but I will try to summarize the thoughts that come to mind when reading your email. A DTrace provider for NFS doesn''t really give you much that you couldn''t do with the standard fbt provider. But what it does give you is a layer of abstraction and stability over using fbt. When I first started thinking about a provider, it was because I was a bit frustrated when trying to debug a somewhat complex problem. I was very tired, and didn''t feel like discovering all of the paths through the functions, and the arguments to the functions, that could lead to the event I was trying to trace. A DTrace provider gives you much in the way of independence from the current implementation of the OpenSolaris NFS code. You can write a script to the NFSv4 protocol, and not care about which functions are being called to implement the protocol. If there are massive changes in the NFSv4 client code between Solaris 10 and the next release, a script written exclusively to the NFS provider won''t be broken. Thus, your script will be more stable. I hope that the NFS providers and other providers help to give your scripts the stability that you need. - Sam On Jan 9, 2006, at 7:45 AM, ttoulliu2002 wrote:> Sam: > > I have been practicing DTrace, one of a few things to share > with you, pointers are very appreciated ! > > After I have worked on different platforms with different kernel > modules, some times even with grid systems and resource virtualization > such as processors, zone, file systems etc. I have realized that > my D code stability in terms of solaris engineering or portability > and reusability in terms of software engineering is where > I need to spend time on. > > (1) different versions of kernel modules such as files systems > and different versions of in kernel function calls > > (2) different platform architecture design and implementation such > as cpu kernel structures etc. I felt lucky to deal with VM and > VFS since they do the abstraction for me. However, CPU is > one of the exception > > (3) normal host based instrumentation vs virtualization environment > such as zone specific. > > > This may not only for NFS, but NFS is one of the use cases to I need > to continue to put my thoughts on. However, It may not always > introduce more kernel abstraction and encapsulation and some work > still can be done at D code level so that my D code can be more > extendable to the new platforms, kernel modules and kernel functions > as for the different versions of kernel modules and functions, they > are really the challenges for my D code since it means I need to > keep my D code to sync with the life cycle of your module and > functions releases. pointers are very appreciated ! > > Thanks > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
debabrata das
2009-Dec-03 18:24 UTC
[dtrace-discuss] Trapping nfs client calls again particular mounted filesystem
Hi Sam, As a part of I/O tuning process, I am trying to capture NFS different client call agains one of our many mounted file systems. We are using nfs ver 3. Could you please let me know the script fo doing this ? I am not expert in dtrace. I would like to know tiems spent in seconds and also no. of calls. Thanks Deba -- This message posted from opensolaris.org