Hi all, We just got a shiny new T5440, and one of the first things I noticed (besides the insane number of hardware contexts), is that dtrace takes 30 seconds to a minute to fire up scripts with 5-6 probes, where our T5220 usually takes under a second. Shutdown is similar, though slightly faster. This is mostly annoying because it takes so long to attach that it''s hard to tell when I''ve collected enough data to be useful. I''ve started using BEGIN and END probes to prun stopped instances of ''echo'' so I can see when probing actually starts and ends... Any idea why this should be the case? I''m running S10 (142900-03). Thanks, Ryan
Just for the record, I had an offline exchange with Ryan on this. The startup time when using dtrace(1M) on systems with many CPU''s such as the T5440 can be large owing to the default principal and aggregation buffer sizes (4MB). To reduce the startup latency, tune these down (if possible) by setting the `bufsize` and `aggsize` tunables accordingly. Obviously, if you set them too low then you''ll see drops reported. Really, dtrace(1M) needs to be smarter about how much memory it''s asking for in these large configs. Jon.> We just got a shiny new T5440, and one of the first things I noticed > (besides the insane number of hardware contexts), is that dtrace takes > 30 seconds to a minute to fire up scripts with 5-6 probes, where our > T5220 usually takes under a second. Shutdown is similar, though > slightly faster. > > This is mostly annoying because it takes so long to attach that it''s > hard to tell when I''ve collected enough data to be useful. I''ve > started using BEGIN and END probes to prun stopped instances of ''echo'' > so I can see when probing actually starts and ends... > > Any idea why this should be the case? I''m running S10 (142900-03). > > Thanks, > Ryan > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
The T5440 is at the extreme end of T-series NUMA latency. The T5220, being just one chip, is at the other. Do you have all your vcpus in one domain? If so, that''s a lot more cross calls too! Very few apps can scale sensibly on that beast and cope with both factors. I guess DTrace is one of them. Phil Sent from my iPhone On 20 Feb 2010, at 09:25, Ryan Johnson <ryanjohn at ece.cmu.edu> wrote:> Hi all, > > We just got a shiny new T5440, and one of the first things I noticed > (besides the insane number of hardware contexts), is that dtrace > takes 30 seconds to a minute to fire up scripts with 5-6 probes, > where our T5220 usually takes under a second. Shutdown is similar, > though slightly faster. > > This is mostly annoying because it takes so long to attach that it''s > hard to tell when I''ve collected enough data to be useful. I''ve > started using BEGIN and END probes to prun stopped instances of > ''echo'' so I can see when probing actually starts and ends... > > Any idea why this should be the case? I''m running S10 (142900-03). > > Thanks, > Ryan > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Following up (the offline exchange wasn''t intended), cutting the principal buffer down to 4k gave a startup time around 10 seconds, and cutting the agg buffer down to 32k in addition cut it down to seven. That''s close enough to 4*(T5220 behavior) to satisfy me (though it would be very nice if it were more scalable...). Regards, Ryan On 2/20/2010 11:24 AM, Jon Haslam wrote:> Just for the record, I had an offline exchange with Ryan on this. > The startup time when using dtrace(1M) on systems with many > CPU''s such as the T5440 can be large owing to the default > principal and aggregation buffer sizes (4MB). To reduce the > startup latency, tune these down (if possible) by setting the > `bufsize` and `aggsize` tunables accordingly. Obviously, if you > set them too low then you''ll see drops reported. > > Really, dtrace(1M) needs to be smarter about how much memory > it''s asking for in these large configs. > > Jon. >> We just got a shiny new T5440, and one of the first things I noticed >> (besides the insane number of hardware contexts), is that dtrace >> takes 30 seconds to a minute to fire up scripts with 5-6 probes, >> where our T5220 usually takes under a second. Shutdown is similar, >> though slightly faster. >> >> This is mostly annoying because it takes so long to attach that it''s >> hard to tell when I''ve collected enough data to be useful. I''ve >> started using BEGIN and END probes to prun stopped instances of >> ''echo'' so I can see when probing actually starts and ends... >> >> Any idea why this should be the case? I''m running S10 (142900-03). >> >> Thanks, >> Ryan >> >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >
On 2/20/2010 11:30 AM, Phil Harman wrote:> The T5440 is at the extreme end of T-series NUMA latency. The T5220, > being just one chip, is at the other. Do you have all your vcpus in > one domain? If so, that''s a lot more cross calls too! Very few apps > can scale sensibly on that beast and cope with both factors. I guess > DTrace is one of them.Yes, I''m starting to notice that with our other software also... this is probably off-topic, but is there a way to hint at Solaris that you''d prefer it to map your process physical memory from a certain socket? Processor sets don''t seem to do it (they do kill the crosstalk from x-calls), but I haven''t tried domains yet. Regards, Ryan
James Litchfield
2010-Feb-20 17:13 UTC
[dtrace-discuss] Dtrace starts very slowly on T5440
Try looking at liblgrp(3LIB). Solaris does tries to recognize various latency domains today (s10 and later) and set the home latency group (lgrp) to the most lightly loaded at the time of process startup. After that, it tries to allocate memory pages from that lgrp. System V shared memory and mmaped anonymous memory over a certain limit are exceptions. See madvise(2) for ways to modify memory allocation policies. The Programming Interfaces Guide in the documentation set does a good job of explaining these. Jim ---- On 02/20/10 02:35 AM, Ryan Johnson wrote:> On 2/20/2010 11:30 AM, Phil Harman wrote: >> The T5440 is at the extreme end of T-series NUMA latency. The T5220, >> being just one chip, is at the other. Do you have all your vcpus in >> one domain? If so, that''s a lot more cross calls too! Very few apps >> can scale sensibly on that beast and cope with both factors. I guess >> DTrace is one of them. > Yes, I''m starting to notice that with our other software also... this > is probably off-topic, but is there a way to hint at Solaris that > you''d prefer it to map your process physical memory from a certain > socket? Processor sets don''t seem to do it (they do kill the crosstalk > from x-calls), but I haven''t tried domains yet. > > Regards, > Ryan > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >