Steve Gonczi
2010-Apr-21 19:51 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.
I have a fast, 8 core x 2 hyperthread Nehalem system, 48 gigs of memory. During network throughput testing ( multiple ftp server instances running, transferring 200 megabytes /sec on 2x GIG-E interfaces) the system periodically becomes extremely sluggish. (I can barely type in commands from the console, and network throughput drops down to nothing). Once the system gets into this state it will not recover unless we kill the network load. I see run queue values 21 10 5, several 100 mf-s in vmsstat, but otherwise little activity. Occasionally I see 100% kernel/system activity for seconds at a time. Trying to figure out who is using cpu in kernel via dtrace profiling scripts for 10 secs at a time: e.g.: dtrace -n ''profile:::profile-3456 /arg0/ { @[stack(1)] = count(); }'' but I get the dtrace watchdog abort "Abort due to systemic unresponsiveness" Tried to force the script to run via -w, and I just see a very low count ( 1 - 2 max) of seemingly random functions sampled. I wonder if there is perhaps a hardware issue, that prevents the dtrace sampling interrupts from being run. I tried to see what is going on with a variety of other tools (all the various *stat commands) but fail to see anything obvious, other than the run queue and the occasional 100% kernel. I typically see an almost idle system, no lock contention, no io wait, low system call, context switch and stmx counts. Any suggestions regarding what tool /dtrace script to use, or where to look to get to the bottom of the sluggishness would be much appreciated. TIA Steve -- This message posted from opensolaris.org
John Higgins - Oracle Corporation
2010-Apr-21 20:09 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.
You might consider using processor sets and psradm to segragate the cpus handling network interrupts. That should allow you to instrument further (dtrace/guds/sar) - though once you''ve freed up some cycles from constantly handling interrupts; you may be done. Steve Gonczi wrote:> I have a fast, 8 core x 2 hyperthread Nehalem system, 48 gigs of memory. During network throughput testing ( multiple ftp server instances running, transferring 200 megabytes /sec on 2x GIG-E interfaces) the system periodically becomes extremely sluggish. (I can barely type in commands from the console, and network throughput drops down to nothing). Once the system gets into this state it will not recover unless we kill the network load. > > I see run queue values 21 10 5, several 100 mf-s in vmsstat, but otherwise little activity. > Occasionally I see 100% kernel/system activity for seconds at a time. > > Trying to figure out who is using cpu in kernel via dtrace profiling scripts for 10 secs at a time: > e.g.: dtrace -n ''profile:::profile-3456 /arg0/ { @[stack(1)] = count(); }'' > but I get the dtrace watchdog abort "Abort due to systemic unresponsiveness" > Tried to force the script to run via -w, and I just see a very low count > ( 1 - 2 max) of seemingly random functions sampled. > > I wonder if there is perhaps a hardware issue, that prevents the dtrace sampling interrupts from being run. I tried to see what is going on with a variety of other tools (all the various *stat commands) but fail to see anything obvious, other than the run queue and the occasional 100% kernel. I typically see an almost idle system, no lock contention, no io wait, low system call, context switch and stmx counts. > > Any suggestions regarding what tool /dtrace script to use, or where to look to get to the bottom of the sluggishness would be much appreciated. > > TIA > > Steve >-- Oracle Logo John Higgins | Principal Support Engineer Phone: 858.449.5087 Oracle Global Customer Services, North America -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20100421/255fc490/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: oraclsig Type: image/gif Size: 658 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20100421/255fc490/attachment.gif>
Steve Gonczi
2010-Apr-21 21:28 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.
Thank you very much for the suggestion. I have shut off interrupt processing on 4 out of the 16 cpus, and now I can run dtrace profiling, without the watchdog killing dtrace. The machine still goes into a bogged-down state. Running dtrace -n ''profile:::profile-3456 /arg0/ { @[stack(4)] = count(); }'' for a minute or so while in the "bad state" fingers these stacks as most frequent: Looks mostly idle to me. Yet, as I type, I have to wait seconds for my text to appear on the console, and network throughput is down to nothing. Ideas, anyone? unix`disp_getwork+0xb6 unix`disp+0x1c2 unix`swtch+0xa4 genunix`cv_wait+0x61 3742 zfs`lzjb_compress+0xee zfs`zio_compress_data+0x8e zfs`zio_write_bp_init+0x216 zfs`zio_execute+0x8d 3808 unix`ddi_get32+0x14 mac`mac_hwring_disable_intr+0x1d mac`mac_rx_srs_drain+0x3a2 mac`mac_rx_srs_process+0x1db 3819 unix`0xfffffffffb85074a genunix`uiomove+0xe9 sockfs`socopyoutuio+0x68 sockfs`so_dequeue_msg+0x4e9 4886 unix`i86_monitor+0x10 unix`cpu_idle_mwait+0xbe unix`cpu_acpi_idle+0x8d unix`cpu_idle_adaptive+0x19 6028 unix`acpi_cpu_cstate+0x2f0 unix`cpu_acpi_idle+0x82 unix`cpu_idle_adaptive+0x19 unix`idle+0x114 9725 unix`atomic_and_64+0x4 unix`acpi_cpu_cstate+0x2d9 unix`cpu_acpi_idle+0x82 unix`cpu_idle_adaptive+0x19 17888 unix`acpi_cpu_cstate+0x2ae unix`cpu_acpi_idle+0x82 unix`cpu_idle_adaptive+0x19 unix`idle+0x114 86695 unix`i86_mwait+0xd unix`cpu_idle_mwait+0xf1 unix`cpu_acpi_idle+0x8d unix`cpu_idle_adaptive+0x19 1645528 -- This message posted from opensolaris.org
Jim Mauro
2010-Apr-22 14:03 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.
Not knowing anything about what your hardware is, or what version of Solaris you are running... the i86_mwait in the stack set off an alarm. There was a change (bug? erratum?) on some intel processors that caused huge memory latencies. The work around was to "set idle_cpu_prefer_mwait = 0" in /etc/system Not knowing anything else, you could give it a try before we dig deeper. On Apr 21, 2010, at 5:28 PM, Steve Gonczi wrote:> Thank you very much for the suggestion. > > I have shut off interrupt processing on 4 out of the 16 cpus, and now I can run dtrace > profiling, without the watchdog killing dtrace. > > The machine still goes into a bogged-down state. > Running dtrace -n ''profile:::profile-3456 /arg0/ { @[stack(4)] = count(); }'' > > for a minute or so while in the "bad state" fingers these stacks as most frequent: > Looks mostly idle to me. Yet, as I type, I have to wait seconds for my text to appear on the console, > and network throughput is down to nothing. > > Ideas, anyone? > > unix`disp_getwork+0xb6 > unix`disp+0x1c2 > unix`swtch+0xa4 > genunix`cv_wait+0x61 > 3742 > > zfs`lzjb_compress+0xee > zfs`zio_compress_data+0x8e > zfs`zio_write_bp_init+0x216 > zfs`zio_execute+0x8d > 3808 > > unix`ddi_get32+0x14 > mac`mac_hwring_disable_intr+0x1d > mac`mac_rx_srs_drain+0x3a2 > mac`mac_rx_srs_process+0x1db > 3819 > > unix`0xfffffffffb85074a > genunix`uiomove+0xe9 > sockfs`socopyoutuio+0x68 > sockfs`so_dequeue_msg+0x4e9 > 4886 > > unix`i86_monitor+0x10 > unix`cpu_idle_mwait+0xbe > unix`cpu_acpi_idle+0x8d > unix`cpu_idle_adaptive+0x19 > 6028 > > unix`acpi_cpu_cstate+0x2f0 > unix`cpu_acpi_idle+0x82 > unix`cpu_idle_adaptive+0x19 > unix`idle+0x114 > 9725 > > unix`atomic_and_64+0x4 > unix`acpi_cpu_cstate+0x2d9 > unix`cpu_acpi_idle+0x82 > unix`cpu_idle_adaptive+0x19 > 17888 > > unix`acpi_cpu_cstate+0x2ae > unix`cpu_acpi_idle+0x82 > unix`cpu_idle_adaptive+0x19 > unix`idle+0x114 > 86695 > > unix`i86_mwait+0xd > unix`cpu_idle_mwait+0xf1 > unix`cpu_acpi_idle+0x8d > unix`cpu_idle_adaptive+0x19 > 1645528 > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Richard Skelton
2010-Apr-22 15:11 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little
Hi Jim, If I set "set idle_cpu_prefer_mwait = 0" in /etc/system on a X2270 running Solaris 10 10/09 s10x_u8wos_08a X86 I get a kernel panic upon reboot :-( Is this setting only for recent versions of OpenSolaris ? Cheers Richard. -- This message posted from opensolaris.org
Steve Gonczi
2010-Apr-22 16:35 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little
Hi Guys, I have just tried the set idle_cpu_prefer_mwait = 0 setting as well, and was unable to boot up. (I am running OpenSolaris build 134, a with couple of igb (Intel 1 Gbit ) NICs, 2 physical cpus ( Xeon 5520), 4 cores each, 2 hyperthreads per core. The NICs are Intel 82576 on-motherboard chips. Jim was asking for hardware / configuration detail, what other info would be helpful? Steve -- This message posted from opensolaris.org
Jim Mauro
2010-Apr-22 17:23 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little
Groan. I''m such an idiot. I should have been more precise in terms of where and when you get set this. Sorry folks. On Apr 22, 2010, at 11:11 AM, Richard Skelton wrote:> Hi Jim, > If I set "set idle_cpu_prefer_mwait = 0" in /etc/system on a X2270 running Solaris 10 10/09 s10x_u8wos_08a X86 > I get a kernel panic upon reboot :-( > Is this setting only for recent versions of OpenSolaris ? > > Cheers > Richard. > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Steve Gonczi
2010-Apr-22 18:12 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little
OK.. so what is then the recommended action? Set this from MDB on boot? -- This message posted from opensolaris.org
Steve Gonczi
2010-Apr-22 19:21 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little
If I am reading bug # 6588054 correctly, these are the 2 things that need to be done: 1) : set cpuid_feature_ecx_exclude=8 in /etc/system 2) : boot into kmdb and set the variable: idle_cpu_prefer_mwait to zero. I am trying this in a minute... Steve -- This message posted from opensolaris.org
Steve Gonczi
2010-Apr-22 21:44 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little..
Just an update, setting either of the following variables in /etc/system cpuid_feature_ecx_exclude idle_cpu_prefer_mwait causes the machine to go into endless reboot cycles. Both of these variables can be set via mdb -kw jut fine. -- This message posted from opensolaris.org
Steve Gonczi
2010-Apr-22 22:20 UTC
[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little..
Another update, The 2 settings mentioned seem to be beneficial. I no longer see the sluggishness, running the same network load as before. I am extending the test duration to overnight to see if this really solved the issue for good. Thanks to everybody for helping out. -- This message posted from opensolaris.org
Hello.. In the probe entry/return statements the - provider:module:function - is used.. Say the module is libc, then. provider:libc.so.1::entry { } If i need to trace, function entry points in all other modules -but for- libc, what is the easiest way ? Thanks Srihari
Hi Srihari: May be you can try something like pid:::entry /probemod!="libc.so.1"/ { } -Angelo On Apr 23, 2010, at 5:16 PM, Srihari Venkatesan wrote:> Hello.. > > In the probe entry/return statements the - provider:module:function - is used.. > > Say the module is libc, then. > > provider:libc.so.1::entry > { > } > > If i need to trace, function entry points in all other modules -but for- libc, what is the easiest way ? > > Thanks > Srihari > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Do these post have some connection to the thread topic? BTW. the mdb settings recommended by Jim Mauro actually did not make a difference, I jumped to to wrong conclusion, based on an incorrect test setup. My sense is that interrupts are not being serviced. Assume that most cpu-s have been given a cli instruction, and just ignoring interrupts most of the time. Anyone has a suggestion on how to prove/disprove this? -- This message posted from opensolaris.org
interrupts not being serviced? Do you mean device interrupts - NIC and HBA? Or higher-level interrupts, like errors? That seems a wee bit extreme, but intrstat should help here. Or use dtrace and determine which interrupt services routines are running. If you''re referring to device interrupts, the service routines all have "intr" in the function name. jimm at pae-g4-nv:~# dtrace -n ''fbt::*intr*:entry { @[probefunc] = count(); }'' dtrace: description ''fbt::*intr*:entry '' matched 339 probes ^C The above probably seems extreme, but just run it for a few seconds and see which probes actually fire. Fine tune your probe spec from there. On Apr 28, 2010, at 10:14 AM, Steve Gonczi wrote:> Do these post have some connection to the thread topic? > > BTW. the mdb settings recommended by Jim Mauro actually did not make a difference, I > jumped to to wrong conclusion, based on an incorrect test setup. > > My sense is that interrupts are not being serviced. Assume that most cpu-s have been given a cli instruction, and just ignoring interrupts most of the time. > > Anyone has a suggestion on how to prove/disprove this? > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org