thr3ads.net - dtrace discuss - [dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity. [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Steve Gonczi

2010-Apr-21 19:51 UTC

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.

I have a fast, 8 core x 2 hyperthread  Nehalem system, 48 gigs of memory. During
network throughput testing ( multiple ftp server instances running, 
transferring 200 megabytes /sec on 2x GIG-E interfaces)  the system periodically
becomes extremely sluggish.  (I can barely type in commands from the console,
and network throughput drops down to nothing).  Once the system gets into this
state it will not recover unless we kill the network load.

I see run queue values 21 10 5, several 100 mf-s in vmsstat, but otherwise
little activity.
Occasionally I see 100% kernel/system activity for seconds at a time.

Trying to figure out who is using cpu in kernel via dtrace profiling scripts for
10 secs at a time:
e.g.: dtrace -n ''profile:::profile-3456 /arg0/ { @[stack(1)] = count();
}''
but I get the dtrace watchdog abort "Abort due to systemic
unresponsiveness"
Tried to force the script to run via -w, and I just see a very low count 
( 1 - 2 max) of seemingly random functions sampled.

I wonder if there is perhaps a hardware issue, that prevents the dtrace sampling
interrupts from being run.   I tried to see what is going on with a variety of
other tools (all the various *stat commands) but fail to see anything obvious,
other than the run queue and the occasional 100% kernel.  I typically see an
almost idle system, no lock contention, no io wait, low system call, context
switch and stmx counts.
  
Any suggestions regarding what tool /dtrace script to use, or where to look to
get to the bottom of the sluggishness would be much appreciated.

TIA

Steve
-- 
This message posted from opensolaris.org

John Higgins - Oracle Corporation

2010-Apr-21 20:09 UTC

head link

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.

You might consider using processor sets and psradm to segragate the cpus 
handling network interrupts. That should allow you to instrument further 
(dtrace/guds/sar) - though once
you''ve freed up some cycles from constantly handling interrupts; you
may
be done.


Steve Gonczi wrote:> I have a fast, 8 core x 2 hyperthread  Nehalem system, 48 gigs of memory.
During network throughput testing ( multiple ftp server instances running, 
transferring 200 megabytes /sec on 2x GIG-E interfaces)  the system periodically
becomes extremely sluggish.  (I can barely type in commands from the console,
and network throughput drops down to nothing).  Once the system gets into this
state it will not recover unless we kill the network load.
>
> I see run queue values 21 10 5, several 100 mf-s in vmsstat, but otherwise
little activity.
> Occasionally I see 100% kernel/system activity for seconds at a time.
>
> Trying to figure out who is using cpu in kernel via dtrace profiling
scripts for 10 secs at a time:
> e.g.: dtrace -n ''profile:::profile-3456 /arg0/ { @[stack(1)] =
count(); }''
> but I get the dtrace watchdog abort "Abort due to systemic
unresponsiveness"
> Tried to force the script to run via -w, and I just see a very low count 
> ( 1 - 2 max) of seemingly random functions sampled.
>
> I wonder if there is perhaps a hardware issue, that prevents the dtrace
sampling interrupts from being run.   I tried to see what is going on with a
variety of other tools (all the various *stat commands) but fail to see anything
obvious, other than the run queue and the occasional 100% kernel.  I typically
see an almost idle system, no lock contention, no io wait, low system call,
context switch and stmx counts.
>   
> Any suggestions regarding what tool /dtrace script to use, or where to look
to get to the bottom of the sluggishness would be much appreciated.
>
> TIA
>
> Steve
>   

-- 
Oracle Logo
John Higgins | Principal Support Engineer
Phone: 858.449.5087
Oracle Global Customer Services, North America
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20100421/255fc490/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oraclsig
Type: image/gif
Size: 658 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20100421/255fc490/attachment.gif>

Steve Gonczi

2010-Apr-21 21:28 UTC

head link

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.

Thank you very much for the suggestion.

I have shut off interrupt processing on 4 out of the 16 cpus, and now I can run
dtrace
profiling, without the watchdog killing dtrace. 

The machine still goes into a bogged-down state.
Running dtrace -n ''profile:::profile-3456 /arg0/ { @[stack(4)] =
count(); }''

for a minute or so while in the "bad state"  fingers these stacks as
most frequent:
Looks mostly idle to me.   Yet, as I type, I have to wait seconds for my text to
appear on the console,
and network throughput is down to nothing.

Ideas, anyone? 

    unix`disp_getwork+0xb6
              unix`disp+0x1c2
              unix`swtch+0xa4
              genunix`cv_wait+0x61
             3742

              zfs`lzjb_compress+0xee
              zfs`zio_compress_data+0x8e
              zfs`zio_write_bp_init+0x216
              zfs`zio_execute+0x8d
             3808

              unix`ddi_get32+0x14
              mac`mac_hwring_disable_intr+0x1d
              mac`mac_rx_srs_drain+0x3a2
              mac`mac_rx_srs_process+0x1db
             3819

              unix`0xfffffffffb85074a
              genunix`uiomove+0xe9
              sockfs`socopyoutuio+0x68
              sockfs`so_dequeue_msg+0x4e9
             4886

              unix`i86_monitor+0x10
              unix`cpu_idle_mwait+0xbe
              unix`cpu_acpi_idle+0x8d
              unix`cpu_idle_adaptive+0x19
             6028

              unix`acpi_cpu_cstate+0x2f0
              unix`cpu_acpi_idle+0x82
              unix`cpu_idle_adaptive+0x19
              unix`idle+0x114
             9725

              unix`atomic_and_64+0x4
              unix`acpi_cpu_cstate+0x2d9
              unix`cpu_acpi_idle+0x82
              unix`cpu_idle_adaptive+0x19
            17888

              unix`acpi_cpu_cstate+0x2ae
              unix`cpu_acpi_idle+0x82
              unix`cpu_idle_adaptive+0x19
              unix`idle+0x114
            86695

              unix`i86_mwait+0xd
              unix`cpu_idle_mwait+0xf1
              unix`cpu_acpi_idle+0x8d
              unix`cpu_idle_adaptive+0x19
          1645528
-- 
This message posted from opensolaris.org

Jim Mauro

2010-Apr-22 14:03 UTC

head link

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.

Not knowing anything about what your hardware is, or what version of
Solaris you are running...

the i86_mwait in the stack set off an alarm. There was a change (bug? erratum?)
on some intel processors that caused huge memory latencies.

The work around was to 
"set idle_cpu_prefer_mwait = 0" in /etc/system

Not knowing anything else, you could give it a try before we
dig deeper.


On Apr 21, 2010, at 5:28 PM, Steve Gonczi wrote:
> Thank you very much for the suggestion.
> 
> I have shut off interrupt processing on 4 out of the 16 cpus, and now I can
run dtrace
> profiling, without the watchdog killing dtrace. 
> 
> The machine still goes into a bogged-down state.
> Running dtrace -n ''profile:::profile-3456 /arg0/ { @[stack(4)] =
count(); }''
> 
> for a minute or so while in the "bad state"  fingers these stacks
as most frequent:
> Looks mostly idle to me.   Yet, as I type, I have to wait seconds for my
text to appear on the console,
> and network throughput is down to nothing.
> 
> Ideas, anyone? 
> 
>    unix`disp_getwork+0xb6
>              unix`disp+0x1c2
>              unix`swtch+0xa4
>              genunix`cv_wait+0x61
>             3742
> 
>              zfs`lzjb_compress+0xee
>              zfs`zio_compress_data+0x8e
>              zfs`zio_write_bp_init+0x216
>              zfs`zio_execute+0x8d
>             3808
> 
>              unix`ddi_get32+0x14
>              mac`mac_hwring_disable_intr+0x1d
>              mac`mac_rx_srs_drain+0x3a2
>              mac`mac_rx_srs_process+0x1db
>             3819
> 
>              unix`0xfffffffffb85074a
>              genunix`uiomove+0xe9
>              sockfs`socopyoutuio+0x68
>              sockfs`so_dequeue_msg+0x4e9
>             4886
> 
>              unix`i86_monitor+0x10
>              unix`cpu_idle_mwait+0xbe
>              unix`cpu_acpi_idle+0x8d
>              unix`cpu_idle_adaptive+0x19
>             6028
> 
>              unix`acpi_cpu_cstate+0x2f0
>              unix`cpu_acpi_idle+0x82
>              unix`cpu_idle_adaptive+0x19
>              unix`idle+0x114
>             9725
> 
>              unix`atomic_and_64+0x4
>              unix`acpi_cpu_cstate+0x2d9
>              unix`cpu_acpi_idle+0x82
>              unix`cpu_idle_adaptive+0x19
>            17888
> 
>              unix`acpi_cpu_cstate+0x2ae
>              unix`cpu_acpi_idle+0x82
>              unix`cpu_idle_adaptive+0x19
>              unix`idle+0x114
>            86695
> 
>              unix`i86_mwait+0xd
>              unix`cpu_idle_mwait+0xf1
>              unix`cpu_acpi_idle+0x8d
>              unix`cpu_idle_adaptive+0x19
>          1645528
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

Richard Skelton

2010-Apr-22 15:11 UTC

head link

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

Hi Jim,
If I set "set idle_cpu_prefer_mwait = 0" in /etc/system on a X2270
running Solaris 10 10/09 s10x_u8wos_08a X86
I get a kernel panic upon reboot :-(
Is this setting only for recent versions of OpenSolaris ?

Cheers
Richard.
-- 
This message posted from opensolaris.org

Steve Gonczi

2010-Apr-22 16:35 UTC

head link

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

Hi Guys,

I have just tried the  set idle_cpu_prefer_mwait = 0
setting as well, and was unable to boot up.

(I am running OpenSolaris build 134, a with couple of igb (Intel 1 Gbit ) NICs,
2 physical cpus ( Xeon 5520), 4 cores each, 2 hyperthreads per core.
The NICs are Intel 82576 on-motherboard chips. 
  
Jim was asking for hardware / configuration detail,  what 
other info  would be helpful?
 
Steve
-- 
This message posted from opensolaris.org

Jim Mauro

2010-Apr-22 17:23 UTC

head link

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

Groan. I''m such an idiot. I should have been more
precise in terms of  where and when you get set this.

Sorry folks.
On Apr 22, 2010, at 11:11 AM, Richard Skelton wrote:
> Hi Jim,
> If I set "set idle_cpu_prefer_mwait = 0" in /etc/system on a
X2270 running Solaris 10 10/09 s10x_u8wos_08a X86
> I get a kernel panic upon reboot :-(
> Is this setting only for recent versions of OpenSolaris ?
> 
> Cheers
> Richard.
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

Steve Gonczi

2010-Apr-22 18:12 UTC

head link

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

OK.. so what is then the recommended action?  Set this from MDB on boot?
-- 
This message posted from opensolaris.org

Steve Gonczi

2010-Apr-22 19:21 UTC

head link

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

If I am reading  bug # 6588054 correctly, these are the 2 things that need to 
be done:


1)  : 
set cpuid_feature_ecx_exclude=8
 in  /etc/system 

2) :
boot into kmdb and set the variable: 

  idle_cpu_prefer_mwait  to zero.

I am trying this in a minute...
Steve
-- 
This message posted from opensolaris.org

Steve Gonczi

2010-Apr-22 21:44 UTC

head link

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little..

Just an update, setting either of the following variables in /etc/system

cpuid_feature_ecx_exclude 
idle_cpu_prefer_mwait

causes the machine to go into endless reboot cycles.

Both of these variables can be set via mdb -kw jut fine.
-- 
This message posted from opensolaris.org

Steve Gonczi

2010-Apr-22 22:20 UTC

head link

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little..

Another update,

The 2 settings mentioned  seem to be beneficial. I no longer see the
sluggishness, running the same network load as before.  I am extending the test
duration to overnight
to see if this really solved the issue for good.

Thanks to everybody for helping out.
-- 
This message posted from opensolaris.org

Srihari Venkatesan

2010-Apr-23 21:16 UTC

head link

[dtrace-discuss] matching the -module-

Hello..

In the probe entry/return statements the - provider:module:function - is 
used..

Say the module is libc, then.

provider:libc.so.1::entry
{
}

If i need to trace, function entry points in all other modules -but for- 
libc, what is the easiest way ?

Thanks
Srihari

Angelo Rajadurai

2010-Apr-23 21:21 UTC

head link

[dtrace-discuss] matching the -module-

Hi Srihari:

May be you can try something like

pid:::entry
/probemod!="libc.so.1"/
{

}

-Angelo

On Apr 23, 2010, at 5:16 PM, Srihari Venkatesan wrote:
> Hello..
> 
> In the probe entry/return statements the - provider:module:function - is
used..
> 
> Say the module is libc, then.
> 
> provider:libc.so.1::entry
> {
> }
> 
> If i need to trace, function entry points in all other modules -but for-
libc, what is the easiest way ?
> 
> Thanks
> Srihari
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

Steve Gonczi

2010-Apr-28 14:14 UTC

head link

[dtrace-discuss] matching the -module-

Do these post have some connection to the thread topic?

BTW. the mdb settings recommended by Jim Mauro actually did not make a
difference, I
 jumped to to wrong conclusion, based on an incorrect test setup.

My sense is that interrupts are not being serviced.  Assume that most cpu-s have
been given a cli instruction, and just ignoring interrupts most of the time.

Anyone has a suggestion on how to prove/disprove this?
-- 
This message posted from opensolaris.org

Jim Mauro

2010-Apr-28 14:25 UTC

head link

[dtrace-discuss] matching the -module-

interrupts not being serviced?

Do you mean device interrupts - NIC and HBA?

Or higher-level interrupts, like errors?

That seems a wee bit extreme, but intrstat should help
here. Or use dtrace and determine which interrupt services
routines are running. If you''re referring to device interrupts,
the service routines all have "intr" in the function name.

jimm at pae-g4-nv:~# dtrace -n ''fbt::*intr*:entry { @[probefunc] =
count(); }''
dtrace: description ''fbt::*intr*:entry '' matched 339 probes
^C

The above probably seems extreme, but just run it for
a few seconds and see which probes actually fire. Fine
tune your probe spec from there.

On Apr 28, 2010, at 10:14 AM, Steve Gonczi wrote:
> Do these post have some connection to the thread topic?
> 
> BTW. the mdb settings recommended by Jim Mauro actually did not make a
difference, I
> jumped to to wrong conclusion, based on an incorrect test setup.
> 
> My sense is that interrupts are not being serviced.  Assume that most cpu-s
have been given a cli instruction, and just ignoring interrupts most of the
time.
> 
> Anyone has a suggestion on how to prove/disprove this?
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

dtrace discuss - Apr 2010 - Very sluggish system, yet mpstat/vmstat shows little activity.

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little activity.

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little..

[dtrace-discuss] Very sluggish system, yet mpstat/vmstat shows little..

[dtrace-discuss] matching the -module-

[dtrace-discuss] matching the -module-

[dtrace-discuss] matching the -module-

[dtrace-discuss] matching the -module-