thr3ads.net - zfs discuss - [zfs-discuss] Occasional storm of xcalls on segkmem_zio

If this information is useful, please help other people find it:
Share via:

Sašo Kiselkov

2012-Jun-06 07:48 UTC

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I''m
occasionally seeing a storm of xcalls on one of the 32 VCPUs (>100000
xcalls a second). The machine is pretty much idle, only receiving a
bunch of multicast video streams and dumping them to the drives (at a
rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of
xcalls that completely eat one of the CPUs, so the mpstat line for the
CPU looks like:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 31    0   0 102191     1    0    0    0    0    0    0     0    0 100
 0   0

100% busy in the system processing cross-calls. When I tried dtracing
this issue, I found that this is the most likely culprit:

dtrace -n ''sysinfo:::xcalls {@[stack()]=count();}''
   unix`xc_call+0x46
   unix`hat_tlb_inval+0x283
   unix`x86pte_inval+0xaa
   unix`hat_pte_unmap+0xed
   unix`hat_unload_callback+0x193
   unix`hat_unload+0x41
   unix`segkmem_free_vn+0x6f
   unix`segkmem_zio_free+0x27
   genunix`vmem_xfree+0x104
   genunix`vmem_free+0x29
   genunix`kmem_slab_destroy+0x87
   genunix`kmem_slab_free+0x2bb
   genunix`kmem_magazine_destroy+0x39a
   genunix`kmem_depot_ws_reap+0x66
   genunix`taskq_thread+0x285
   unix`thread_start+0x8
3221701

This happens in the sched (pid 0) process. My fsstat one looks like this:

# fsstat /content 1
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   set    ops   ops   ops bytes   ops bytes
    0     0     0   664     0    952     0     0     0   664 38.0M /content
    0     0     0   658     0    935     0     0     0   656 38.6M /content
    0     0     0   660     0    946     0     0     0   659 37.8M /content
    0     0     0   677     0    969     0     0     0   676 38.5M /content

What''s even more puzzling is that this happens apparently entirely
because of some factor other than userland, since I see no changes to
CPU usage of processes in prstat(1M) when this xcall storm happens, only
an increase of loadavg of +1.00 (the busy CPU).

I Googled and found that
http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
seems to have been an issue identical to mine, however, it remains
unresolved at that time and it worries me about putting this kind of
machine into production use.

Could some ZFS guru please tell me what''s going on in segkmem_zio_free?
When I disable the writers to the /content filesystem, this issue goes
away, so it has obviously something to do with disk IO. Thanks!

Cheers,
--
Saso

Richard Elling

2012-Jun-06 14:55 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On Jun 6, 2012, at 12:48 AM, Sa?o Kiselkov wrote:
> So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
> to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
> LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I''m
> occasionally seeing a storm of xcalls on one of the 32 VCPUs (>100000
> xcalls a second).
That isn''t much of a storm, I''ve seen > 1M xcalls in some
cases...
> The machine is pretty much idle, only receiving a
> bunch of multicast video streams and dumping them to the drives (at a
> rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of
> xcalls that completely eat one of the CPUs, so the mpstat line for the
> CPU looks like:
> 
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
> 31    0   0 102191     1    0    0    0    0    0    0     0    0 100
> 0   0
> 
> 100% busy in the system processing cross-calls. When I tried dtracing
> this issue, I found that this is the most likely culprit:
> 
> dtrace -n ''sysinfo:::xcalls {@[stack()]=count();}''
>   unix`xc_call+0x46
>   unix`hat_tlb_inval+0x283
>   unix`x86pte_inval+0xaa
>   unix`hat_pte_unmap+0xed
>   unix`hat_unload_callback+0x193
>   unix`hat_unload+0x41
>   unix`segkmem_free_vn+0x6f
>   unix`segkmem_zio_free+0x27
>   genunix`vmem_xfree+0x104
>   genunix`vmem_free+0x29
>   genunix`kmem_slab_destroy+0x87
>   genunix`kmem_slab_free+0x2bb
>   genunix`kmem_magazine_destroy+0x39a
>   genunix`kmem_depot_ws_reap+0x66
>   genunix`taskq_thread+0x285
>   unix`thread_start+0x8
> 3221701
> 
> This happens in the sched (pid 0) process. My fsstat one looks like this:
> 
> # fsstat /content 1
> new  name   name  attr  attr lookup rddir  read read  write write
> file remov  chng   get   set    ops   ops   ops bytes   ops bytes
>    0     0     0   664     0    952     0     0     0   664 38.0M /content
>    0     0     0   658     0    935     0     0     0   656 38.6M /content
>    0     0     0   660     0    946     0     0     0   659 37.8M /content
>    0     0     0   677     0    969     0     0     0   676 38.5M /content
> 
> What''s even more puzzling is that this happens apparently entirely
> because of some factor other than userland, since I see no changes to
> CPU usage of processes in prstat(1M) when this xcall storm happens, only
> an increase of loadavg of +1.00 (the busy CPU).
What exactly is the workload doing?

Local I/O, iSCSI, NFS, or CIFS?
> I Googled and found that
>
http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
> seems to have been an issue identical to mine, however, it remains
> unresolved at that time and it worries me about putting this kind of
> machine into production use.
> 
> Could some ZFS guru please tell me what''s going on in
segkmem_zio_free?
It is freeing memory.
> When I disable the writers to the /content filesystem, this issue goes
> away, so it has obviously something to do with disk IO. Thanks!
Not directly related to disk I/O bandwidth. Can be directly related to other
use, such as deletions -- something that causes frees.

Depending on the cause, there can be some tuning that applies for large
memory machines, where large is >= 96 MB.
 -- richard

--
ZFS Performance and Training
Richard.Elling at RichardElling.com
+1-760-896-4422







-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120606/e0581de3/attachment.html>

Sašo Kiselkov

2012-Jun-06 15:01 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/06/2012 04:55 PM, Richard Elling wrote:> On Jun 6, 2012, at 12:48 AM, Sa?o Kiselkov wrote:
> 
>> So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
>> to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
>> LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and
I''m
>> occasionally seeing a storm of xcalls on one of the 32 VCPUs
(>100000
>> xcalls a second).
> 
> That isn''t much of a storm, I''ve seen > 1M xcalls in
some cases...
Well it does make one of the cores 100% busy for around 10-15 seconds,
so it is processing at the maximum rate the core can do it. I''d call
that a sign of "something bad(tm)" going on.
>> The machine is pretty much idle, only receiving a
>> bunch of multicast video streams and dumping them to the drives (at a
>> rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm
of
>> xcalls that completely eat one of the CPUs, so the mpstat line for the
>> CPU looks like:
>>
>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys 
wt idl
>> 31    0   0 102191     1    0    0    0    0    0    0     0    0 100
>> 0   0
>>
>> 100% busy in the system processing cross-calls. When I tried dtracing
>> this issue, I found that this is the most likely culprit:
>>
>> dtrace -n ''sysinfo:::xcalls {@[stack()]=count();}''
>>   unix`xc_call+0x46
>>   unix`hat_tlb_inval+0x283
>>   unix`x86pte_inval+0xaa
>>   unix`hat_pte_unmap+0xed
>>   unix`hat_unload_callback+0x193
>>   unix`hat_unload+0x41
>>   unix`segkmem_free_vn+0x6f
>>   unix`segkmem_zio_free+0x27
>>   genunix`vmem_xfree+0x104
>>   genunix`vmem_free+0x29
>>   genunix`kmem_slab_destroy+0x87
>>   genunix`kmem_slab_free+0x2bb
>>   genunix`kmem_magazine_destroy+0x39a
>>   genunix`kmem_depot_ws_reap+0x66
>>   genunix`taskq_thread+0x285
>>   unix`thread_start+0x8
>> 3221701
>>
>> This happens in the sched (pid 0) process. My fsstat one looks like
this:
>>
>> # fsstat /content 1
>> new  name   name  attr  attr lookup rddir  read read  write write
>> file remov  chng   get   set    ops   ops   ops bytes   ops bytes
>>    0     0     0   664     0    952     0     0     0   664 38.0M
/content
>>    0     0     0   658     0    935     0     0     0   656 38.6M
/content
>>    0     0     0   660     0    946     0     0     0   659 37.8M
/content
>>    0     0     0   677     0    969     0     0     0   676 38.5M
/content
>>
>> What''s even more puzzling is that this happens apparently
entirely
>> because of some factor other than userland, since I see no changes to
>> CPU usage of processes in prstat(1M) when this xcall storm happens,
only
>> an increase of loadavg of +1.00 (the busy CPU).
> 
> What exactly is the workload doing?
As I wrote above, just receiving multicast video streams and writing
them to disk files, nothing else. The fsstat lines above show that -
pure write load.
> Local I/O, iSCSI, NFS, or CIFS?
Purely local I/O via the two LSI SAS controllers, nothing else.
>> I Googled and found that
>>
http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
>> seems to have been an issue identical to mine, however, it remains
>> unresolved at that time and it worries me about putting this kind of
>> machine into production use.
>>
>> Could some ZFS guru please tell me what''s going on in
segkmem_zio_free?
> 
> It is freeing memory.
Yes, but why is this causing a ton of cross-calls?
>> When I disable the writers to the /content filesystem, this issue goes
>> away, so it has obviously something to do with disk IO. Thanks!
> 
> Not directly related to disk I/O bandwidth. Can be directly related to
other
> use, such as deletions -- something that causes frees.
When I''m not writing to disk it doesn''t happen, so my guess
that it
indeed has something to do with (perhaps) ZFS freeing txg buffers or
something...
> Depending on the cause, there can be some tuning that applies for large
> memory machines, where large is >= 96 MB.
>  -- richard
I''ll try and load the machine with dd(1) to the max to see if access
patterns of my software have something to do with it.

Cheers,
--
Saso

Sašo Kiselkov

2012-Jun-06 15:22 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/06/2012 05:01 PM, Sa?o Kiselkov wrote:> I''ll try and load the machine with dd(1) to the max to see if
access
> patterns of my software have something to do with it.
Tried and tested, any and all write I/O to the pool causes this xcall
storm issue, writing more data to it only exacerbates it (i.e. it occurs
more often). I still get storms of over 100k xcalls completely draining
one CPU core, but now they happen in 20-30s intervals rather than every
1-2 minutes. Writing to the rpool, however, does not, so I suspect it
has something to do with the MPxIO and how ZFS is pumping data into the
twin LSI 9200 controllers. Each is attached to a different CPU I/O
bridge (since the system has two Opterons, it has two I/O bridges, each
handling roughly half of the PCI-e links). I did this in the hope of
improving performance (since the HT links to the I/O bridges will be
more evenly loaded). Any idea of this might be the cause of this issue?

The whole system diagram is:

CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-,
 |                                            \
(ht)                                           == JBOD
 |                                            /
CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-''

Cheers,
--
Saso

Richard Elling

2012-Jun-06 15:34 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On Jun 6, 2012, at 8:01 AM, Sa?o Kiselkov wrote:
> On 06/06/2012 04:55 PM, Richard Elling wrote:
>> On Jun 6, 2012, at 12:48 AM, Sa?o Kiselkov wrote:
>> 
>>> So I have this dual 16-core Opteron Dell R715 with 128G of RAM
attached
>>> to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via
two
>>> LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and
I''m
>>> occasionally seeing a storm of xcalls on one of the 32 VCPUs
(>100000
>>> xcalls a second).
>> 
>> That isn''t much of a storm, I''ve seen > 1M xcalls
in some cases...
> 
> Well it does make one of the cores 100% busy for around 10-15 seconds,
> so it is processing at the maximum rate the core can do it. I''d
call
> that a sign of "something bad(tm)" going on.
It is unpleasant, for sure...
>>> The machine is pretty much idle, only receiving a
>>> bunch of multicast video streams and dumping them to the drives (at
a
>>> rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a
storm of
>>> xcalls that completely eat one of the CPUs, so the mpstat line for
the
>>> CPU looks like:
>>> 
>>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr
sys  wt idl
>>> 31    0   0 102191     1    0    0    0    0    0    0     0    0
100
>>> 0   0
>>> 
>>> 100% busy in the system processing cross-calls. When I tried
dtracing
>>> this issue, I found that this is the most likely culprit:
>>> 
>>> dtrace -n ''sysinfo:::xcalls
{@[stack()]=count();}''
>>>  unix`xc_call+0x46
>>>  unix`hat_tlb_inval+0x283
>>>  unix`x86pte_inval+0xaa
>>>  unix`hat_pte_unmap+0xed
>>>  unix`hat_unload_callback+0x193
>>>  unix`hat_unload+0x41
>>>  unix`segkmem_free_vn+0x6f
>>>  unix`segkmem_zio_free+0x27
>>>  genunix`vmem_xfree+0x104
>>>  genunix`vmem_free+0x29
>>>  genunix`kmem_slab_destroy+0x87
>>>  genunix`kmem_slab_free+0x2bb
>>>  genunix`kmem_magazine_destroy+0x39a
>>>  genunix`kmem_depot_ws_reap+0x66
>>>  genunix`taskq_thread+0x285
>>>  unix`thread_start+0x8
>>> 3221701
>>> 
>>> This happens in the sched (pid 0) process. My fsstat one looks like
this:
>>> 
>>> # fsstat /content 1
>>> new  name   name  attr  attr lookup rddir  read read  write write
>>> file remov  chng   get   set    ops   ops   ops bytes   ops bytes
>>>   0     0     0   664     0    952     0     0     0   664 38.0M
/content
>>>   0     0     0   658     0    935     0     0     0   656 38.6M
/content
>>>   0     0     0   660     0    946     0     0     0   659 37.8M
/content
>>>   0     0     0   677     0    969     0     0     0   676 38.5M
/content
>>> 
>>> What''s even more puzzling is that this happens apparently
entirely
>>> because of some factor other than userland, since I see no changes
to
>>> CPU usage of processes in prstat(1M) when this xcall storm happens,
only
>>> an increase of loadavg of +1.00 (the busy CPU).
>> 
>> What exactly is the workload doing?
> 
> As I wrote above, just receiving multicast video streams and writing
> them to disk files, nothing else. The fsstat lines above show that -
> pure write load.
> 
>> Local I/O, iSCSI, NFS, or CIFS?
> 
> Purely local I/O via the two LSI SAS controllers, nothing else.
> 
>>> I Googled and found that
>>>
http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
>>> seems to have been an issue identical to mine, however, it remains
>>> unresolved at that time and it worries me about putting this kind
of
>>> machine into production use.
>>> 
>>> Could some ZFS guru please tell me what''s going on in
segkmem_zio_free?
>> 
>> It is freeing memory.
> 
> Yes, but why is this causing a ton of cross-calls?
To better understand xcalls, see Adam''s excellent blog on the topic
	https://blogs.oracle.com/ahl/entry/debugging_cross_calls_on_opensolaris

The scenario described there probably isn''t your problem, but the
explaination
of xcalls is quite good.
>>> When I disable the writers to the /content filesystem, this issue
goes
>>> away, so it has obviously something to do with disk IO. Thanks!
>> 
>> Not directly related to disk I/O bandwidth. Can be directly related to
other
>> use, such as deletions -- something that causes frees.
> 
> When I''m not writing to disk it doesn''t happen, so my
guess that it
> indeed has something to do with (perhaps) ZFS freeing txg buffers or
> something...
> 
>> Depending on the cause, there can be some tuning that applies for large
>> memory machines, where large is >= 96 MB.
>> -- richard
> 
> I''ll try and load the machine with dd(1) to the max to see if
access
> patterns of my software have something to do with it.
OK, next step is to determine if this is part of the normal ARC resizing or not.
If it is, then tuning can help. I''ve got a few scripts to do this that
aren''t posted.
Fortunately, Brendan posted his, solook for the arcreap.d script at
http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/

If you are seeing lots of xcall activity related to arc_shrink, then this can be
tuned.
 -- richard

--
ZFS Performance and Training
Richard.Elling at RichardElling.com
+1-760-896-4422







-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120606/ff3dbb0f/attachment-0001.html>

Richard Elling

2012-Jun-06 16:35 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On Jun 6, 2012, at 8:22 AM, Sa?o Kiselkov wrote:
> On 06/06/2012 05:01 PM, Sa?o Kiselkov wrote:
>> I''ll try and load the machine with dd(1) to the max to see if
access
>> patterns of my software have something to do with it.
> 
> Tried and tested, any and all write I/O to the pool causes this xcall
> storm issue, writing more data to it only exacerbates it (i.e. it occurs
> more often). I still get storms of over 100k xcalls completely draining
> one CPU core, but now they happen in 20-30s intervals rather than every
> 1-2 minutes. Writing to the rpool, however, does not, so I suspect it
> has something to do with the MPxIO and how ZFS is pumping data into the
> twin LSI 9200 controllers. Each is attached to a different CPU I/O
> bridge (since the system has two Opterons, it has two I/O bridges, each
> handling roughly half of the PCI-e links). I did this in the hope of
> improving performance (since the HT links to the I/O bridges will be
> more evenly loaded). Any idea of this might be the cause of this issue?
No, this is definitely not the cause of the issue. Looks like a reasonable
config
to me.
  -- richard
> 
> The whole system diagram is:
> 
> CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-,
> |                                            \
> (ht)                                           == JBOD
> |                                            /
> CPU --(ht)-- IOB --(pcie)-- LSI 9200 --(sas)-''
> 
> Cheers,
> --
> Saso
--
ZFS Performance and Training
Richard.Elling at RichardElling.com
+1-760-896-4422







-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120606/9c37795d/attachment.html>

Jim Mauro

2012-Jun-06 19:43 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

I can''t help but be curious about something, which perhaps you verified
but
did not post.

What the data here shows is;
- CPU 31 is buried in the kernel (100% sys).
- CPU 31 is handling a moderate-to-high rate of xcalls.

What the data does not prove empirically is that the 100% sys time of
CPU 31 is in xcall handling.

What''s the hot stack when this occurs and you run this;

dtrace -n ''profile-997hz /cpu == 31/ { @[stack()] = count();
}''


On Jun 6, 2012, at 3:48 AM, Sa?o Kiselkov wrote:
> So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
> to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
> LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I''m
> occasionally seeing a storm of xcalls on one of the 32 VCPUs (>100000
> xcalls a second). The machine is pretty much idle, only receiving a
> bunch of multicast video streams and dumping them to the drives (at a
> rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of
> xcalls that completely eat one of the CPUs, so the mpstat line for the
> CPU looks like:
> 
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
> 31    0   0 102191     1    0    0    0    0    0    0     0    0 100
> 0   0
> 
> 100% busy in the system processing cross-calls. When I tried dtracing
> this issue, I found that this is the most likely culprit:
> 
> dtrace -n ''sysinfo:::xcalls {@[stack()]=count();}''
>   unix`xc_call+0x46
>   unix`hat_tlb_inval+0x283
>   unix`x86pte_inval+0xaa
>   unix`hat_pte_unmap+0xed
>   unix`hat_unload_callback+0x193
>   unix`hat_unload+0x41
>   unix`segkmem_free_vn+0x6f
>   unix`segkmem_zio_free+0x27
>   genunix`vmem_xfree+0x104
>   genunix`vmem_free+0x29
>   genunix`kmem_slab_destroy+0x87
>   genunix`kmem_slab_free+0x2bb
>   genunix`kmem_magazine_destroy+0x39a
>   genunix`kmem_depot_ws_reap+0x66
>   genunix`taskq_thread+0x285
>   unix`thread_start+0x8
> 3221701
> 
> This happens in the sched (pid 0) process. My fsstat one looks like this:
> 
> # fsstat /content 1
> new  name   name  attr  attr lookup rddir  read read  write write
> file remov  chng   get   set    ops   ops   ops bytes   ops bytes
>    0     0     0   664     0    952     0     0     0   664 38.0M /content
>    0     0     0   658     0    935     0     0     0   656 38.6M /content
>    0     0     0   660     0    946     0     0     0   659 37.8M /content
>    0     0     0   677     0    969     0     0     0   676 38.5M /content
> 
> What''s even more puzzling is that this happens apparently entirely
> because of some factor other than userland, since I see no changes to
> CPU usage of processes in prstat(1M) when this xcall storm happens, only
> an increase of loadavg of +1.00 (the busy CPU).
> 
> I Googled and found that
>
http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
> seems to have been an issue identical to mine, however, it remains
> unresolved at that time and it worries me about putting this kind of
> machine into production use.
> 
> Could some ZFS guru please tell me what''s going on in
segkmem_zio_free?
> When I disable the writers to the /content filesystem, this issue goes
> away, so it has obviously something to do with disk IO. Thanks!
> 
> Cheers,
> --
> Saso
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Sašo Kiselkov

2012-Jun-06 20:27 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/06/2012 09:43 PM, Jim Mauro wrote:> 
> I can''t help but be curious about something, which perhaps you
verified but
> did not post.
> 
> What the data here shows is;
> - CPU 31 is buried in the kernel (100% sys).
> - CPU 31 is handling a moderate-to-high rate of xcalls.
> 
> What the data does not prove empirically is that the 100% sys time of
> CPU 31 is in xcall handling.
> 
> What''s the hot stack when this occurs and you run this;
> 
> dtrace -n ''profile-997hz /cpu == 31/ { @[stack()] = count();
}''
> 
Thanks for pointing this out. I ran the probe you specified and attached
are the results (I had to chase the xcalls around a bit, because they
were jumping around cores as I was trying to insert the probes). As I
suspected, the most numerous stack trace is the one which causes cross
calls because of the segkmem_zio_free+0x27 code path. While this was
going on, I was getting between 80k and 300k xcalls on the core in question.

The next most common stack was the one ending in mach_cpu_idle and then,
so I''m not sure why the CPU reported 100% busy (perhaps the xcalls were
very expensive on CPU time compared with the 1273 idle''s).

Cheers,
--
Saso
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xc_call.txt.bz2
Type: application/x-bzip
Size: 7992 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120606/f9eaa233/attachment-0001.bin>

Sašo Kiselkov

2012-Jun-12 13:57 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

Seems the problem is somewhat more egregious than I thought. The xcall
storm causes my network drivers to stop receiving IP multicast packets
and subsequently my recording applications record bad data, so
ultimately, this kind of isn''t workable... I need to somehow resolve
this... I''m running four on-board Broadcom NICs in an LACP
aggregation. Any ideas on why this might be a side-effect? I''m really
kind of out of ideas here...

Cheers,
--
Saso

Sašo Kiselkov

2012-Jun-12 15:13 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/12/2012 03:57 PM, Sa?o Kiselkov wrote:> Seems the problem is somewhat more egregious than I thought. The xcall
> storm causes my network drivers to stop receiving IP multicast packets
> and subsequently my recording applications record bad data, so
> ultimately, this kind of isn''t workable... I need to somehow
resolve
> this... I''m running four on-board Broadcom NICs in an LACP
> aggregation. Any ideas on why this might be a side-effect? I''m
really
> kind of out of ideas here...
> 
> Cheers,
> --
> Saso
Just as another datapoint, though I''m not sure if it''s going
to be much
use, is that I found (via arcstat.pl) that the storms always start
happen when ARC downsizing starts. E.g. I would see the following in
"./arcstat.pl 1":

    Time  read    dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
16:29:45    21       0    0     0    0     0    0   111G  111G
16:29:46     0       0    0     0    0     0    0   111G  111G
16:29:47     1       0    0     0    0     0    0   111G  111G
16:29:48     0       0    0     0    0     0    0   111G  111G
16:29:49    5K       0    0     0    0     0    0   111G  111G
  (this is where the problem starts)
16:29:50    36       0    0     0    0     0    0   109G  107G
16:29:51    51       0    0     0    0     0    0   107G  107G
16:29:52    10       0    0     0    0     0    0   107G  107G
16:29:53   148       0    0     0    0     0    0   107G  107G
16:29:54    5K       0    0     0    0     0    0   107G  107G
  (and after a while, around 10-15 seconds, it stops)

(I omitted the miss and miss% columns to make the rows fit).

During the time, the network stack is dropping input IP multicast UDP
packets like crazy, so I see my network input drop by about 30-40%.
Truly strange behavior...

Cheers,
--
Saso

Matt Breitbach

2012-Jun-12 15:21 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

I saw this _exact_ problem after I bumped ram from 48GB to 192GB.  Low
memory pressure seemed to be the cuplrit.  Happened usually during storage
vmotions or something like that which effectively nullified the data in the
ARC (sometimes 50GB of data would be purged from the ARC).  The system was
so busy that it would drop 10Gbit LACP portchannels from our Nexus 5k stack.
I never got a good solution to this other than to set arc_min_c to something
that was close to what I wanted the system to use - I settled on setting it
at ~160GB.  It still dropped the arcsz, but it didn''t try to adjust
arc_c
and resulted in significantly fewer xcalls.

-Matt Breitbach

-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org
[mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Sa?o Kiselkov
Sent: Tuesday, June 12, 2012 10:14 AM
To: Richard Elling
Cc: zfs-discuss
Subject: Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/12/2012 03:57 PM, Sa?o Kiselkov wrote:> Seems the problem is somewhat more egregious than I thought. The xcall
> storm causes my network drivers to stop receiving IP multicast packets
> and subsequently my recording applications record bad data, so
> ultimately, this kind of isn''t workable... I need to somehow
resolve
> this... I''m running four on-board Broadcom NICs in an LACP
> aggregation. Any ideas on why this might be a side-effect? I''m
really
> kind of out of ideas here...
> 
> Cheers,
> --
> Saso
Just as another datapoint, though I''m not sure if it''s going
to be much
use, is that I found (via arcstat.pl) that the storms always start
happen when ARC downsizing starts. E.g. I would see the following in
"./arcstat.pl 1":

    Time  read    dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
16:29:45    21       0    0     0    0     0    0   111G  111G
16:29:46     0       0    0     0    0     0    0   111G  111G
16:29:47     1       0    0     0    0     0    0   111G  111G
16:29:48     0       0    0     0    0     0    0   111G  111G
16:29:49    5K       0    0     0    0     0    0   111G  111G
  (this is where the problem starts)
16:29:50    36       0    0     0    0     0    0   109G  107G
16:29:51    51       0    0     0    0     0    0   107G  107G
16:29:52    10       0    0     0    0     0    0   107G  107G
16:29:53   148       0    0     0    0     0    0   107G  107G
16:29:54    5K       0    0     0    0     0    0   107G  107G
  (and after a while, around 10-15 seconds, it stops)

(I omitted the miss and miss% columns to make the rows fit).

During the time, the network stack is dropping input IP multicast UDP
packets like crazy, so I see my network input drop by about 30-40%.
Truly strange behavior...

Cheers,
--
Saso
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Roch Bourbonnais

2012-Jun-12 15:37 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

So the xcall are necessary part of memory reclaiming, when one needs to tear
down the TLB entry mapping the physical memory (which can from here on be
repurposed).
So the xcall are just part of this. Should not cause trouble, but they do. They
consume a cpu for some time.

That in turn can cause infrequent latency bubble on the network. A certain root
cause of these latency bubble is that network thread are bound by default and
if the xcall storm ends up on the CPU that the network thread is bound to, it
will wait for the storm to pass.

So try unbinding the mac threads; it may help you here.

Le 12 juin 2012 ? 09:57, Sa?o Kiselkov a ?crit :
> Seems the problem is somewhat more egregious than I thought. The xcall
> storm causes my network drivers to stop receiving IP multicast packets
> and subsequently my recording applications record bad data, so
> ultimately, this kind of isn''t workable... I need to somehow
resolve
> this... I''m running four on-board Broadcom NICs in an LACP
> aggregation. Any ideas on why this might be a side-effect? I''m
really
> kind of out of ideas here...
> 
> Cheers,
> --
> Saso
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Sašo Kiselkov

2012-Jun-12 15:39 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/12/2012 05:21 PM, Matt Breitbach wrote:> I saw this _exact_ problem after I bumped ram from 48GB to 192GB.  Low
> memory pressure seemed to be the cuplrit.  Happened usually during storage
> vmotions or something like that which effectively nullified the data in the
> ARC (sometimes 50GB of data would be purged from the ARC).  The system was
> so busy that it would drop 10Gbit LACP portchannels from our Nexus 5k
stack.
> I never got a good solution to this other than to set arc_min_c to
something
> that was close to what I wanted the system to use - I settled on setting it
> at ~160GB.  It still dropped the arcsz, but it didn''t try to
adjust arc_c
> and resulted in significantly fewer xcalls.
Hmm, how do I do that? I don''t have that kind of symbol in the kernel.
I''m running OpenIndiana build 151a. My system indeed runs at low memory
pressure, I''m simply running a bunch of writers writing files linearly
with data they received IP/UDP multicast sockets.

--
Saso

Sašo Kiselkov

2012-Jun-12 15:52 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/12/2012 05:37 PM, Roch Bourbonnais wrote:> 
> So the xcall are necessary part of memory reclaiming, when one needs to
tear down the TLB entry mapping the physical memory (which can from here on be
repurposed).
> So the xcall are just part of this. Should not cause trouble, but they do.
They consume a cpu for some time.
> 
> That in turn can cause infrequent latency bubble on the network. A certain
root cause of these latency bubble is that network thread are bound by default
and
> if the xcall storm ends up on the CPU that the network thread is bound to,
it will wait for the storm to pass.
I understand, but the xcall storm settles only eats up a single core out
of a total of 32, plus it''s not a single specific one, it tends to
change, so what are the odds of hitting the same core as the one on
which the mac thread is running?
> So try unbinding the mac threads; it may help you here.
How do I do that? All I can find on interrupt fencing and the like is to
simply set certain processors to no-intr, which moves all of the
interrupts and it doesn''t prevent the xcall storm choosing to affect
these CPUs either...

--
Saso

Jim Klimov

2012-Jun-12 16:00 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

2012-06-12 19:52, Sa?o Kiselkov wrote:>> So try unbinding the mac threads; it may help you here.
>
> How do I do that? All I can find on interrupt fencing and the like is to
> simply set certain processors to no-intr, which moves all of the
> interrupts and it doesn''t prevent the xcall storm choosing to
affect
> these CPUs either...
I wonder if creating a CPU set not assigned to any active user,
and setting that CPU set to process (networking) interrupts,
would work or help (see psrset, psradm)?

My 2c,
//Jim Klimov

Jim Mauro

2012-Jun-12 16:06 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

> 
>> So try unbinding the mac threads; it may help you here.
> 
> How do I do that? All I can find on interrupt fencing and the like is to
> simply set certain processors to no-intr, which moves all of the
> interrupts and it doesn''t prevent the xcall storm choosing to
affect
> these CPUs either?
In /etc/system:

set mac:mac_soft_ring_thread_bind=0
set mac:mac_srs_thread_bind=0

Reboot required. Verify after reboot with mdb;

echo "mac_soft_ring_thread_bind/D" | mdb -k

Sašo Kiselkov

2012-Jun-12 16:11 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/12/2012 06:06 PM, Jim Mauro wrote:> 
>>
>>> So try unbinding the mac threads; it may help you here.
>>
>> How do I do that? All I can find on interrupt fencing and the like is
to
>> simply set certain processors to no-intr, which moves all of the
>> interrupts and it doesn''t prevent the xcall storm choosing to
affect
>> these CPUs either?
> 
> In /etc/system:
> 
> set mac:mac_soft_ring_thread_bind=0
> set mac:mac_srs_thread_bind=0
> 
> Reboot required. Verify after reboot with mdb;
> 
> echo "mac_soft_ring_thread_bind/D" | mdb -k
Trying that right now... thanks!

--
Saso

Sašo Kiselkov

2012-Jun-12 16:17 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/12/2012 05:58 PM, Andy Bowers - Performance Engineering
wrote:> find where your nics are bound too
> 
> mdb -k
> ::interrupts
> 
> create a processor set including those cpus [ so just the nic code will
> run there ]
> 
> andy
Tried and didn''t help, unfortunately. I''m still seeing drops.
What''s
even funnier is that I''m seeing drops when the machine is
sync''ing the
txg to the zpool. So looking at a little UDP receiver I can see the
following input stream bandwidth (the stream is constant bitrate, so
this shouldn''t happen):

4.396151 Mbit/s   <- drop
5.217205 Mbit/s
5.144323 Mbit/s
5.150227 Mbit/s
5.144150 Mbit/s
4.663824 Mbit/s   <- drop
5.178603 Mbit/s
5.148681 Mbit/s
5.153835 Mbit/s
5.141116 Mbit/s
4.532479 Mbit/s   <- drop
5.197381 Mbit/s
5.158436 Mbit/s
5.141881 Mbit/s
5.145433 Mbit/s
4.605852 Mbit/s   <- drop
5.183006 Mbit/s
5.150526 Mbit/s
5.149324 Mbit/s
5.142306 Mbit/s
4.749443 Mbit/s   <- drop

(txg timeout on my system is the default 5s)

It isn''t just a slight delay in the arrival of the packets, because
then
I should be seeing a rebound on the bitrate, sort of like this:

 ^ |-----, ,^----, ,^-----, ,^----
 B |      v       v        v
   |
   +------------------------------
                            t ->

Instead, what I''m seeing is simply:

 ^ |-----, ,-----, ,------, ,-----
 B |      v       v        v
   |
   +------------------------------
                            t ->

(The missing spikes after the drops means that there were lost packets
on the NIC.)

--
Saso

Mike Gerdts

2012-Jun-12 16:52 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On Tue, Jun 12, 2012 at 11:17 AM, Sa?o Kiselkov <skiselkov.ml at
gmail.com> wrote:> On 06/12/2012 05:58 PM, Andy Bowers - Performance Engineering wrote:
>> find where your nics are bound too
>>
>> mdb -k
>> ::interrupts
>>
>> create a processor set including those cpus [ so just the nic code will
>> run there ]
>>
>> andy
>
> Tried and didn''t help, unfortunately. I''m still seeing
drops. What''s
> even funnier is that I''m seeing drops when the machine is
sync''ing the
> txg to the zpool. So looking at a little UDP receiver I can see the
> following input stream bandwidth (the stream is constant bitrate, so
> this shouldn''t happen):
If processing in interrupt context (use intrstat) is dominating cpu
usage, you may be able to use pcitool to cause the device generating
all of those expensive interrupts to be moved to another CPU.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Roch Bourbonnais

2012-Jun-12 17:19 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

Try with this /etc/system tunings : 

set mac:mac_soft_ring_thread_bind=0
set mac:mac_srs_thread_bind=0
set zfs:zio_taskq_batch_pct=50


Le 12 juin 2012 ? 11:37, Roch Bourbonnais a ?crit :
> 
> So the xcall are necessary part of memory reclaiming, when one needs to
tear down the TLB entry mapping the physical memory (which can from here on be
repurposed).
> So the xcall are just part of this. Should not cause trouble, but they do.
They consume a cpu for some time.
> 
> That in turn can cause infrequent latency bubble on the network. A certain
root cause of these latency bubble is that network thread are bound by default
and
> if the xcall storm ends up on the CPU that the network thread is bound to,
it will wait for the storm to pass.
> 
> So try unbinding the mac threads; it may help you here.
> 
> Le 12 juin 2012 ? 09:57, Sa?o Kiselkov a ?crit :
> 
>> Seems the problem is somewhat more egregious than I thought. The xcall
>> storm causes my network drivers to stop receiving IP multicast packets
>> and subsequently my recording applications record bad data, so
>> ultimately, this kind of isn''t workable... I need to somehow
resolve
>> this... I''m running four on-board Broadcom NICs in an LACP
>> aggregation. Any ideas on why this might be a side-effect? I''m
really
>> kind of out of ideas here...
>> 
>> Cheers,
>> --
>> Saso
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Sašo Kiselkov

2012-Jun-12 20:32 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/12/2012 07:19 PM, Roch Bourbonnais wrote:> 
> Try with this /etc/system tunings :
> 
> set mac:mac_soft_ring_thread_bind=0 set mac:mac_srs_thread_bind=0 
> set zfs:zio_taskq_batch_pct=50
> 
Thanks for the recommendations, I''ll try and see whether it helps, but
this is going to take me a while (especially since the reboot means
I''ll have a clear ARC and need to record up again around 120G of data,
which takes a while to accumulate).

Cheers,
--
Saso

Roch

2012-Jun-13 13:43 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

Sa?o Kiselkov writes:
 > On 06/12/2012 05:37 PM, Roch Bourbonnais wrote:
 > > 
 > > So the xcall are necessary part of memory reclaiming, when one needs
to tear down the TLB entry mapping the physical memory (which can from here on
be repurposed).
 > > So the xcall are just part of this. Should not cause trouble, but
they do. They consume a cpu for some time.
 > > 
 > > That in turn can cause infrequent latency bubble on the network. A
certain root cause of these latency bubble is that network thread are bound by
default and
 > > if the xcall storm ends up on the CPU that the network thread is
bound to, it will wait for the storm to pass.
 > 
 > I understand, but the xcall storm settles only eats up a single core out
 > of a total of 32, plus it''s not a single specific one, it tends
to
 > change, so what are the odds of hitting the same core as the one on
 > which the mac thread is running?
 > 

That''s easy :-) : 1/32 each time it needs to run. So depending on how
often it runs (which depends on how
much churn there is in the ARC) and how often you see the latency bubbles, that
may or may
not be it.

What is zio_taskq_batch_pct on your system ? That is another storm bit of code
which
causes bubble. Setting it down to 50 (versus an older default of 100) should
help if it''s
not done already.

-r

 > > So try unbinding the mac threads; it may help you here.
 > 
 > How do I do that? All I can find on interrupt fencing and the like is to
 > simply set certain processors to no-intr, which moves all of the
 > interrupts and it doesn''t prevent the xcall storm choosing to
affect
 > these CPUs either...
 > 
 > --
 > Saso

Sašo Kiselkov

2012-Jun-17 21:41 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/13/2012 03:43 PM, Roch wrote:> 
> Sa?o Kiselkov writes:
>  > On 06/12/2012 05:37 PM, Roch Bourbonnais wrote:
>  > > 
>  > > So the xcall are necessary part of memory reclaiming, when one
needs to tear down the TLB entry mapping the physical memory (which can from
here on be repurposed).
>  > > So the xcall are just part of this. Should not cause trouble,
but they do. They consume a cpu for some time.
>  > > 
>  > > That in turn can cause infrequent latency bubble on the network.
A certain root cause of these latency bubble is that network thread are bound by
default and
>  > > if the xcall storm ends up on the CPU that the network thread is
bound to, it will wait for the storm to pass.
>  > 
>  > I understand, but the xcall storm settles only eats up a single core
out
>  > of a total of 32, plus it''s not a single specific one, it
tends to
>  > change, so what are the odds of hitting the same core as the one on
>  > which the mac thread is running?
>  > 
> 
> That''s easy :-) : 1/32 each time it needs to run. So depending on
how often it runs (which depends on how
> much churn there is in the ARC) and how often you see the latency bubbles,
that may or may
> not be it.
> 
> What is zio_taskq_batch_pct on your system ? That is another storm bit of
code which
> causes bubble. Setting it down to 50 (versus an older default of 100)
should help if it''s
> not done already.
> 
> -r
So I tried all of the suggestions above (mac unbinding, zio_taskq
tuning) and none helped. I''m beginning to suspect it has something to
do
with the networking cards. When I try and snoop filtered traffic from
one interface into a file ("snoop -o /tmp/dump -rd vlan935 host
a.b.c.d"), my multicast reception throughput plummets to about 1/3 of
the original.

I''m running a link-aggregation of 4 on-board Broadcom NICs:

# dladm show-aggr -x
LINK     PORT     SPEED DUPLEX   STATE     ADDRESS            PORTSTATE
aggr0    --       1000Mb full    up        d0:67:e5:fc:bd:38  --
         bnx1     1000Mb full    up        d0:67:e5:fc:bd:38  attached
         bnx2     1000Mb full    up        d0:67:e5:fc:bd:3a  attached
         bnx3     1000Mb full    up        d0:67:e5:fc:bd:3c  attached
         bnx0     1000Mb full    up        d0:67:e5:fc:bd:36  attached

# dladm show-vlan
LINK            VID      OVER         FLAGS
vlan49          49       aggr0        -----
vlan934         934      aggr0        -----
vlan935         935      aggr0        -----

Normally, I''m getting around 46MB/s on vlan935, however, once I run any
snoop command which puts the network interfaces into promisc mode, my
throughput plummets to around 20MB/s. During that I can see context
switches skyrocket on 4 CPU cores and them being around 75% busy. Now I
understand that snoop has some probe effect, but this is definitely too
large. I''ve never seen this kind of bad behavior before on any of my
other Solaris systems (with similar load).

Are there any tunings I can make to my network to track down the issue?
My module for bnx is:

# modinfo | grep bnx
169 fffffffff80a7000  63ba0 197   1  bnx (Broadcom NXII GbE 6.0.1)

Regards,
--
Saso

Sašo Kiselkov

2012-Jun-18 19:45 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/18/2012 12:05 AM, Richard Elling wrote:> You might try some of the troubleshooting techniques described in Chapter 5
> of the DTtrace book by Brendan Gregg and Jim Mauro. It is not clear from
your
> description that you are seeing the same symptoms, but the technique should
> apply.
>  -- richard
Thanks for the advice, I''ll try it. In the mean time, I''m
beginning to
suspect I''m hitting some PCI-e issue on the Dell R715 machine. Looking
at

# mdb -k
::interrupts
IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# ISR(s)
.[snip]
91   0x82 7   PCI    Edg MSI    5   1     -         pcieb_intr_handler
.[snip].

In mpstat I can see that during normal operation, CPU 5 is nearly floored:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  5    0   0    0   512    0 1054    0    0  870    0     0    0  93   0   7

Then, when anything hits which disturbs the PCI-e bus (e.g. a txg flush
or the xcall storm), the CPU goes to 100% utilization and my networking
throughput drops accordingly. The issue can be softened by lowering the
input bandwidth from ~46MB/s to below 20MB/s - at that point I''m
getting
only about 10% utilization on the core in question and no xcall storm or
txg flush can influence my network (though I do see the CPU get about
70% busy during the process, but still enough left to avoid packet loss).

So it seems, I''m hitting some hardware design issue, or something...
I''ll try moving my network card to the second PCI-e I/O bridge tomorrow
(which seems to be bound to CPU 6).

Any other ideas on what I might try to get the PCI-e I/O bridge
bandwidth back? Or how to fight the starvation of the CPU by other
activities in the system? (xcalls and/or txg flushes) I already tried
putting the CPUs in question into an empty processor set, but that
isn''t
enough, it seems.

--
Saso

Sašo Kiselkov

2012-Jun-19 15:12 UTC

head link

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/19/2012 11:05 AM, Sa?o Kiselkov wrote:> On 06/18/2012 07:50 PM, Roch wrote:
>>
>> Are we hitting :
>> 	7167903 Configuring VLANs results in single threaded soft ring fanout
> 
> Confirmed, it is definitely this.
Hold the phone, I just tried unconfiguring all of the VLANs in the
system and went to pure interfaces and it didn''t help. So while the
issue stems from the soft ring fanout, it''s probably not caused by
VLANs. Thanks for the pointers anyway, though.

--
Saso

Reasonably Related Threads

Search for more reasonably related threads

zfs discuss - Jun 2012 - Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

[zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

Reasonably Related Threads