Jim Leonard
2009-Sep-22 18:30 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
We have a 16-core x86 system that, at seemingly random intervals, will completely stop responding for several seconds. Running an mpstat 1 showed something horrifiying: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1004691 397 170 0 0 0 5 0 0 0 100 0 0 (rest of CPUs omitted) That''s over a million cross-calls a second. Seeing them on CPU0 made me nervous that they were kernel-related, so I wrote a dtrace to print out xcalls per second aggregated by PID to see if a specific process was the culprit. Here''s the output during another random system outage: 2009 Sep 22 12:51:49, load average: 5.90, 5.35, 5.39 xcalls: 637511 PID XCALLCOUNT 6164 15 6165 15 28339 26 0 637455 PID 0 is "sched" (aka the kernel). At this point I''m completely stumped as to what could be causing this. Any hints or ideas? -- This message posted from opensolaris.org
Jim Mauro
2009-Sep-22 18:45 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
dtrace -n '':::xcalls { @s[stack()] = count() } tick-1sec { trunc(@s,10); printa(@s); clear(@s); }'' That will tell us where the xcalls are coming from in the kernel, and we can go from there. Thanks, /jim Jim Leonard wrote:> We have a 16-core x86 system that, at seemingly random intervals, will completely stop responding for several seconds. Running an mpstat 1 showed something horrifiying: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 0 0 0 1004691 397 170 0 0 0 5 0 0 0 100 0 0 > (rest of CPUs omitted) > > That''s over a million cross-calls a second. Seeing them on CPU0 made me nervous that they were kernel-related, so I wrote a dtrace to print out xcalls per second aggregated by PID to see if a specific process was the culprit. Here''s the output during another random system outage: > > 2009 Sep 22 12:51:49, load average: 5.90, 5.35, 5.39 xcalls: 637511 > > PID XCALLCOUNT > 6164 15 > 6165 15 > 28339 26 > 0 637455 > > PID 0 is "sched" (aka the kernel). > > At this point I''m completely stumped as to what could be causing this. Any hints or ideas? >
James Litchfield
2009-Sep-22 19:20 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
Jim Mauro has provided an excellent starting point. Keep in mind that kernel threads will show up as pid 0 so you may be seeing a kernel thread Causing the activity. Jim L ----------Original Message---------- From: Jim Leonard <trixter at oldskool.org> Sent: Tue, September 22, 2009 11:31 AM To: dtrace-discuss at opensolaris.org Subject: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided) We have a 16-core x86 system that, at seemingly random intervals, will completely stop responding for several seconds. Running an mpstat 1 showed something horrifiying: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1004691 397 170 0 0 0 5 0 0 0 100 0 0 (rest of CPUs omitted) That''s over a million cross-calls a second. Seeing them on CPU0 made me nervous that they were kernel-related, so I wrote a dtrace to print out xcalls per second aggregated by PID to see if a specific process was the culprit. Here''s the output during another random system outage: 2009 Sep 22 12:51:49, load average: 5.90, 5.35, 5.39 xcalls: 637511 PID XCALLCOUNT 6164 15 6165 15 28339 26 0 637455 PID 0 is "sched" (aka the kernel). At this point I''m completely stumped as to what could be causing this. Any hints or ideas? -- This message posted from opensolaris.org _______________________________________________ dtrace-discuss mailing list dtrace-discuss at opensolaris.org
Jim Leonard
2009-Sep-22 19:29 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
Thanks for the awesome two-liner, I''d been struggling with 1-second intervals without a full-blown script. I modified it to output walltime so that I could zoom in on the problem, and here it is: unix`xc_do_call+0x8f unix`xc_wait_sync+0x36 unix`x86pte_invalidate_pfn+0x135 unix`hat_pte_unmap+0xa9 unix`hat_unload_callback+0x109 unix`hat_unload+0x2a unix`segkmem_free_vn+0x82 unix`segkmem_zio_free+0x10 genunix`vmem_xfree+0xee genunix`vmem_free+0x28 genunix`kmem_slab_destroy+0x80 genunix`kmem_slab_free+0x1be genunix`kmem_magazine_destroy+0x54 genunix`kmem_depot_ws_reap+0x4d genunix`taskq_thread+0xbc unix`thread_start+0x8 418560 Any ideas? -- This message posted from opensolaris.org
Dan Mick
2009-Sep-22 22:26 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
zfs is busy? Jim Leonard wrote:> Thanks for the awesome two-liner, I''d been struggling with 1-second intervals without a full-blown script. > > I modified it to output walltime so that I could zoom in on the problem, and here it is: > > unix`xc_do_call+0x8f > unix`xc_wait_sync+0x36 > unix`x86pte_invalidate_pfn+0x135 > unix`hat_pte_unmap+0xa9 > unix`hat_unload_callback+0x109 > unix`hat_unload+0x2a > unix`segkmem_free_vn+0x82 > unix`segkmem_zio_free+0x10 > genunix`vmem_xfree+0xee > genunix`vmem_free+0x28 > genunix`kmem_slab_destroy+0x80 > genunix`kmem_slab_free+0x1be > genunix`kmem_magazine_destroy+0x54 > genunix`kmem_depot_ws_reap+0x4d > genunix`taskq_thread+0xbc > unix`thread_start+0x8 > 418560 > > Any ideas?
Jim Mauro
2009-Sep-22 23:36 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
As Dan said, it looks like ZFS is busy. How much RAM is on this system? What release of Solaris? Do you have any ZFS tweaks in /etc/system? (like clamping the ARC size...) Is the system memory constrained? The xcalls are due to the page unmaps out of what I''m assuming is the ZFS ARC (although I''m not 100% sure about that - I need to check the source for segkmem_zio_free()) Can you gather some ZFS IO statistics, like "fsstat zfs 1" for a minute or so. Thanks, /jim Jim Leonard wrote:> Thanks for the awesome two-liner, I''d been struggling with 1-second intervals without a full-blown script. > > I modified it to output walltime so that I could zoom in on the problem, and here it is: > > unix`xc_do_call+0x8f > unix`xc_wait_sync+0x36 > unix`x86pte_invalidate_pfn+0x135 > unix`hat_pte_unmap+0xa9 > unix`hat_unload_callback+0x109 > unix`hat_unload+0x2a > unix`segkmem_free_vn+0x82 > unix`segkmem_zio_free+0x10 > genunix`vmem_xfree+0xee > genunix`vmem_free+0x28 > genunix`kmem_slab_destroy+0x80 > genunix`kmem_slab_free+0x1be > genunix`kmem_magazine_destroy+0x54 > genunix`kmem_depot_ws_reap+0x4d > genunix`taskq_thread+0xbc > unix`thread_start+0x8 > 418560 > > Any ideas? >
Jim Mauro
2009-Sep-22 23:40 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
It would also be interesting to see some snapshots of the ZFS arc kstats kstat -n arcstats Thanks Jim Leonard wrote:> Thanks for the awesome two-liner, I''d been struggling with 1-second intervals without a full-blown script. > > I modified it to output walltime so that I could zoom in on the problem, and here it is: > > unix`xc_do_call+0x8f > unix`xc_wait_sync+0x36 > unix`x86pte_invalidate_pfn+0x135 > unix`hat_pte_unmap+0xa9 > unix`hat_unload_callback+0x109 > unix`hat_unload+0x2a > unix`segkmem_free_vn+0x82 > unix`segkmem_zio_free+0x10 > genunix`vmem_xfree+0xee > genunix`vmem_free+0x28 > genunix`kmem_slab_destroy+0x80 > genunix`kmem_slab_free+0x1be > genunix`kmem_magazine_destroy+0x54 > genunix`kmem_depot_ws_reap+0x4d > genunix`taskq_thread+0xbc > unix`thread_start+0x8 > 418560 > > Any ideas? >
Jim Leonard
2009-Sep-23 12:34 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
> As Dan said, it looks like ZFS is busy.That''s very odd, as the system isn''t doing anything I/O heavy. It has only a single zpool of five devices in a raidz serving a single filesystem, and that filesystem only writes logs at a rate of about 10MB/s. ZFS compression is turned off.> How much RAM is on this system? > What release of Solaris?The system is a Sun Fire X4440 with 16 cores (4 x Quad-Core AMD Opteron(tm) Processor 8356s running at 2.3Ghz) with 64G of RAM. It is running sol10update5, kernel patch level 138889-06.> Do you have any ZFS tweaks in /etc/system? > (like clamping the ARC size...) > Is the system memory constrained?Nothing in /etc/system. System memory is not strained (swap is never touched).> The xcalls are due to the page unmaps out of > what I''m assuming is the ZFS ARC (although > I''m not 100% sure about that - I need to check > the source for segkmem_zio_free()) > > Can you gather some ZFS IO statistics, like > "fsstat zfs 1" for a minute or so.I''ll have to wait until later in the day when this usually starts happening, and then I''ll post back with results. Thanks to everyone for their help thus far. It will be very interesting to see what fsstat shows. -- This message posted from opensolaris.org
Jim Leonard
2009-Sep-23 13:22 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
> It would also be interesting to see some snapshots > of the ZFS arc kstats > > kstat -n arcstatsHere you go, although I didn''t see anything jump out (massive amounts of cache misses or something). Any immediate trouble spot? # kstat -n arcstats module: zfs instance: 0 name: arcstats class: misc c 53490612870 c_max 67636535296 c_min 8454566912 crtime 212.955493179 deleted 7267003 demand_data_hits 179708165 demand_data_misses 189797 demand_metadata_hits 9959277 demand_metadata_misses 194228 evict_skip 1709 hash_chain_max 9 hash_chains 205513 hash_collisions 9372169 hash_elements 851634 hash_elements_max 886509 hdr_size 143082240 hits 198822346 l2_abort_lowmem 0 l2_cksum_bad 0 l2_evict_lock_retry 0 l2_evict_reading 0 l2_feeds 0 l2_free_on_write 0 l2_hdr_size 0 l2_hits 0 l2_io_error 0 l2_misses 0 l2_rw_clash 0 l2_size 0 l2_writes_done 0 l2_writes_error 0 l2_writes_hdr_miss 0 l2_writes_sent 0 memory_throttle_count 0 mfu_ghost_hits 236508 mfu_hits 165895558 misses 388618 mru_ghost_hits 70149 mru_hits 24777390 mutex_miss 6094 p 49175731760 prefetch_data_hits 7993639 prefetch_data_misses 370 prefetch_metadata_hits 1161265 prefetch_metadata_misses 4223 recycle_miss 7149 size 53490565328 snaptime 5759009.53378144 -- This message posted from opensolaris.org
Jim Mauro
2009-Sep-23 14:11 UTC
[zfs-discuss] [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
I''m cross-posting to zfs-discuss, as this is now more of a ZFS query than a dtrace query at this point, and I''m not sure if all the ZFS experts are listening on dtrace-discuss (although they probably are... :^). The only thing that jumps out at me is the ARC size - 53.4GB, or most of your 64GB of RAM. This in-and-of-itself is not necessarily a bad thing - if there are no other memory consumers, let ZFS cache data in the ARC. But if something is coming along to flush dirty ARC pages periodically.... I found what looked to be an applicable bug; CR 6699438 zfs induces crosscall storm under heavy mapped sequential read workload but the stack signature for the above bug is different than yours, and it doesn''t sound like your workload is doing mmap''d sequential reads. That said, I would be curious to know if your workload used mmap(), versus read/write? For the ZFS folks just seeing this, here''s the stack frame; unix`xc_do_call+0x8f unix`xc_wait_sync+0x36 unix`x86pte_invalidate_pfn+0x135 unix`hat_pte_unmap+0xa9 unix`hat_unload_callback+0x109 unix`hat_unload+0x2a unix`segkmem_free_vn+0x82 unix`segkmem_zio_free+0x10 genunix`vmem_xfree+0xee genunix`vmem_free+0x28 genunix`kmem_slab_destroy+0x80 genunix`kmem_slab_free+0x1be genunix`kmem_magazine_destroy+0x54 genunix`kmem_depot_ws_reap+0x4d genunix`taskq_thread+0xbc unix`thread_start+0x8 Let''s see what the fsstat and zpool iostat data looks like when this starts happening.. Thanks, /jim Jim Leonard wrote:>> It would also be interesting to see some snapshots >> of the ZFS arc kstats >> >> kstat -n arcstats >> > > Here you go, although I didn''t see anything jump out (massive amounts of cache misses or something). Any immediate trouble spot? > > # kstat -n arcstats > module: zfs instance: 0 > name: arcstats class: misc > c 53490612870 > c_max 67636535296 > c_min 8454566912 > crtime 212.955493179 > deleted 7267003 > demand_data_hits 179708165 > demand_data_misses 189797 > demand_metadata_hits 9959277 > demand_metadata_misses 194228 > evict_skip 1709 > hash_chain_max 9 > hash_chains 205513 > hash_collisions 9372169 > hash_elements 851634 > hash_elements_max 886509 > hdr_size 143082240 > hits 198822346 > l2_abort_lowmem 0 > l2_cksum_bad 0 > l2_evict_lock_retry 0 > l2_evict_reading 0 > l2_feeds 0 > l2_free_on_write 0 > l2_hdr_size 0 > l2_hits 0 > l2_io_error 0 > l2_misses 0 > l2_rw_clash 0 > l2_size 0 > l2_writes_done 0 > l2_writes_error 0 > l2_writes_hdr_miss 0 > l2_writes_sent 0 > memory_throttle_count 0 > mfu_ghost_hits 236508 > mfu_hits 165895558 > misses 388618 > mru_ghost_hits 70149 > mru_hits 24777390 > mutex_miss 6094 > p 49175731760 > prefetch_data_hits 7993639 > prefetch_data_misses 370 > prefetch_metadata_hits 1161265 > prefetch_metadata_misses 4223 > recycle_miss 7149 > size 53490565328 > snaptime 5759009.53378144 >
Jim Leonard
2009-Sep-23 14:22 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
> > Can you gather some ZFS IO statistics, like > > "fsstat zfs 1" for a minute or so.Here is a snapshot from when it is exhibiting the behavior: new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 75 0 0 0 0 0 10 1.25M zfs 0 0 0 83 0 0 0 0 0 7 896K zfs 0 0 0 78 0 0 0 0 0 13 1.62M zfs 0 0 0 229 0 0 0 0 0 29 3.62M zfs 0 0 0 217 0 0 0 0 0 28 3.37M zfs 0 0 0 212 0 0 0 0 0 26 3.03M zfs 0 0 0 151 0 0 0 0 0 18 2.07M zfs 0 0 0 184 0 0 0 0 0 31 3.41M zfs 0 0 0 187 0 0 0 0 0 32 2.74M zfs 0 0 0 219 0 0 0 0 0 24 2.61M zfs 0 0 0 222 0 0 0 0 0 29 3.29M zfs 0 0 0 206 0 0 0 0 0 29 3.26M zfs 0 0 0 205 0 0 0 0 0 19 2.26M zfs Unless attr_get is ludicrously costly, I can''t see any issues...? -- This message posted from opensolaris.org
Jim Mauro
2009-Sep-23 14:41 UTC
[zfs-discuss] [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
(posted to zfs-discuss) Hmmm...this is nothing in terms of load. So you say that the system becomes sluggish/unresponsive periodically, and you noticed the xcall storm when that happens, correct? Refresh my memory - what is the frequency and duration of the sluggish cycles? Could you capture a kernel profile during a sluggish cycle; #dtrace -n ''profile-997hz / arg0 && curthread->t_pri != -1 / { @[stack()]=count(); } tick-1sec { trunc(@,10); printa(@); clear(@); }'' And/or - #lockstat -i997 -kIW -s 10 sleep 30 > lockstat.kprof.out And #lockstat -Cc sleep 30 > lockstat.locks.out Thanks, /jim Jim Leonard wrote:>>> Can you gather some ZFS IO statistics, like >>> "fsstat zfs 1" for a minute or so. >>> > > Here is a snapshot from when it is exhibiting the behavior: > > new name name attr attr lookup rddir read read write write > file remov chng get set ops ops ops bytes ops bytes > 0 0 0 75 0 0 0 0 0 10 1.25M zfs > 0 0 0 83 0 0 0 0 0 7 896K zfs > 0 0 0 78 0 0 0 0 0 13 1.62M zfs > 0 0 0 229 0 0 0 0 0 29 3.62M zfs > 0 0 0 217 0 0 0 0 0 28 3.37M zfs > 0 0 0 212 0 0 0 0 0 26 3.03M zfs > 0 0 0 151 0 0 0 0 0 18 2.07M zfs > 0 0 0 184 0 0 0 0 0 31 3.41M zfs > 0 0 0 187 0 0 0 0 0 32 2.74M zfs > 0 0 0 219 0 0 0 0 0 24 2.61M zfs > 0 0 0 222 0 0 0 0 0 29 3.29M zfs > 0 0 0 206 0 0 0 0 0 29 3.26M zfs > 0 0 0 205 0 0 0 0 0 19 2.26M zfs > > Unless attr_get is ludicrously costly, I can''t see any issues...? >
Jim Leonard
2009-Sep-23 15:26 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
> The only thing that jumps out at me is the ARC size - > 53.4GB, or > most of your 64GB of RAM. This in-and-of-itself is > not necessarily > a bad thing - if there are no other memory consumers, > let ZFS cache > data in the ARC. But if something is coming along to > flush dirty > ARC pages periodically....The workload is a set of 50 python processes, each receiving a stream of data via TCP/IP. The processes run until they notice something interesting in the stream (sorry I can''t be more specific), then they connect to a server via TCP/IP and issue a command or two. Log files are written that takes up about 50M per day per process. It''s relatively low-traffic.> I found what looked to be an applicable bug; > CR 6699438 zfs induces crosscall storm under heavy > mapped sequential > read workload > but the stack signature for the above bug is > different than yours, and > it doesn''t sound like your workload is doing mmap''d > sequential reads. > That said, I would be curious to know if your > workload used mmap(), > versus read/write?I asked and they couldn''t say. It''s python so I think it''s unlikely.> For the ZFS folks just seeing this, here''s the stack > frame; > > unix`xc_do_call+0x8f > unix`xc_wait_sync+0x36 > unix`x86pte_invalidate_pfn+0x135 > unix`hat_pte_unmap+0xa9 > unix`hat_unload_callback+0x109 > unix`hat_unload+0x2a > unix`segkmem_free_vn+0x82 > unix`segkmem_zio_free+0x10 > genunix`vmem_xfree+0xee > genunix`vmem_free+0x28 > genunix`kmem_slab_destroy+0x80 > genunix`kmem_slab_free+0x1be > genunix`kmem_magazine_destroy+0x54 > genunix`kmem_depot_ws_reap+0x4d > genunix`taskq_thread+0xbc > unix`thread_start+0x8 > > Let''s see what the fsstat and zpool iostat data looks > like when this > starts happening..Both are unremarkable, I''m afraid. Here''s the fsstat from when it starts happening: new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 75 0 0 0 0 0 10 1.25M zfs 0 0 0 83 0 0 0 0 0 7 896K zfs 0 0 0 78 0 0 0 0 0 13 1.62M zfs 0 0 0 229 0 0 0 0 0 29 3.62M zfs 0 0 0 217 0 0 0 0 0 28 3.37M zfs 0 0 0 212 0 0 0 0 0 26 3.03M zfs 0 0 0 151 0 0 0 0 0 18 2.07M zfs 0 0 0 184 0 0 0 0 0 31 3.41M zfs 0 0 0 187 0 0 0 0 0 32 2.74M zfs 0 0 0 219 0 0 0 0 0 24 2.61M zfs 0 0 0 222 0 0 0 0 0 29 3.29M zfs 0 0 0 206 0 0 0 0 0 29 3.26M zfs 0 0 0 205 0 0 0 0 0 19 2.26M zfs -- This message posted from opensolaris.org
Jim Leonard
2009-Nov-13 21:27 UTC
[dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
Sorry for the ludicriously late reply, but I figured out what the problem eventually was. While it was showing up in the ZFS code, the problem was actually fsflush. The machine has 64G of RAM and when memory started to get active, the scanning took a solid 7+ seconds to complete. Setting "autoup = 600" in /etc/system appeared to clear up the problem. Thanks to everyone for the help! -- This message posted from opensolaris.org