Jim Leonard
2009-Sep-23 15:26 UTC
[zfs-discuss] [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
> The only thing that jumps out at me is the ARC size - > 53.4GB, or > most of your 64GB of RAM. This in-and-of-itself is > not necessarily > a bad thing - if there are no other memory consumers, > let ZFS cache > data in the ARC. But if something is coming along to > flush dirty > ARC pages periodically....The workload is a set of 50 python processes, each receiving a stream of data via TCP/IP. The processes run until they notice something interesting in the stream (sorry I can''t be more specific), then they connect to a server via TCP/IP and issue a command or two. Log files are written that takes up about 50M per day per process. It''s relatively low-traffic.> I found what looked to be an applicable bug; > CR 6699438 zfs induces crosscall storm under heavy > mapped sequential > read workload > but the stack signature for the above bug is > different than yours, and > it doesn''t sound like your workload is doing mmap''d > sequential reads. > That said, I would be curious to know if your > workload used mmap(), > versus read/write?I asked and they couldn''t say. It''s python so I think it''s unlikely.> For the ZFS folks just seeing this, here''s the stack > frame; > > unix`xc_do_call+0x8f > unix`xc_wait_sync+0x36 > unix`x86pte_invalidate_pfn+0x135 > unix`hat_pte_unmap+0xa9 > unix`hat_unload_callback+0x109 > unix`hat_unload+0x2a > unix`segkmem_free_vn+0x82 > unix`segkmem_zio_free+0x10 > genunix`vmem_xfree+0xee > genunix`vmem_free+0x28 > genunix`kmem_slab_destroy+0x80 > genunix`kmem_slab_free+0x1be > genunix`kmem_magazine_destroy+0x54 > genunix`kmem_depot_ws_reap+0x4d > genunix`taskq_thread+0xbc > unix`thread_start+0x8 > > Let''s see what the fsstat and zpool iostat data looks > like when this > starts happening..Both are unremarkable, I''m afraid. Here''s the fsstat from when it starts happening: new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 75 0 0 0 0 0 10 1.25M zfs 0 0 0 83 0 0 0 0 0 7 896K zfs 0 0 0 78 0 0 0 0 0 13 1.62M zfs 0 0 0 229 0 0 0 0 0 29 3.62M zfs 0 0 0 217 0 0 0 0 0 28 3.37M zfs 0 0 0 212 0 0 0 0 0 26 3.03M zfs 0 0 0 151 0 0 0 0 0 18 2.07M zfs 0 0 0 184 0 0 0 0 0 31 3.41M zfs 0 0 0 187 0 0 0 0 0 32 2.74M zfs 0 0 0 219 0 0 0 0 0 24 2.61M zfs 0 0 0 222 0 0 0 0 0 29 3.29M zfs 0 0 0 206 0 0 0 0 0 29 3.26M zfs 0 0 0 205 0 0 0 0 0 19 2.26M zfs -- This message posted from opensolaris.org
Jim Mauro
2009-Sep-26 02:20 UTC
[zfs-discuss] [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)
Hey Jim - There''s something we''re missing here. There does not appear to be enough ZFS write activity to cause the system to pause regularly. Were you able to capture a kernel profile during the pause period? Thanks, /jim Jim Leonard wrote:>> The only thing that jumps out at me is the ARC size - >> 53.4GB, or >> most of your 64GB of RAM. This in-and-of-itself is >> not necessarily >> a bad thing - if there are no other memory consumers, >> let ZFS cache >> data in the ARC. But if something is coming along to >> flush dirty >> ARC pages periodically.... >> > > The workload is a set of 50 python processes, each receiving a stream of data via TCP/IP. The processes run until they notice something interesting in the stream (sorry I can''t be more specific), then they connect to a server via TCP/IP and issue a command or two. Log files are written that takes up about 50M per day per process. It''s relatively low-traffic. > > >> I found what looked to be an applicable bug; >> CR 6699438 zfs induces crosscall storm under heavy >> mapped sequential >> read workload >> but the stack signature for the above bug is >> different than yours, and >> it doesn''t sound like your workload is doing mmap''d >> sequential reads. >> That said, I would be curious to know if your >> workload used mmap(), >> versus read/write? >> > > I asked and they couldn''t say. It''s python so I think it''s unlikely. > > >> For the ZFS folks just seeing this, here''s the stack >> frame; >> >> unix`xc_do_call+0x8f >> unix`xc_wait_sync+0x36 >> unix`x86pte_invalidate_pfn+0x135 >> unix`hat_pte_unmap+0xa9 >> unix`hat_unload_callback+0x109 >> unix`hat_unload+0x2a >> unix`segkmem_free_vn+0x82 >> unix`segkmem_zio_free+0x10 >> genunix`vmem_xfree+0xee >> genunix`vmem_free+0x28 >> genunix`kmem_slab_destroy+0x80 >> genunix`kmem_slab_free+0x1be >> genunix`kmem_magazine_destroy+0x54 >> genunix`kmem_depot_ws_reap+0x4d >> genunix`taskq_thread+0xbc >> unix`thread_start+0x8 >> >> Let''s see what the fsstat and zpool iostat data looks >> like when this >> starts happening.. >> > > Both are unremarkable, I''m afraid. Here''s the fsstat from when it starts happening: > > new name name attr attr lookup rddir read read write write > file remov chng get set ops ops ops bytes ops bytes > 0 0 0 75 0 0 0 0 0 10 1.25M zfs > 0 0 0 83 0 0 0 0 0 7 896K zfs > 0 0 0 78 0 0 0 0 0 13 1.62M zfs > 0 0 0 229 0 0 0 0 0 29 3.62M zfs > 0 0 0 217 0 0 0 0 0 28 3.37M zfs > 0 0 0 212 0 0 0 0 0 26 3.03M zfs > 0 0 0 151 0 0 0 0 0 18 2.07M zfs > 0 0 0 184 0 0 0 0 0 31 3.41M zfs > 0 0 0 187 0 0 0 0 0 32 2.74M zfs > 0 0 0 219 0 0 0 0 0 24 2.61M zfs > 0 0 0 222 0 0 0 0 0 29 3.29M zfs > 0 0 0 206 0 0 0 0 0 29 3.26M zfs > 0 0 0 205 0 0 0 0 0 19 2.26M zfs >