Dave Pratt
2007-Nov-06 22:08 UTC
[zfs-code] High sys utilization with zfs mirror (on file create?)
Sorry if this is not the correct place for this question/inquiry, but I''m now stuck in a hard place as I can''t get the i/o performance I want using UFS and ZFS seems to cause blocking on network operations during even moderate levels of file creation operations. At any rate here goes: Running Sol10 8/07 with the full compliment of 10_reccomended patches on 6-core T1000s. Running our custom email server application which, not surprisingly, hits local disk very hard with many small file creates, writes, moves, deletes. On traditional UFS (on solid state disks with disksuite mirrored FS), we had no system problems at full load. On ZFS (on SAS 144gb drives), with a mirrored pool like such: pool: base state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM base ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s3 ONLINE 0 0 0 c0t1d0s3 ONLINE 0 0 0 errors: No known data errors ----------------- NAME USED AVAIL REFER MOUNTPOINT base 1.20G 105G 1.50K none base/spool 184M 105G 184M legacy base/usr 989M 105G 989M legacy base/var 48.1M 105G 48.1M legacy We start showing serious kernel hiccups at low load -- pauses of 1-40 seconds where there''s no external activity -- network nonresponsive, console nonresponsive, all statistics just show a gap (e.g. vmstat skips n seconds of data). Right before this happens kernel cpu time pegs at 100%. There may be a correlation in file creates before this happens. fsstat looks like: 163 80 322 2.82K 177 4.91K 10 426 629K 969 2.28M /var/spool 0 56 108 2.64K 1 3.39K 47 465 727K 164 164 /var/spool 0 62 0 1.00K 0 1.63K 4 430 1.43M 91 91 /var/spool 0 37 0 519 0 828 8 164 579K 49 49 /var/spool 32 9 18 180 0 397 2 2 1.18K 16 1.63K /var/spool 96 57 160 2.84K 86 4.54K 77 435 1.05M 645 1.48M /var/spool 0 49 38 1.18K 9 1.77K 11 320 898K 101 26.6K /var/spool 1 12 1 324 1 463 0 2 4.45K 25 534 /var/spool 65 3 42 191 3 491 6 13 4.03K 21 8.76K /var/spool new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 1014 71 302 1.70K 24 5.65K 19 58 85.4K 113 86.5K /var/spool 73 415 964 2.71K 161 8.55K 12 183 300K 594 959K /var/spool 73 76 136 2.25K 412 3.32K 9 273 460K 1.32K 2.24M /var/spool 46 42 82 1.58K 155 2.10K 12 242 530K 976 2.32M /var/spool 25 83 115 2.03K 170 3.08K 0 376 629K 1.25K 3.52M /var/spool 24 68 152 1.87K 17 3.11K 4 838 2.23M 841 2.69M /var/spool 65 57 144 6.65K 49 6.45K 24 490 812K 969 890K /var/spool vmstat looks like: kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s2 -- -- in sy cs us sy id 19 0 0 19502744 3999456 366 9912 2194 0 0 0 0 0 0 0 0 24868 112987 58937 49 49 2 5 0 0 19499696 4001416 345 3951 3501 0 0 0 0 0 0 0 0 36075 168337 96305 57 40 3 1 0 0 19492128 3993160 450 4542 2275 0 0 0 0 0 0 0 0 38505 128766 82580 25 46 29 14 0 0 19488608 3989880 30 207 0 0 0 0 0 0 0 0 0 3042 3932 3106 1 75 24 214 0 0 19488800 3990048 0 2 5 0 0 0 0 5 5 0 0 2647 1717 1767 0 97 3 242 0 0 19489568 3990776 0 1 0 0 0 0 0 0 0 0 0 2548 1057 387 0 100 0 [many second lacuna here] 298 0 0 19489568 3990776 0 0 0 0 0 0 0 1 0 0 0 2523 1086 587 0 100 0 43 0 0 19481184 3980456 1289 31472 2499 0 0 0 0 144 145 0 0 32371 179144 97796 44 54 2 2 0 0 19481872 3974600 2125 9766 5922 0 0 0 0 96 96 0 0 40929 170587 112274 37 55 7 0 0 0 19486568 3974504 551 1110 2632 0 0 0 0 107 107 0 0 35579 95488 77269 17 31 52 0 0 0 19485136 3969520 354 1310 2183 0 0 0 0 125 125 0 0 24527 56481 46909 11 17 72 Network activity completely stops, as such: 64 bytes from w.x.y.z: icmp_seq=2. time=0. ms 64 bytes from w.x.y.z: icmp_seq=3. time=60. ms 64 bytes from w.x.y.z: icmp_seq=4. time=6497. ms 64 bytes from w.x.y.z: icmp_seq=5. time=5560. ms 64 bytes from w.x.y.z: icmp_seq=6. time=4619. ms 64 bytes from w.x.y.z: icmp_seq=7. time=3692. ms 64 bytes from w.x.y.z: icmp_seq=8. time=2713. ms 64 bytes from w.x.y.z: icmp_seq=9. time=1725. ms 64 bytes from w.x.y.z: icmp_seq=10. time=743. ms 64 bytes from w.x.y.z: icmp_seq=11. time=0. ms 64 bytes from w.x.y.z: icmp_seq=12. time=0. ms As you can see, no packets are lost...but they all get processed after that 6 second delay... Again, only software change is moving from UFS to ZFS, and under very light load, we start seeing this behavior... I''m not even sure what further telemetry/diagnostic testing I can do to figure out what the problem is, especially since it appears that even utilities like vmstat stop working when the load goes to 100%. Any insight would be greatly appreciated. -- This messages posted from opensolaris.org