Andrey Dmitriev
2008-Aug-07 20:47 UTC
[zfs-discuss] Poor ZFS performance when file system is close to full
All, We had a situation where write speeds to a ZFS consisting of 2 7TB RAID5 LUNs came to a crawl. We have spent a good 100 men hours trying to troubleshoot the issue eliminating HW issues. In the end, when we whacked about 2TB out of 14, performance went back to normal (300megs+ vs 3 megs when it was poor). I would like some understanding as to why this is the case with ZFS, as well as for what threshold to look out for. Ths was the layout when performance was 3megs/sec beast 130G 37K 130G 1% /mnt/backup1 beast/customer1 130G 29K 130G 1% /mnt/backup1/customer1 beast/customer1/bacula 222G 93G 130G 42% /mnt/backup1/customer1/bacula beast/customer1/db 2.0T 1.8T 130G 94% /mnt/backup1/customer1/db beast/customer1/fs 2.1T 1.9T 130G 94% /mnt/backup1/customer1/filesystem beast/customer5 130G 29K 130G 1% /mnt/backup1/customer5 beast/customer5/bacula 221G 92G 130G 42% /mnt/backup1/customer5/bacula beast/customer5/db 130G 25K 130G 1% /mnt/backup1/customer5/db beast/customer5/fs 172G 42G 130G 25% /mnt/backup1/customer5/filesystem beast/bacula 130G 15M 130G 1% /mnt/backup1/bacula beast/bacula/spool 130G 34K 130G 1% /mnt/backup1/bacula/spool beast/customer6 130G 29K 130G 1% /mnt/backup1/customer6 beast/customer6/bacula 210G 81G 130G 39% /mnt/backup1/customer6/bacula beast/customer6/db 3.7T 3.6T 130G 97% /mnt/backup1/customer6/db beast/customer6/fs 130G 25K 130G 1% /mnt/backup1/customer6/filesystem beast/customer2 133G 3.6G 130G 3% /mnt/backup1/customer2 beast/customer2/bacula 1.5T 1.4T 130G 92% /mnt/backup1/customer2/bacula beast/customer2/db 194G 65G 130G 34% /mnt/backup1/customer2/db beast/customer2/fs 221G 92G 130G 42% /mnt/backup1/customer2/filesystem beast/customer4 130G 29K 130G 1% /mnt/backup1/customer4 beast/customer4/bacula 1.3T 1.2T 130G 90% /mnt/backup1/customer4/bacula beast/customer4/db 1.6T 1.5T 130G 92% /mnt/backup1/customer4/db beast/customer4/fs 130G 25K 130G 1% /mnt/backup1/customer4/filesystem beast/customer3 130G 26K 130G 1% /mnt/backup1/customer3 beast/customer3/bacula 2.8T 2.6T 130G 96% /mnt/backup1/customer3/bacula Original Post: capacity operations bandwidth pool used avail read write read write -------------------------------------- ----- ----- ----- ----- ----- ----- beast 14.1T 366G 0 155 0 3.91M c7t6000402002FC424F6CF5317A00000000d0 7.07T 183G 0 31 0 16.2K c7t6000402002FC424F6CF5318F00000000d0 7.07T 183G 0 124 0 3.90M This message posted from opensolaris.org
Bob Friesenhahn
2008-Aug-07 21:19 UTC
[zfs-discuss] Poor ZFS performance when file system is close to full
On Thu, 7 Aug 2008, Andrey Dmitriev wrote:> We had a situation where write speeds to a ZFS consisting of 2 7TB > RAID5 LUNs came to a crawl. We have spent a good 100 men hours > trying to troubleshoot the issue eliminating HW issues. In the end, > when we whacked about 2TB out of 14, performance went back to normal > (300megs+ vs 3 megs when it was poor). I would like some > understanding as to why this is the case with ZFS, as well as for > what threshold to look out for.Are you sure that you don''t have some slow disk drives in your RAID5 LUN? Its seems like your two RAID5 LUNs are performing quite differently. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Andrey Dmitriev
2008-Aug-07 21:50 UTC
[zfs-discuss] Poor ZFS performance when file system is close to full
I am sure.. Nothing but this box ever accessed them. All NFS access was stopped to the box. The RAID sets are identical (9 drive RAID5). We tested the file system almost non-stop for almost two days and never did I ever get it to write above 4 megs (on average it was below 3 megs). The second I destroyed a file system with 1.8TB on it (3rd one from the top), it started writing 300megs/sec. This system was in production for over a year, and it always performed top notch (we use it for D2D backups) All tests were done with dd and variable block sizes (8k and 1meg for the most part) This message posted from opensolaris.org
Bob Friesenhahn
2008-Aug-07 22:00 UTC
[zfs-discuss] Poor ZFS performance when file system is close to full
On Thu, 7 Aug 2008, Andrey Dmitriev wrote:> I am sure.. Nothing but this box ever accessed them. All NFS access > was stopped to the box. The RAID sets are identical (9 drive RAID5). > We tested the file system almost non-stop for almost two days and > never did I ever get it to write above 4 megs (on average it was > below 3 megs). The second I destroyed a file system with 1.8TB on it > (3rd one from the top), it started writing 300megs/sec. This system > was in production for over a year, and it always performed top notch > (we use it for D2D backups)I see. When you say "close to full", how much space was left, and what was the percentage full? Bad things happen when any filesystem gets full. ZFS has to search harder for free blocks and the remaining free blocks may be in very poor locations, causing lots of fragmentation. The disks spend most of their time doing seeks rather than transferring data. The UFS filesystem "solves" this by not allowing the filesystem to become full. 100% is not really 100% and only the root user is allowed to take the filesystem to the artificial "100%". Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Andrey Dmitriev
2008-Aug-07 23:09 UTC
[zfs-discuss] Poor ZFS performance when file system is close to full
there were 130G left on the zpool. df -h from before one if the file system was destroyed is in the original post. Some file system viewed that as 1% full, others as 94-97% (and some others with fairly random numbers), which is another mystery to me as well. Shouldn''t all file systems have shown the same amount under the same zpool? The reason for such a file system breakdown is for both organizational purposes, as well as being able to compress some data sets (via ZFS) This message posted from opensolaris.org
Lance
2008-Aug-09 00:04 UTC
[zfs-discuss] Poor ZFS performance when file system is close to full
> We had a situation where write speeds to a ZFS > consisting of 2 7TB RAID5 LUNs came to a crawl.Sounds like you''ve hit Bug# 6596237 "Stop looking and start ganging". We ran into the same problem on our X4500 Thumpers. Write throughput dropped to 200 KB/s. We now keep utilization under 90% to help combat the problem till a patch is available. It seems to be worse if you have millions of files within the zpool. We had over 25M files in a 16TB zpool. At 95% full, it was virtually unusable for writing files. This was on a host running vanilla S10U4 x86. This message posted from opensolaris.org