Hi, We have approximately 3 million active users and have a storage capacity of 300 TB in ZFS zpools. The ZFS is mounted on Sun cluster using 3*T2000 servers connected with FC to SAN storage. Each zpool is a LUN in SAN which already provides raid so we''re not doing raidz on top of it. We started using ZFS about 2.5 years ago on Solaris10U3 or U4 (I can''t recall). Our storage growth is roughly 4TB a month. The zpools sizes are from 2TB to the biggest 32T. We''re using ZFS to store mail headers (less than 4k) and attachments (1k to 12mb). Currently the Sun cluster handles approx. 20K NFS OPS. ? File sizing: 1. 2. 10 million files less than 4K a day. 3. 4. Addition to the 10 million there are another 10 million varies sizes: 5. 6. 20% less than 4K. 7. 8. 25% between 4K and 8K 9. 10. 50% between 9K and 100K 11. 12. 5% above 100K till 12M Total 20 million new files a day. We''re using two file hierarchies for storing files: For the mail headers (less than 4K): /FF/YYMM/DDHH/SS/ABCDEFGH Explanation: First directory is for the mount point from 00..FF (up to 256 directories) Second directory year and month; Third directory day and hour; Forth is seconds; In the end we have a gzipped file up to 1K. For the mail object: We''re using single instancing/de-dup on our application (Meaning no maildir or mbox). Mail objects can be 1K up to 12MB. Directory structure is as follows. /FF/FF/FF/FF/FF/FF/FF/FF/FF/file Explanation: First directory holds 256 directories 00 to FF and the other directories hold up to 256 directories, with the lower branches holding fewer directories than higher branches. At the end of the hierarchy there''s a single file. Mail operation: When a new mail is arrived we split the mail into object: a header, and each attachment is an object (even text within a body). The header files are stored as a ?timestamp? (/FF/YYMM/DDHH/SS/file) so it may be advantage for reads because when the users are reading their mail the same day, the metadata of the directories and file can be in cache. But this is not the same for the attachments, when the attachments are store in directories with their HEX value. Our main issue, or problem, over the last 2.5 years of using ZFS: When a zpool becomes full, the write operation becomes significantly slower. At first it happened around 90% zpool capacity and now, after 30-40 zpools, it happens around 80% capacity. The meaning of this for us is that if we define a zpool of 4TB, we can use only 3.2T (82%) effectively. Is there a ?best practice? from SUN/ZFS regarding building directory hierarchies with huge capacity of files(20M a day) ? also, How can we avoid performance degradation when we reach 80% of zpool capacity? Regards Yariv -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091216/7cb44cdc/attachment.html>
William D. Hathaway
2009-Dec-17 01:12 UTC
[zfs-discuss] ZFS performance issues over 2.5 years.
Hi Yariv - It is hard to say without more data, but perhaps you might be a victim of "Stop looking and start ganging": http://bugs.opensolaris.org/view_bug.do?bug_id=6596237 It looks like this was fixed in S10u8, which was released last month. If you open a support ticket (or search for this bug id on the web), I think you should be able to get some DTrace scripts to determine if that bug is impacting you. -- This message posted from opensolaris.org