Hi, can anyone please tell me what is the maximum number of files that can be there in 1 folder in Solaris with ZSF file system. I am working on an application in which I have to support 1mn users. In my application I am using MySql MyISAM and in MyISAM there is 3 files created for 1 table. I am having application architechture in which each user will be having separate table, so the expected number of files in database folder is 3mn. I have read somewhere that there is a limit of each OS to create files in a folder. -- This message posted from opensolaris.org
On Tue, 30 Sep 2008, Ram Sharma wrote:> Hi, > > can anyone please tell me what is the maximum number of files that can > be there in 1 folder in Solaris with ZSF file system.By folder, I assume you mean directory and not, say, pool. In any case, the ''limit'' is 2^48, but that''s effectively no limit at all. Regards, markm
ZFS has not limit for snapshots and filesystems too, but try to create "a lot" snapshots and filesytems and you will have to wait "a lot" for your pool to import too... ;-) I think you should not think about the "limits", but performance. Any filesytem with *too many" entries by directory will suffer. So, my advice is configure your app to create a better hierarchy. Leal. -- This message posted from opensolaris.org
On 30-Sep-08, at 7:50 AM, Ram Sharma wrote:> Hi, > > can anyone please tell me what is the maximum number of files that > can be there in 1 folder in Solaris with ZSF file system. > > I am working on an application in which I have to support 1mn > users. In my application I am using MySql MyISAM and in MyISAM > there is 3 files created for 1 table. I am having application > architechture in which each user will be having separate table, so > the expected number of files in database folder is 3mn.That sounds like a disastrous schema design. Apart from that, you''re going to run into problems on several levels, including O/S resources (file descriptors) and filesystem scalability. --Toby> I have read somewhere that there is a limit of each OS to create > files in a folder. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Actually, the one that''ll hurt most is ironically the most closely related to bad database schema design... With a zillion files in the one directory, if someone does an ''ls'' in that directory, it''ll not only take ages, but steal a whole heap of memory and compute power... Provided the only things that''ll be doing *anything* in that directory are using indexed methods, there is no real problem from a ZFS perspective, but if something decides to list (or worse, list and sort) that directory, it won''t be that pleasant. Oh - That''s of course assuming you have sufficient memory in the system to cache all that metadata somewhere... If you don''t then that''s another zillion I/O''s you need to deal with each time you list the entire directory. an ls -1rt on a directory with about 1.2 million files with names like afile1202899 takes minutes to complete on my box, and we see ''ls'' get to in excess of 700MB rss... (and that''s not including the memory zfs is using to cache whatever it can.) My box has the ARC limited to about 1GB, so it''s obviously undersized for such a workload, but still gives you an indication... I generally look to keep directories to a size that allows the utilities that work on and in it to perform at a reasonable rate... which for the most part is around the 100K files or less... Perhaps you are using larger hardware than I am for some of this stuff? :) Nathan. On 1/10/08 07:29 AM, Toby Thain wrote:> On 30-Sep-08, at 7:50 AM, Ram Sharma wrote: > >> Hi, >> >> can anyone please tell me what is the maximum number of files that >> can be there in 1 folder in Solaris with ZSF file system. >> >> I am working on an application in which I have to support 1mn >> users. In my application I am using MySql MyISAM and in MyISAM >> there is 3 files created for 1 table. I am having application >> architechture in which each user will be having separate table, so >> the expected number of files in database folder is 3mn. > > That sounds like a disastrous schema design. Apart from that, you''re > going to run into problems on several levels, including O/S resources > (file descriptors) and filesystem scalability. > > --Toby > >> I have read somewhere that there is a limit of each OS to create >> files in a folder. >> -- >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ////////////////////////////////////////////////////////////////// // Nathan Kroenert nathan.kroenert at sun.com // // Senior Systems Engineer Phone: +61 3 9869 6255 // // Global Systems Engineering Fax: +61 3 9869 6288 // // Level 7, 476 St. Kilda Road // // Melbourne 3004 Victoria Australia // //////////////////////////////////////////////////////////////////
On Wed, 1 Oct 2008, Nathan Kroenert wrote:> zillion I/O''s you need to deal with each time you list the entire directory. > > an ls -1rt on a directory with about 1.2 million files with names like > afile1202899 takes minutes to complete on my box, and we see ''ls'' get to > in excess of 700MB rss... (and that''s not including the memory zfs is > using to cache whatever it can.)A million files in ZFS is no big deal: % ptime ls -1rt > /dev/null real 17.277 user 8.992 sys 8.231 % ptime ls -1rt | wc -l real 17.045 user 8.607 sys 8.413 1000000 Maybe the problem is that you need to increase your screen''s scroll rate. :-) Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Interesting. heh - I was piping to tail -10, so output rate was not an issue. That being said, there is a large delta in your results and mine... If I get a chance, I''ll look into it... I suspect it''s a cached versus I/O issue... Nathan. On 1/10/08 10:02 AM, Bob Friesenhahn wrote:> On Wed, 1 Oct 2008, Nathan Kroenert wrote: >> zillion I/O''s you need to deal with each time you list the entire >> directory. >> >> an ls -1rt on a directory with about 1.2 million files with names like >> afile1202899 takes minutes to complete on my box, and we see ''ls'' get to >> in excess of 700MB rss... (and that''s not including the memory zfs is >> using to cache whatever it can.) > > A million files in ZFS is no big deal: > > % ptime ls -1rt > /dev/null > > real 17.277 > user 8.992 > sys 8.231 > > % ptime ls -1rt | wc -l > > real 17.045 > user 8.607 > sys 8.413 > 1000000 > > Maybe the problem is that you need to increase your screen''s scroll > rate. :-) > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ >-- ////////////////////////////////////////////////////////////////// // Nathan Kroenert nathan.kroenert at sun.com // // Senior Systems Engineer Phone: +61 3 9869 6255 // // Global Systems Engineering Fax: +61 3 9869 6288 // // Level 7, 476 St. Kilda Road // // Melbourne 3004 Victoria Australia // //////////////////////////////////////////////////////////////////
On Wed, 1 Oct 2008, Nathan Kroenert wrote:> > That being said, there is a large delta in your results and mine... If I get > a chance, I''ll look into it... > > I suspect it''s a cached versus I/O issue...The first time I posted was the first time the directory has been read in well over a month so it was not currently cached. You might find this to be interesting since it shows that the ''rt'' options are taking most of the time: % ptime ls -1 | wc -l real 5.497 user 4.825 sys 0.654 1000000 I will certainly agree that huge directories can cause problems for many applications, particularly ones that access the files over a network. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Tue, Sep 30, 2008 at 6:30 PM, Nathan Kroenert <Nathan.Kroenert at sun.com> wrote:> Actually, the one that''ll hurt most is ironically the most closely > related to bad database schema design... With a zillion files in the one > directory, if someone does an ''ls'' in that directory, it''ll not only > take ages, but steal a whole heap of memory and compute power... > > Provided the only things that''ll be doing *anything* in that directory > are using indexed methods, there is no real problem from a ZFS > perspective, but if something decides to list (or worse, list and sort) > that directory, it won''t be that pleasant. > > Oh - That''s of course assuming you have sufficient memory in the system > to cache all that metadata somewhere... If you don''t then that''s another > zillion I/O''s you need to deal with each time you list the entire directory. > > an ls -1rt on a directory with about 1.2 million files with names like > afile1202899 takes minutes to complete on my box, and we see ''ls'' get to^^^^^^^^^^^ Here''s your problem!> in excess of 700MB rss... (and that''s not including the memory zfs is > using to cache whatever it can.) > > My box has the ARC limited to about 1GB, so it''s obviously undersized > for such a workload, but still gives you an indication... > > I generally look to keep directories to a size that allows the utilities > that work on and in it to perform at a reasonable rate... which for the > most part is around the 100K files or less... > > Perhaps you are using larger hardware than I am for some of this stuff? :) >I''ve seen this problem where *Solaris has issues with many files created with this type of file naming pattern. For example, the file naming pattern produced by tmpfile(3C). I saw it originally on a tmpfs and it can be easily reproduced by: [note: I''m writing this from memory - so don''t beat me up over specific details] 1) pick a number for the number of files you want to test with (try different numbers - start with 1,500 and then increase it). Call this test# 2) cd /tmp 3) IMPORTANT: Make a test directory for this experiment - let''s call it temp 4) cd /tmp/temp (your playground) 5) using your favorite language generate your test# of files using a pattern similar to the one above by calling (ultimate) tmpfile() 6) ptime ls -al; - it will be quick the first time 7) ptime rm * ; - it will be quick the first time 8) repeat steps 5, 6 and 7. Your ptimes will be a little slower 9) repeat steps 5, 6 and 7. Your ptimes will be much slower 10) repeat steps 5, 6 and 7. Your ptimes will be *really* slow. Now you''ll understand that you have a problem. 11) repeat 5, 6 and 7 a couple more times. Notice how bad your ptimes are now! 12) look at the size of /tmp/temp using ls -ald /tmp/temp and you''ll notice that it has grown substancially. The larger this directory grows, the slower the filesystem operations will get. This behavior is common to tmpfs, UFS and I tested it on early ZFS releases. I have no idea why - I have not made the time to figure it out. What I have observed is that all operations on your (victim) test directory will max out (100% utilization) one CPU or one CPU core - and all directory operations become single-threaded and limited by the performance of one CPU (or core). Now for the weird part: the *only* way to return everything to normal performance levels (that I''ve found) is to rmdir the (victim) directory. This is why I recommend you perform this experiment in a subdirectory. If you do it in /tmp - you''ll have to reboot the box to get reasonably performance back - and you don''t want to do it in your home directory either!! I''ll try to set aside some time tomorrow to re-run this experiment. But I''m nearly sure this is why your directory related file ops are so slow and *dramatically* slower than they should be. This problem/bug is insideous - because using tmpfile() in /tmp is a very common practice and the application(s) using /tmp will slow down dramatically while maxing out (100% utilization) one CPU (or core). And if your system only has a single CPU... :( Let me know what you find out. I know that the file name pattern is what causes this bug to bite bigtime - and not so much the number of files you use to test it. I *suspect* that there might be something like a hash table that is degenerating into a singly linked list as the root cause of this issue. But this is only my WAG. Regards, -- Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Bob Friesenhahn wrote:> On Wed, 1 Oct 2008, Nathan Kroenert wrote: > >> zillion I/O''s you need to deal with each time you list the entire directory. >> >> an ls -1rt on a directory with about 1.2 million files with names like >> afile1202899 takes minutes to complete on my box, and we see ''ls'' get to >> in excess of 700MB rss... (and that''s not including the memory zfs is >> using to cache whatever it can.) >> > > A million files in ZFS is no big deal: > >But how similar were your file names? Ian
On Tue, Sep 30, 2008 at 09:44:21PM -0500, Al Hopper wrote:> > This behavior is common to tmpfs, UFS and I tested it on early ZFS > releases. I have no idea why - I have not made the time to figure it > out. What I have observed is that all operations on your (victim) > test directory will max out (100% utilization) one CPU or one CPU core > - and all directory operations become single-threaded and limited by > the performance of one CPU (or core).And sometimes its just a little bug: E.g. with a recent version of Solaris (i.e. >= snv_95 || >= S10U5) on UFS: SunOS graf 5.10 Generic_137112-07 i86pc i386 i86pc (X4600, S10U5) ============================================================================admin.graf /var/tmp > time sh -c ''mkfile 2g xx ; sync'' 0.05u 9.78s 0:29.42 33.4% admin.graf /var/tmp > time sh -c ''mkfile 2g xx ; sync'' 0.05u 293.37s 5:13.67 93.5% admin.graf /var/tmp > rm xx admin.graf /var/tmp > time sh -c ''mkfile 2g xx ; sync'' 0.05u 9.92s 0:31.75 31.4% admin.graf /var/tmp > time sh -c ''mkfile 2g xx ; sync'' 0.05u 305.15s 5:28.67 92.8% admin.graf /var/tmp > time dd if=/dev/zero of=xx bs=1k count=2048 2048+0 records in 2048+0 records out 0.00u 298.40s 4:58.46 99.9% admin.graf /var/tmp > time sh -c ''mkfile 2g xx ; sync'' 0.05u 394.06s 6:52.79 95.4% SunOS kaiser 5.10 Generic_137111-07 sun4u sparc SUNW,Sun-Fire-V440 (S10, U5) ============================================================================admin.kaiser /var/tmp > time mkfile 1g xx 0.14u 5.24s 0:26.72 20.1% admin.kaiser /var/tmp > time mkfile 1g xx 0.13u 64.23s 1:25.67 75.1% admin.kaiser /var/tmp > time mkfile 1g xx 0.13u 68.36s 1:30.12 75.9% admin.kaiser /var/tmp > rm xx admin.kaiser /var/tmp > time mkfile 1g xx 0.14u 5.79s 0:29.93 19.8% admin.kaiser /var/tmp > time mkfile 1g xx 0.13u 66.37s 1:28.06 75.5% SunOS q 5.11 snv_98 i86pc i386 i86pc (U40, S11b98) ============================================================================elkner.q /var/tmp > time mkfile 2g xx 0.05u 3.63s 0:42.91 8.5% elkner.q /var/tmp > time mkfile 2g xx 0.04u 315.15s 5:54.12 89.0% SunOS dax 5.11 snv_79a i86pc i386 i86pc (U40, S11b79) ============================================================================elkner.dax /var/tmp > time mkfile 2g xx 0.05u 3.09s 0:43.09 7.2% elkner.dax /var/tmp > time mkfile 2g xx 0.05u 4.95s 0:43.62 11.4% Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768
Hi Guys, Thanks for so many good comments. Perhaps I got even more than what I asked for! I am targeting 1 million users for my application.My DB will be on solaris machine.And the reason I am making one table per user is that it will be a simple design as compared to keeping all the data in single table.In that case I need to worry about things like horizontal partitioning which inturn will require higher level of management. So for storing 1 million MYISAM tables (MYISAM being a good performer when it comes to not very large data) , I need to save 3 million data files in a single folder on disk. This is the way MYISAM saves data. I will never need to do an ls on this folder. This folder(~database) will be used just by MYSQL engine to exceute my SQL queries and fetch me results. And now that ZFS allows me to do this easily, I believe I can go forward with this design easily.Correct me if I am missing something. -- This message posted from opensolaris.org
On Tue, 30 Sep 2008, Al Hopper wrote:> > I *suspect* that there might be something like a hash table that is > degenerating into a singly linked list as the root cause of this > issue. But this is only my WAG.That seems to be a reasonable conclusion. BTFW that my million file test directory uses this sort of file naming, but it has only been written once. When making data multi-access safe, often it is easiest to mark old data entries as unused while retaining the allocation. At some later time when it is convenient to do so, these old entries may be made available for reuse. It seems like your algorithm is causing the directory size to grow quite large, with many stale entries. Another possibility is that the directory is becoming fragmented due to the limitations of block size. The original directory was contiguous, but the updated directory is now fragmented. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Wed, 1 Oct 2008, Ian Collins wrote:>> >> A million files in ZFS is no big deal: >> > But how similar were your file names?The file names are like: image.dpx[0000000] image.dpx[0000001] image.dpx[0000002] image.dpx[0000003] image.dpx[0000004] . . . So they will surely trip up Al Hopper''s bad algorithm. It is pretty common that images arranged in sequences have the common part up front so that sorting works. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Wed, 1 Oct 2008, Ram Sharma wrote:> So for storing 1 million MYISAM tables (MYISAM being a good performer when > it comes to not very large data) , I need to save 3 million data files in a > single folder on disk. This is the way MYISAM saves data. > I will never need to do an ls on this folder. This folder(~database) will be > used just by MYSQL engine to exceute my SQL queries and fetch me results.As long as you do not need to list the files in the directory, I think that you will be ok with zfs: First access: % ptime ls -l ''image.dpx[0000666]'' -r--r--r-- 8001 bfriesen home 12754944 Jun 16 2005 image.dpx[0000666] real 0.023 user 0.000 sys 0.002 Second access: % ptime ls -l ''image.dpx[0000666]'' -r--r--r-- 8001 bfriesen home 12754944 Jun 16 2005 image.dpx[0000666] real 0.003 user 0.000 sys 0.002 Access to a file in a small directory: % ptime ls -l .zprofile -rwxr-xr-x 1 bfriesen home 236 Dec 30 2007 .zprofile real 0.003 user 0.000 sys 0.002 Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On 1-Oct-08, at 1:56 AM, Ram Sharma wrote:> Hi Guys, > > Thanks for so many good comments. Perhaps I got even more than what > I asked for! > > I am targeting 1 million users for my application.My DB will be on > solaris machine.And the reason I am making one table per user is > that it will be a simple design as compared to keeping all the data > in single table.You have a green light from ZFS experts, but there is no way you''d get that schema past a good DBA. This design will fail you long before you get near a million users. --Toby> In that case I need to worry about things like horizontal > partitioning which inturn will require higher level of management. > > So for storing 1 million MYISAM tables (MYISAM being a good > performer when it comes to not very large data) , I need to save 3 > million data files in a single folder on disk. This is the way > MYISAM saves data. > I will never need to do an ls on this folder. This folder > (~database) will be used just by MYSQL engine to exceute my SQL > queries and fetch me results. > And now that ZFS allows me to do this easily, I believe I can go > forward with this design easily.Correct me if I am missing something. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss