EKC
2006-Jun-27 13:43 UTC
[Lustre-discuss] Runtime Option To Allocate Files Based on OST Utilization?
Hello, The Lustre manual refers to a runtime option to allocate files to OST''s based on the free space available on each OST: <snip> In the short term (4Q05-1Q06) Lustre will include a runtime option that will create proportionally more new files on OSTs with more room available. Although this won''t help if you need to write new data to an existing file on a completely full OST, it will help to keep a system from getting too far out of balance in the first place, and help bring it back into balance more quickly. <snip> source: https://mail.clusterfs.com/wikis/attachments/LustreManual.html#4.1_Expanding_the_File_System_by_Adding_OSTs Question: Has this runtime option been implemented? If so, what is it? Without this option, how does Lustre allocate files when a new OST is added? If I have 9 OST''s with <100Mb free on each of them and I add a new OST, is it possible that Lustre will fail to write a 10Gb file because the first 9 OST''s are full? (I''m assuming Lustre just uses round-robbin when allocating files -- maybe I''m wrong?) The problem that I am trying to solve is as follows: I need to distribute three 3 identical copies of a file (with max size 5Gb) across three different OST''s. My client program writes and reads from these files concurrently. I have enabled the "--failout'''' option for each OST so that the client program will detect immediately when a copy of the file is unavailable and automatically create another copy on a different OST. I am doing this in-lieu of setting up DRBD and/or using high-availabilty SAN''s. I am using "lfs setstripe <FILENAME> 0 -1 0" for each file now. However, that means that lustre could place all three files on the same OST. Also, lustre does not know the size of the file before setstripe is called, so it may pick a suboptimal OST. So, how can I have "lfs setstripe" select three different OST''s each with at least X Mb of free space each? Or, do I have to parse /proc/fs/lustre/osc/* manually? Thanks again, eser
Andreas Dilger
2006-Jun-27 14:40 UTC
[Lustre-discuss] Runtime Option To Allocate Files Based on OST Utilization?
On Jun 27, 2006 13:43 -0600, EKC wrote:> The Lustre manual refers to a runtime option to allocate files to > OST''s based on the free space available on each OST: > > Question: Has this runtime option been implemented? If so, what is it?It is part of the upcoming 1.6 release (beta now available).> Without this option, how does Lustre allocate files when a new OST is > added? If I have 9 OST''s with <100Mb free on each of them and I add a > new OST, is it possible that Lustre will fail to write a 10Gb file > because the first 9 OST''s are full? (I''m assuming Lustre just uses > round-robbin when allocating files -- maybe I''m wrong?)Currently it is _mostly_ round-robin, but if an OST is very full it is skipped. The 1.6 support will correct this behaviour as soon as there is an imbalance of free space.> So, how can I have "lfs setstripe" select three different OST''s each > with at least X Mb of free space each? Or, do I have to parse > /proc/fs/lustre/osc/* manually?A little-used feature is if you "mknod() and truncate()" a new file (i.e. don''t open the file at all) to the maximum file size, the MDS will pick enough OSTs to stripe the file over. In 1.4 this is only useful for files larger than 2TB (i.e. num_stripes = size / 2TB). In 1.6 this will add enough stripes on OSTs with enough free space to hold the expected file size (though it won''t actually reserve this space, so parallel creates/writes may still fail). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.