Hello again guys, I constantly read about that Lustre is not very good with "small files". However, there is no definition of small file in Lustre point of view. Would you be able to draw some borders on what is considered to be a small file , for example is a 10 MB considered a small file, and should a directory holding such files be striped on few OSTs, and should it be striped at all. Cheers, _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
With a 1MB default stripe size, anything that is <1 MB is a "small" file. In general, I would not stripe 10MB files across more than one OST (each). Kevin On Jul 3, 2013, at 3:11 AM, Nikolay Kvetsinski wrote:> Hello again guys, > > I constantly read about that Lustre is not very good with "small files". However, there is no definition of small file in Lustre point of view. Would you be able to draw some borders on what is considered to be a small file , for example is a 10 MB considered a small file, and should a directory holding such files be striped on few OSTs, and should it be striped at all. > > Cheers, > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > http://lists.lustre.org/mailman/listinfo/lustre-discussThis e-mail (and any attachments) is confidential and may be privileged. Any unauthorized use, copying, disclosure or dissemination of this communication is prohibited. If you are not the intended recipient, please notify the sender immediately and delete all copies of the message and its attachments.
Thanks mate, what you say makes sense. Anyway I`m dealing with different file sizes, varying from KBs to GBs. Luckily some of the files are grouped in folders, for example in folder /fs/X all files will be 9MB, which I`ll not stripe. In /fs/Y all files will be GBs, which I`ll stripe among all OSTs. I guess the problem now is to figure out the optimal stripe size/stripe count for the files in the range from a couple of hundreds of MBs to GB .... on top of that, what I said about files with similar size being put in their own directory might not always be true ..... Unfortunately I cant "teach" users to think before they generate their files. Cheers, On Wed, Jul 3, 2013 at 12:18 PM, Kevin Van Maren <KVanMaren-5c4llco8/ftWk0Htik3J/w@public.gmane.org>wrote:> With a 1MB default stripe size, anything that is <1 MB is a "small" file. > In general, I would not stripe 10MB files across more than one OST (each). > > Kevin > > > On Jul 3, 2013, at 3:11 AM, Nikolay Kvetsinski wrote: > > > Hello again guys, > > > > I constantly read about that Lustre is not very good with "small files". > However, there is no definition of small file in Lustre point of view. > Would you be able to draw some borders on what is considered to be a small > file , for example is a 10 MB considered a small file, and should a > directory holding such files be striped on few OSTs, and should it be > striped at all. > > > > Cheers, > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > This e-mail (and any attachments) is confidential and may be privileged. > Any unauthorized use, copying, disclosure or dissemination of this > communication is prohibited. If you are not the intended recipient, > please notify the sender immediately and delete all copies of the message > and its attachments. >_______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Worth noting – unless your IO requirements are quite strict, in most cases you won''t need a large number of striping policies. The ''best'' stripe for any large IO task is usually dependent on the particular hardware/network/workload involved. If you are saturating part of the system, such as network or client, then adding stripes won''t increase IO, so the largest useful stripe size can usually be determined by creating bigger and bigger stripes, then measuring where the performance plateaus. Generally most people get by with two policies, A single-stripe area, for small files, and a larger-stripe where the size of the stripe is determined by best IO performance on that specific hardware. cliffw From: Nikolay Kvetsinski <nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org<mailto:nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> Date: Wednesday, July 3, 2013 4:51 AM To: Kevin Van Maren <KVanMaren-5c4llco8/ftWk0Htik3J/w@public.gmane.org<mailto:KVanMaren-5c4llco8/ftWk0Htik3J/w@public.gmane.org>> Cc: "<lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org<mailto:lustre-discuss-aLEFhgZF4x4kHE2CFeF4+A@public.gmane.orgg>>" <lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org<mailto:lustre-discuss-aLEFhgZF4x4kHE2CFeF4+A@public.gmane.orgg>> Subject: Re: [Lustre-discuss] Small files Thanks mate, what you say makes sense. Anyway I`m dealing with different file sizes, varying from KBs to GBs. Luckily some of the files are grouped in folders, for example in folder /fs/X all files will be 9MB, which I`ll not stripe. In /fs/Y all files will be GBs, which I`ll stripe among all OSTs. I guess the problem now is to figure out the optimal stripe size/stripe count for the files in the range from a couple of hundreds of MBs to GB .... on top of that, what I said about files with similar size being put in their own directory might not always be true ..... Unfortunately I cant "teach" users to think before they generate their files. Cheers, On Wed, Jul 3, 2013 at 12:18 PM, Kevin Van Maren <KVanMaren-5c4llco8/ftWk0Htik3J/w@public.gmane.org<mailto:KVanMaren-5c4llco8/ftWk0Htik3J/w@public.gmane.org>> wrote: With a 1MB default stripe size, anything that is <1 MB is a "small" file. In general, I would not stripe 10MB files across more than one OST (each). Kevin On Jul 3, 2013, at 3:11 AM, Nikolay Kvetsinski wrote:> Hello again guys, > > I constantly read about that Lustre is not very good with "small files". However, there is no definition of small file in Lustre point of view. Would you be able to draw some borders on what is considered to be a small file , for example is a 10 MB considered a small file, and should a directory holding such files be striped on few OSTs, and should it be striped at all. > > Cheers, > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org<mailto:Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org> > http://lists.lustre.org/mailman/listinfo/lustre-discussThis e-mail (and any attachments) is confidential and may be privileged. Any unauthorized use, copying, disclosure or dissemination of this communication is prohibited. If you are not the intended recipient, please notify the sender immediately and delete all copies of the message and its attachments.
On 2013/03/07 5:51 AM, "Nikolay Kvetsinski" <nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:>Thanks mate, what you say makes sense. Anyway I`m dealing with different >file sizes, varying from KBs to GBs. Luckily some of the files are >grouped in folders, for example in folder /fs/X all files will be 9MB, >which I`ll not stripe. In /fs/Y > all files will be GBs, which I`ll stripe among all OSTs. I guess the >problem now is to figure out the optimal stripe size/stripe count for the >files in the range from a couple of hundreds of MBs to GB .... on top of >that, what I said about files with similar > size being put in their own directory might not always be true ..... >Unfortunately I cant "teach" users to think before they generate their >files.Note that there is also no benefit to stripe files over multiple OSTs if there is already parallelism at the application level (i.e. multiple threads reading/writing separate files in parallel from one or more client nodes). One thread per CPU can saturate the network interface of the client, and if this IO goes to multiple OSTs per file then it creates unnecessary contention and overhead (more locking, RPCs, etc to manage multiple objects per file). You should look at the concurrency of the file access instead of just the file size to decide what to stripe. For really large files (e.g. hundreds of GB+, or anything over 5% of the total OST size or so) you should probably stripe those over multiple OSTs anyway, just to balance the space usage, and the extra metadata overhead isn''t noticeable at this size anyway. Cheers, Andreas>On Wed, Jul 3, 2013 at 12:18 PM, Kevin Van Maren ><KVanMaren-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote: > >With a 1MB default stripe size, anything that is <1 MB is a "small" file. > In general, I would not stripe 10MB files across more than one OST >(each). > >Kevin > > >On Jul 3, 2013, at 3:11 AM, Nikolay Kvetsinski wrote: > >> Hello again guys, >> >> I constantly read about that Lustre is not very good with "small >>files". However, there is no definition of small file in Lustre point of >>view. Would you be able to draw some borders on what is considered to be >>a small file , for example is a 10 MB considered > a small file, and should a directory holding such files be striped on >few OSTs, and should it be striped at all. >Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division