On 03/09/2010 09:36 AM, Charles Riley wrote:> Sorry, I meant to send this to the list, not just Ric.
>
>
> ----- Forwarded Message -----
> From: "Charles Riley"<criley at erad.com>
> To: "Ric Wheeler"<rwheeler at redhat.com>
> Sent: Tuesday, March 9, 2010 9:34:25 AM GMT -05:00 US/Canada Eastern
> Subject: Re: problems with large directories?
>
>
>
>
> ----- "Ric Wheeler"<rwheeler at redhat.com> wrote:
>
>> On 03/08/2010 08:23 PM, Mitch Trachtenberg wrote:
>>> Hi,
>>>
>>> I have an application that deals with 100,000 to 1,000,000 image
>> files.
>>>
>>> I initially structured it to use multiple directories, so that file
>>> 123456 would be stored in /12/34/123456. I'm now wondering if
>> that's
>>> pointless, as it would simplify things to simply store the file in
>> /123456.
>>>
>>> Can anyone indicate whether I'm gaining anything by using
smaller
>>> directories in ext3/ext4? Thanks.
>>>
>>> Mitch
>>>
>>
>> I think that breaking up your files into subdirectories makes it
>> easier to
>> navigate the tree and find files from a human point of view. Even
>> better if the
>> bytes reflect something like year/month/day/hour/min (assuming your
>> pathname has
>> a date based guid or similar encoding).
>>
>> You can have a million files in one large directory, but be careful to
>> iterate
>> and copy them in a sorted order (sorted by inode) to avoid nasty
>> performance
>> issues that are side effects of the way we hash file names in ext3/4.
>>
>> Good luck!
>>
>> Ric
>>
>
> Hi Ric,
>
> Can you elaborate on the performance issues you mention above?
>
> We use rhel4/ext3 on our pacs (medical imaging) servers.
> We ran into the 32k limit a couple of years back when our first customer
hit the 31,999th study, at which point we implemented a directory hashing
algorithm. Now we store images for a given patient's study in a path
something like:
> aa/ab/ac/1.2.3/
>
> where 1.2.3 is the dicom study instance uid (a wwuid for a medical study)
> and aa/ab/ac/ is the directory hash we derived from that study instance
uid.
>
> The above is a simplified example for illustration purposes only, 1.2.3
does not really hash to aa/ab/ac/.
> Within aa/ab/ac/1.2.3/ there can be anywhere from three to a couple of
thousand DICOM object files.
> Images are initially created in a non-hashed temporary directory and then
copied to their permanent home in e.g. aa/ab/ac/1.2.3/
>
> In this context, would we gain filesystem performance by sorting by inode
before copying?
> Do the performance issues you refer to only apply to the copy process
itself or do they contribute to long term filesystem performance?
>
> Thanks for any insight you can provide,
>
> Charles
>
Hi Charles,
The big issue with touching a lot of files (reading, stating, unlinking them) in
ext3/4 is that readdir gives us back a list in effectively random order. This
makes the accesses very seeky.
Not an issue with a handful of files (say a couple of hundred), but when you get
to thousands (or millions) of files, performance really tanks.
To avoid that, you can sort the list returned by readdir() into ascending order
by inode in reasonably large batches and get your performance up.
Several core tools have been looking at doing this automatically, but it is
important for any home grown applications as well.
In your scenario with the directory hierarchy, I suspect that you won't hit
this. If you had one very large directory, you certainly would.
Best regards,
Ric