thr3ads.net - samba - [Samba] Large numbers of files in a directory

If this information is useful, please help other people find it:
Share via:

Jeremy Allison

2005-Feb-03 19:22 UTC

[Samba] Large numbers of files in a directory - take #2 :-)

Ok, second attempt now I'm sure the code is working :-).

JohnT - if you want to turn this into a HOWTO or part of
the book, be my guest. Remember it'll be in 3.0.12, not
3.0.11 or below.

-----------------------------------------------------------
I've been working (inspired by James Peach of SGI) on the
problem of using Samba3 with applications that need large
numbers of files (100,000 or more) per directory.

I think the current code in SVN in the SAMBA_3_0 branch
may hold the fix for this problem, so I'd like to request
people who need this functionality to give it a try.

The key was fixing the directory handling to read only
the current list requested instead of the old (up to 3.0.11)
behaviour of reading the entire directory into memory before
doling out names. Normally this would have broken OS/2
applications which have *very* strange delete semantics :-),
but by stealing logic from Samba4 (thanks tridge) I think
the current code in SVN handles this correctly.

So here's how to set up an application that needs large
number of files per directory in a way that doesn't damage
performance.

Firstly, you need to canonicalize all the files in the
directory to have one case, upper or lower - take your
pick (I chose upper as all my files were already upper
case names). Then set up a new custom share for the
application as follows:

[bigshare]
        path = /home/jeremy/tmp/manyfilesdir
        read only = no
	case sensitive = True
        default case = upper
        preserve case = no
        short preserve case = no

Of course, use your own path and settings, but set the
case options to match the case of all the files in your
directory. The path should point at the large directory
needed for the application - any new files created in
there and in any paths under it will be forced by smbd
into upper case - but smbd will no longer have to scan
the directory for names - it knows that if a file doesn't
exist in upper case then it doesn't exist at all.

The secret to this is really in the "case sensitive = True"
line - it tells smbd never to scan for case-insensitive
versions of names. So if an application asks for a file
called "FOO", and it can't be found by a simple stat call,
then smbd will return file not found immediately without
scanning the containing directory for a version of a different
case. The other "xxx case xxx" lines make this work by forcing
a consistent case on all files created by smbd.

Remember, all files and directories under the "path" directory
must be in upper case with this smb.conf stanza as smbd won't
be able to find lower case filenames with these settings. Also
note this is done on a per-share basis, allowing this to be set
only for a share servicing an application with this problematic
behaviour (using large numbers of entries in a directory) - the
rest of your smbd shares don't need to be affected.

This makes smbd *much* faster when dealing with large directories.
My test case has over 100,000 files and smbd now deals with this
very efficiently.

So please give this a test if you have problems with
Samba and large sized directories. Remember this is in SVN code
only, it isn't in the 3.0.11 pre releases or rc candidates,
as we need to ensure this new code is correct. If you
can help me test it it'll be in 3.0.12 (security problems
notwithstanding :-).

Cheers,

	Jeremy.

John H Terpstra

2005-Feb-03 20:03 UTC

head link

[Samba] Large numbers of files in a directory - take #2 :-)

Folks,

This will go into the docs as soon as 3.0.11 is out.

- John T.

On Thursday 03 February 2005 12:22, Jeremy Allison
wrote:> Ok, second attempt now I'm sure the code is working :-).
>
> JohnT - if you want to turn this into a HOWTO or part of
> the book, be my guest. Remember it'll be in 3.0.12, not
> 3.0.11 or below.
>
> -----------------------------------------------------------
> I've been working (inspired by James Peach of SGI) on the
> problem of using Samba3 with applications that need large
> numbers of files (100,000 or more) per directory.
>
> I think the current code in SVN in the SAMBA_3_0 branch
> may hold the fix for this problem, so I'd like to request
> people who need this functionality to give it a try.
>
> The key was fixing the directory handling to read only
> the current list requested instead of the old (up to 3.0.11)
> behaviour of reading the entire directory into memory before
> doling out names. Normally this would have broken OS/2
> applications which have *very* strange delete semantics :-),
> but by stealing logic from Samba4 (thanks tridge) I think
> the current code in SVN handles this correctly.
>
> So here's how to set up an application that needs large
> number of files per directory in a way that doesn't damage
> performance.
>
> Firstly, you need to canonicalize all the files in the
> directory to have one case, upper or lower - take your
> pick (I chose upper as all my files were already upper
> case names). Then set up a new custom share for the
> application as follows:
>
> [bigshare]
>         path = /home/jeremy/tmp/manyfilesdir
>         read only = no
> 	case sensitive = True
>         default case = upper
>         preserve case = no
>         short preserve case = no
>
> Of course, use your own path and settings, but set the
> case options to match the case of all the files in your
> directory. The path should point at the large directory
> needed for the application - any new files created in
> there and in any paths under it will be forced by smbd
> into upper case - but smbd will no longer have to scan
> the directory for names - it knows that if a file doesn't
> exist in upper case then it doesn't exist at all.
>
> The secret to this is really in the "case sensitive = True"
> line - it tells smbd never to scan for case-insensitive
> versions of names. So if an application asks for a file
> called "FOO", and it can't be found by a simple stat call,
> then smbd will return file not found immediately without
> scanning the containing directory for a version of a different
> case. The other "xxx case xxx" lines make this work by forcing
> a consistent case on all files created by smbd.
>
> Remember, all files and directories under the "path" directory
> must be in upper case with this smb.conf stanza as smbd won't
> be able to find lower case filenames with these settings. Also
> note this is done on a per-share basis, allowing this to be set
> only for a share servicing an application with this problematic
> behaviour (using large numbers of entries in a directory) - the
> rest of your smbd shares don't need to be affected.
>
> This makes smbd *much* faster when dealing with large directories.
> My test case has over 100,000 files and smbd now deals with this
> very efficiently.
>
> So please give this a test if you have problems with
> Samba and large sized directories. Remember this is in SVN code
> only, it isn't in the 3.0.11 pre releases or rc candidates,
> as we need to ensure this new code is correct. If you
> can help me test it it'll be in 3.0.12 (security problems
> notwithstanding :-).
>
> Cheers,
>
> 	Jeremy.
-- 
John H Terpstra
Samba-Team Member
Phone: +1 (650) 580-8668

Author:
The Official Samba-3 HOWTO & Reference Guide, ISBN: 0131453556
Samba-3 by Example, ISBN: 0131472216
Hardening Linux, ISBN: 0072254971
Other books in production.

Michael Lueck

2005-Feb-03 20:22 UTC

head link

[Samba] Re: Large numbers of files in a directory - take #2 :-)

Jeremy Allison wrote:
> The secret to this is really in the "case sensitive = True"
> line - it tells smbd never to scan for case-insensitive
> versions of names. So if an application asks for a file
> called "FOO", and it can't be found by a simple stat call,
> then smbd will return file not found immediately without
> scanning the containing directory for a version of a different
> case. The other "xxx case xxx" lines make this work by forcing
> a consistent case on all files created by smbd.
Hang on here... Windows app asks for file "Foo" and under this
proposal it will not be found?

If so could this create an issue where Windows app writes "Foo" and is
successful yet goes back to read it and is told it is not there?

This code is starting to sound like the M$ FTPD which is case sensitive only for
DIR and LS commands, GET and PUT is case insensitive, thus you can get into
modes where if "FOO" is the existing file
on the server, you upload over top of it "Foo" then dir/ls for
"Foo" to check that the size matches, you are told the file is not
there. MOST ANNOYING to code around M$ FTPD to say the least.

I recently was dealing with this case issue and came up with the following
scheme which I will try for a while.

1) Upper case the entire directory/file name
2) CRC32 that string
3) Store that hash along with file data

When a request comes in for a file, again 1) Upper case the entire
directory/file name and 2) CRC32 that string, then check against the list of
known files and see if there is a match. Step through
hash collisions as needed. Already scanning my own desktop I hit one collision
using this scheme, but one on a full hard drive is not bad! ;-)

Anyway, I came up with the above to avoid developing something like the case
guessing code found in Samba. (Still considering how to deal with M$ FTPD when I
get to the FTP I/O part of my program.)

-- 
Michael Lueck
Lueck Data Systems

Remove the upper case letters NOSPAM to contact me directly.

Seemingly Similar Threads

Search for more maybe matching threads

samba - Feb 2005 - Large numbers of files in a directory - take #2 :-)

[Samba] Large numbers of files in a directory - take #2 :-)

[Samba] Large numbers of files in a directory - take #2 :-)

[Samba] Re: Large numbers of files in a directory - take #2 :-)

Seemingly Similar Threads