Дмитрий Глушенок
2017-Feb-08 11:49 UTC
[Gluster-users] Slow performance on samba with small files
For _every_ file copied samba performs readdir() to get all entries of the destination folder. Then the list is searched for filename (to prevent name collisions as SMB shares are not case sensitive). More files in folder, more time it takes to perform readdir(). It is a lot worse for Gluster because single folder contents distributed among many servers and Gluster has to join many directory listings (requested via network) to form one and return it to caller. Rsync does not perform readdir(), it just checks file existence with stat() IIRC. And as modern Gluster versions has default setting to check for file only at its destination (when volume is balanced) - the check performs relatively fast. You can hack samba to prevent such checks if your goal is to get files copied not so slow (as you sure the files you are copying are not exists at destination). But try to perform 'ls -l' on _not_ cached folder with thousands of files - it will take tens of seconds. This is time your users will waste browsing shares.> 8 ????. 2017 ?., ? 13:17, Gary Lloyd <g.lloyd at keele.ac.uk> ???????(?): > > Thanks for the reply > > I've just done a bit more testing. If I use rsync from a gluster client to copy the same files to the mount point it only takes a couple of minutes. > For some reason it's very slow on samba though (version 4.4.4). > > I have tried various samba tweaks / settings and have yet to get acceptable write speed on small files. > > > Gary Lloyd > ________________________________________________ > I.T. Systems:Keele University > Finance & IT Directorate > Keele:Staffs:IC1 Building:ST5 5NB:UK > +44 1782 733063 <tel:%2B44%201782%20733073> > ________________________________________________ > > On 8 February 2017 at 10:05, ??????? ???????? <glush at jet.msk.su <mailto:glush at jet.msk.su>> wrote: > Hi, > > There is a number of tweaks/hacks to make it better, but IMHO overall performance with small files is still unacceptable for such folders with thousands of entries. > > If your shares are not too large to be placed on single filesystem and you still want to use Gluster - it is possible to run VM on top of Gluster. Inside that VM you can create ZFS/NTFS to be shared. > >> 8 ????. 2017 ?., ? 12:10, Gary Lloyd <g.lloyd at keele.ac.uk <mailto:g.lloyd at keele.ac.uk>> ???????(?): >> >> Hi >> >> I am currently testing gluster 3.9 replicated/distrbuted on centos 7.3 with samba/ctdb. >> I have been able to get it all up and running, but writing small files is really slow. >> >> If I copy large files from gluster backed samba I get almost wire speed (We only have 1Gb at the moment). I get around half that speed if I copy large files to the gluster backed samba system, which I am guessing is due to it being replicated (This is acceptable). >> >> Small file write performance seems really poor for us though: >> As an example I have an eclipse IDE workspace folder that is 6MB in size that has around 6000 files in it. A lot of these files are <1k in size. >> >> If I copy this up to gluster backed samba it takes almost one hour to get there. >> With our basic samba deployment it only takes about 5 minutes. >> >> Both systems reside on the same disks/SAN. >> >> >> I was hoping that we would be able to move away from using a proprietary SAN to house our network shares and use gluster instead. >> >> Does anyone have any suggestions of anything I could tweak to make it better ? >> >> Many Thanks >> >> >> Gary Lloyd >> ________________________________________________ >> I.T. Systems:Keele University >> Finance & IT Directorate >> Keele:Staffs:IC1 Building:ST5 5NB:UK >> ________________________________________________ >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users> > -- > Dmitry Glushenok > Jet Infosystems > >-- Dmitry Glushenok Jet Infosystems -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170208/22b930f2/attachment.html>
Gary Lloyd
2017-Feb-09 16:18 UTC
[Gluster-users] Slow performance on samba with small files
Was just reading the small file section of the 3.9 release notes: http://blog.gluster.org/2016/11/announcing-gluster-3-9/ Setting these options does seem to increase transfer speeds on small files by quite alot: # gluster volume set <volname> features.cache-invalidation on # gluster volume set <volname> features.cache-invalidation-timeout 600 # gluster volume set <volname> performance.stat-prefetch on #This one seemed to have the biggest impact in small file performance for me # gluster volume set <volname> performance.cache-invalidation on # gluster volume set <volname> performance.md-cache-timeout 600 Setting # gluster volume set <volname> performance.cache-samba-metadata on # Only for SMB access. Results in my client to keep losing the state of the server and the shares often disappear / become inaccessible and I can only get them back if I logon / logoff the machine, this is with distro Samba 4.4.4. Has anyone here had the same issue, does the version of samba need to be newer to support the feature ? Thanks *Gary Lloyd* ________________________________________________ I.T. Systems:Keele University Finance & IT Directorate Keele:Staffs:IC1 Building:ST5 5NB:UK +44 1782 733063 <%2B44%201782%20733073> ________________________________________________ On 8 February 2017 at 11:49, ??????? ???????? <glush at jet.msk.su> wrote:> For _every_ file copied samba performs readdir() to get all entries of the > destination folder. Then the list is searched for filename (to prevent name > collisions as SMB shares are not case sensitive). More files in folder, > more time it takes to perform readdir(). It is a lot worse for Gluster > because single folder contents distributed among many servers and Gluster > has to join many directory listings (requested via network) to form one and > return it to caller. > > Rsync does not perform readdir(), it just checks file existence with > stat() IIRC. And as modern Gluster versions has default setting to check > for file only at its destination (when volume is balanced) - the check > performs relatively fast. > > You can hack samba to prevent such checks if your goal is to get files > copied not so slow (as you sure the files you are copying are not exists at > destination). But try to perform 'ls -l' on _not_ cached folder with > thousands of files - it will take tens of seconds. This is time your users > will waste browsing shares. > > 8 ????. 2017 ?., ? 13:17, Gary Lloyd <g.lloyd at keele.ac.uk> ???????(?): > > Thanks for the reply > > I've just done a bit more testing. If I use rsync from a gluster client to > copy the same files to the mount point it only takes a couple of minutes. > For some reason it's very slow on samba though (version 4.4.4). > > I have tried various samba tweaks / settings and have yet to get > acceptable write speed on small files. > > > *Gary Lloyd* > ________________________________________________ > I.T. Systems:Keele University > Finance & IT Directorate > Keele:Staffs:IC1 Building:ST5 5NB:UK > +44 1782 733063 <%2B44%201782%20733073> > ________________________________________________ > > On 8 February 2017 at 10:05, ??????? ???????? <glush at jet.msk.su> wrote: > >> Hi, >> >> There is a number of tweaks/hacks to make it better, but IMHO overall >> performance with small files is still unacceptable for such folders with >> thousands of entries. >> >> If your shares are not too large to be placed on single filesystem and >> you still want to use Gluster - it is possible to run VM on top of Gluster. >> Inside that VM you can create ZFS/NTFS to be shared. >> >> 8 ????. 2017 ?., ? 12:10, Gary Lloyd <g.lloyd at keele.ac.uk> ???????(?): >> >> Hi >> >> I am currently testing gluster 3.9 replicated/distrbuted on centos 7.3 >> with samba/ctdb. >> I have been able to get it all up and running, but writing small files is >> really slow. >> >> If I copy large files from gluster backed samba I get almost wire speed >> (We only have 1Gb at the moment). I get around half that speed if I copy >> large files to the gluster backed samba system, which I am guessing is due >> to it being replicated (This is acceptable). >> >> Small file write performance seems really poor for us though: >> As an example I have an eclipse IDE workspace folder that is 6MB in size >> that has around 6000 files in it. A lot of these files are <1k in size. >> >> If I copy this up to gluster backed samba it takes almost one hour to get >> there. >> With our basic samba deployment it only takes about 5 minutes. >> >> Both systems reside on the same disks/SAN. >> >> >> I was hoping that we would be able to move away from using a proprietary >> SAN to house our network shares and use gluster instead. >> >> Does anyone have any suggestions of anything I could tweak to make it >> better ? >> >> Many Thanks >> >> >> *Gary Lloyd* >> ________________________________________________ >> I.T. Systems:Keele University >> Finance & IT Directorate >> Keele:Staffs:IC1 Building:ST5 5NB:UK >> ________________________________________________ >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> -- >> Dmitry Glushenok >> Jet Infosystems >> >> > > -- > Dmitry Glushenok > Jet Infosystems > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170209/1126ef48/attachment.html>