Michael Peek
2014-Apr-16 13:50 UTC
[Gluster-users] Would there be a use for cluster-specific filesystem tools?
Hi guys, (I'm new to this, so pardon me if my shenanigans turns out to be a waste of your time.) I have been experimenting with Gluster by copying and deleting large numbers of files of all sizes. What I found was that when deleting a large number of small files, the deletion process seems to take a good chunk of my time -- in some cases it seemed to take a significant percentage of the time that it took to copy the files to the cluster to begin with. I'm guessing that the reason is a combination of find and rm -fr processing files serially and having to wait on the packets to travel back and forth over the network. But with a clustering filesystem, the bottleneck is processing files serially and waiting for network packets when you don't have to. So I decided to try an experiment. Instead of using /bin/rm to delete files serially, I wrote my own quick-and-dirty recursive rm (and recursive ls) that uses pthreads (listed as "cluster-rm" and "cluster-ls" in the table below): Methods: 1) This was done on a Linux system. I suspect that Linux (or any modern OS) caches filesystem information. For example, after setting up a directory, when running rm -fr on that directory, the time for rm to complete is lessened if I first run find on the same directory. So to avoid this caching effect, each command was run on it's own test directory. (I.e. find was never run on the same directory as rm -fr or cluster-rm.) This approach seemed to prevent inconsistencies resulting from any caching behavior, resulting in run times that were more consistent. 2) Each test directory contained the exact same data for each of the four commands tested (find, cluster-ls, rm, cluster-rm) for each test run. 3) All commands were run on a client machine and not one of the cluster nodes. Results: _*Data Size*_ _*Command*_ _*Test #1*_ _*Test #2*_ _*Test #3*_ _*Test #4*_ 49GB find -print real 6m45.066s user 0m0.172s sys 0m0.748s real 6m18.524s user 0m0.140s sys 0m0.508s real 5m45.301s user 0m0.156s sys 0m0.484s real 5m58.577s user 0m0.132s sys 0m0.480s cluster-ls real 2m32.770s user 0m0.208s sys 0m1.876s real 2m21.376s user 0m0.164s sys 0m1.568s real 2m40.511s user 0m0.184s sys 0m1.488s real 2m36.202s user 0m0.172s sys 0m1.412s 49GB rm -fr real 16m36.264s user 0m0.232s sys 0m1.724s real 16m16.795s user 0m0.248s sys 0m1.528s real 15m54.503s user 0m0.204s sys 0m1.396s real 16m10.037s user 0m0.168s sys 0m1.448s cluster-rm real 1m50.717s user 0m0.236s sys 0m1.820s real 1m44.803s user 0m0.192s sys 0m2.100s real 2m6.250s user 0m0.224s sys 0m2.200s real 2m6.367s user 0m0.224s sys 0m2.316s 97GB find -print real 11m39.990s user 0m0.380s sys 0m1.428s real 11m21.018s user 0m0.380s sys 0m1.224s real 11m33.257s user 0m0.288s sys 0m0.924s real 11m4.867s user 0m0.332s sys 0m1.244s cluster-ls real 4m46.829s user 0m0.504s sys 0m3.228s real 5m15.538s user 0m0.408s sys 0m3.736s real 4m52.075s user 0m0.364s sys 0m3.004s real 4m43.134s user 0m0.452s sys 0m3.140s 97GB rm -fr real 29m34.138s user 0m0.520s sys 0m3.908s real 28m11.000s user 0m0.556s sys 0m3.480s real 28m37.154s user 0m0.412s sys 0m2.756s real 28m41.724s user 0m0.380s sys 0m4.184s cluster-rm real 3m30.750s user 0m0.524s sys 0m4.932s real 4m20.195s user 0m0.456s sys 0m5.316s real 4m45.206s user 0m0.444s sys 0m4.584s real 4m26.894s user 0m0.436s sys 0m4.732s 145GB find -print real 16m26.498s user 0m0.520s sys 0m2.244s real 16m53.047s user 0m0.596s sys 0m1.740s real 15m10.704s user 0m0.364s sys 0m1.748s real 15m53.943s user 0m0.456s sys 0m1.764s cluster-ls real 6m52.006s user 0m0.644s sys 0m5.664s real 7m7.361s user 0m0.804s sys 0m5.432s real 7m4.109s user 0m0.652s sys 0m4.800s real 6m37.229s user 0m0.656s sys 0m4.652s 145GB rm -fr real 40m10.396s user 0m0.624s sys 0m5.492s real 42m17.851s user 0m0.844s sys 0m4.872s real 39m6.493s user 0m0.484s sys 0m4.868s real 39m52.047s user 0m0.496s sys 0m4.980s cluster-rm real 6m49.769s user 0m0.708s sys 0m6.440s real 8m34.644s user 0m0.852s sys 0m8.345s real 6m3.563s user 0m0.636s sys 0m5.844s real 6m31.808s user 0m0.664s sys 0m5.996s 1.1TB find -print real 62m4.043s user 0m1.300s sys 0m5.448s real 61m11.584s user 0m1.204s sys 0m5.172s real 65m37.389s user 0m1.708s sys 0m4.276s real 63m51.822s user 0m3.096s sys 0m9.869s cluster-ls real 73m12.463s user 0m2.472s sys 0m19.289s real 68m37.846s user 0m2.080s sys 0m18.625s real 72m56.417s user 0m2.516s sys 0m18.601s real 69m3.575s user 0m4.316s sys 0m35.986s 1.1TB rm -fr real 188m1.925s user 0m2.240s sys 0m21.705s real 190m21.850s user 0m2.372s sys 0m18.885s real 200m25.712s user 0m5.840s sys 0m46.363s real 196m12.686s user 0m4.916s sys 0m41.519s cluster-rm real 85m46.463s user 0m2.512s sys 0m30.478s real 90m29.055s user 0m2.600s sys 0m30.382s real 88m16.063s user 0m4.456s sys 0m51.667 real 77m42.096s user 0m2.464s sys 0m31.638s Conclusions: 1) Once I had a threaded version of rm, a threaded version of ls was easy to make, so I included it in the tests (listed above as cluster-ls). Performance looked spiffy up until the 1.1TB range, when cluster-ls started taking more time than find. Right now I can't explain why. 1.1TB takes a long time to set up and process (about a day for each set of four commands), it could be that regular nightly backups might be interfering with performance. If that's the case, then it calls into question the usefulness of my threaded approach. Also, naturally the output from cluster-ls is out of order, so grep and sed would most likely be used in conjunction with something like that, and I haven't yet time-tested 'cluster-ls | some-other-command' against using plain old find by itself. 2) Results from cluster-rm look pretty good to me across the board. Again, performance seems to fall off in the 1.1TB tests, and the reasons are not clear to me at this time, but performance is still half that of rm -fr. Run times fluctuate more than in the previous tests, but I suppose that's to be expected. But since performance does drop, it makes me wonder how well this approach scales on larger sets of data. 3) My threaded cluster-rm/ls commands are not clever. While traversing directories, any subdirectories found would result in a new thread to process it, up until some hard-coded limit is reached (for the above results, 100 threads were used). After the thread count limit is reached, directories are processed using plain old recursion until a thread exits, freeing up a thread to process another subdirectory. Further Research: A) I would like to test further with larger data sets. B) I would like to implement a smarter algorithm for determining how many threads to use to maximize performance. Rather than a hard-coded maximum, a better approach might be to use some metric for measuring number of inodes processed per second, and use that to determine the effectiveness of adding more threads until a local maxima is reached. C) How do these numbers change if the commands are run on one of the cluster nodes instead of a client? I have some ideas of smarter things to try, but I am at best an inexperienced (if enthusiastic) dabbler in the programming arts. A professional would likely do a much better job. But if this data looks at all interesting or useful, then maybe there would be a call for a handful of cluster-specific filesystem tools? Michael Peek -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140416/70e34a1d/attachment.html>
Joe Julian
2014-Apr-16 14:31 UTC
[Gluster-users] Would there be a use for cluster-specific filesystem tools?
Excellent! I've been toying with the same concept in the back of my mind for a long while now. I'm sure there is an unrealized desire for such tools. When your ready, please put such a toolset on forge.gluster.org. On April 16, 2014 6:50:48 AM PDT, Michael Peek <peek at nimbios.org> wrote:>Hi guys, > >(I'm new to this, so pardon me if my shenanigans turns out to be a >waste >of your time.) > >I have been experimenting with Gluster by copying and deleting large >numbers of files of all sizes. What I found was that when deleting a >large number of small files, the deletion process seems to take a good >chunk of my time -- in some cases it seemed to take a significant >percentage of the time that it took to copy the files to the cluster to >begin with. I'm guessing that the reason is a combination of find and >rm -fr processing files serially and having to wait on the packets to >travel back and forth over the network. But with a clustering >filesystem, the bottleneck is processing files serially and waiting for >network packets when you don't have to. > >So I decided to try an experiment. Instead of using /bin/rm to delete >files serially, I wrote my own quick-and-dirty recursive rm (and >recursive ls) that uses pthreads (listed as "cluster-rm" and >"cluster-ls" in the table below): > >Methods: > >1) This was done on a Linux system. I suspect that Linux (or any >modern >OS) caches filesystem information. For example, after setting up a >directory, when running rm -fr on that directory, the time for rm to >complete is lessened if I first run find on the same directory. So to >avoid this caching effect, each command was run on it's own test >directory. (I.e. find was never run on the same directory as rm -fr or >cluster-rm.) This approach seemed to prevent inconsistencies resulting >from any caching behavior, resulting in run times that were more >consistent. > >2) Each test directory contained the exact same data for each of the >four commands tested (find, cluster-ls, rm, cluster-rm) for each test >run. > >3) All commands were run on a client machine and not one of the cluster >nodes. > >Results: > >_*Data Size*_ > _*Command*_ > _*Test #1*_ > _*Test #2*_ > _*Test #3*_ > _*Test #4*_ >49GB > find -print > real 6m45.066s >user 0m0.172s >sys 0m0.748s > real 6m18.524s >user 0m0.140s >sys 0m0.508s > real 5m45.301s >user 0m0.156s >sys 0m0.484s > real 5m58.577s >user 0m0.132s >sys 0m0.480s > > cluster-ls > real 2m32.770s >user 0m0.208s >sys 0m1.876s > real 2m21.376s >user 0m0.164s >sys 0m1.568s > real 2m40.511s >user 0m0.184s >sys 0m1.488s > real 2m36.202s >user 0m0.172s >sys 0m1.412s > > > > > > >49GB > rm -fr > real 16m36.264s >user 0m0.232s >sys 0m1.724s > real 16m16.795s >user 0m0.248s >sys 0m1.528s > real 15m54.503s >user 0m0.204s >sys 0m1.396s > real 16m10.037s >user 0m0.168s >sys 0m1.448s > > cluster-rm > real 1m50.717s >user 0m0.236s >sys 0m1.820s > real 1m44.803s >user 0m0.192s >sys 0m2.100s > real 2m6.250s >user 0m0.224s >sys 0m2.200s > real 2m6.367s >user 0m0.224s >sys 0m2.316s > > > > > > >97GB > find -print > real 11m39.990s >user 0m0.380s >sys 0m1.428s > real 11m21.018s >user 0m0.380s >sys 0m1.224s > real 11m33.257s >user 0m0.288s >sys 0m0.924s > real 11m4.867s >user 0m0.332s >sys 0m1.244s > > cluster-ls > real 4m46.829s >user 0m0.504s >sys 0m3.228s > real 5m15.538s >user 0m0.408s >sys 0m3.736s > real 4m52.075s >user 0m0.364s >sys 0m3.004s > real 4m43.134s >user 0m0.452s >sys 0m3.140s > > > > > > >97GB > rm -fr > real 29m34.138s >user 0m0.520s >sys 0m3.908s > real 28m11.000s >user 0m0.556s >sys 0m3.480s > real 28m37.154s >user 0m0.412s >sys 0m2.756s > real 28m41.724s >user 0m0.380s >sys 0m4.184s > > cluster-rm > real 3m30.750s >user 0m0.524s >sys 0m4.932s > real 4m20.195s >user 0m0.456s >sys 0m5.316s > real 4m45.206s >user 0m0.444s >sys 0m4.584s > real 4m26.894s >user 0m0.436s >sys 0m4.732s > > > > > > >145GB > find -print > real 16m26.498s >user 0m0.520s >sys 0m2.244s > real 16m53.047s >user 0m0.596s >sys 0m1.740s > real 15m10.704s >user 0m0.364s >sys 0m1.748s > real 15m53.943s >user 0m0.456s >sys 0m1.764s > > cluster-ls > real 6m52.006s >user 0m0.644s >sys 0m5.664s > real 7m7.361s >user 0m0.804s >sys 0m5.432s > real 7m4.109s >user 0m0.652s >sys 0m4.800s > real 6m37.229s >user 0m0.656s >sys 0m4.652s > > > > > > >145GB > rm -fr > real 40m10.396s >user 0m0.624s >sys 0m5.492s > real 42m17.851s >user 0m0.844s >sys 0m4.872s > real 39m6.493s >user 0m0.484s >sys 0m4.868s > real 39m52.047s >user 0m0.496s >sys 0m4.980s > > cluster-rm > real 6m49.769s >user 0m0.708s >sys 0m6.440s > real 8m34.644s >user 0m0.852s >sys 0m8.345s > real 6m3.563s >user 0m0.636s >sys 0m5.844s > real 6m31.808s >user 0m0.664s >sys 0m5.996s > > > > > > >1.1TB > find -print real 62m4.043s >user 0m1.300s >sys 0m5.448s > real 61m11.584s >user 0m1.204s >sys 0m5.172s > real 65m37.389s >user 0m1.708s >sys 0m4.276s > real 63m51.822s >user 0m3.096s >sys 0m9.869s > > cluster-ls > real 73m12.463s >user 0m2.472s >sys 0m19.289s > real 68m37.846s >user 0m2.080s >sys 0m18.625s > real 72m56.417s >user 0m2.516s >sys 0m18.601s > real 69m3.575s >user 0m4.316s >sys 0m35.986s > > > > > > >1.1TB > rm -fr > real 188m1.925s >user 0m2.240s >sys 0m21.705s > real 190m21.850s >user 0m2.372s >sys 0m18.885s > real 200m25.712s >user 0m5.840s >sys 0m46.363s > real 196m12.686s >user 0m4.916s >sys 0m41.519s > > cluster-rm > real 85m46.463s >user 0m2.512s >sys 0m30.478s > real 90m29.055s >user 0m2.600s >sys 0m30.382s > real 88m16.063s >user 0m4.456s >sys 0m51.667 > real 77m42.096s >user 0m2.464s >sys 0m31.638s > > > >Conclusions: > >1) Once I had a threaded version of rm, a threaded version of ls was >easy to make, so I included it in the tests (listed above as >cluster-ls). Performance looked spiffy up until the 1.1TB range, when >cluster-ls started taking more time than find. Right now I can't >explain why. 1.1TB takes a long time to set up and process (about a >day >for each set of four commands), it could be that regular nightly >backups >might be interfering with performance. If that's the case, then it >calls into question the usefulness of my threaded approach. Also, >naturally the output from cluster-ls is out of order, so grep and sed >would most likely be used in conjunction with something like that, and >I >haven't yet time-tested 'cluster-ls | some-other-command' against using >plain old find by itself. > >2) Results from cluster-rm look pretty good to me across the board. >Again, performance seems to fall off in the 1.1TB tests, and the >reasons >are not clear to me at this time, but performance is still half that of >rm -fr. Run times fluctuate more than in the previous tests, but I >suppose that's to be expected. But since performance does drop, it >makes me wonder how well this approach scales on larger sets of data. > >3) My threaded cluster-rm/ls commands are not clever. While traversing >directories, any subdirectories found would result in a new thread to >process it, up until some hard-coded limit is reached (for the above >results, 100 threads were used). After the thread count limit is >reached, directories are processed using plain old recursion until a >thread exits, freeing up a thread to process another subdirectory. > >Further Research: > >A) I would like to test further with larger data sets. > >B) I would like to implement a smarter algorithm for determining how >many threads to use to maximize performance. Rather than a hard-coded >maximum, a better approach might be to use some metric for measuring >number of inodes processed per second, and use that to determine the >effectiveness of adding more threads until a local maxima is reached. > >C) How do these numbers change if the commands are run on one of the >cluster nodes instead of a client? > >I have some ideas of smarter things to try, but I am at best an >inexperienced (if enthusiastic) dabbler in the programming arts. A >professional would likely do a much better job. > >But if this data looks at all interesting or useful, then maybe there >would be a call for a handful of cluster-specific filesystem tools? > >Michael Peek > > > >------------------------------------------------------------------------ > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://supercolony.gluster.org/mailman/listinfo/gluster-users-- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140416/0dc63763/attachment.html>
Justin Clift
2014-Apr-17 14:26 UTC
[Gluster-users] Would there be a use for cluster-specific filesystem tools?
On 16/04/2014, at 2:50 PM, Michael Peek wrote: <snip>> I have some ideas of smarter things to try, but I am at best an inexperienced (if enthusiastic) dabbler in the programming arts. A professional would likely do a much better job.Don't be stressed about coding standard yet. Just do the best you can, as it sounds like it'll be useful regardless of perfection level. :) More experienced coders can turn up to help out later on, as it gets used... :)> But if this data looks at all interesting or useful, then maybe there would be a call for a handful of cluster-specific filesystem tools?As Joe mentioned, there's definitely Community interest in it. :) As an additional thought, would you be interested in turning your email about this into a blog post for the Gluster site, or maybe creating a reference Howto/article type thing? (optimal combo would be having a project created on the Gluster Forge with your initial tools, + reference Howto article + blog post for attention) :) Regards and best wishes, Justin Clift -- Open Source and Standards @ Red Hat twitter.com/realjustinclift
Michael Peek
2014-May-05 13:46 UTC
[Gluster-users] Would there be a use for cluster-specific filesystem tools?
Okay, so interest seems to be there. What tools would be useful? So far my list consists of: 1) du -sk or -s --si 2) rm -fr 3) find (or at least find -print) What else would you add to this list? What things do you do with your cluster that you think might benefit from with this approach? I can't promise to be able to reproduce the full complexity of these tools in my limited time, but I'd like to get something useful out there. Michael On 04/17/2014 10:26 AM, Justin Clift wrote:> On 16/04/2014, at 2:50 PM, Michael Peek wrote: > <snip> >> I have some ideas of smarter things to try, but I am at best an inexperienced (if enthusiastic) dabbler in the programming arts. A professional would likely do a much better job. > Don't be stressed about coding standard yet. Just do the best you can, > as it sounds like it'll be useful regardless of perfection level. :) > > More experienced coders can turn up to help out later on, as it gets > used... :) > > >> But if this data looks at all interesting or useful, then maybe there would be a call for a handful of cluster-specific filesystem tools? > As Joe mentioned, there's definitely Community interest in it. :) > > As an additional thought, would you be interested in turning your > email about this into a blog post for the Gluster site, or maybe > creating a reference Howto/article type thing? (optimal combo > would be having a project created on the Gluster Forge with your > initial tools, + reference Howto article + blog post for attention) > > :) > > Regards and best wishes, > > Justin Clift > > -- > Open Source and Standards @ Red Hat > > twitter.com/realjustinclift >