I reckon that to quickly copy one glusterfs volume to another, I will need a multi-threaded 'cp'. That is, something which will take the list of files from readdir() and copy batches of N of them in parallel. This is so I can keep all the component spindles busy. Question 1: does such a thing existing already in the open source world? Question 2: for a DHT volume, does readdir() return the files in a round-robin fashion, i.e. one from brick 1, one from brick 2, one from brick 3 etc? Or does it return all the results from one brick, followed by all the results from the second brick, and so on? Or something indeterminate? Alternatively: is it possible to determine for each file which brick it resides on? (I don't think it's in an extended attribute; I tried 'getfattr -d' on a file, both on the GlusterFS mount and on the underlying brick, and couldn't see anything) Thanks, Brian. P.S. I did look in the source, and I couldn't figure out how dht_do_readdir works. But it does have a slightly disconcerting comment: /* TODO: do proper readdir */
On Sun, Feb 5, 2012 at 4:41 AM, Brian Candler <B.Candler at pobox.com> wrote:> I reckon that to quickly copy one glusterfs volume to another, I will need a > multi-threaded 'cp'. ?That is, something which will take the list of files > from readdir() and copy batches of N of them in parallel. ?This is so I can > keep all the component spindles busy. > > Question 1: does such a thing existing already in the open source world?Not aware of one. Please post to this thread if you find one.> Question 2: for a DHT volume, does readdir() return the files in a > round-robin fashion, i.e. one from brick 1, one from brick 2, one from brick > 3 etc? Or does it return all the results from one brick, followed by all the > results from the second brick, and so on? Or something indeterminate?It returns all entries from the first brick, and only non-directories from the second, so on.. (sequentially)> Alternatively: is it possible to determine for each file which brick it > resides on?Yes, there is the virtual extended attribute "trusted.glusterfs.pathinfo" which gives you the location (hostname) of a file.> (I don't think it's in an extended attribute; I tried 'getfattr -d' on a > file, both on the GlusterFS mount and on the underlying brick, and couldn't > see anything) > > Thanks, > > Brian. > > P.S. I did look in the source, and I couldn't figure out how dht_do_readdir > works. ?But it does have a slightly disconcerting comment: > > /* TODO: do proper readdir */That comment is only for corner cases when the backend filesystem is inconsistent. It is not relevant to the algorithm you were enquiring. Avati
Don' t you run your bricks in replication mode? So you do not have to copy anything by hand or batch. Ex: gluster volume create yourvol replica 2 transport tcp xxx.xxx.xxx.xxx:/glusterfs/export yyy.yyy.yyy.yyy:/glusterfs/export ----------------------------------------------- EDV Daniel M?ller Leitung EDV Tropenklinik Paul-Lechler-Krankenhaus Paul-Lechler-Str. 24 72076 T?bingen Tel.: 07071/206-463, Fax: 07071/206-499 eMail: mueller at tropenklinik.de Internet: www.tropenklinik.de ----------------------------------------------- -----Urspr?ngliche Nachricht----- Von: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] Im Auftrag von Brian Candler Gesendet: Sonntag, 5. Februar 2012 00:12 An: gluster-users at gluster.org Betreff: [Gluster-users] Parallel cp? I reckon that to quickly copy one glusterfs volume to another, I will need a multi-threaded 'cp'. That is, something which will take the list of files from readdir() and copy batches of N of them in parallel. This is so I can keep all the component spindles busy. Question 1: does such a thing existing already in the open source world? Question 2: for a DHT volume, does readdir() return the files in a round-robin fashion, i.e. one from brick 1, one from brick 2, one from brick 3 etc? Or does it return all the results from one brick, followed by all the results from the second brick, and so on? Or something indeterminate? Alternatively: is it possible to determine for each file which brick it resides on? (I don't think it's in an extended attribute; I tried 'getfattr -d' on a file, both on the GlusterFS mount and on the underlying brick, and couldn't see anything) Thanks, Brian. P.S. I did look in the source, and I couldn't figure out how dht_do_readdir works. But it does have a slightly disconcerting comment: /* TODO: do proper readdir */ _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users