thr3ads.net - Lustre discuss - [Lustre-discuss] Parallel-copy utilities?? [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Clauser, Milton

2006-Sep-14 12:37 UTC

[Lustre-discuss] Parallel-copy utilities??

Our clusters have more than one file system, and we occasionally need to
copy large amounts (terabytes) of data from one file system to another.
A parallel copy utility (that runs in parallel on multiple nodes of the
cluster) could be helpful, and I''m wondering if you folks have any
recommendations.

Milt Clauser
Sandia Labs

John D. Leidel

2006-Sep-14 12:52 UTC

head link

[Lustre-discuss] Parallel-copy utilities??

Try GridFTP from the Globus project
... they have a client called UberFTP that supports the protocol. 

Cheers
john 

http://dims.ncsa.uiuc.edu/set/uberftp/

On Thu, 2006-09-14 at 12:37 -0600, Clauser, Milton
wrote:> Our clusters have more than one file system, and we occasionally need to
> copy large amounts (terabytes) of data from one file system to another.
> A parallel copy utility (that runs in parallel on multiple nodes of the
> cluster) could be helpful, and I''m wondering if you folks have any
> recommendations.
> 
> Milt Clauser
> Sandia Labs
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

Clauser, Milton

2006-Sep-17 11:33 UTC

head link

[Lustre-discuss] Parallel-copy utilities??

John, Thanks for the pointer, but apparently my question wasn''t clear:
I''m looking for a parallel cp utility rather than a parallel ftp.
Rather than transferring data between file systems on separate clusters,
I want to copy large sets of data from one file system to another file
system in the same cluster, or make a second copy of the data on the
same file system.  Both file systems are mounted on all nodes of the
cluster, so all the nodes could be used in parallel to copy the data.

Thanks,
Milt

-----Original Message-----
From: John D. Leidel [mailto:john_d_leidel@raytheon.com] 
Sent: Thursday, September 14, 2006 12:47 PM
To: Clauser, Milton
Cc: lustre-discuss@clusterfs.com
Subject: Re: [Lustre-discuss] Parallel-copy utilities??

Try GridFTP from the Globus project
... they have a client called UberFTP that supports the protocol. 

Cheers
john 

http://dims.ncsa.uiuc.edu/set/uberftp/

On Thu, 2006-09-14 at 12:37 -0600, Clauser, Milton
wrote:> Our clusters have more than one file system, and we occasionally need 
> to copy large amounts (terabytes) of data from one file system to
another.> A parallel copy utility (that runs in parallel on multiple nodes of 
> the
> cluster) could be helpful, and I''m wondering if you folks have any
> recommendations.
> 
> Milt Clauser
> Sandia Labs
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

Peter J. Braam

2006-Sep-17 12:05 UTC

head link

[Lustre-discuss] Parallel-copy utilities??

Hi Milt,

How many files and what total volume of data would you like to copy, and
roughly how often?  How are the files that you wish to copy selected?

We are talking with someone else about this kind of problem for backup
purposes.

- Peter - 
> -----Original Message-----
> From: lustre-discuss-bounces@clusterfs.com 
> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of 
> Clauser, Milton
> Sent: Sunday, September 17, 2006 11:33 AM
> To: john_d_leidel@raytheon.com
> Cc: lustre-discuss@clusterfs.com
> Subject: RE: [Lustre-discuss] Parallel-copy utilities??
> 
>  John, Thanks for the pointer, but apparently my question 
> wasn''t clear:
> I''m looking for a parallel cp utility rather than a parallel ftp.
> Rather than transferring data between file systems on 
> separate clusters, I want to copy large sets of data from one 
> file system to another file system in the same cluster, or 
> make a second copy of the data on the same file system.  Both 
> file systems are mounted on all nodes of the cluster, so all 
> the nodes could be used in parallel to copy the data.
> 
> Thanks,
> Milt
> 
> -----Original Message-----
> From: John D. Leidel [mailto:john_d_leidel@raytheon.com]
> Sent: Thursday, September 14, 2006 12:47 PM
> To: Clauser, Milton
> Cc: lustre-discuss@clusterfs.com
> Subject: Re: [Lustre-discuss] Parallel-copy utilities??
> 
> Try GridFTP from the Globus project
> ... they have a client called UberFTP that supports the protocol. 
> 
> Cheers
> john 
> 
> http://dims.ncsa.uiuc.edu/set/uberftp/
> 
> On Thu, 2006-09-14 at 12:37 -0600, Clauser, Milton wrote:
> > Our clusters have more than one file system, and we 
> occasionally need 
> > to copy large amounts (terabytes) of data from one file system to
> another.
> > A parallel copy utility (that runs in parallel on multiple nodes of 
> > the
> > cluster) could be helpful, and I''m wondering if you folks
have any
> > recommendations.
> > 
> > Milt Clauser
> > Sandia Labs
> > 
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss@clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > 
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Clauser, Milton

2006-Sep-17 14:28 UTC

head link

[Lustre-discuss] Parallel-copy utilities??

Hi Peter,

Most immediately, I''m looking for an easy way for users to copy their
large datasets.  I suppose this could occur as often as a few times a
week.  Typically a dataset consists of many files, rather than a single
large file. To the extent that there''s a typical dataset, I would say,
very roughly, 1000 files and 1 TB total volume of data.  But I''ve seen
much larger datasets.  (However, I don''t think I want to make it easy
for a user to make multiple copies of a 100-TB dataset!)  And of course,
datasets can be much smaller, but we don''t need a parallel copy for
small ones.  Using wildcards (e.g. xyz*) to select files would be
desirable, though copying all the files in a directory and
subdirectories would meet many immediate needs.

Milt

-----Original Message-----
From: Peter J. Braam [mailto:braam@clusterfs.com] 
Sent: Sunday, September 17, 2006 12:05 PM
To: Clauser, Milton; john_d_leidel@raytheon.com
Cc: lustre-discuss@clusterfs.com
Subject: RE: [Lustre-discuss] Parallel-copy utilities??

Hi Milt,

How many files and what total volume of data would you like to copy, and
roughly how often?  How are the files that you wish to copy selected?

We are talking with someone else about this kind of problem for backup
purposes.

- Peter - 
> -----Original Message-----
> From: lustre-discuss-bounces@clusterfs.com
> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Clauser, 
> Milton
> Sent: Sunday, September 17, 2006 11:33 AM
> To: john_d_leidel@raytheon.com
> Cc: lustre-discuss@clusterfs.com
> Subject: RE: [Lustre-discuss] Parallel-copy utilities??
> 
>  John, Thanks for the pointer, but apparently my question wasn''t 
> clear:
> I''m looking for a parallel cp utility rather than a parallel ftp.
> Rather than transferring data between file systems on separate 
> clusters, I want to copy large sets of data from one file system to 
> another file system in the same cluster, or make a second copy of the 
> data on the same file system.  Both file systems are mounted on all 
> nodes of the cluster, so all the nodes could be used in parallel to 
> copy the data.
> 
> Thanks,
> Milt
> 
> -----Original Message-----
> From: John D. Leidel [mailto:john_d_leidel@raytheon.com]
> Sent: Thursday, September 14, 2006 12:47 PM
> To: Clauser, Milton
> Cc: lustre-discuss@clusterfs.com
> Subject: Re: [Lustre-discuss] Parallel-copy utilities??
> 
> Try GridFTP from the Globus project
> ... they have a client called UberFTP that supports the protocol. 
> 
> Cheers
> john
> 
> http://dims.ncsa.uiuc.edu/set/uberftp/
> 
> On Thu, 2006-09-14 at 12:37 -0600, Clauser, Milton wrote:
> > Our clusters have more than one file system, and we
> occasionally need
> > to copy large amounts (terabytes) of data from one file system to
> another.
> > A parallel copy utility (that runs in parallel on multiple nodes of 
> > the
> > cluster) could be helpful, and I''m wondering if you folks
have any
> > recommendations.
> > 
> > Milt Clauser
> > Sandia Labs
> > 
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss@clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > 
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Robert Latham

2006-Sep-19 02:30 UTC

head link

[Lustre-discuss] Parallel-copy utilities??

On Thu, Sep 14, 2006 at 12:37:05PM -0600, Clauser, Milton
wrote:> Our clusters have more than one file system, and we occasionally need to
> copy large amounts (terabytes) of data from one file system to another.
> A parallel copy utility (that runs in parallel on multiple nodes of the
> cluster) could be helpful, and I''m wondering if you folks have any
> recommendations.
It''s not too hard to write one in MPI-IO.  Even a naive approach
should get you better improvements than standard cp(1):

MPI_File_open() /* src */
MPI_File_open() /* dest */

until done
	MPI_Fil_read_at() /* src */
	MPI_File_write_at(dest) /* dest */

MPI_File_close() /* src */
MPI_File_close() /* dest */

There are some other MPI tricks you can play but this
simple approach probably will get you acceptable performance.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA                B29D F333 664A 4280 315B

Marty Barnaby

2006-Sep-19 07:07 UTC

head link

[Lustre-discuss] Parallel-copy utilities??

Do you see any tricks in the MPI controlled job for distribution of the 
list of files across the processes. If the range of filesizes is 
constrained, and the data coping can be trusted to not suffer anomalous 
failures, simply running -np 10, and having proc 0 cp *0, proc 1 cp *1, 
etc..., would work. Probably 95% of our data-set files have this name 
convention--the last character will be [0-9]. However, our individual 
file sizes can normally range from 1 MB to 1 GB.

I am almost done with C-coding for a special binary, and a driver script 
that will generate a shareable list of the files to be copied, then, 
with locking, next-available distribution of the list entries over N 
cp-type processes, all the files get copied. This also offers some added 
reliability, since, if a copy does not succeed, because a node crashes, 
or looses it''s FS mount, or for whatever reason, the item remains on
the
list, unlocked, so another process can take it. I already have an 
utility like this working for the case where the two FS are not both 
mounted to the cluster, so that, instead, a network-data-copy operation 
must be employed. To accomplish this, I am using my own MPSCP:

http://www.sandia.gov/MPSCP/mpscp_design.htm

With it, I can cheat and add my own table look-up, lock, and remove or 
unlock routine, for each process to get it''s next target file.

To control the cluster job launch of this many, arbitrarily-sized, 
load-balanced, parallel-distribution, file-copy utility, I use ssh. This 
mainly because, in our case, the high-performance external network 
nodes, while on the backplane interconnect, are not part of the 
computing partition, and the nodes of the computing partition do not 
have external IP interfaces.

Is this convoluted enough? The last trick is to verify, at least with 
respect to name and ls reported size, that all the files got copied. So 
far, after a user went for days running a single series of cp''s--a
total
of 6 TB, in 54 thousand files--I found seven that either never made it, 
or were short on the byte-count.

Marty Barnaby

robl@mcs.anl.gov wrote:> On Thu, Sep 14, 2006 at 12:37:05PM -0600, Clauser, Milton wrote:
>   
>> Our clusters have more than one file system, and we occasionally need
to
>> copy large amounts (terabytes) of data from one file system to another.
>> A parallel copy utility (that runs in parallel on multiple nodes of the
>> cluster) could be helpful, and I''m wondering if you folks have
any
>> recommendations.
>>     
>
> It''s not too hard to write one in MPI-IO.  Even a naive approach
> should get you better improvements than standard cp(1):
>
> MPI_File_open() /* src */
> MPI_File_open() /* dest */
>
> until done
> 	MPI_Fil_read_at() /* src */
> 	MPI_File_write_at(dest) /* dest */
>
> MPI_File_close() /* src */
> MPI_File_close() /* dest */
>
> There are some other MPI tricks you can play but this
> simple approach probably will get you acceptable performance.
>
> ==rob
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060919/edd15416/attachment.html

Peter J. Braam

2006-Sep-19 23:41 UTC

head link

[Lustre-discuss] Parallel-copy utilities??

Hi 
 
We are looking for something really similar, but perhaps with a few small
differences.  
 
For backups, for example to other file systems, we would have special
(scalable) software prepare a list of files and directories which we would
like to split up and distribute to multiple Lustre clients which execute the
copies in parallel.  Taking into account file size is attractive.  I
don''t
think we foresee a need to synchronize a single file in parallel for most of
our commercial customers.
 
But some of the entries in the list would be directory files which we would
like to synchronize with a different program like rsync to achieve, for
example recursive deletions or directory creations etc.  Could that easily
be done with your software? 
 
May I ask what makes your solution more attractive than running something
like a pdsh command?  Is it the ability for parallel copies on a single
file?
 
Thanks for this interesting email!
 
- Peter -
 


  _____  

From: lustre-discuss-bounces@clusterfs.com
[mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Marty Barnaby
Sent: Tuesday, September 19, 2006 7:10 AM
To: lustre-discuss@clusterfs.com
Cc: Noe, John P; CJPAVLA@sandia.gov
Subject: Re: [Lustre-discuss] Parallel-copy utilities??


Do you see any tricks in the MPI controlled job for distribution of the list
of files across the processes. If the range of filesizes is constrained, and
the data coping can be trusted to not suffer anomalous failures, simply
running -np 10, and having proc 0 cp *0, proc 1 cp *1, etc..., would work.
Probably 95% of our data-set files have this name convention--the last
character will be [0-9]. However, our individual file sizes can normally
range from 1 MB to 1 GB.

I am almost done with C-coding for a special binary, and a driver script
that will generate a shareable list of the files to be copied, then, with
locking, next-available distribution of the list entries over N cp-type
processes, all the files get copied. This also offers some added
reliability, since, if a copy does not succeed, because a node crashes, or
looses it''s FS mount, or for whatever reason, the item remains on the
list,
unlocked, so another process can take it. I already have an utility like
this working for the case where the two FS are not both mounted to the
cluster, so that, instead, a network-data-copy operation must be employed.
To accomplish this, I am using my own MPSCP:

http://www.sandia.gov/MPSCP/mpscp_design.htm

With it, I can cheat and add my own table look-up, lock, and remove or
unlock routine, for each process to get it''s next target file.

To control the cluster job launch of this many, arbitrarily-sized,
load-balanced, parallel-distribution, file-copy utility, I use ssh. This
mainly because, in our case, the high-performance external network nodes,
while on the backplane interconnect, are not part of the computing
partition, and the nodes of the computing partition do not have external IP
interfaces.

Is this convoluted enough? The last trick is to verify, at least with
respect to name and ls reported size, that all the files got copied. So far,
after a user went for days running a single series of cp''s--a total of
6 TB,
in 54 thousand files--I found seven that either never made it, or were short
on the byte-count.

Marty Barnaby




robl@mcs.anl.gov wrote: 

On Thu, Sep 14, 2006 at 12:37:05PM -0600, Clauser, Milton wrote:

  

Our clusters have more than one file system, and we occasionally need to

copy large amounts (terabytes) of data from one file system to another.

A parallel copy utility (that runs in parallel on multiple nodes of the

cluster) could be helpful, and I''m wondering if you folks have any

recommendations.

    



It''s not too hard to write one in MPI-IO.  Even a naive approach

should get you better improvements than standard cp(1):



MPI_File_open() /* src */

MPI_File_open() /* dest */



until done

	MPI_Fil_read_at() /* src */

	MPI_File_write_at(dest) /* dest */



MPI_File_close() /* src */

MPI_File_close() /* dest */



There are some other MPI tricks you can play but this

simple approach probably will get you acceptable performance.



==rob



  


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060919/e9d40864/attachment.html

Stephen Simms

2006-Sep-21 10:32 UTC

head link

[Lustre-discuss] rpm install on 2.4 i686

Am I the only one that is having the following error trying to install the 
latest redhat 2.4 lustre rpm?

# rpm -ivh 
lustre-1.4.7-2.4.21_40.EL_lustre.1.4.7smp.i686.rpm 
error: Failed dependencies:
        libblkid.so.1 is needed by lustre-1.4.7-2.4.21_40.EL_lustre.1.4.7smp
#

Thanks!
Stephen Simms

Andreas Dilger

2006-Sep-21 18:34 UTC

head link

[Lustre-discuss] rpm install on 2.4 i686

On Sep 21, 2006  11:30 -0500, Stephen Simms wrote:> Am I the only one that is having the following error trying to install the 
> latest redhat 2.4 lustre rpm?
> 
> # rpm -ivh 
> lustre-1.4.7-2.4.21_40.EL_lustre.1.4.7smp.i686.rpm 
> error: Failed dependencies:
>         libblkid.so.1 is needed by
lustre-1.4.7-2.4.21_40.EL_lustre.1.4.7smp
You need to install the CFS e2fsprogs, or at least any e2fsprogs > 1.36.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Lustre discuss - Sep 2006 - Parallel-copy utilities??

[Lustre-discuss] Parallel-copy utilities??

[Lustre-discuss] Parallel-copy utilities??

[Lustre-discuss] Parallel-copy utilities??

[Lustre-discuss] Parallel-copy utilities??

[Lustre-discuss] Parallel-copy utilities??

[Lustre-discuss] Parallel-copy utilities??

[Lustre-discuss] Parallel-copy utilities??

[Lustre-discuss] Parallel-copy utilities??

[Lustre-discuss] rpm install on 2.4 i686

[Lustre-discuss] rpm install on 2.4 i686