thr3ads.net - Lustre devel - [Lustre-devel] SAM-QFS, ADM, and Lustre HSM [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Nathaniel Rutman

2009-Jan-22 20:46 UTC

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

(adding lustre-devel, dropping Bojanic from distro list; if anyone else 
wants off, let me know.)

Hua Huang and Andreas wrote:>
> Nathan,
>
> Thanks for the write-up.  A few questions and comments.
>
> SAM-QFS only runs on Solaris,  so it is always
> remotely mounted on Lustre client via network connection,
> right?QFS has a Linux native client 
(http://www.sun.com/download/products.xml?id=4429c1d1).
So the copy nodes would be linux nodes acting as clients for both Lustre 
and QFS.  This would generally result in two network hops for the data, 
but by placing the clients on OST nodes and having the coordinator 
choose wisely, we can probably save one of the network hops most of the 
time.  This may or may not be a good idea, depending on the load imposed 
on the OST.  The copytool would also require us to pump the data from 
kernel to userspace and back, potentially resulting in significant bus 
loading.  We could memory map the Lustre side
>
>
> Nathaniel Rutman wrote:
>> Hi all -
>> So we all have a common starting point, I''m going to jump
right in
>> and describe the current plan for integrating Lustre''s HSM
feature
>> (in development) with SAM-QFS and ADM.
>>
>> HSM for Lustre can be broken into two major components, both of which 
>> will live in userspace: the policy engine, which decides when files 
>> are archived (copy to (logical) tape), punched (removed from OSTs), 
>> or deleted; and the copytool, which moves file data to and from 
>> tape.  A third component that we call the coordinator lives in kernel 
>> space and is responsible for relaying HSM requests to various client 
>> nodes.
> s/tape/the archive/ yes, I knew my "(logical) tape" statement needed to be clarified :)
>
>>
>> The policy engine collects filesystem info, maintains a database of 
>> files it is interested in, and makes archive and punch decisions that 
>> are then communicated back to Lustre.  Note that the database is only 
>> used to make policy decisions, and is specifically _not_ a database 
>> of file/storage location information.  Periodically, the policy 
>> engine give a list of file identifiers and operations (via the 
>> coordinator) to any number of Lustre clients running copytools.
> This work will be done by CEA as part of the HPSS HSM solution.
> This work is generic in the sense that it could be SAM-QFS or any
> other tape backend on the remote side for archival, right?Yes.  The issue here is that the policy engine is a big part of
"brains"
of the HSM, and could be a key differentiator for customers.  That''s
why
the ADM integration would likely replace the HPSS policy engine with 
ADM''s Event Manager -- presumably we''ll be able to get
enhanced features
by doing this.  The actual benefits need to be
investigated.> Is it expected that a given copytool would be given multiple files to
> archive at one time?  This would allow optimizing the archiving operations
> to e.g. aggregate small files into a single archive object, but would
> make identifying and extracting these files from the aggregate harder.
>   I do expect the coordinator to hand a list of files to each copytool.   
But SAM-QFS would actually handle small file aggregation "underneath" 
the copytool itself; we don''t have to worry about
identification/extraction.
 >> The copytool will take the list of files and perform the requested 
>> operation: archive, delete, or restore.  (It is potentially possible 
>> to have finer-grained archive commands passed from the policy engine, 
>> e.g. archive_level_3.)  It will then copy the files off to 
>> tape/storage using whatever hardware/software specific commands are 
>> necessary.  Note that the file identifiers are opaque 16-byte 
>> strings.  Files are requested using the same identifiers; "paths
may
>> change, but the fids remain the same" is the basic philosophy. 
The
>> copytool may hash the fids into dirs/subdirs to relieve problems with 
>> a flat namespace, but this is invisible to Lustre.  Having said that, 
>> additional information such as the full path name, EAs, etc. may be 
>> added by the copytool (using a tar wrapper, for example), for 
>> disaster recovery or striping recovery.
>> The initial version of the copytool and policy engine will be written 
>> targeted for HPSS, but it is likely that the SAM-QFS integration will 
>> use the same pieces.  Perhaps calling it the "Lustre policy
engine"
>> would be more appropriate.
>
> So the initial version will be done by CEA as part of the HPSS.Part of the "HPSS-compatible Lustre HSM solution", which is our
initial
target, yes.>
> You mentioned other details above, which can be SAM_QFS specific?
> I am trying to figure out if the full-version of copy-tool used in
> Lustre/SAM_QFS integration will be implemented specifically for SAM-QFS
> from the Lustre side.There are two items that I can think of that may be archive-specific
1. hash the fids into dirs/subdirs to avoid a big flat namespace
2. inclusion of file extended attributes (EAs)
But in fact, I don''t know enough about HPSS to say we don''t
need these
items anyhow.  CEA, can you comment?
I think current versions of HPSS are able to store EAs automatically, 
and QFS is not, so that may be one difference.>
>>
>> Integration with SAM-QFS
>> The SAM policy engine is tightly tied directly to the QFS filesystem 
>> and for this reason it is not possible to replace the HPSS policy 
>> engine with SAM.  However, SAM policies could be layered in at the 
>> copytool level.  The split as we envision it is this: existing Lustre 
>> policy engine decides which and when files should be archived and 
>> punched, and SAM-QFS decides how and where to archive them. The 
>> copytool in this case 
>
> SAM-QFS already does all these, i.e,  "how and where".Yes.  SAM policies would likely have to be written without reference to 
specific filenames/directories, since that info will not be readily 
available.  If this proves to be performance-limiting (maybe certain 
file extensions (.mpg) should be stored in a different manner than 
another (.txt)), then we can probably find a way to pass the full 
pathname through to SAM, but this would require SAM code
changes.>
>> is simply the unix "cp" command (or perhaps tar as mentioned
above),
>> that copies the file from the Lustre mount point to the QFS mount 
>> point on one (of many) clients that has both filesystems mounted.  
>> SAM-QFS''s file staging and small-file aggregation (as well as 
>> parallel operation) would all be used "out of the box" to
provide the
>> best performance possible.
>
> The one thing that should be taken into account is that the files being
> moved from Lustre to SAM are losing the "age" information.  This
might
> cause SAM some heartburn because all of the files being added will be
> considered "new" but there will be a large enough influx of files
that
> it will need to archive and purge files within hours.
>
> It may be that the SAM copytool will need to be modified to allow it
> to pass on some "age" information (if that is something other
than
> atime and mtime) so the SAM policy engine can treat these files sensibly.
> Alternately, it may be that the SAM copytool will need to be smart enough
> to mark the new files as "archive & purge immediately" in
some manner.
>   We will just use cp -a to preserve timestamps, ownership, perms etc; I 
don''t see what any additional age info could be.  As to the heartburn 
problem, QFS has disk cache as the first level of archive; as that fills 
files are moved off to secondary automatically.  We can adjust these 
watermarks to aggressively move files off to tape.  If something backs 
up, the cp command will simply block.  It would be nice to have some 
visibility when this situation occurs, but in fact it''s not at all
clear
what we should do besides change our archiving policy.  This is a 
general issue, not QFS specific.
> Again, SAM-QFS already does all of these. Correct?
> So no code changes are expected at SAM-QFS side, right?Correct.  As I see it today, no SAM-QFS code changes are necessary, and 
the QFS copytool will likely be identical or almost identical to the 
HPSS copytool.>
> For Lustre/SAM-QFS integration, could you point out specifically
> which area (in this write-up) can be done by U.Minn students? I don''t actually see any work to be done at this point. 
There''s the
pathname pass-through potential, but I''m not convinced it''s at
all
necessary.>
>>
>> Integration with ADM
>> ADM''s event manager would replace the HPSS policy engine.  It
would
>> need some minor modifications to be integrated with the Lustre 
>> changelogs (instead of DMAPI) and ioctl interface to the 
>> coordinator.  It also produces a similar list of files and actions.  
>> The ADM core would be the copytool, consuming the list and sending 
>> files to tape.  We would also need a bit of work to pass 
>> communications between ADM''s Archive Information Manager and
the
>> policy engine and copytools.  ADM integration is dependent upon 
>> having a Linux ADM implementation, or a Solaris Lustre implementation 
>> (potentially Lustre client only).
>>
>> Feel free to question, correct, criticize.
>> Nathan
>>

Andreas Dilger

2009-Jan-22 22:55 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

On Jan 22, 2009  12:46 -0800, Nathaniel Rutman wrote:> QFS has a Linux native client  
> So the copy nodes would be linux nodes acting as clients for both Lustre  
> and QFS.  This would generally result in two network hops for the data,  
> but by placing the clients on OST nodes and having the coordinator  
> choose wisely, we can probably save one of the network hops most of the  
> time.  This may or may not be a good idea, depending on the load imposed  
> on the OST.  The copytool would also require us to pump the data from  
> kernel to userspace and back, potentially resulting in significant bus  
> loading.  We could memory map the Lustre side
I was just wondering to myself if we couldn''t make an optimized
"cp"
command that would work in the kernel and be able to use newer APIs
like "splice" or just a read-write loop that avoids kernel-user-kernel
data copies.  Unfortunately, I don''t think mmap IO is very fast with
Lustre, or memcpy() from mmap Lustre to mmap QFS would give us a single
memcpy() operation (which is the best I think we can do).
> There are two items that I can think of that may be archive-specific
> 1. hash the fids into dirs/subdirs to avoid a big flat namespace
> 2. inclusion of file extended attributes (EAs)
> But in fact, I don''t know enough about HPSS to say we
don''t need these
> items anyhow.  CEA, can you comment?
> I think current versions of HPSS are able to store EAs automatically,  
> and QFS is not, so that may be one difference.
I got a paper from CEA that indicated HPSS was going to (or may have
already) implemented EA support, but it isn''t at all clear if that
version of software would be available at all sites, since AFAIK it
is relatively new.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Harriet G. Coverston

2009-Jan-23 16:46 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Nathan,
On Jan 22, 2009, at 2:46 PM, Nathaniel Rutman wrote:>>>
>>> Integration with SAM-QFS
>>> The SAM policy engine is tightly tied directly to the QFS  
>>> filesystem and for this reason it is not possible to replace the  
>>> HPSS policy engine with SAM.  However, SAM policies could be  
>>> layered in at the copytool level.  The split as we envision it is  
>>> this: existing Lustre policy engine decides which and when files  
>>> should be archived and punched, and SAM-QFS decides how and where  
>>> to archive them. The copytool in this case
>>
>> SAM-QFS already does all these, i.e,  "how and where".
> Yes.  SAM policies would likely have to be written without reference  
> to specific filenames/directories, since that info will not be  
> readily available.  If this proves to be performance-limiting (maybe  
> certain file extensions (.mpg) should be stored in a different  
> manner than another (.txt)), then we can probably find a way to pass  
> the full pathname through to SAM, but this would require SAM code  
> changes.SAM supports classification policy rules for files  -- (1) number of  
copies, up to 4 (2) where to put the copies  on which vsn pools  -  
disk and/or tape, local and/or remote) (3) when to make the copies  
(time based archiving). You specify the policy in the archiver.cmd  
file. You can group files for a policy rule by pathname, owner, group,  
size, wildcard, and access time.>
>>
>>> is simply the unix "cp" command (or perhaps tar as
mentioned
>>> above), that copies the file from the Lustre mount point to the  
>>> QFS mount point on one (of many) clients that has both filesystems
>>> mounted.  SAM-QFS''s file staging and small-file
aggregation (as
>>> well as parallel operation) would all be used "out of the
box" to
>>> provide the best performance possible.
>>
>> The one thing that should be taken into account is that the files  
>> being
>> moved from Lustre to SAM are losing the "age" information. 
This
>> might
>> cause SAM some heartburn because all of the files being added will be
>> considered "new" but there will be a large enough influx of
files
>> that
>> it will need to archive and purge files within hours.
>>
>>
>> It may be that the SAM copytool will need to be modified to allow it
>> to pass on some "age" information (if that is something other
than
>> atime and mtime) so the SAM policy engine can treat these files  
>> sensibly.
>> Alternately, it may be that the SAM copytool will need to be smart  
>> enough
>> to mark the new files as "archive & purge immediately" in
some
>> manner.There is a option to release files from the disk cache after all  
archive copies have been made. You may want to set this in the  
archiver.cmd file. The releasing is done automatically. It depends on  
how you are going to use SAM. If it is just for backup, then, yes, set  
this. However in your mail above, you also are managing your disk  
cache. In this case, it will be faster to retrieve files that are in  
our disk cache.

This brings up the question of restore. In case of a Lustre disk  
failure, how are you going to restore
your Lustre file system?>>
>>
> We will just use cp -a to preserve timestamps, ownership, perms etc;  
> I don''t see what any additional age info could be.  As to the  
> heartburn problem, QFS has disk cache as the first level of archive;  
> as that fills files are moved off to secondary automatically.  We  
> can adjust these watermarks to aggressively move files off to tape.   
> If something backs up, the cp command will simply block.  It would  
> be nice to have some visibility when this situation occurs, but in  
> fact it''s not at all clear what we should do besides change our  
> archiving policy.  This is a general issue, not QFS specific.You will want to set your disk cache thresholds based on the rate of  
influx of data into the disk cache. We default to high 80%, low 70%  
which means when the disk cache reaches 80%, we release the oldest  
archived files until the disk caches reaches 70%. Some of our oil  
customers set the theshold to 60% - 50% because of the heavy influx.  
Of course, if SAM does reach 100%, we block the writers until we have  
space so this is transparent to the application.>
>
>> Again, SAM-QFS already does all of these. Correct?
>> So no code changes are expected at SAM-QFS side, right?
> Correct.  As I see it today, no SAM-QFS code changes are necessary,  
> and the QFS copytool will likely be identical or almost identical to  
> the HPSS copytool.Agree. I don''t see any SAM-QFS code changes required. The Lustre  
copytool will write to HPSS using the HPSS APIs and write to SAM-QFS  
with a ftp or pftp interface. This is minimum changes.>
>>
>> For Lustre/SAM-QFS integration, could you point out specifically
>> which area (in this write-up) can be done by U.Minn students?
> I don''t actually see any work to be done at this point. 
There''s the
> pathname pass-through potential, but I''m not convinced
it''s at all
> necessary.I do see work to switch the HPSS APIs to ftp or pftp. If this is  
already supported by HPSS, then, yes, no changes are required.

    - Harriet

Harriet G. Coverston
Solaris, Storage Software			 |  Email: harriet.coverston at sun.com
Sun Microsystems, Inc.                     	 |  AT&T:  651-554-1515
1270 Eagan Industrial Rd., Suite 160       |  Fax:   651-554-1540
Eagan, MN 55121-1231

Shipman, Galen M.

2009-Jan-23 17:39 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Looks like HPSS will support EA in 7.1.2.0, June 2009
I have asked Vicky here at ORNL to dig a bit into what the EA features will look
like.
Do we have a set of requirements for EAs for HSM integration?


- Galen 



-----Original Message-----
From: Andreas.Dilger at sun.com on behalf of Andreas Dilger
Sent: Thu 1/22/2009 5:55 PM
To: Nathaniel Rutman
Cc: Hua Huang; lustre-hsm-core-ext at sun.com; lustre-devel at lists.lustre.org;
Karen Jourdenais; Erica Dorenkamp; Harriet.Coverston at sun.com; Rick Matthews;
karl at tacc.utexas.edu
Subject: Re: SAM-QFS, ADM, and Lustre HSM
 
On Jan 22, 2009  12:46 -0800, Nathaniel Rutman wrote:> QFS has a Linux native client  
> So the copy nodes would be linux nodes acting as clients for both Lustre  
> and QFS.  This would generally result in two network hops for the data,  
> but by placing the clients on OST nodes and having the coordinator  
> choose wisely, we can probably save one of the network hops most of the  
> time.  This may or may not be a good idea, depending on the load imposed  
> on the OST.  The copytool would also require us to pump the data from  
> kernel to userspace and back, potentially resulting in significant bus  
> loading.  We could memory map the Lustre side
I was just wondering to myself if we couldn''t make an optimized
"cp"
command that would work in the kernel and be able to use newer APIs
like "splice" or just a read-write loop that avoids kernel-user-kernel
data copies.  Unfortunately, I don''t think mmap IO is very fast with
Lustre, or memcpy() from mmap Lustre to mmap QFS would give us a single
memcpy() operation (which is the best I think we can do).
> There are two items that I can think of that may be archive-specific
> 1. hash the fids into dirs/subdirs to avoid a big flat namespace
> 2. inclusion of file extended attributes (EAs)
> But in fact, I don''t know enough about HPSS to say we
don''t need these
> items anyhow.  CEA, can you comment?
> I think current versions of HPSS are able to store EAs automatically,  
> and QFS is not, so that may be one difference.
I got a paper from CEA that indicated HPSS was going to (or may have
already) implemented EA support, but it isn''t at all clear if that
version of software would be available at all sites, since AFAIK it
is relatively new.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Rick Matthews

2009-Jan-23 19:02 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Nathaniel and all,
  Thanks for putting this together.

  Having a mover to put data into QFS is a great idea, and can easily 
use the QFS Linux client. I don''t think
you would necessarily get QFS policy for native Lustre files unless the 
"moved" files retained the Lustre attributes,
from which you want policy decisions made. There may be ways to do this. 
You would automatically gain the
file gathering of QFS and its efficient tape handling. I also think 
there is a "archive then release (purge)"
policy that can be established. The applicable Lustre namespace would be 
essentially duplicated in the
QFS space, and (I think) QFS classification and policy occur on that 
name space. Doing so gives you
access to rich QFS policy. This also allows QFS to migrate data to/from 
archive media without I/O or
compute load on any Linux clients.

Nathaniel Rutman wrote:> (adding lustre-devel, dropping Bojanic from distro list; if anyone 
> else wants off, let me know.)
>
> Hua Huang and Andreas wrote:
>>
>> Nathan,
>>
>> Thanks for the write-up.  A few questions and comments.
>>
>> SAM-QFS only runs on Solaris,  so it is always
>> remotely mounted on Lustre client via network connection,
>> right?
> QFS has a Linux native client 
> (http://www.sun.com/download/products.xml?id=4429c1d1).
> So the copy nodes would be linux nodes acting as clients for both 
> Lustre and QFS.  This would generally result in two network hops for 
> the data, but by placing the clients on OST nodes and having the 
> coordinator choose wisely, we can probably save one of the network 
> hops most of the time.  This may or may not be a good idea, depending 
> on the load imposed on the OST.  The copytool would also require us to 
> pump the data from kernel to userspace and back, potentially resulting 
> in significant bus loading.  We could memory map the Lustre side
>
>>
>>
>> Nathaniel Rutman wrote:
>>> Hi all -
>>> So we all have a common starting point, I''m going to jump
right in
>>> and describe the current plan for integrating Lustre''s HSM
feature
>>> (in development) with SAM-QFS and ADM.
>>>
>>> HSM for Lustre can be broken into two major components, both of 
>>> which will live in userspace: the policy engine, which decides when
>>> files are archived (copy to (logical) tape), punched (removed from 
>>> OSTs), or deleted; and the copytool, which moves file data to and 
>>> from tape.  A third component that we call the coordinator lives in
>>> kernel space and is responsible for relaying HSM requests to
various
>>> client nodes.
>> s/tape/the archive/ 
> yes, I knew my "(logical) tape" statement needed to be clarified
:)
>>
>>>
>>> The policy engine collects filesystem info, maintains a database of
>>> files it is interested in, and makes archive and punch decisions 
>>> that are then communicated back to Lustre.  Note that the database 
>>> is only used to make policy decisions, and is specifically _not_ a 
>>> database of file/storage location information.  Periodically, the 
>>> policy engine give a list of file identifiers and operations (via 
>>> the coordinator) to any number of Lustre clients running copytools.
>> This work will be done by CEA as part of the HPSS HSM solution.
>> This work is generic in the sense that it could be SAM-QFS or any
>> other tape backend on the remote side for archival, right?
> Yes.  The issue here is that the policy engine is a big part of 
> "brains" of the HSM, and could be a key differentiator for
customers.
> That''s why the ADM integration would likely replace the HPSS
policy
> engine with ADM''s Event Manager -- presumably we''ll be
able to get
> enhanced features by doing this.  The actual benefits need to be 
> investigated.
>> Is it expected that a given copytool would be given multiple files to
>> archive at one time?  This would allow optimizing the archiving 
>> operations
>> to e.g. aggregate small files into a single archive object, but would
>> make identifying and extracting these files from the aggregate harder.
>>   
> I do expect the coordinator to hand a list of files to each 
> copytool.   But SAM-QFS would actually handle small file aggregation 
> "underneath" the copytool itself; we don''t have to worry
about
> identification/extraction.
>
>>> The copytool will take the list of files and perform the requested 
>>> operation: archive, delete, or restore.  (It is potentially
possible
>>> to have finer-grained archive commands passed from the policy 
>>> engine, e.g. archive_level_3.)  It will then copy the files off to 
>>> tape/storage using whatever hardware/software specific commands are
>>> necessary.  Note that the file identifiers are opaque 16-byte 
>>> strings.  Files are requested using the same identifiers;
"paths may
>>> change, but the fids remain the same" is the basic philosophy.
The
>>> copytool may hash the fids into dirs/subdirs to relieve problems 
>>> with a flat namespace, but this is invisible to Lustre.  Having
said
>>> that, additional information such as the full path name, EAs, etc. 
>>> may be added by the copytool (using a tar wrapper, for example),
for
>>> disaster recovery or striping recovery.
>>> The initial version of the copytool and policy engine will be 
>>> written targeted for HPSS, but it is likely that the SAM-QFS 
>>> integration will use the same pieces.  Perhaps calling it the 
>>> "Lustre policy engine" would be more appropriate.
>>
>> So the initial version will be done by CEA as part of the HPSS.
> Part of the "HPSS-compatible Lustre HSM solution", which is our 
> initial target, yes.
>>
>> You mentioned other details above, which can be SAM_QFS specific?
>> I am trying to figure out if the full-version of copy-tool used in
>> Lustre/SAM_QFS integration will be implemented specifically for SAM-QFS
>> from the Lustre side.
> There are two items that I can think of that may be archive-specific
> 1. hash the fids into dirs/subdirs to avoid a big flat namespace
> 2. inclusion of file extended attributes (EAs)
> But in fact, I don''t know enough about HPSS to say we
don''t need these
> items anyhow.  CEA, can you comment?
> I think current versions of HPSS are able to store EAs automatically, 
> and QFS is not, so that may be one difference.
>>
>>>
>>> Integration with SAM-QFS
>>> The SAM policy engine is tightly tied directly to the QFS
filesystem
>>> and for this reason it is not possible to replace the HPSS policy 
>>> engine with SAM.  However, SAM policies could be layered in at the 
>>> copytool level.  The split as we envision it is this: existing 
>>> Lustre policy engine decides which and when files should be
archived
>>> and punched, and SAM-QFS decides how and where to archive them. The
>>> copytool in this case 
>>
>> SAM-QFS already does all these, i.e,  "how and where".
> Yes.  SAM policies would likely have to be written without reference 
> to specific filenames/directories, since that info will not be readily 
> available.  If this proves to be performance-limiting (maybe certain 
> file extensions (.mpg) should be stored in a different manner than 
> another (.txt)), then we can probably find a way to pass the full 
> pathname through to SAM, but this would require SAM code changes.
>>
>>> is simply the unix "cp" command (or perhaps tar as
mentioned above),
>>> that copies the file from the Lustre mount point to the QFS mount 
>>> point on one (of many) clients that has both filesystems mounted.  
>>> SAM-QFS''s file staging and small-file aggregation (as well
as
>>> parallel operation) would all be used "out of the box" to
provide
>>> the best performance possible.
>>
>> The one thing that should be taken into account is that the files being
>> moved from Lustre to SAM are losing the "age" information. 
This might
>> cause SAM some heartburn because all of the files being added will be
>> considered "new" but there will be a large enough influx of
files that
>> it will need to archive and purge files within hours.
>>
>> It may be that the SAM copytool will need to be modified to allow it
>> to pass on some "age" information (if that is something other
than
>> atime and mtime) so the SAM policy engine can treat these files 
>> sensibly.
>> Alternately, it may be that the SAM copytool will need to be smart 
>> enough
>> to mark the new files as "archive & purge immediately" in
some manner.
>>   
> We will just use cp -a to preserve timestamps, ownership, perms etc; I 
> don''t see what any additional age info could be.  As to the
heartburn
> problem, QFS has disk cache as the first level of archive; as that 
> fills files are moved off to secondary automatically.  We can adjust 
> these watermarks to aggressively move files off to tape.  If something 
> backs up, the cp command will simply block.  It would be nice to have 
> some visibility when this situation occurs, but in fact it''s not
at
> all clear what we should do besides change our archiving policy.  This 
> is a general issue, not QFS specific.
>
>> Again, SAM-QFS already does all of these. Correct?
>> So no code changes are expected at SAM-QFS side, right?
> Correct.  As I see it today, no SAM-QFS code changes are necessary, 
> and the QFS copytool will likely be identical or almost identical to 
> the HPSS copytool.
>>
>> For Lustre/SAM-QFS integration, could you point out specifically
>> which area (in this write-up) can be done by U.Minn students? 
> I don''t actually see any work to be done at this point. 
There''s the
> pathname pass-through potential, but I''m not convinced
it''s at all
> necessary.
>>
>>>
>>> Integration with ADM
>>> ADM''s event manager would replace the HPSS policy engine. 
It would
>>> need some minor modifications to be integrated with the Lustre 
>>> changelogs (instead of DMAPI) and ioctl interface to the 
>>> coordinator.  It also produces a similar list of files and actions.
>>> The ADM core would be the copytool, consuming the list and sending 
>>> files to tape.  We would also need a bit of work to pass 
>>> communications between ADM''s Archive Information Manager
and the
>>> policy engine and copytools.  ADM integration is dependent upon 
>>> having a Linux ADM implementation, or a Solaris Lustre 
>>> implementation (potentially Lustre client only).
>>>
>>> Feel free to question, correct, criticize.
>>> Nathan
>>>
>

-- 
---------------------------------------------------------------------
Rick Matthews                           email: Rick.Matthews at sun.com
Sun Microsystems, Inc.                  phone:+1(651) 554-1518
1270 Eagan Industrial Road              phone(internal): 54418
Suite 160                               fax:  +1(651) 554-1540
Eagan, MN 55121-1231 USA                main: +1(651) 554-1500		
---------------------------------------------------------------------

Andreas Dilger

2009-Jan-26 19:35 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

On Jan 23, 2009  13:02 -0600, Rick Matthews wrote:>  Having a mover to put data into QFS is a great idea, and can easily use 
> the QFS Linux client. I don''t think you would necessarily get QFS
> policy for native Lustre files unless the "moved" files retained
the
> Lustre attributes, from which you want policy decisions made.
There will not necessarily be HSM policy data stored with every file
from Lustre, though there is a desire to store Lustre layout data in
the archive.  Is it possible to store extended attributes with each
file in QFS?
> The applicable Lustre namespace would be essentially duplicated in the
> QFS space, and (I think) QFS classification and policy occur on that  
> name space. Doing so gives you access to rich QFS policy. This also
> allows QFS to migrate data to/from archive media without I/O or
> compute load on any Linux clients.
The current Lustre HSM design will not export any of the filesystem
namespace to the archive, so that we don''t have to track renames in
the archive.  The archive objects will only be identified by a Lustre
FID (128-bit file identifier).  IIRC, the HSM-specific copy tool would
be given the file name (though not necessarily the full pathname) in
order to perform the copyout, but the filesystem will be retrieving the
file from the archive by FID.  Nathan, can you confirm that is right?

Does QFS have name-based policies?  Are these policies only on the
filename, or on the whole pathname?

> Nathaniel Rutman wrote:
>> (adding lustre-devel, dropping Bojanic from distro list; if anyone  
>> else wants off, let me know.)
>>
>> Hua Huang and Andreas wrote:
>>>
>>> Nathan,
>>>
>>> Thanks for the write-up.  A few questions and comments.
>>>
>>> SAM-QFS only runs on Solaris,  so it is always
>>> remotely mounted on Lustre client via network connection,
>>> right?
>> QFS has a Linux native client  
>> (http://www.sun.com/download/products.xml?id=4429c1d1).
>> So the copy nodes would be linux nodes acting as clients for both  
>> Lustre and QFS.  This would generally result in two network hops for  
>> the data, but by placing the clients on OST nodes and having the  
>> coordinator choose wisely, we can probably save one of the network  
>> hops most of the time.  This may or may not be a good idea, depending  
>> on the load imposed on the OST.  The copytool would also require us to
>> pump the data from kernel to userspace and back, potentially resulting
>> in significant bus loading.  We could memory map the Lustre side
>>
>>>
>>>
>>> Nathaniel Rutman wrote:
>>>> Hi all -
>>>> So we all have a common starting point, I''m going to
jump right in
>>>> and describe the current plan for integrating Lustre''s
HSM feature
>>>> (in development) with SAM-QFS and ADM.
>>>>
>>>> HSM for Lustre can be broken into two major components, both of
>>>> which will live in userspace: the policy engine, which decides
when
>>>> files are archived (copy to (logical) tape), punched (removed
from
>>>> OSTs), or deleted; and the copytool, which moves file data to
and
>>>> from tape.  A third component that we call the coordinator
lives in
>>>> kernel space and is responsible for relaying HSM requests to 
>>>> various client nodes.
>>> s/tape/the archive/ 
>> yes, I knew my "(logical) tape" statement needed to be
clarified :)
>>>
>>>>
>>>> The policy engine collects filesystem info, maintains a
database of
>>>> files it is interested in, and makes archive and punch
decisions
>>>> that are then communicated back to Lustre.  Note that the
database
>>>> is only used to make policy decisions, and is specifically
_not_ a
>>>> database of file/storage location information.  Periodically,
the
>>>> policy engine give a list of file identifiers and operations
(via
>>>> the coordinator) to any number of Lustre clients running
copytools.
>>> This work will be done by CEA as part of the HPSS HSM solution.
>>> This work is generic in the sense that it could be SAM-QFS or any
>>> other tape backend on the remote side for archival, right?
>> Yes.  The issue here is that the policy engine is a big part of  
>> "brains" of the HSM, and could be a key differentiator for
customers.
>> That''s why the ADM integration would likely replace the HPSS
policy
>> engine with ADM''s Event Manager -- presumably we''ll
be able to get
>> enhanced features by doing this.  The actual benefits need to be  
>> investigated.
>>> Is it expected that a given copytool would be given multiple files
to
>>> archive at one time?  This would allow optimizing the archiving  
>>> operations
>>> to e.g. aggregate small files into a single archive object, but
would
>>> make identifying and extracting these files from the aggregate
harder.
>>>   
>> I do expect the coordinator to hand a list of files to each copytool.
>> But SAM-QFS would actually handle small file aggregation
"underneath"
>> the copytool itself; we don''t have to worry about  
>> identification/extraction.
>>
>>>> The copytool will take the list of files and perform the
requested
>>>> operation: archive, delete, or restore.  (It is potentially 
>>>> possible to have finer-grained archive commands passed from the
>>>> policy engine, e.g. archive_level_3.)  It will then copy the
files
>>>> off to tape/storage using whatever hardware/software specific 
>>>> commands are necessary.  Note that the file identifiers are
opaque
>>>> 16-byte strings.  Files are requested using the same
identifiers;
>>>> "paths may change, but the fids remain the same" is
the basic
>>>> philosophy.  The copytool may hash the fids into dirs/subdirs
to
>>>> relieve problems with a flat namespace, but this is invisible
to
>>>> Lustre.  Having said that, additional information such as the
full
>>>> path name, EAs, etc. may be added by the copytool (using a tar 
>>>> wrapper, for example), for disaster recovery or striping
recovery.
>>>> The initial version of the copytool and policy engine will be  
>>>> written targeted for HPSS, but it is likely that the SAM-QFS  
>>>> integration will use the same pieces.  Perhaps calling it the  
>>>> "Lustre policy engine" would be more appropriate.
>>>
>>> So the initial version will be done by CEA as part of the HPSS.
>> Part of the "HPSS-compatible Lustre HSM solution", which is
our
>> initial target, yes.
>>>
>>> You mentioned other details above, which can be SAM_QFS specific?
>>> I am trying to figure out if the full-version of copy-tool used in
>>> Lustre/SAM_QFS integration will be implemented specifically for
SAM-QFS
>>> from the Lustre side.
>> There are two items that I can think of that may be archive-specific
>> 1. hash the fids into dirs/subdirs to avoid a big flat namespace
>> 2. inclusion of file extended attributes (EAs)
>> But in fact, I don''t know enough about HPSS to say we
don''t need these
>> items anyhow.  CEA, can you comment?
>> I think current versions of HPSS are able to store EAs automatically,  
>> and QFS is not, so that may be one difference.
>>>
>>>>
>>>> Integration with SAM-QFS
>>>> The SAM policy engine is tightly tied directly to the QFS 
>>>> filesystem and for this reason it is not possible to replace
the
>>>> HPSS policy engine with SAM.  However, SAM policies could be 
>>>> layered in at the copytool level.  The split as we envision it
is
>>>> this: existing Lustre policy engine decides which and when
files
>>>> should be archived and punched, and SAM-QFS decides how and
where
>>>> to archive them. The copytool in this case 
>>>
>>> SAM-QFS already does all these, i.e,  "how and where".
>> Yes.  SAM policies would likely have to be written without reference  
>> to specific filenames/directories, since that info will not be readily
>> available.  If this proves to be performance-limiting (maybe certain  
>> file extensions (.mpg) should be stored in a different manner than  
>> another (.txt)), then we can probably find a way to pass the full  
>> pathname through to SAM, but this would require SAM code changes.
>>>
>>>> is simply the unix "cp" command (or perhaps tar as
mentioned
>>>> above), that copies the file from the Lustre mount point to the
QFS
>>>> mount point on one (of many) clients that has both filesystems 
>>>> mounted.  SAM-QFS''s file staging and small-file
aggregation (as
>>>> well as parallel operation) would all be used "out of the
box" to
>>>> provide the best performance possible.
>>>
>>> The one thing that should be taken into account is that the files
being
>>> moved from Lustre to SAM are losing the "age"
information.  This might
>>> cause SAM some heartburn because all of the files being added will
be
>>> considered "new" but there will be a large enough influx
of files that
>>> it will need to archive and purge files within hours.
>>>
>>> It may be that the SAM copytool will need to be modified to allow
it
>>> to pass on some "age" information (if that is something
other than
>>> atime and mtime) so the SAM policy engine can treat these files  
>>> sensibly.
>>> Alternately, it may be that the SAM copytool will need to be smart
>>> enough
>>> to mark the new files as "archive & purge
immediately" in some manner.
>>>   
>> We will just use cp -a to preserve timestamps, ownership, perms etc; I
>> don''t see what any additional age info could be.  As to the
heartburn
>> problem, QFS has disk cache as the first level of archive; as that  
>> fills files are moved off to secondary automatically.  We can adjust  
>> these watermarks to aggressively move files off to tape.  If something
>> backs up, the cp command will simply block.  It would be nice to have  
>> some visibility when this situation occurs, but in fact it''s
not at
>> all clear what we should do besides change our archiving policy.  This
>> is a general issue, not QFS specific.
>>
>>> Again, SAM-QFS already does all of these. Correct?
>>> So no code changes are expected at SAM-QFS side, right?
>> Correct.  As I see it today, no SAM-QFS code changes are necessary,  
>> and the QFS copytool will likely be identical or almost identical to  
>> the HPSS copytool.
>>>
>>> For Lustre/SAM-QFS integration, could you point out specifically
>>> which area (in this write-up) can be done by U.Minn students? 
>> I don''t actually see any work to be done at this point. 
There''s the
>> pathname pass-through potential, but I''m not convinced
it''s at all
>> necessary.
>>>
>>>>
>>>> Integration with ADM
>>>> ADM''s event manager would replace the HPSS policy
engine.  It would
>>>> need some minor modifications to be integrated with the Lustre
>>>> changelogs (instead of DMAPI) and ioctl interface to the  
>>>> coordinator.  It also produces a similar list of files and
actions.
>>>>  The ADM core would be the copytool, consuming the list and
sending
>>>> files to tape.  We would also need a bit of work to pass  
>>>> communications between ADM''s Archive Information
Manager and the
>>>> policy engine and copytools.  ADM integration is dependent upon
>>>> having a Linux ADM implementation, or a Solaris Lustre  
>>>> implementation (potentially Lustre client only).
>>>>
>>>> Feel free to question, correct, criticize.
>>>> Nathan
>>>>
>>
>
>
> -- 
> ---------------------------------------------------------------------
> Rick Matthews                           email: Rick.Matthews at sun.com
> Sun Microsystems, Inc.                  phone:+1(651) 554-1518
> 1270 Eagan Industrial Road              phone(internal): 54418
> Suite 160                               fax:  +1(651) 554-1540
> Eagan, MN 55121-1231 USA                main: +1(651) 554-1500		
> ---------------------------------------------------------------------
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Andreas Dilger

2009-Jan-26 19:47 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

On Jan 23, 2009  10:46 -0600, Harriet G. Coverston
wrote:> SAM supports classification policy rules for files  -- (1) number of  
> copies, up to 4 (2) where to put the copies  on which vsn pools  -  
> disk and/or tape, local and/or remote) (3) when to make the copies  
> (time based archiving). You specify the policy in the archiver.cmd  
> file. You can group files for a policy rule by pathname, owner, group,  
> size, wildcard, and access time.
> 
> This brings up the question of restore. In case of a Lustre disk  
> failure, how are you going to restore your Lustre file system?
The initial HSM implementation is focussed mainly on the space management
issues, rather than backup/restore, though of course there is a lot of
overlap between the two and we have discussed backup aspects in the past.

There are two main issues that would need to be addressed:
- a Lustre-level policy on the minimum file size that should be sent to
  the archive.  For Lustre, there would be minimal space savings if a
  small file is moved to the archive, so that would only be useful in
  the archive-as-backup case.
  
  We would need to decide whether the HPSS implementation can/should
  handle aggregating multiple small files into a single archive object.
  I think that is useful, and this is one reason I advocate being able
  to pass multiple files at once from the coordinator to the agent.

- since the archive does not contain a copy of the namespace (it only
  has 128-bit FIDs as identifiers for the file) we would need to make
  a separate backup of the MDS filesystem (which is all namespace).
  There are already several mechanisms to do this, either using the
  ext2 "dump" program to read from the raw device, or to make an LVM
  snapshot and use e.g. tar to make a filesystem-level backup.  Both
  of these need to include a backup of the extended attributes.
> Agree. I don''t see any SAM-QFS code changes required. The Lustre  
> copytool will write to HPSS using the HPSS APIs and write to SAM-QFS  
> with a ftp or pftp interface. This is minimum changes.
We weren''t thinking of using an FTP interface to SAM, though I guess
this is possible.  Rather we were thinking of just mounting both QFS
and Lustre on a Linux client and using "cp" or equivalent tool.
Depending on the performance requirements, it might make sense to
use a smarter tool that avoids the kernel-user-kernel memory copies.
> I do see work to switch the HPSS APIs to ftp or pftp. If this is  
> already supported by HPSS, then, yes, no changes are required.
I think CEA is planning on writing a copytool using the HPSS APIs
directly.  There is also "htar" which is a tar-like interface to
HPSS, but I don''t think that was anyone''s intention to use.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Andreas Dilger

2009-Jan-26 19:57 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

On Jan 23, 2009  12:39 -0500, Shipman, Galen M. wrote:> Looks like HPSS will support EA in 7.1.2.0, June 2009
> I have asked Vicky here at ORNL to dig a bit into what the EA features will
> look like.  Do we have a set of requirements for EAs for HSM integration?
As yet we don''t have a hard requirement for EAs in HSM.  We would
ideally
keep the LOV EA for the file layout in the HSM, so that the file gets
(approximately) the same layout when it is restored.  This is only really
needed for files that were not allocated using the default layout, and
we might consider saving e.g. "stripe over all OSTs" instead of
"stripe
over N OSTs" so that if the number of OSTs increases from when the file
was archived until it is restored the new file gets the full performance.

In the absence of EAs in the HSM we could fall back to using a tar file
format that supports EAs (as in RHEL5.x and star) to store the layout
information.  We are also considering to keep the layout information in
the MDS, but that doesn''t help in the "backup" use case where
the file
was deleted or the MDS is lost.
> -----Original Message-----
> From: Andreas.Dilger at sun.com on behalf of Andreas Dilger
> Sent: Thu 1/22/2009 5:55 PM
> To: Nathaniel Rutman
> Cc: Hua Huang; lustre-hsm-core-ext at sun.com; lustre-devel at
lists.lustre.org; Karen Jourdenais; Erica Dorenkamp; Harriet.Coverston at
sun.com; Rick Matthews; karl at tacc.utexas.edu
> Subject: Re: SAM-QFS, ADM, and Lustre HSM
>  
> On Jan 22, 2009  12:46 -0800, Nathaniel Rutman wrote:
> > QFS has a Linux native client  
> > So the copy nodes would be linux nodes acting as clients for both
Lustre
> > and QFS.  This would generally result in two network hops for the
data,
> > but by placing the clients on OST nodes and having the coordinator  
> > choose wisely, we can probably save one of the network hops most of
the
> > time.  This may or may not be a good idea, depending on the load
imposed
> > on the OST.  The copytool would also require us to pump the data from
> > kernel to userspace and back, potentially resulting in significant bus
> > loading.  We could memory map the Lustre side
> 
> I was just wondering to myself if we couldn''t make an optimized
"cp"
> command that would work in the kernel and be able to use newer APIs
> like "splice" or just a read-write loop that avoids
kernel-user-kernel
> data copies.  Unfortunately, I don''t think mmap IO is very fast
with
> Lustre, or memcpy() from mmap Lustre to mmap QFS would give us a single
> memcpy() operation (which is the best I think we can do).
> 
> > There are two items that I can think of that may be archive-specific
> > 1. hash the fids into dirs/subdirs to avoid a big flat namespace
> > 2. inclusion of file extended attributes (EAs)
> > But in fact, I don''t know enough about HPSS to say we
don''t need these
> > items anyhow.  CEA, can you comment?
> > I think current versions of HPSS are able to store EAs automatically,
> > and QFS is not, so that may be one difference.
> 
> I got a paper from CEA that indicated HPSS was going to (or may have
> already) implemented EA support, but it isn''t at all clear if that
> version of software would be available at all sites, since AFAIK it
> is relatively new.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Nathaniel Rutman

2009-Jan-26 21:53 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Andreas Dilger wrote:> On Jan 23, 2009  10:46 -0600, Harriet G. Coverston wrote:
>   
>> SAM supports classification policy rules for files  -- (1) number of  
>> copies, up to 4 (2) where to put the copies  on which vsn pools  -  
>> disk and/or tape, local and/or remote) (3) when to make the copies  
>> (time based archiving). You specify the policy in the archiver.cmd  
>> file. You can group files for a policy rule by pathname, owner, group,
>> size, wildcard, and access time.
>>     My point about this is that files will be stored using the FID as the 
file name, so name-based policies at the copytool level are worthless.   
Unless we a.) add the path/filename back to the file (EA?), and b.) 
modify the SAM policy engine to use the "real" path/filename instead
of
the FID.>> This brings up the question of restore. In case of a Lustre disk  
>> failure, how are you going to restore your Lustre file system?
>>     
>
> ...
>   
> - since the archive does not contain a copy of the namespace (it only
>   has 128-bit FIDs as identifiers for the file) we would need to make
>   a separate backup of the MDS filesystem (which is all namespace).
>   There are already several mechanisms to do this, either using the
>   ext2 "dump" program to read from the raw device, or to make an
LVM
>   snapshot and use e.g. tar to make a filesystem-level backup.  Both
>   of these need to include a backup of the extended attributes.
>   Or include the path/filename in each file, and the restore process uses 
this to repopulate the filesystem.>   
>> Agree. I don''t see any SAM-QFS code changes required. The
Lustre
>> copytool will write to HPSS using the HPSS APIs and write to SAM-QFS  
>> with a ftp or pftp interface. This is minimum changes.
>>     
>
> We weren''t thinking of using an FTP interface to SAM, though I
guess
> this is possible.  Rather we were thinking of just mounting both QFS
> and Lustre on a Linux client and using "cp" or equivalent tool.
>   Harriet already knew this, she just forgot :)

Nathaniel Rutman

2009-Jan-26 22:13 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Andreas Dilger wrote:> On Jan 23, 2009  13:02 -0600, Rick Matthews wrote:
>   
>>  Having a mover to put data into QFS is a great idea, and can easily
use
>> the QFS Linux client. I don''t think you would necessarily get
QFS
>> policy for native Lustre files unless the "moved" files
retained the
>> Lustre attributes, from which you want policy decisions made.
>>     
>
> There will not necessarily be HSM policy data stored with every file
> from Lustre, though there is a desire to store Lustre layout data in
> the archive.  Is it possible to store extended attributes with each
> file in QFS?
>   We can always store EA''s, either natively or "poor-man''s
EA''s" via
mini-tarballs.>   
>> The applicable Lustre namespace would be essentially duplicated in the
>> QFS space, and (I think) QFS classification and policy occur on that  
>> name space. Doing so gives you access to rich QFS policy. This also
>> allows QFS to migrate data to/from archive media without I/O or
>> compute load on any Linux clients.
>>     
>
> The current Lustre HSM design will not export any of the filesystem
> namespace to the archive, so that we don''t have to track renames
in
> the archive.  The archive objects will only be identified by a Lustre
> FID (128-bit file identifier).  IIRC, the HSM-specific copy tool would
> be given the file name (though not necessarily the full pathname) in
> order to perform the copyout, but the filesystem will be retrieving the
> file from the archive by FID.  Nathan, can you confirm that is right?
>   There is a mechanism to get the current full pathname for a given fid 
from userspace, so an HSM-specific copytool could find it out, but a 
central tenet of the design here is that as far as the HSM is concerned, 
the entire Lustre FS is a flat namespace of FIDs.  You can get a full 
pathname if you want to for catastrophe recovery, but Lustre itself will 
only speak to the HSM with FIDs.
As I said in the other email, although SAM-QFS can do name-based 
policies, the "name" as far as QFS is concerned is just the FID, so  
name-based policies at the copytool level are worthless.   Unless we a.) 
add the path/filename back to the file (EA, or use a tarball wrapper), 
and b.) modify the SAM policy engine to use the "real" path/filename 
instead of the FID.

But in the bigger picture sense, note that all this is simply an 
optimization to allow SAM-QFS filename-based policies, which ultimately 
only influences where SAM-QFS stores files, not whether or when the 
files are archived by Lustre.  These "top-level" policy decisions are 
made by the Lustre policy manager, and so perhaps there is no real need 
to spend any effort getting b.) above working.  Note that a.) is still 
useful for disaster recovery.

Harriet G. Coverston

2009-Jan-27 00:12 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Andreas,

On Jan 26, 2009, at 1:47 PM, Andreas Dilger wrote:
> On Jan 23, 2009  10:46 -0600, Harriet G. Coverston wrote:
>> SAM supports classification policy rules for files  -- (1) number of
>> copies, up to 4 (2) where to put the copies  on which vsn pools  -
>> disk and/or tape, local and/or remote) (3) when to make the copies
>> (time based archiving). You specify the policy in the archiver.cmd
>> file. You can group files for a policy rule by pathname, owner,  
>> group,
>> size, wildcard, and access time.
>>
>> This brings up the question of restore. In case of a Lustre disk
>> failure, how are you going to restore your Lustre file system?
>
> The initial HSM implementation is focussed mainly on the space  
> management
> issues, rather than backup/restore, though of course there is a lot of
> overlap between the two and we have discussed backup aspects in the  
> past.
I do see below that you are dumping your metadata which maps the 128- 
FIDs to
the full pathname. In this case, you would be able to restore your  
Luster
file system from the archive.  If you don''t have this restore feature,
then you would
just be using the archive as a disk extender and you would also need a  
conventional
backup.>
> There are two main issues that would need to be addressed:
> - a Lustre-level policy on the minimum file size that should be sent  
> to
>  the archive.  For Lustre, there would be minimal space savings if a
>  small file is moved to the archive, so that would only be useful in
>  the archive-as-backup case.
>
>  We would need to decide whether the HPSS implementation can/should
>  handle aggregating multiple small files into a single archive object.Last I knew, they still don''t build a container for small files. They  
write
a tape mark between each file. This means they are start/stopping the
tape for small files. A lot of sites use SRB which builds a tar  
container.>
>  I think that is useful, and this is one reason I advocate being able
>  to pass multiple files at once from the coordinator to the agent.If you decided to build a container, then that will work for both HPSS  
and
SAM.>
>
> - since the archive does not contain a copy of the namespace (it only
>  has 128-bit FIDs as identifiers for the file) we would need to make
>  a separate backup of the MDS filesystem (which is all namespace).
>  There are already several mechanisms to do this, either using the
>  ext2 "dump" program to read from the raw device, or to make an
LVM
>  snapshot and use e.g. tar to make a filesystem-level backup.  Both
>  of these need to include a backup of the extended attributes.
>
>> Agree. I don''t see any SAM-QFS code changes required. The
Lustre
>> copytool will write to HPSS using the HPSS APIs and write to SAM-QFS
>> with a ftp or pftp interface. This is minimum changes.
>
> We weren''t thinking of using an FTP interface to SAM, though I
guess
> this is possible.  Rather we were thinking of just mounting both QFS
> and Lustre on a Linux client and using "cp" or equivalent tool.
> Depending on the performance requirements, it might make sense to
> use a smarter tool that avoids the kernel-user-kernel memory copies.Yes, we support Linux clients and you can use the datamover  
architecture.
You benefit with direct access to the storage from both the Lustre
file system and the SAM file system, no OTW performance penalty. I  
would not
recommend cp since it is mmap I/O (on Solaris, not sure about Linux).  
You will
want to use direct I/O to avoid the useless data copy. If you use ftp/ 
pftp/gridftp,
that is just a loop back move on the datamover(s); however, any  
standard file system
interface will work to SAM.>
>
>> I do see work to switch the HPSS APIs to ftp or pftp. If this is
>> already supported by HPSS, then, yes, no changes are required.
>
> I think CEA is planning on writing a copytool using the HPSS APIs
> directly.  There is also "htar" which is a tar-like interface to
> HPSS, but I don''t think that was anyone''s intention to
use.
If they decide to use the non standard HPSS APIs, then yes, there  
would be changes
required to use a standard file system interface for SAM.

Best regards,

- Harriet>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
    - Harriet

Harriet G. Coverston
Solaris, Storage Software			 |  Email: harriet.coverston at sun.com
Sun Microsystems, Inc.                     	 |  AT&T:  651-554-1515
1270 Eagan Industrial Rd., Suite 160       |  Fax:   651-554-1540
Eagan, MN 55121-1231

Harriet G. Coverston

2009-Jan-27 02:26 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Nathan,

On Jan 26, 2009, at 4:13 PM, Nathaniel Rutman wrote:
> Andreas Dilger wrote:
>> On Jan 23, 2009  13:02 -0600, Rick Matthews wrote:
>>
>>> Having a mover to put data into QFS is a great idea, and can  
>>> easily use the QFS Linux client. I don''t think you would  
>>> necessarily get QFS
>>> policy for native Lustre files unless the "moved" files
retained the
>>> Lustre attributes, from which you want policy decisions made.
>>>
>>
>> There will not necessarily be HSM policy data stored with every file
>> from Lustre, though there is a desire to store Lustre layout data in
>> the archive.  Is it possible to store extended attributes with each
>> file in QFS?
>>
> We can always store EA''s, either natively or
"poor-man''s EA''s" via
> mini-tarballs.
>>
>>> The applicable Lustre namespace would be essentially duplicated in
>>> the
>>> QFS space, and (I think) QFS classification and policy occur on  
>>> that  name space. Doing so gives you access to rich QFS policy.  
>>> This also
>>> allows QFS to migrate data to/from archive media without I/O or
>>> compute load on any Linux clients.
>>>
>>
>> The current Lustre HSM design will not export any of the filesystem
>> namespace to the archive, so that we don''t have to track
renames in
>> the archive.  The archive objects will only be identified by a Lustre
>> FID (128-bit file identifier).  IIRC, the HSM-specific copy tool  
>> would
>> be given the file name (though not necessarily the full pathname) in
>> order to perform the copyout, but the filesystem will be retrieving  
>> the
>> file from the archive by FID.  Nathan, can you confirm that is right?
>>
> There is a mechanism to get the current full pathname for a given  
> fid from userspace, so an HSM-specific copytool could find it out,  
> but a central tenet of the design here is that as far as the HSM is  
> concerned, the entire Lustre FS is a flat namespace of FIDs.
Be careful here. We are a file system. We don''t have a limit on # of  
files in one directory, but we don''t recommend more than 500,000 files
in one single directory or you will start to see some performance  
problems. You will have to create a tree, not use a flat namespace.
>  You can get a full pathname if you want to for catastrophe  
> recovery, but Lustre itself will only speak to the HSM with FIDs.
> As I said in the other email, although SAM-QFS can do name-based  
> policies, the "name" as far as QFS is concerned is just the FID,
so
> name-based policies at the copytool level are worthless.   Unless we  
> a.) add the path/filename back to the file (EA, or use a tarball  
> wrapper), and b.) modify the SAM policy engine to use the "real"
> path/filename instead of the FID.
Currently, we don''t support policy using EA (extended attributes are  
in 5.0). We have had lots of requests for this, especially from our  
digital preservation customers.>
>
> But in the bigger picture sense, note that all this is simply an  
> optimization to allow SAM-QFS filename-based policies, which  
> ultimately only influences where SAM-QFS stores files, not whether  
> or when the files are archived by Lustre.  These "top-level"
policy
> decisions are made by the Lustre policy manager, and so perhaps  
> there is no real need to spend any effort getting b.) above  
> working.  Note that a.) is still useful for disaster recovery.Agree. We have lots of customer with only one archive set. This means  
all files are archived with the
same policy -- very simple.

    - Harriet

Harriet G. Coverston
Solaris, Storage Software			 |  Email: harriet.coverston at sun.com
Sun Microsystems, Inc.                     	 |  AT&T:  651-554-1515
1270 Eagan Industrial Rd., Suite 160       |  Fax:   651-554-1540
Eagan, MN 55121-1231

LEIBOVICI Thomas

2009-Jan-27 08:22 UTC

head link

Re: SAM-QFS, ADM, and Lustre HSM

Hi,

AFAIK, HPSS distribution includes pftp and gridftp support (also
available here
for download).

At CEA, we are using our own copytool that directly uses HPSS API. This
already exists and is in production for years.

I think there will be few modifications to adapt it to Lustre-HSM
purpose

(basically, add fid &lt;-&gt; HSM id mapping and backup of attributes,
path, stripe...)

Thomas

CEA/DAM

Andreas Dilger a écrit :

I do see work to switch the HPSS APIs to ftp or pftp. If this is  
already supported by HPSS, then, yes, no changes are required.

I think CEA is planning on writing a copytool using the HPSS APIs
directly.  There is also "htar" which is a tar-like interface to
HPSS, but I don''t think that was anyone''s intention to use.

Cheers, Andreas


_______________________________________________
Lustre-devel mailing list
Lustre-devel-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-devel

Vicky White

2009-Jan-28 20:30 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Comments on several messages; I am slowly catching up.

> I do see work to switch the HPSS APIs to ftp or pftp. If this is  
> already supported by HPSS, then, yes, no changes are required.
>   
HPSS supports ftp and pftp.   However, this seems to be a moot point as 
Thomas points out that CEA is using the HPSS client API library for 
their copy tool:

> At CEA, we are using our own copytool that directly uses HPSS API. 
> This already exists and is in production for years.
> I think there will be few modifications to adapt it to Lustre-HSM purpose
> (basically, add fid <-> HSM id mapping and backup of attributes,
path,
> stripe...)
>   There is also "htar" which is a tar-like interface to
> HPSS, but I don''t think that was anyone''s intention to
use.

htar is a well proven and valuable tool for aggregation to HPSS.   It is 
widely used at HPSS sites as a stand-alone utility and has been 
incorporated into other interfaces. 

> Looks like HPSS will support EA in 7.1.2.0, June 2009
> I have asked Vicky here at ORNL to dig a bit into what the EA features will
look like.
>   
The last draft of this design I saw was from November.    Work on this 
is picking up right now and has been bumped to a high priority, due for 
release this June, as Galen says.   I am trying to find out if there is 
a later design and how much about it I can share.

> Do we have a set of requirements for EAs for HSM integration?
I never saw an answer to Galen''s question above; did I miss it?  Now is
the time to speak up if we need to influence the design of the HPSS EAs.

>   > We would need to decide whether the HPSS implementation can/should
>   > handle aggregating multiple small files into a single archive
object.
>   > I think that is useful, and this is one reason I advocate being able
>   > to pass multiple files at once from the coordinator to the agent.
> Last I knew, they still don''t build a container for small files.
They
> write
> a tape mark between each file. This means they are start/stopping the
> tape for small files. A lot of sites use SRB which builds a tar container.
As of HPSS 7.1, we build a container for small files before copying them 
to tape.  It''s called Tape Aggregation and we call the container an 
aggregate.   Tape Aggregation is controlled via the HPSS migration 
policy, where the sysadm can configure whether or not to aggregate, the 
minimum and maximum files to place in each aggregate, and the maximum 
size of each aggregate.


Vicky

Vicky White

2009-Jan-29 15:35 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

>> Looks like HPSS will support EA in 7.1.2.0, June 2009
>> I have asked Vicky here at ORNL to dig a bit into what the EA 
>> features will look like.   
>
> The last draft of this design I saw was from November.    Work on this 
> is picking up right now and has been bumped to a high priority, due 
> for release this June, as Galen says.   I am trying to find out if 
> there is a later design and how much about it I can share.

There is a more recent draft, though the main change seems to be change 
the name from "Extended Attributes" to "User Defined
Attributes" (UDAs).

The gist of the current draft is that a new database table would be 
added to the HPSS schema which would consist of two columns, an object 
ID and an XML document.   The XML document would define all the UDAs for 
some HPSS name space object (file, directory, symlink, hard link, etc.) 
in some key/value format.   It would take advantage of the new 
capability in version 9 of DB2 of handling XML columns and being able to 
index and query them as XML, not just as a text string.   The object ID 
column of the new table would hold the ID of the HPSS name space object  
to which the extended attribute(s) apply.

The design is intended to handle small UDAs, up to 512 bytes in length 
for the total XML document, in order to be able to store the data in the 
same row; larger documents will be accepted but would have to be stored 
in a large object (LOB) area external to the main table, reducing 
efficiency.   This is something to keep in mind if we start talking 
about putting full (or even relative) pathnames in as UDAs.

I understand that the CEA folks have a copy of this draft of the design 
and are in communication with its authors.

Vicky

Vicky White

2009-Jan-29 15:36 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Andreas Dilger wrote:> On Jan 23, 2009  12:39 -0500, Shipman, Galen M. wrote:
>   
>> Looks like HPSS will support EA in 7.1.2.0, June 2009
>> I have asked Vicky here at ORNL to dig a bit into what the EA features
will
>> look like.  Do we have a set of requirements for EAs for HSM
integration?
>>     
>
> As yet we don''t have a hard requirement for EAs in HSM.  We would
ideally
> keep the LOV EA for the file layout in the HSM, so that the file gets
> (approximately) the same layout when it is restored.  This is only really
> needed for files that were not allocated using the default layout, and
> we might consider saving e.g. "stripe over all OSTs" instead of
"stripe
> over N OSTs" so that if the number of OSTs increases from when the
file
> was archived until it is restored the new file gets the full performance.
>   

Also, if for some reason the number of OFTs decreases, the stripe all 
could just use the new available value.

>> I got a paper from CEA that indicated HPSS was going to (or may have
>> already) implemented EA support, but it isn''t at all clear if
that
>> version of software would be available at all sites, since AFAIK it
>> is relatively new.
>>
>> Cheers, Andreas
>>     

This version of the HPSS software with the EA support (now called UDA 
for User Defined Attributes) will be available in the baseline HPSS 
code, available to all sites.   Target availability date is summer 2009.


Vicky

Vicky White

2009-Jan-30 14:26 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

>
>> Looks like HPSS will support EA in 7.1.2.0, June 2009
>> I have asked Vicky here at ORNL to dig a bit into what the EA 
>> features will look like.   
>
> The last draft of this design I saw was from November.    Work on this 
> is picking up right now and has been bumped to a high priority, due 
> for release this June, as Galen says.   I am trying to find out if 
> there is a later design and how much about it I can share.

I just realized the June date is internal.  That''s when hpss developers
are to have their code unit tested.  After that comes integration and 
system testing.   This feature will not likely be released until around 
September.

Vicky

Nathaniel Rutman

2009-Jan-31 00:21 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

LEIBOVICI Thomas wrote:> At CEA, we are using our own copytool that directly uses HPSS API. 
> This already exists and is in production for years.
> I think there will be few modifications to adapt it to Lustre-HSM purpose
> (basically, add fid <-> HSM id mapping and backup of attributes,
path,
> stripe...)So then the QFS copytool will indeed be a new tool, and should be 
scheduled accordingly.
Features:
1. "cp --preserve" like functionality (include metadata attributes in
cp)
2. add EA''s (create mini-tarball)
3. implement FID hash to subdivide namespace
4. periodic status reporting (via ioctl on file)


Harriet G. Coverston wrote:>> There is a mechanism to get the current full pathname for a given fid 
>> from userspace, so an HSM-specific copytool could find it out, but a 
>> central tenet of the design here is that as far as the HSM is 
>> concerned, the entire Lustre FS is a flat namespace of FIDs.
>
> Be careful here. We are a file system. We don''t have a limit on #
of
> files in one directory, but we don''t recommend more than 500,000
files
> in one single directory or you will start to see some performance 
> problems. You will have to create a tree, not use a flat namespace.Yes, a tree based on a hash of the fid. 

The other option is to use the actual filename for storage, but from 
Lustre''s point of view this gets extremely tricky.  For example:
Send /foo/bar to archive.  Client A opens /foo/bar.  Client B renames 
/foo/bar to /abc/xyz, but this change hasn''t propagated to the archive 
yet.  Client A now tries to read its open file handle, which tells 
Lustre to read the offline file FID 123, which it translates to /abc/xyz 
currently, which the archive doesn''t know about yet.  Not just xyz, but
renames on any ancestor path element cause similar misses.  Since the 
FID remains constant throughout the life of a file, we don''t have to 
worry about any namespace changes (file or parents).  If there was an 
alternate way of bypassing the archive''s namespace to directly access a
file, we could conceivably store e.g. an archive-specific identifier 
within the Lustre stripe EA, and pass this down to the copytool when 
reading an offline file, but this presupposes that such a thing exists, 
is of reasonable size, has a userspace method to access it, etc.
>
>>  You can get a full pathname if you want to for catastrophe recovery, 
>> but Lustre itself will only speak to the HSM with FIDs.
>> As I said in the other email, although SAM-QFS can do name-based 
>> policies, the "name" as far as QFS is concerned is just the
FID, so
>> name-based policies at the copytool level are worthless.   Unless we 
>> a.) add the path/filename back to the file (EA, or use a tarball 
>> wrapper), and b.) modify the SAM policy engine to use the
"real"
>> path/filename instead of the FID.
>
> Currently, we don''t support policy using EA (extended attributes
are
> in 5.0). We have had lots of requests for this, especially from our 
> digital preservation customers.Ah, policy based on EAs would be the general case, yes.

Harriet G. Coverston

2009-Feb-02 04:00 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Nathan,

On Jan 30, 2009, at 6:21 PM, Nathaniel Rutman wrote:
> LEIBOVICI Thomas wrote:
>> At CEA, we are using our own copytool that directly uses HPSS API.  
>> This already exists and is in production for years.
>> I think there will be few modifications to adapt it to Lustre-HSM  
>> purpose
>> (basically, add fid <-> HSM id mapping and backup of attributes,
>> path, stripe...)
> So then the QFS copytool will indeed be a new tool, and should be  
> scheduled accordingly.
> Features:
> 1. "cp --preserve" like functionality (include metadata
attributes
> in cp)
> 2. add EA''s (create mini-tarball)
> 3. implement FID hash to subdivide namespace
> 4. periodic status reporting (via ioctl on file)
>
>
> Harriet G. Coverston wrote:
>>> There is a mechanism to get the current full pathname for a given  
>>> fid from userspace, so an HSM-specific copytool could find it out,
>>> but a central tenet of the design here is that as far as the HSM  
>>> is concerned, the entire Lustre FS is a flat namespace of FIDs.
>>
>> Be careful here. We are a file system. We don''t have a limit
on #
>> of files in one directory, but we don''t recommend more than
500,000
>> files in one single directory or you will start to see some  
>> performance problems. You will have to create a tree, not use a  
>> flat namespace.
> Yes, a tree based on a hash of the fid.
> The other option is to use the actual filename for storage, but from  
> Lustre''s point of view this gets extremely tricky.  For example:
> Send /foo/bar to archive.  Client A opens /foo/bar.  Client B  
> renames /foo/bar to /abc/xyz, but this change hasn''t propagated to
> the archive yet.  Client A now tries to read its open file handle,  
> which tells Lustre to read the offline file FID 123, which it  
> translates to /abc/xyz currently, which the archive doesn''t know  
> about yet.  Not just xyz, but renames on any ancestor path element  
> cause similar misses.  Since the FID remains constant throughout the  
> life of a file, we don''t have to worry about any namespace changes
> (file or parents).  If there was an alternate way of bypassing the  
> archive''s namespace to directly access a file, we could
conceivably
> store e.g. an archive-specific identifier within the Lustre stripe  
> EA, and pass this down to the copytool when reading an offline file,  
> but this presupposes that such a thing exists, is of reasonable  
> size, has a userspace method to access it, etc.
Yes, we have a FID like concept in SAM-QFS. It is called the file ID.  
It is 64 bits and consists of the inode/generation number. It is  
unique. You can store it. You can issue an ioctl to open the ID. You
can issue an ioctl to do an ID stat, etc. It is much more efficient  
than using the filename (expensive lookup). This means if you store  
and use the ID, you can cover the rename window and still be  
guaranteed that you will get the right file. Note, we don''t rearchive  
on a rename.

I really think a replicated namespace will be much more intuitive and  
solves restore. If you prefer
to build a tar container, that is OK, too. The tar file can have a  
suffix and then you know it is tar and
you can tar it back.>
>
>>
>>> You can get a full pathname if you want to for catastrophe  
>>> recovery, but Lustre itself will only speak to the HSM with FIDs.
>>> As I said in the other email, although SAM-QFS can do name-based  
>>> policies, the "name" as far as QFS is concerned is just
the FID,
>>> so  name-based policies at the copytool level are worthless.    
>>> Unless we a.) add the path/filename back to the file (EA, or use a
>>> tarball wrapper), and b.) modify the SAM policy engine to use the  
>>> "real" path/filename instead of the FID.
>>
>> Currently, we don''t support policy using EA (extended
attributes
>> are in 5.0). We have had lots of requests for this, especially from  
>> our digital preservation customers.
> Ah, policy based on EAs would be the general case, yes.Yes, this would be a nice feature for us.

    - Harriet

Harriet G. Coverston
Solaris, Storage Software			 |  Email: harriet.coverston at sun.com
Sun Microsystems, Inc.                     	 |  AT&T:  651-554-1515
1270 Eagan Industrial Rd., Suite 160       |  Fax:   651-554-1540
Eagan, MN 55121-1231

Colin Ngam

2009-Feb-02 14:56 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Harriet G. Coverston wrote:

Hi,> Nathan,
>
> On Jan 30, 2009, at 6:21 PM, Nathaniel Rutman wrote:
>
>> LEIBOVICI Thomas wrote:
>>> At CEA, we are using our own copytool that directly uses HPSS API. 
>>> This already exists and is in production for years.
>>> I think there will be few modifications to adapt it to Lustre-HSM 
>>> purpose
>>> (basically, add fid <-> HSM id mapping and backup of
attributes,
>>> path, stripe...)
>> So then the QFS copytool will indeed be a new tool, and should be 
>> scheduled accordingly.
>> Features:
>> 1. "cp --preserve" like functionality (include metadata
attributes in
>> cp)
>> 2. add EA''s (create mini-tarball)
>> 3. implement FID hash to subdivide namespace
>> 4. periodic status reporting (via ioctl on file)
>>
>>
>> Harriet G. Coverston wrote:
>>>> There is a mechanism to get the current full pathname for a
given
>>>> fid from userspace, so an HSM-specific copytool could find it
out,
>>>> but a central tenet of the design here is that as far as the
HSM is
>>>> concerned, the entire Lustre FS is a flat namespace of FIDs.
>>>
>>> Be careful here. We are a file system. We don''t have a
limit on # of
>>> files in one directory, but we don''t recommend more than
500,000
>>> files in one single directory or you will start to see some 
>>> performance problems. You will have to create a tree, not use a
flat
>>> namespace.
>> Yes, a tree based on a hash of the fid.
>> The other option is to use the actual filename for storage, but from 
>> Lustre''s point of view this gets extremely tricky.  For
example:
>> Send /foo/bar to archive.  Client A opens /foo/bar.  Client B renames 
>> /foo/bar to /abc/xyz, but this change hasn''t propagated to the
>> archive yet.  Client A now tries to read its open file handle, which 
>> tells Lustre to read the offline file FID 123, which it translates to 
>> /abc/xyz currently, which the archive doesn''t know about yet. 
Not
>> just xyz, but renames on any ancestor path element cause similar 
>> misses.  Since the FID remains constant throughout the life of a 
>> file, we don''t have to worry about any namespace changes (file
or
>> parents).  If there was an alternate way of bypassing the
archive''s
>> namespace to directly access a file, we could conceivably store e.g. 
>> an archive-specific identifier within the Lustre stripe EA, and pass 
>> this down to the copytool when reading an offline file, but this 
>> presupposes that such a thing exists, is of reasonable size, has a 
>> userspace method to access it, etc.
>
> Yes, we have a FID like concept in SAM-QFS. It is called the file ID. 
> It is 64 bits and consists of the inode/generation number. It is 
> unique. You can store it. You can issue an ioctl to open the ID. You
> can issue an ioctl to do an ID stat, etc. It is much more efficient 
> than using the filename (expensive lookup). This means if you store 
> and use the ID, you can cover the rename window and still be 
> guaranteed that you will get the right file. Note, we don''t
rearchive
> on a rename.I believe this facility only exist on the Meta Data Server Node and not 
on the Linux/Solaris clients.  Am I correct?

Thanks.

colin>
> I really think a replicated namespace will be much more intuitive and 
> solves restore. If you prefer
> to build a tar container, that is OK, too. The tar file can have a 
> suffix and then you know it is tar and
> you can tar it back.
>>
>>
>>>
>>>> You can get a full pathname if you want to for catastrophe 
>>>> recovery, but Lustre itself will only speak to the HSM with
FIDs.
>>>> As I said in the other email, although SAM-QFS can do
name-based
>>>> policies, the "name" as far as QFS is concerned is
just the FID,
>>>> so  name-based policies at the copytool level are worthless.   
>>>> Unless we a.) add the path/filename back to the file (EA, or
use a
>>>> tarball wrapper), and b.) modify the SAM policy engine to use
the
>>>> "real" path/filename instead of the FID.
>>>
>>> Currently, we don''t support policy using EA (extended
attributes are
>>> in 5.0). We have had lots of requests for this, especially from our
>>> digital preservation customers.
>> Ah, policy based on EAs would be the general case, yes.
> Yes, this would be a nice feature for us.
>
>    - Harriet
>
> Harriet G. Coverston
> Solaris, Storage Software             |  Email: harriet.coverston at
sun.com
> Sun Microsystems, Inc.                          |  AT&T:  651-554-1515
> 1270 Eagan Industrial Rd., Suite 160       |  Fax:   651-554-1540
> Eagan, MN 55121-1231
>
>
>
>

Harriet G. Coverston

2009-Feb-02 15:07 UTC

head link

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Colin,

On Feb 2, 2009, at 8:56 AM, Colin Ngam wrote:
>>>
>>>
>>> Harriet G. Coverston wrote:
>>>>> There is a mechanism to get the current full pathname for a
>>>>> given fid from userspace, so an HSM-specific copytool could
find
>>>>> it out, but a central tenet of the design here is that as
far as
>>>>> the HSM is concerned, the entire Lustre FS is a flat
namespace
>>>>> of FIDs.
>>>>
>>>> Be careful here. We are a file system. We don''t have a
limit on #
>>>> of files in one directory, but we don''t recommend more
than
>>>> 500,000 files in one single directory or you will start to see
>>>> some performance problems. You will have to create a tree, not
>>>> use a flat namespace.
>>> Yes, a tree based on a hash of the fid.
>>> The other option is to use the actual filename for storage, but  
>>> from Lustre''s point of view this gets extremely tricky. 
For
>>> example:
>>> Send /foo/bar to archive.  Client A opens /foo/bar.  Client B  
>>> renames /foo/bar to /abc/xyz, but this change hasn''t
propagated to
>>> the archive yet.  Client A now tries to read its open file handle,
>>> which tells Lustre to read the offline file FID 123, which it  
>>> translates to /abc/xyz currently, which the archive
doesn''t know
>>> about yet.  Not just xyz, but renames on any ancestor path element
>>> cause similar misses.  Since the FID remains constant throughout  
>>> the life of a file, we don''t have to worry about any
namespace
>>> changes (file or parents).  If there was an alternate way of  
>>> bypassing the archive''s namespace to directly access a
file, we
>>> could conceivably store e.g. an archive-specific identifier within
>>> the Lustre stripe EA, and pass this down to the copytool when  
>>> reading an offline file, but this presupposes that such a thing  
>>> exists, is of reasonable size, has a userspace method to access  
>>> it, etc.
>>
>> Yes, we have a FID like concept in SAM-QFS. It is called the file  
>> ID. It is 64 bits and consists of the inode/generation number. It  
>> is unique. You can store it. You can issue an ioctl to open the ID.  
>> You
>> can issue an ioctl to do an ID stat, etc. It is much more efficient  
>> than using the filename (expensive lookup). This means if you store  
>> and use the ID, you can cover the rename window and still be  
>> guaranteed that you will get the right file. Note, we don''t  
>> rearchive on a rename.
> I believe this facility only exist on the Meta Data Server Node and  
> not on the Linux/Solaris clients.  Am I correct?It is supported on the MDS and the Solaris client nodes, but currently  
not on Linux.

I thought about this a bit. After we do a samfsrestore (reload the  
metadata after a crash of the SAM-QFS disk cache), the ID is not the  
same. Therefore, you would not be able to use this after a SAM restore  
unless
the ID that you are storing is updated. We really need to think about  
this.

- Harriet>
>
> Thanks.
>
> colin
>>
>> I really think a replicated namespace will be much more intuitive  
>> and solves restore. If you prefer
>> to build a tar container, that is OK, too. The tar file can have a  
>> suffix and then you know it is tar and
>> you can tar it back.
>>>
>>>
>>>>
>>>>> You can get a full pathname if you want to for catastrophe
>>>>> recovery, but Lustre itself will only speak to the HSM with
FIDs.
>>>>> As I said in the other email, although SAM-QFS can do
name-based
>>>>> policies, the "name" as far as QFS is concerned
is just the FID,
>>>>> so  name-based policies at the copytool level are
worthless.
>>>>> Unless we a.) add the path/filename back to the file (EA,
or use
>>>>> a tarball wrapper), and b.) modify the SAM policy engine to
use
>>>>> the "real" path/filename instead of the FID.
>>>>
>>>> Currently, we don''t support policy using EA (extended
attributes
>>>> are in 5.0). We have had lots of requests for this, especially
>>>> from our digital preservation customers.
>>> Ah, policy based on EAs would be the general case, yes.
>> Yes, this would be a nice feature for us.
>>
>>   - Harriet
>>
>> Harriet G. Coverston
>> Solaris, Storage Software             |  Email: harriet.coverston at
sun.com
>> Sun Microsystems, Inc.                          |  AT&T:   
>> 651-554-1515
>> 1270 Eagan Industrial Rd., Suite 160       |  Fax:   651-554-1540
>> Eagan, MN 55121-1231
>>
>>
>>
>>
>
    - Harriet

Harriet G. Coverston
Solaris, Storage Software			 |  Email: harriet.coverston at sun.com
Sun Microsystems, Inc.                     	 |  AT&T:  651-554-1515
1270 Eagan Industrial Rd., Suite 160       |  Fax:   651-554-1540
Eagan, MN 55121-1231

Colin Ngam

2009-Feb-02 17:25 UTC

head link

[Lustre-devel] Lustre HSM - some talking points.

Hi Nathan,

I wrote up what I think can be done in SAMQFS to support Lustre HSM 
effort.  They are talking points that will help us to move forward 
quickly.  It needs to be sanitized ...

Thanks.

colin


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Osam-proposal
Url:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20090202/03d17db7/attachment-0001.ksh

Vicky White

2009-Feb-02 17:46 UTC

head link

[Lustre-devel] Lustre HSM - some talking points.

Colin Ngam wrote:
> Before I forget - can someone point me to HPSS API??

See http://www.hpss-collaboration.org/hpss/users/user_doc.jsp.  You 
would want the most recent (6.2) version of the HPSS Programmer''s 
Reference Guide, Volume 1.   The 7.1 version should be out very soon.


Vicky

Vicky White

2009-Feb-02 18:00 UTC

head link

[Lustre-devel] Lustre HSM - some talking points.

> 6.  No namepace.  No namespace.  Lustre pathnames can be stored as Extended
> Attributes.

I realize you are talking about SAMQFS, but do we want to keep the 
design consistent with what we do for hpss?

HPSS will support Extended Attributes in release 7.2, to be available 
September 2009, but it will be expensive to use these EAs for pathnames, 
though it can be done.   The current design is to specify EAs in XML, 
and for the most efficient storage of the EA in the database, the 
containing XML document needs to be 512 bytes or fewer so that it can be 
stored inline.  Larger XML objects will have to be stored externally as 
LOBs (large objects), which will make queries cost more.

So we need to think about what that cost will be when we are considering 
repositories with millions or billions of files.

Vicky

Colin Ngam

2009-Feb-02 19:25 UTC

head link

[Lustre-devel] Lustre HSM - some talking points.

Vicky White wrote:>
>> 6.  No namepace.  No namespace.  Lustre pathnames can be stored as 
>> Extended
>> Attributes.
>
>
> I realize you are talking about SAMQFS, but do we want to keep the 
> design consistent with what we do for hpss?
>
> HPSS will support Extended Attributes in release 7.2, to be available 
> September 2009, but it will be expensive to use these EAs for 
> pathnames, though it can be done.   The current design is to specify 
> EAs in XML, and for the most efficient storage of the EA in the 
> database, the containing XML document needs to be 512 bytes or fewer 
> so that it can be stored inline.  Larger XML objects will have to be 
> stored externally as LOBs (large objects), which will make queries 
> cost more.
>
> So we need to think about what that cost will be when we are 
> considering repositories with millions or billions of files.
>
> Vicky
Hi Vicky,

I do not see why we need to query these EAs in normal operation.  These 
EAs will only be accessed when we need to perform Ultimate Disaster 
Recovery - when you have lost all data on disks and all you have are tapes.

I was thinking about XML - but it is "opaque" to Object SAMQFS so, it
is
up to the Lustre side.  Whatever it is, the Applications - 
Lustre-Restore for example, is the one that has to understand the 
format.  I am not a Tar Header expert - but I assume that these EAs can 
go with the file in the tar ball.  I do not expect to keep any of it on 
line on disk cache, on the SAMFS side.  I see no reason.

With respect to whether it should be consistent with HPSS - I would say 
if that is all we need and it is sufficient - why not.  Otherwise,
let''s
make it better than HPSS.  It must be in SUN''s best interest to sell
SAMQFS?

I do apologize Vicky, are you a SUN employee?

Thanks.

colin

Vicky White

2009-Feb-02 19:54 UTC

head link

[Lustre-devel] Lustre HSM - some talking points.

Colin Ngam wrote:> I do not see why we need to query these EAs in normal operation.  
> These EAs will only be accessed when we need to perform Ultimate 
> Disaster Recovery - when you have lost all data on disks and all you 
> have are tapes.

That would help.  I don''t know what it would cost to store the EA in a 
separate object to begin with, though, and that would be incurred on 
every file.   Plus you have to consider the space it takes up.

> I was thinking about XML - but it is "opaque" to Object SAMQFS
so, it
> is up to the Lustre side.  Whatever it is, the Applications - 
> Lustre-Restore for example, is the one that has to understand the 
> format.  I am not a Tar Header expert - but I assume that these EAs 
> can go with the file in the tar ball.

I think what you put in the tar ball is up to you.   Putting the EAs in 
there regardless of what the hsm was might simplify the design, so you 
wouldn''t have to extract the EA in a different way for each hsm.

I was just trying to keep the hpss EA design in front of folks so that 
if we were considering using that, we knew all the tradeoffs.

> I do not expect to keep any of it on line on disk cache, on the SAMFS 
> side.  I see no reason.
>
> With respect to whether it should be consistent with HPSS - I would 
> say if that is all we need and it is sufficient - why not.  Otherwise, 
> let''s make it better than HPSS.  It must be in SUN''s best
interest to
> sell SAMQFS?

Oh, I''m sure it is.

> I do apologize Vicky, are you a SUN employee?

No.   Were you going to feel sorry for me if I worked for Sun? ;)

Vicky

Colin Ngam

2009-Feb-02 20:42 UTC

head link

[Lustre-devel] Lustre HSM - some talking points.

Vicky White wrote:

Hi Vicky,> Colin Ngam wrote:
>> I do not see why we need to query these EAs in normal operation.  
>> These EAs will only be accessed when we need to perform Ultimate 
>> Disaster Recovery - when you have lost all data on disks and all you 
>> have are tapes.
>
>
> That would help.  I don''t know what it would cost to store the EA
in a
> separate object to begin with, though, and that would be incurred on 
> every file.   Plus you have to consider the space it takes up.
>
>
>> I was thinking about XML - but it is "opaque" to Object
SAMQFS so, it
>> is up to the Lustre side.  Whatever it is, the Applications - 
>> Lustre-Restore for example, is the one that has to understand the 
>> format.  I am not a Tar Header expert - but I assume that these EAs 
>> can go with the file in the tar ball.
>
>
> I think what you put in the tar ball is up to you.   Putting the EAs 
> in there regardless of what the hsm was might simplify the design, so 
> you wouldn''t have to extract the EA in a different way for each
hsm.
>
> I was just trying to keep the hpss EA design in front of folks so that 
> if we were considering using that, we knew all the tradeoffs.Good point.  I guess EA can be anywhere in the tar file, but, best if 
somehow it is put together for fast access.  But then, we do not need it 
unless it is for Ultimate Disaster Recovery .. that should never happen :-))

With respect to space - how about compression?  The problem is, I always 
thought space is cheap.  Does HPSS ever scrub/recycle?  Policy driven?

If EAs are going to be in a Database, I can see it can be a problem.

Path name does not need to be in the EA.  It is needed for the tar 
header only.  I guess the EA will consist of everything that Lustre 
needs to restore a file, completely.>
>
>> I do not expect to keep any of it on line on disk cache, on the SAMFS 
>> side.  I see no reason.
>>
>> With respect to whether it should be consistent with HPSS - I would 
>> say if that is all we need and it is sufficient - why not.  
>> Otherwise, let''s make it better than HPSS.  It must be in
SUN''s best
>> interest to sell SAMQFS?
>
>
> Oh, I''m sure it is.
>
>
>> I do apologize Vicky, are you a SUN employee?
>
>
> No.   Were you going to feel sorry for me if I worked for Sun? ;)No, it''s kind of fun to design with .. and I want to say competitor,
but
I guess you do not really fall into that category.  Say hi to Kim K. or 
Dave W(Cray folks) for me if they cross your path.>
>
> Vicky
>PS-The programmer''s reference is close to 400 pages!  Perhaps I should 
start with User''s Guide.

Vicky White

2009-Feb-02 21:02 UTC

head link

[Lustre-devel] Lustre HSM - some talking points.

> PS-The programmer''s reference is close to 400 pages!  
Alas, yes.

Take two steps backward from it.   Think of chapter 2 as "posix on 
hpss", because that''s basically what it is - a client api
interface to
map all the posix calls into corresponding hpss calls.   The first half 
of the chapter describes the functions and the second half the relevant 
data structures.

The other chapters are gravy - additional kinds of features you can use 
but wouldn''t have to right off the bat, and some of which
you''d never use.

Funny...I thought there used to be some programming examples in the 
back, but maybe that was in another book.

> Perhaps I should start with User''s Guide.

I always think of that as just an explanation of the standard user 
interfaces like ftp and vfs, but you''re right, it does talk about some 
hpss concepts that would be a useful intro.


Vicky

Nathaniel Rutman

2009-Feb-04 00:41 UTC

head link

[Lustre-devel] Lustre HSM - some talking points.

Colin Ngam wrote:

Is OSAM available on Linux?

Object SAMQFS - HSM for Lustre
------------------------------

0.  We re basically looking at the HSM as a Repository right?

    yes


2.  Object SAMQFS meta data(inodes) is used as a database for files that are
archived etc.

    You mean, store the Lustre metadata attributes in these inodes?  Or
    rather that these inodes just keep track of the objects in the
    archive (like block pointers)

3.  This database can be dumped and restored really quick using normal meta
data backup of the HSM.  The inodes are kept in 1 file.  This is not a 
Lustre
dump but rather a dump of Object SAMQFS.  No file data dump is 
required.  Files
not archived yet are irrelevent ..  Incrementals can be obtained by 
comparing
2 full dumps and just keeping the diffs.  Persistent Object SAMQFS file id
can be preserved if we restore a complete version of the dump.  Otherwise,
it can be different.  We can update Lustre with the new file id for the 
given
Lustre File ID.  Consider this error recovery path ..

    If we''re already storing archive-specific opaque data (the SamFID),
    I see no reason why we couldn''t allow the archive to modify that
    value at will.  We''d need to put a lock around it...


4.  Object SAMQFS should have very simple policies - archive immediate, 
number
of copies and when copies to be made etc..  This can actually be passed by
Lustre and executed by Object SAMQFS.  Last thing we want to do is to 
have to
configure 2 Policy engines.

    I was envisioning the Lustre "action list" as a list of files and
    actions.  The actions could be semi-complex (e.g. "archive at level
    4") which would mean something to the archive.


5.  Lustre will store a 16 Bytes Object SAMQFS identifier.  A 8 bytes unique
file system ID and a 8 bytes Object SAMQFS File ID.  An Object SAMQFS 
can only
support 32 bits number of files.  This will be less if we use inodes for
extended attributes etc.  The file system ID will allow us to create 
multiple
Object SAMQFS "mat" file system - provide infinite number of files
that can
be supported.

    Do separate filesystems need separate disks?  This opens up a
    inodecount/filesize relation, or we have to create new OSAM
    filesystems on demand (ENOSPC, create new fs, store file -- hmm, not
    so hard).


6.  No namepace.  Lustre pathnames can be stored as Extended
Attributes.

    No problem except for the disaster recovery scenario.  And even in
    that case we don''t need EAs if we''re storing mini-tarballs
already -
    just add an empty file to the tarball with the actual filename.


7.  Files to be archived and staged in together(associative archiving) to be
given in a list by Lustre.  Object SAMQFS will figure out a way to link 
these
files together and put them on the same tarball - this is not for free.

    It''s actually not clear that this is useful for Lustre.  If the
    point of Lustre HSM is to extend the filesystem space, it makes
    little sense to bother archiving small files.  Anyhow, this can be a
    future optimization.



Basic Object SAMQFS - HSM for Lustre Archive Events
-------------------------------------------

Lustre calls with the following Information:

1.  Luster FID
2.  Luster Opaque Meta Data
3.  Luster Tar File required Data e.g. Path Name
4.  Luster Archiving Policy for this file - must be simple.

Lustre gets back:

1.  Object SAMQFS Identifier.

Depending on asynchronous or synchronous archiving:

1.  Lustre can status with the given "Object SAMQFS Identifier"

    Sounds fine.  Lustre will always use asynchronous archiving, as far
    as I can see.



Basic Object SAMQFS - HSM for Lustre Stage In Events(bring data back)
---------------------------------------------------------------------

1.  Lustre just reads the file with the given "Object SAMQFS
Identifier"


Basic Object SAMQFS - HSM for Lustre status Events(check state)

1.  Lustre perform "sls" command on Object SAMQFS Client.

PS - We can have both User level command and API capabilities.

    well technically, Lustre calls with the following information
    1.  Luster FID
    2.  Luster Opaque Meta Data
    (BTW, that''s Lustre, not Luster)
    OSAM ignores fid and just uses OSAM identifier



Basic Object SAMQFS - HSM for Lustre Delete Event
-------------------------------------------------

1.  Lustre can effectively do an "rm" on the Object SAMQFS Identifier
or
calls an API.


Object SAMQFS Dump and Restore
------------------------------

Independent Administrative event.

Lustre Dump and Restore
-----------------------

Can be an Independent Lustre event.
However, this does have impact on when we can actually delete a file from
tape if a Lustre Dump has a reference to this file e.g.
1.  Archive file.
2.  Dump Lustre.
3.  Delete file.

Now you want to restore the deleted file.

    Dumping the Lustre metadata isn''t something we''ve really
talked
    about before - or, rather, the restore part isn''t :)
    Effectively, the Lustre metadata is (all the data on) the entire MDT
    disk.  I''m not sure it makes any sense to try to be any more
    elaborate than that, but maybe.  It would be nice to be able to e.g.
    dump the disk to a regular (big!) file store in OSAM, so we''ve got
    everything on 1 set of tapes...



Ultimate Disaster Recovery - Directly from Tapes
------------------------------------------------

Requires Tar File to be complete with Lustre Meta Data.
Since this is a recreation of both the Lustre FS and Object SAMQFS
"mat" FS
I would be incline to believe that at a minimum, we will not require the
Object SAMQFS identifier to be persistent from previous incantation.  I 
am also
incline to believe that if you take regular Object SAMQFS dumps, both 
full and
also incrementals and store this safely on tape - you may not need this
procedure .. but then, that''s why we call it Ultimate Recovery.

    If everything is wiped out except the tapes, we would just
    repopulate a new Lustre fs anyhow. Once the OSAM fs is regenerated,
    we walk all the objects and create object placeholders in the new
    Lustre fs referencing the new OSAM fids and marking everything as
    punched.  As users start using files they are pulled back in
    automatically.



Syncing Object SAMQFS with Lustre
---------------------------------

Lustre File Identifier and Object SAMQFS Identifier can get out of sync 
- shit
happens.  We need syncing capabilities.

    Only if we stored enough information to mismatch :)  If Lustre asks
    for a FID, and it gets back the wrong file, it doesn''t /
can''t
    know.  Unless we store the FID inside the file it gets back and we
    verify it.

 

Object SAMQFS - Freeing space on tapes
--------------------------------------

We will need a way to determine with Lustre - conclusively that an 
archive is
no longer needed.

    If Lustre policy manager says "rm", then Lustre has no way to ever
    get that file back.  There''s no time-machine like old versions of
    directories.  Would be a cool feature though.  Maybe archive says
    "ok" to the rm, but secretly holds on to the file for some time in
a
    special "recently deleted" dir?

Colin Ngam

2009-Feb-04 01:29 UTC

head link

[Lustre-devel] Lustre HSM - some talking points.

On Feb 3, 2009, at 6:41 PM, Nathaniel Rutman wrote:

Hi,

If these are all agreeable, lets start drawing up the Spec.
> Colin Ngam wrote:
>
> Is OSAM available on Linux?
Can be access from a Linux Client.   It is another file system type to  
SAMQFS.  We have inserted software restrictions to prevent it from  
being used as a Shared QFS file system type.  This is one of those,  
code is there, needs testing.  I did the code so ...

Keep in mind that the Meta Data Server is still only Solaris.
>
>
> Object SAMQFS - HSM for Lustre
> ------------------------------
>
> 0.  We re basically looking at the HSM as a Repository right?
>
>   yes
>
>
> 2.  Object SAMQFS meta data(inodes) is used as a database for files  
> that are
> archived etc.
>
>   You mean, store the Lustre metadata attributes in these inodes?  Or
>   rather that these inodes just keep track of the objects in the
>   archive (like block pointers)Inodes on the OSAM nodes are for managing the files in archive and the  
link to Lustre.  I expect to store Lustre Meta Data as EA in tar  
file.  I am assuming that we do not need Lustre Meta data on disk  
cache.  Lustre already has it in Lustre .. only need to access these  
for Ultimate Disaster Recovery from tape.>
>
> 3.  This database can be dumped and restored really quick using  
> normal meta
> data backup of the HSM.  The inodes are kept in 1 file.  This is not  
> a Lustre
> dump but rather a dump of Object SAMQFS.  No file data dump is  
> required.  Files
> not archived yet are irrelevent ..  Incrementals can be obtained by  
> comparing
> 2 full dumps and just keeping the diffs.  Persistent Object SAMQFS  
> file id
> can be preserved if we restore a complete version of the dump.   
> Otherwise,
> it can be different.  We can update Lustre with the new file id for  
> the given
> Lustre File ID.  Consider this error recovery path ..
>
>   If we''re already storing archive-specific opaque data (the
SamFID),
>   I see no reason why we couldn''t allow the archive to modify that
>   value at will.  We''d need to put a lock around it...Yes we can.  It is just a matter of how do we initiate this change  
between the archive and Lustre.>
>
>
> 4.  Object SAMQFS should have very simple policies - archive  
> immediate, number
> of copies and when copies to be made etc..  This can actually be  
> passed by
> Lustre and executed by Object SAMQFS.  Last thing we want to do is  
> to have to
> configure 2 Policy engines.
>
>   I was envisioning the Lustre "action list" as a list of files
and
>   actions.  The actions could be semi-complex (e.g. "archive at level
>   4") which would mean something to the archive.Yes, this needs to be defined.  This should include future action like  
"made 2nd copy after 24 hours etc.  SAMQFS has a standard set of  
Policies .. if you want to deviate we will have to provide new code.

We need to define these actions.>
>
>
> 5.  Lustre will store a 16 Bytes Object SAMQFS identifier.  A 8  
> bytes unique
> file system ID and a 8 bytes Object SAMQFS File ID.  An Object  
> SAMQFS can only
> support 32 bits number of files.  This will be less if we use inodes  
> for
> extended attributes etc.  The file system ID will allow us to create  
> multiple
> Object SAMQFS "mat" file system - provide infinite number of
files
> that can
> be supported.
>
>   Do separate filesystems need separate disks?  This opens up a
>   inodecount/filesize relation, or we have to create new OSAM
>   filesystems on demand (ENOSPC, create new fs, store file -- hmm, not
>   so hard).No, a file system is configured using slices/partitions.  More than 1  
FS can reside on the same disk.  There will not be any inodecount/ 
filesize relationship because on the SAMQFS node we will release file  
data space as needed after the file is on Tape.  We also do the
"punch".

Yes FS can be created on demand.>
>
>
> 6.  No namepace.  Lustre pathnames can be stored as Extended
> Attributes.
>
>   No problem except for the disaster recovery scenario.  And even in
>   that case we don''t need EAs if we''re storing
mini-tarballs already -
>   just add an empty file to the tarball with the actual filename.
OK.>
>
>
> 7.  Files to be archived and staged in together(associative  
> archiving) to be
> given in a list by Lustre.  Object SAMQFS will figure out a way to  
> link these
> files together and put them on the same tarball - this is not for  
> free.
>
>   It''s actually not clear that this is useful for Lustre.  If the
>   point of Lustre HSM is to extend the filesystem space, it makes
>   little sense to bother archiving small files.  Anyhow, this can be a
>   future optimization.
Lustre''s call.>
>
>
>
> Basic Object SAMQFS - HSM for Lustre Archive Events
> -------------------------------------------
>
> Lustre calls with the following Information:
>
> 1.  Luster FID
> 2.  Luster Opaque Meta Data
> 3.  Luster Tar File required Data e.g. Path Name
> 4.  Luster Archiving Policy for this file - must be simple.
>
> Lustre gets back:
>
> 1.  Object SAMQFS Identifier.
>
> Depending on asynchronous or synchronous archiving:
>
> 1.  Lustre can status with the given "Object SAMQFS Identifier"
>
>   Sounds fine.  Lustre will always use asynchronous archiving, as far
>   as I can see.
Okay.>
>
>
>
> Basic Object SAMQFS - HSM for Lustre Stage In Events(bring data back)
> ---------------------------------------------------------------------
>
> 1.  Lustre just reads the file with the given "Object SAMQFS  
> Identifier"
>
>
> Basic Object SAMQFS - HSM for Lustre status Events(check state)
>
> 1.  Lustre perform "sls" command on Object SAMQFS Client.
>
> PS - We can have both User level command and API capabilities.
>
>   well technically, Lustre calls with the following information
>   1.  Luster FID
>   2.  Luster Opaque Meta Data
>   (BTW, that''s Lustre, not Luster)
>   OSAM ignores fid and just uses OSAM identifier
Right, Fiber/Fibre :-)

I am missing something here .. Stage-In is to get a file from  
archive .. why do we need Item 2?  Or is 2 OSAM Identifier?  If so,  
great.  I like it.

In this case, we should trust Lustre FID.  The OSAM ID is for a very  
fast search - direct index.
>
>
>
>
> Basic Object SAMQFS - HSM for Lustre Delete Event
> -------------------------------------------------
>
> 1.  Lustre can effectively do an "rm" on the Object SAMQFS  
> Identifier or
> calls an API.
>
>
> Object SAMQFS Dump and Restore
> ------------------------------
>
> Independent Administrative event.
>
> Lustre Dump and Restore
> -----------------------
>
> Can be an Independent Lustre event.
> However, this does have impact on when we can actually delete a file  
> from
> tape if a Lustre Dump has a reference to this file e.g.
> 1.  Archive file.
> 2.  Dump Lustre.
> 3.  Delete file.
>
> Now you want to restore the deleted file.
>
>   Dumping the Lustre metadata isn''t something we''ve
really talked
>   about before - or, rather, the restore part isn''t :)
>   Effectively, the Lustre metadata is (all the data on) the entire MDT
>   disk.  I''m not sure it makes any sense to try to be any more
>   elaborate than that, but maybe.  It would be nice to be able to e.g.
>   dump the disk to a regular (big!) file store in OSAM, so we''ve
got
>   everything on 1 set of tapes...
Lustre''s call.>
>
>
>
> Ultimate Disaster Recovery - Directly from Tapes
> ------------------------------------------------
>
> Requires Tar File to be complete with Lustre Meta Data.
> Since this is a recreation of both the Lustre FS and Object SAMQFS  
> "mat" FS
> I would be incline to believe that at a minimum, we will not require  
> the
> Object SAMQFS identifier to be persistent from previous  
> incantation.  I am also
> incline to believe that if you take regular Object SAMQFS dumps,  
> both full and
> also incrementals and store this safely on tape - you may not need  
> this
> procedure .. but then, that''s why we call it Ultimate Recovery.
>
>   If everything is wiped out except the tapes, we would just
>   repopulate a new Lustre fs anyhow. Once the OSAM fs is regenerated,
>   we walk all the objects and create object placeholders in the new
>   Lustre fs referencing the new OSAM fids and marking everything as
>   punched.  As users start using files they are pulled back in
>   automatically.Yes.  The chances of both a Lustre and OSAM collapse at the same time  
is not very good.>
>
>
>
> Syncing Object SAMQFS with Lustre
> ---------------------------------
>
> Lustre File Identifier and Object SAMQFS Identifier can get out of  
> sync - shit
> happens.  We need syncing capabilities.
>
>   Only if we stored enough information to mismatch :)  If Lustre asks
>   for a FID, and it gets back the wrong file, it doesn''t /
can''t
>   know.  Unless we store the FID inside the file it gets back and we
>   verify it.If you always call with Lustre ID and OSAM ID, if we find that the  
Lustre ID does not match the OSAM ID, because perhaps we have done  
OSAM recovery and we are using a different OSAM ID to hold the Lustre  
ID now, we can search for the inode that match the Lustre ID, fetch  
the file and also update Lustre with the new OSAM ID.
>
>
>
> Object SAMQFS - Freeing space on tapes
> --------------------------------------
>
> We will need a way to determine with Lustre - conclusively that an  
> archive is
> no longer needed.
>
>   If Lustre policy manager says "rm", then Lustre has no way to
ever
>   get that file back.  There''s no time-machine like old versions
of
>   directories.  Would be a cool feature though.  Maybe archive says
>   "ok" to the rm, but secretly holds on to the file for some time
in a
>   special "recently deleted" dir?No namespace - no dir.

If Lustre removes the file, we can delay the scrub.  If Lustre can  
come back with the Lustre ID and OSAM ID, if it has not been scrubbed,  
you can get it back.

Thanks.

colin>
>
>
>

Nathaniel Rutman

2009-Feb-10 00:48 UTC

head link

[Lustre-devel] Lustre HSM - some talking points.

Colin Ngam wrote:> If these are all agreeable, lets start drawing up the Spec.
>sure
>> Basic Object SAMQFS - HSM for Lustre status Events(check state)
>>
>> 1.  Lustre perform "sls" command on Object SAMQFS Client.
>>
>> PS - We can have both User level command and API capabilities.
>>
>>   well technically, Lustre calls with the following information
>>   1.  Luster FID
>>   2.  Luster Opaque Meta Data
>>   (BTW, that''s Lustre, not Luster)
>>   OSAM ignores fid and just uses OSAM identifier
>
> Right, Fiber/Fibre :-)
>
> I am missing something here .. Stage-In is to get a file from archive 
> .. why do we need Item 2?  Or is 2 OSAM Identifier?  If so, great.  I 
> like it.
Yes, item 2 is archive-specific (e.g. OSAM) identifier.>
> In this case, we should trust Lustre FID.  The OSAM ID is for a very 
> fast search - direct index.Agreed, the Lustre FID will be the authoritative value.  OSAM would 
internally use the OSAM identifier for the fast lookup, then verify the 
Lustre FID matches.  If not, we would have to do a slow search for the 
Lustre FID...>> Syncing Object SAMQFS with Lustre
>> ---------------------------------
>>
>> Lustre File Identifier and Object SAMQFS Identifier can get out of 
>> sync - shit
>> happens.  We need syncing capabilities.
>>
>>   Only if we stored enough information to mismatch :)  If Lustre asks
>>   for a FID, and it gets back the wrong file, it doesn''t /
can''t
>>   know.  Unless we store the FID inside the file it gets back and we
>>   verify it.
> If you always call with Lustre ID and OSAM ID, if we find that the 
> Lustre ID does not match the OSAM ID, because perhaps we have done 
> OSAM recovery and we are using a different OSAM ID to hold the Lustre 
> ID now, we can search for the inode that match the Lustre ID, fetch 
> the file and also update Lustre with the new OSAM ID.Perfect.  We will also need a way of modifying the Lustre FID stored in 
the archive for the ultimate disaster recovery, which is equivalent to 
"pre-populating" a Lustre FS with files from the archive.  We create
an
empty file, mark it as "in archive", add the OSAM fid, and need to set
the archive''s Lustre FID to match the new empty file.
>
>>
>>
>>
>> Object SAMQFS - Freeing space on tapes
>> --------------------------------------
>>
>> We will need a way to determine with Lustre - conclusively that an 
>> archive is
>> no longer needed.
>>
>>   If Lustre policy manager says "rm", then Lustre has no way
to ever
>>   get that file back.  There''s no time-machine like old
versions of
>>   directories.  Would be a cool feature though.  Maybe archive says
>>   "ok" to the rm, but secretly holds on to the file for some
time in a
>>   special "recently deleted" dir?
> No namespace - no dir.
>
> If Lustre removes the file, we can delay the scrub.  If Lustre can 
> come back with the Lustre ID and OSAM ID, if it has not been scrubbed, 
> you can get it back.
>Let''s not worry about this for V1.  Once a file is rm''ed from
Lustre, no
way to get back Lustre ID or OSAM ID.

Lustre devel - Jan 2009 - SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Re: SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

[Lustre-devel] Lustre HSM - some talking points.

[Lustre-devel] Lustre HSM - some talking points.

[Lustre-devel] Lustre HSM - some talking points.

[Lustre-devel] Lustre HSM - some talking points.

[Lustre-devel] Lustre HSM - some talking points.

[Lustre-devel] Lustre HSM - some talking points.

[Lustre-devel] Lustre HSM - some talking points.

[Lustre-devel] Lustre HSM - some talking points.

[Lustre-devel] Lustre HSM - some talking points.

[Lustre-devel] Lustre HSM - some talking points.