thr3ads.net - Lustre discuss - [Lustre-discuss] Metadata storage in test script files [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Chris

2012-Apr-30 16:50 UTC

[Lustre-discuss] Metadata storage in test script files

Hi,

Further to previous discussions titled "your opinion about testing"
I''d
like to propose a meta data format for the test script files and would 
obviously welcome peoples input;

Effectively each test in the scripts is represented by a function and a 
call to run_test, so we have

test_function() {
     ...code
}

run_test function "Description of the function"

I''d like to propose that above every function a here document is placed
that contains yaml v1.2 encoded data (yaml.org) with 2 characters for 
the indent. The block will start with << TEST_METADATA and be terminated 
with TEST_METADATA. We might want to place it in a comment block but 
this is not really required. The block will also be wrapped at 80 
characters for readability.

The compulsory elements to the data will be
Name:                Name of the function, this ensures pairing between 
function and comments is not just file relative.
Summary:          Will often be the description after the run_test but 
not always as the tense will change
Description:       A full description of the function, the more 
information here the better.
Components:      This is the component described in the commit message 
(http://wiki.whamcloud.com/display/PUB/Commit+Comments) to make this 
useful we will need to come up a with a defined set of components that 
will need to be enforced in the commit message. The format of this entry 
will be a yaml array.
Prerequisites:    Pre-requisite tests that must be run before this test 
can be run. This is again an array which presumes a test may have 
multiple pre-requisites, but the data should not contain a chain of 
prerequisites, i.e. if A requires B and B requires C, the pre-requisites 
of A is B not B & C.
TicketIDs:             This is an array of ticket numbers that this test 
explicitly tests. In theory we should aim for the state where every 
ticket has a test associated with it, and in future we should be able to 
carry out a gap analysis.

As time goes on we may well expand this compulsory list, but this is I 
believe a sensible starting place.

Being part of the source this data will be subject to the same review 
process as any other change and so we cannot store dynamic data here, 
such as pass rates etc.

Do people think that additional data fields should be permitted on an 
adhoc basis or should a control list of permitted data elements be kept. 
I''m tempted to say that adhoc additional fields should be allowed, 
although this could lead to name clashes if people are not careful.

Below is an simple example.

======================================================================<<TEST_METADATA
Name:
   before_upgrade_create_data
Summary:
   Copies lustre source into a node specific directory and then creates 
a tarball using that directory
Description:
   This should be called prior to upgrading Lustre and creates a set of 
data on the Lustre partition
   which be accessed and checked after the upgrade has taken place. 
Several methods are using
   including tar''ing directories so the can later be untar''ed
and
compared, along with create sha1''s
   of stored data.
Component:
   - lnet
   - recovery
Prerequisites:
   - before_upgrade_clear_filesystem
TicketIDs:
   - LU-123
   - LU-432
TEST_METADATA

test_before_upgrade_create_data() {
    ...
}

run_test before_upgrade_create_data "Copying lustre source into a 
directory $IOP_DIR1, creating and then using source to create a tarball"
======================================================================
As I said comments, inputs and thoughts much appreciated

Thanks

Chris

Roman Grigoryev

2012-Apr-30 18:15 UTC

head link

[Lustre-discuss] Metadata storage in test script files

Hi Cris,
I''m glad to read next emails on this direction.
Please don''t consider this as criticism, I just would like to get more
clearness: what is target of adding this metadata? Do you have plans to
use the metadata in other scripts? How? Does this metadata go to to results?

Also please see more my comments inline:

On 04/30/2012 08:50 PM, Chris wrote:> Hi,
> 
> Further to previous discussions titled "your opinion about
testing" I''d
> like to propose a meta data format for the test script files and would 
> obviously welcome peoples input;
> 
> Effectively each test in the scripts is represented by a function and a 
> call to run_test, so we have
> 
> test_function() {
>      ...code
> }
> 
> run_test function "Description of the function"
> 
> I''d like to propose that above every function a here document is
placed
> that contains yaml v1.2 encoded data (yaml.org) with 2 characters for 
> the indent. The block will start with << TEST_METADATA and be
terminated
> with TEST_METADATA. We might want to place it in a comment block but 
> this is not really required. The block will also be wrapped at 80 
> characters for readability.
> 
> The compulsory elements to the data will be
> Name:                Name of the function, this ensures pairing between 
> function and comments is not just file relative.
> Summary:          Will often be the description after the run_test but 
> not always as the tense will change
> Description:       A full description of the function, the more 
> information here the better.
> Components:      This is the component described in the commit message 
> (http://wiki.whamcloud.com/display/PUB/Commit+Comments) to make this 
> useful we will need to come up a with a defined set of components that 
> will need to be enforced in the commit message. The format of this entry 
> will be a yaml array.
> Prerequisites:    Pre-requisite tests that must be run before this test 
> can be run. This is again an array which presumes a test may have 
> multiple pre-requisites, but the data should not contain a chain of 
> prerequisites, i.e. if A requires B and B requires C, the pre-requisites 
> of A is B not B & C.
On which step do you want to check chains? And what is logical base for
this prerequisites exclude case that current tests have hidden
dependencies?
 I don''t see any difference between one test which have body from tests
a,b,c and this prerequisites definition.
Could you please explain more why we need this field?
> TicketIDs:             This is an array of ticket numbers that this test 
> explicitly tests. In theory we should aim for the state where every 
> ticket has a test associated with it, and in future we should be able to 
> carry out a gap analysis.
> 
I suggest add keywords(Components could be translated as keywords too)
and test type (stress, benchmark, load, functional, negative, etc) for
quick filtering. For example, SLOW could transform to keyword.

Also,  I would like to mention, we have 3 different logical types of data:
1) just human-readable descriptions
2) filtering and targeting fields (Componens, keywords if you agree with
my suggestion)
3) framework directives(Prerequisites)
> As time goes on we may well expand this compulsory list, but this is I 
> believe a sensible starting place.
> 
> Being part of the source this data will be subject to the same review 
> process as any other change and so we cannot store dynamic data here, 
> such as pass rates etc.
What you you think, maybe it is good idea to keep metadata separately?
This can be useful for simplifying changing data via script for mass
modification also as adding tickets and pass rate and execution time on
''gold'' configurations?

Thanks,
	Roman
> 
> Do people think that additional data fields should be permitted on an 
> adhoc basis or should a control list of permitted data elements be kept. 
> I''m tempted to say that adhoc additional fields should be allowed,
> although this could lead to name clashes if people are not careful.
> 
> Below is an simple example.
> 
> ======================================================================>
<<TEST_METADATA
> Name:
>    before_upgrade_create_data
> Summary:
>    Copies lustre source into a node specific directory and then creates 
> a tarball using that directory
> Description:
>    This should be called prior to upgrading Lustre and creates a set of 
> data on the Lustre partition
>    which be accessed and checked after the upgrade has taken place. 
> Several methods are using
>    including tar''ing directories so the can later be
untar''ed and
> compared, along with create sha1''s
>    of stored data.
> Component:
>    - lnet
>    - recovery
> Prerequisites:
>    - before_upgrade_clear_filesystem
> TicketIDs:
>    - LU-123
>    - LU-432
> TEST_METADATA
> 
> test_before_upgrade_create_data() {
>     ...
> }
> 
> run_test before_upgrade_create_data "Copying lustre source into a 
> directory $IOP_DIR1, creating and then using source to create a
tarball"
> ======================================================================> 
> As I said comments, inputs and thoughts much appreciated
> 
> Thanks
> 
> Chris
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Chris

2012-May-01 16:17 UTC

head link

[Lustre-discuss] Metadata storage in test script files

On 30/04/2012 19:15, Roman Grigoryev wrote:> Hi Cris,
> I''m glad to read next emails on this direction.
> Please don''t consider this as criticism, I just would like to get
more
> clearness: what is target of adding this metadata? Do you have plans to
> use the metadata in other scripts? How? Does this metadata go to to
results?
>
> Also please see more my comments inline:The metadata can be used in a multitude of ways, for example we can 
create dynamic test sets based on
the changes made or target area of testing. What we are doing here is 
creating an understanding of the
tests that we have so that we can improve our processes and testing 
capabilities in the future.

The metadata does not go to the results. The metadata is a database in 
it''s own right and should metadata
about a test be required it would be accessed from the source (database) 
itself.
> On 04/30/2012 08:50 PM, Chris wrote:... snip ...
>> Prerequisites:    Pre-requisite tests that must be run before this test
>> can be run. This is again an array which presumes a test may have
>> multiple pre-requisites, but the data should not contain a chain of
>> prerequisites, i.e. if A requires B and B requires C, the
pre-requisites
>> of A is B not B&  C.
> On which step do you want to check chains? And what is logical base for
> this prerequisites exclude case that current tests have hidden
> dependencies?
>   I don''t see any difference between one test which have body from
tests
> a,b,c and this prerequisites definition.
> Could you please explain more why we need this field?As I said we can mine this data any-time and anyway that we want, and 
the purpose of this
discussion is the data not how we use it. But as an example something 
that dynamically built
test sets would need to know prerequisites.

The suffix of a,b,c could be used to generate prerequisite information 
but it is firstly inflexible, for example
I bet ''b'',''c'' and ''d'' are
often dependent on ''a'' but not each other,
secondly and more importantly we want a
standard form for storing metadata because we want to introduce order 
and knowledge into the test
scripts that we have today.
>> TicketIDs:             This is an array of ticket numbers that this
test
>> explicitly tests. In theory we should aim for the state where every
>> ticket has a test associated with it, and in future we should be able
to
>> carry out a gap analysis.
>>
> I suggest add keywords(Components could be translated as keywords too)
> and test type (stress, benchmark, load, functional, negative, etc) for
> quick filtering. For example, SLOW could transform to keyword.This seems like a reasonable idea although we need a name that describes 
what it is,
we will need to define that set of possible words as we need to with the 
Components elements.

What should this field be called - we should not reduce the value of 
this data why genericizing it
into ''keywords''.> Also,  I would like to mention, we have 3 different logical types of data:
> 1) just human-readable descriptions
> 2) filtering and targeting fields (Componens, keywords if you agree with
> my suggestion)
> 3) framework directives(Prerequisites)
>
>> As time goes on we may well expand this compulsory list, but this is I
>> believe a sensible starting place.
>>
>> Being part of the source this data will be subject to the same review
>> process as any other change and so we cannot store dynamic data here,
>> such as pass rates etc.
> What you you think, maybe it is good idea to keep metadata separately?
> This can be useful for simplifying changing data via script for mass
> modification also as adding tickets and pass rate and execution time on
> ''gold'' configurations?It would be easier to store the data separately and we could use Maloo 
but it''s very important
that this data becomes part of the Lustre ''source'' so that
everybody can
benefit from it. Adding
tickets is not a problem as part of the resolution issue is to ensure 
that at least one test exercises
the problem and proves it has been fixed, the fact that this assurance 
process requires active
interaction by an engineer with the scripts is a positive.

As for pass rate, execution time and gold configurations this 
information is just not 1 dimensional
enough to store in the source.

Chris

Roman Grigoryev

2012-May-02 03:23 UTC

head link

[Lustre-discuss] Metadata storage in test script files

Hi Cris,

On 05/01/2012 08:17 PM, Chris wrote:> On 30/04/2012 19:15, Roman Grigoryev wrote:
>> Hi Cris,
>> I''m glad to read next emails on this direction.
>> Please don''t consider this as criticism, I just would like to
get more
>> clearness: what is target of adding this metadata? Do you have plans to
>> use the metadata in other scripts? How? Does this metadata go to to
>> results?
>>
>> Also please see more my comments inline:
> The metadata can be used in a multitude of ways, for example we can
> create dynamic test sets based on
> the changes made or target area of testing. What we are doing here is
> creating an understanding of the
> tests that we have so that we can improve our processes and testing
> capabilities in the future.
I think that when are are defining tool we should say about purpose.
F.e. good description  and summary is not needed for creating dynamic
test sets. I think, it very important to say how will we use it.
Continue of this idea please read below.
> 
> The metadata does not go to the results. The metadata is a database in
> it''s own right and should metadata
> about a test be required it would be accessed from the source (database)
> itself.
I think fields like title, summary, and, possible. description should be
present in results too. It can be very helpful for quickly understanding
test results.
> 
>> On 04/30/2012 08:50 PM, Chris wrote:
> ... snip ...
> 
>>> Prerequisites:    Pre-requisite tests that must be run before this
test
>>> can be run. This is again an array which presumes a test may have
>>> multiple pre-requisites, but the data should not contain a chain of
>>> prerequisites, i.e. if A requires B and B requires C, the
pre-requisites
>>> of A is B not B&  C.
>> On which step do you want to check chains? And what is logical base for
>> this prerequisites exclude case that current tests have hidden
>> dependencies?
>>   I don''t see any difference between one test which have body
from tests
>> a,b,c and this prerequisites definition.
>> Could you please explain more why we need this field?
> As I said we can mine this data any-time and anyway that we want, and
> the purpose of this
> discussion is the data not how we use it. But as an example something
> that dynamically built
> test sets would need to know prerequisites.
> 
> The suffix of a,b,c could be used to generate prerequisite information
> but it is firstly inflexible, for example
> I bet ''b'',''c'' and ''d''
are often dependent on ''a'' but not each other,
> secondly and more importantly we want a
> standard form for storing metadata because we want to introduce order
> and knowledge into the test
> scripts that we have today.
Why I asked about way of usage: if we want to use this information in
scripts and in other automated way we must strictly specify logic on
items and provides tool for check it.

F.e. we will use it when built test execution queue. We have chain like
this: test C prerequisite B, test B prerequisite A. Test A doesn''t have
prerequisite. In one good day test A became excluded. Is it possible to
execute test C?
But if we will not use it in scripting there is no big logical problem.

(My opinion: I don''t like this situation and think that test
dependencies should be used only in very specific and rare case.)
> 
>>> TicketIDs:             This is an array of ticket numbers that this
test
>>> explicitly tests. In theory we should aim for the state where every
>>> ticket has a test associated with it, and in future we should be
able to
>>> carry out a gap analysis.
>>>
>> I suggest add keywords(Components could be translated as keywords too)
>> and test type (stress, benchmark, load, functional, negative, etc) for
>> quick filtering. For example, SLOW could transform to keyword.
> This seems like a reasonable idea although we need a name that describes
> what it is,
> we will need to define that set of possible words as we need to with the
> Components elements.
I mean that ''keywords'' should be separated from components but
could be
logically included. I think, ''Components'' is special type of
keywords.
> 
> What should this field be called - we should not reduce the value of
> this data why genericizing it
> into ''keywords''.
>> Also,  I would like to mention, we have 3 different logical types of
>> data:
>> 1) just human-readable descriptions
>> 2) filtering and targeting fields (Componens, keywords if you agree
with
>> my suggestion)
>> 3) framework directives(Prerequisites)
>>
>>> As time goes on we may well expand this compulsory list, but this
is I
>>> believe a sensible starting place.
>>>
>>> Being part of the source this data will be subject to the same
review
>>> process as any other change and so we cannot store dynamic data
here,
>>> such as pass rates etc.
>> What you you think, maybe it is good idea to keep metadata separately?
>> This can be useful for simplifying changing data via script for mass
>> modification also as adding tickets and pass rate and execution time on
>> ''gold'' configurations?
> It would be easier to store the data separately and we could use Maloo
> but it''s very important
> that this data becomes part of the Lustre ''source'' so
that everybody can
> benefit from it. Adding
> tickets is not a problem as part of the resolution issue is to ensure
> that at least one test exercises
> the problem and proves it has been fixed, the fact that this assurance
> process requires active
> interaction by an engineer with the scripts is a positive.
> 
> As for pass rate, execution time and gold configurations this
> information is just not 1 dimensional
> enough to store in the source.
> 
I''m not accidentally in previous letter said about group of fields. All
meta data may be separated by rare and often changed fields. F.e.
Summary will change not so often. But test timeout in golden
configuration (I mean that this timeout will be set as default based on
''gold'' configuration and can be overloaded in specific
configuration)
could be more variable(and possible more important for testing).

 Using separated files provides more flexibility and nobody stop us to
commit it to lustre repo and it became " Lustre
''source''". In separated
files we can use format which we want and all information will be
available without parsing shell script or without running it. More over,
in great future, it give us very simple migration from shell to other
language.


Few words how we done this task in our wrapper test framework(see
attached sample yaml):

The file contains set of tags. Main entity is test, in this sample
element <id> is <Tests> array define logic entity
''test''. Every test
inherit vales from common description (fields which  described out of
<Tests> array). A test can override any field or add new fields.

<groupname>, <executor>, <description>, <reference>,
<roles>, <tags> -
are common fields. All other are executor-specific and used in executors.

-- 
Thanks,
	Roman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: conf-sanity_tests.yaml
Type: application/x-yaml
Size: 1546 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120502/ad6182fe/attachment.bin

Andreas Dilger

2012-May-02 04:14 UTC

head link

[Lustre-discuss] Metadata storage in test script files

On 2012-05-01, at 9:23 PM, Roman Grigoryev wrote:> On 05/01/2012 08:17 PM, Chris wrote:
>> The metadata can be used in a multitude of ways, for example we can
>> create dynamic test sets based on
>> the changes made or target area of testing. What we are doing here is
>> creating an understanding of the
>> tests that we have so that we can improve our processes and testing
>> capabilities in the future.
> 
> I think that when are are defining tool we should say about purpose.
> F.e. good description  and summary is not needed for creating dynamic
> test sets. I think, it very important to say how will we use it.
> Continue of this idea please read below.
> 
>> The metadata does not go to the results. The metadata is a database in
>> it''s own right and should metadata about a test be required it
would be accessed from the source (database) itself.
> 
> I think fields like title, summary, and, possible. description should be
> present in results too. It can be very helpful for quickly understanding
> test results.
I think what Chris was suggesting is the opposite of what you state here.  He
was writing that the "test metadata" under discussion here is the
static description of the test to be stored with the test itself.  Chris is
specifically excluding any runtime data from being stored with the test, not (as
you suggest) excluding the display of this description in the test results.
>>> On 04/30/2012 08:50 PM, Chris wrote:
>>>> Prerequisites:    Pre-requisite tests that must be run before
this test can be run. This is again an array which presumes a test may
>>>> have multiple pre-requisites, but the data should not contain a
>>>> chain of prerequisites, i.e. if A requires B and B requires C,
the
>>>> pre-requisites of A is B not B & C.
>>> On which step do you want to check chains? And what is logical base
>>> for this prerequisites exclude case that current tests have hidden
>>> dependencies?
>>>  I don''t see any difference between one test which have
body from tests a,b,c and this prerequisites definition.
>>> Could you please explain more why we need this field?
>> As I said we can mine this data any-time and anyway that we want, and
>> the purpose of this discussion is the data not how we use it. But as
>> an example something that dynamically built
>> test sets would need to know prerequisites.
>> 
>> The suffix of a,b,c could be used to generate prerequisite information
>> but it is firstly inflexible, for example I bet
''b'',''c'' and ''d'' are
>> often dependent on ''a'' but not each other, secondly
and more
>> importantly we want a standard form for storing metadata because we
>> want to introduce order and knowledge into the test
>> scripts that we have today.
> 
> Why I asked about way of usage: if we want to use this information in
> scripts and in other automated way we must strictly specify logic on
> items and provides tool for check it.
I think it is sufficient to have a well-structured repository of test
metadata, and then multiple uses can be found for this data.  Even for
human use, a good description of what the test is supposed to check,
and why this test exists would be a good start.

The test metadata format is extensible, so should we need more fields
in the future it will be possible to add them.  I think the hardest
work will be to get good text descriptions of the tests, not mechanical
issues like dependencies and such.
> F.e. we will use it when built test execution queue. We have chain like
> this: test C prerequisite B, test B prerequisite A. Test A doesn''t
have
> prerequisite. In one good day test A became excluded. Is it possible to
> execute test C?
> But if we will not use it in scripting there is no big logical problem.
> 
> (My opinion: I don''t like this situation and think that test
> dependencies should be used only in very specific and rare case.)
> 
>> 
>>>> TicketIDs:             This is an array of ticket numbers that
this test
>>>> explicitly tests. In theory we should aim for the state where
>>>> every ticket has a test associated with it, and in future we
>>>> should be able to carry out a gap analysis.
>>>> 
>>> I suggest add keywords(Components could be translated as keywords
too) and test type (stress, benchmark, load, functional, negative,
>>> etc) for quick filtering. For example, SLOW could transform to
>>> keyword.
>> This seems like a reasonable idea although we need a name that
describes what it is, we will need to define that set of possible
>> words as we need to with the Components elements.
> 
> I mean that ''keywords'' should be separated from
components but could be
> logically included. I think, ''Components'' is special type
of keywords.
> 
>> What should this field be called - we should not reduce the value of
>> this data why genericizing it into ''keywords''.
>> 
>>> Also,  I would like to mention, we have 3 different logical types
of
>>> data:
>>> 1) just human-readable descriptions
>>> 2) filtering and targeting fields (Componens, keywords if you agree
with
>>> my suggestion)
>>> 3) framework directives(Prerequisites)
>>> 
>>>> As time goes on we may well expand this compulsory list, but
this is I
>>>> believe a sensible starting place.
>>>> 
>>>> Being part of the source this data will be subject to the same
review
>>>> process as any other change and so we cannot store dynamic data
here,
>>>> such as pass rates etc.
>>> What you you think, maybe it is good idea to keep metadata
separately?
>>> This can be useful for simplifying changing data via script for
mass
>>> modification also as adding tickets and pass rate and execution
time on
>>> ''gold'' configurations?
>> It would be easier to store the data separately and we could use Maloo
>> but it''s very important that this data becomes part of the
Lustre
>> ''source'' so that everybody can benefit from it.
Adding tickets is
>> not a problem as part of the resolution issue is to ensure that at
>> least one test exercises the problem and proves it has been fixed,
>> the fact that this assurance process requires active
>> interaction by an engineer with the scripts is a positive.
>> 
>> As for pass rate, execution time and gold configurations this
>> information is just not 1 dimensional enough to store in the source.
> 
> I''m not accidentally in previous letter said about group of
fields. All
> meta data may be separated by rare and often changed fields. F.e.
> Summary will change not so often. But test timeout in golden
> configuration (I mean that this timeout will be set as default based on
> ''gold'' configuration and can be overloaded in specific
configuration)
> could be more variable(and possible more important for testing).
I think this is something that needs to live outside the test metadata
being described here.  The definition of "golden configuration" is
hard to define, and depends heavily on factors that change from one
environment to the next.

Ideally, tests will be written so that they can run under a wide range
of configurations (number of clients, servers, virtual and real nodes).
A further goal might be to allow many non-destructive functional subtests
to be run in parallel, which would further skew the time taken, but
would allow much more efficient use of test resources.
> Using separated files provides more flexibility and nobody stop us to
> commit it to lustre repo and it became " Lustre
''source''". In separated
> files we can use format which we want and all information will be
> available without parsing shell script or without running it. More over,
> in great future, it give us very simple migration from shell to other
> language.
I think the metadata format should be chosen so that it is trivial to
extract the test metadata without having to execute or parse the shell
(or other) test language itself.  Simple filtering and regexp should
be enough.
> Few words how we done this task in our wrapper test framework(see
> attached sample yaml):
> 
> The file contains set of tags. Main entity is test, in this sample
> element <id> is <Tests> array define logic entity
''test''. Every test
> inherit vales from common description (fields which  described out of
> <Tests> array). A test can override any field or add new fields.
> 
> <groupname>, <executor>, <description>,
<reference>, <roles>, <tags> -
> are common fields. All other are executor-specific and used in executors.
> 
> -- 
> Thanks,
> 	Roman
>
<conf-sanity_tests.yaml>_______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Lustre Engineer            http://www.whamcloud.com/

Chris

2012-May-02 09:25 UTC

head link

[Lustre-discuss] Metadata storage in test script files

On 02/05/2012 04:23, Roman Grigoryev wrote:> Hi Cris,
>
> On 05/01/2012 08:17 PM, Chris wrote:
>> The metadata can be used in a multitude of ways, for example we can
>> create dynamic test sets based on
>> the changes made or target area of testing. What we are doing here is
>> creating an understanding of the
>> tests that we have so that we can improve our processes and testing
>> capabilities in the future.
> I think that when are are defining tool we should say about purpose.
> F.e. good description  and summary is not needed for creating dynamic
> test sets. I think, it very important to say how will we use it.
> Continue of this idea please read below.The purpose is to enable use to develop and store knowledge/information 
about the tests, the information should be in a conical form, objective 
and correct. If we do this then the whole community can make use of it 
as they see fit. I want to ensure that the initial set of stored 
variables describes the tests as completely as reasonably possible. The 
conical description of each test is not effected by the usage to which 
the data is put.
>> The metadata does not go to the results. The metadata is a database in
>> it''s own right and should metadata
>> about a test be required it would be accessed from the source
(database)
>> itself.
> I think fields like title, summary, and, possible. description should be
> present in results too. It can be very helpful for quickly understanding
> test results.They can be presented as part of results but I would not store with the 
results, if for example Maloo presents the description it will fetch it 
from the correct version of the source, we should not be making copies 
of data.

I cannot suppose if you should store this information with your results 
because I have no insight into your private testing
practices.>
>>> On 04/30/2012 08:50 PM, Chris wrote:
>> ... snip ...
>>
>>
>> As I said we can mine this data any-time and anyway that we want, and
>> the purpose of this
>> discussion is the data not how we use it. But as an example something
>> that dynamically built
>> test sets would need to know prerequisites.
>>
>> The suffix of a,b,c could be used to generate prerequisite information
>> but it is firstly inflexible, for example
>> I bet ''b'',''c'' and
''d'' are often dependent on ''a'' but not each
other,
>> secondly and more importantly we want a
>> standard form for storing metadata because we want to introduce order
>> and knowledge into the test
>> scripts that we have today.
> Why I asked about way of usage: if we want to use this information in
> scripts and in other automated way we must strictly specify logic on
> items and provides tool for check it.
>
> F.e. we will use it when built test execution queue. We have chain like
> this: test C prerequisite B, test B prerequisite A. Test A doesn''t
have
> prerequisite. In one good day test A became excluded. Is it possible to
> execute test C?
> But if we will not use it in scripting there is no big logical problem.
>
> (My opinion: I don''t like this situation and think that test
> dependencies should be used only in very specific and rare case.)I don''t think people should introduce dependencies either, but they
have
and we have to deal with that fact. In your example if C is dependent on 
A and A is removed then C cannot be run.>
>>> I suggest add keywords(Components could be translated as keywords
too)
>>> and test type (stress, benchmark, load, functional, negative, etc)
for
>>> quick filtering. For example, SLOW could transform to keyword.
>> This seems like a reasonable idea although we need a name that
describes
>> what it is,
>> we will need to define that set of possible words as we need to with
the
>> Components elements.
> I mean that ''keywords'' should be separated from
components but could be
> logically included. I think, ''Components'' is special type
of keywords.I don''t think of Components as a keyword, I think of it as a factual 
piece of data and if we want to add the test purpose then we should call 
it that. The use of keywords in data is generally a typeless catch-all. 
All of this metadata should be clear and well defined which does not in 
my opinion allow scope for a keywords element.

I would suggest that we add a variable called Purposes which is an array 
containing a set of predefined elements like stress, benchmark, load and 
functional etc.

For example

Purposes:
   - stress
   - load
>> It would be easier to store the data separately and we could use Maloo
>> but it''s very important
>> that this data becomes part of the Lustre ''source'' so
that everybody can
>> benefit from it. Adding
>> tickets is not a problem as part of the resolution issue is to ensure
>> that at least one test exercises
>> the problem and proves it has been fixed, the fact that this assurance
>> process requires active
>> interaction by an engineer with the scripts is a positive.
>>
>> As for pass rate, execution time and gold configurations this
>> information is just not 1 dimensional
>> enough to store in the source.
>>
> I''m not accidentally in previous letter said about group of
fields. All
> meta data may be separated by rare and often changed fields. F.e.
> Summary will change not so often. But test timeout in golden
> configuration (I mean that this timeout will be set as default based on
> ''gold'' configuration and can be overloaded in specific
configuration)
> could be more variable(and possible more important for testing).What exactly is a gold configuration? Lustre has such breadth of 
possibilities that gold configurations would be a matrix of 
distro/architecture/distro version/interconnect/cpu 
speed/memory/storage/oss count/client count/... . To try and summarise 
this into some useful single value does not make any sense to
me.>   Using separated files provides more flexibility and nobody stop us to
> commit it to lustre repo and it became " Lustre
''source''". In separated
> files we can use format which we want and all information will be
> available without parsing shell script or without running it. More over,
> in great future, it give us very simple migration from shell to other
> language.This data is valuable and needs to be treated with the same respect and 
discipline as we treat the source, to imagine we can have a ''free for 
all'' where people just update it at will does not work. The controls on
what goes into the Lustre tree are there for very good reason and we are 
not going to circumvent those controls. We have to invest in this as we 
do with all the test infrastructure, it cannot be done on the cheap.

Parsing the scripts for the data is easy because computers are really 
good at it. I would expect someone will write a library to access and 
modify the data as required, I''d also expect them to publish that
library.

If test''s were re-written then this data will probably change, and the 
cost of migrating unchanged data will be insignificant compared to the 
cost of re-writing the test itself.

Chris

Roman

2012-May-02 15:44 UTC

head link

[Lustre-discuss] Metadata storage in test script files

Hi Andreas,

On 05/02/2012 08:14 AM, Andreas Dilger wrote:> On 2012-05-01, at 9:23 PM, Roman Grigoryev wrote:
>>>> On 04/30/2012 08:50 PM, Chris wrote:
>>>>> Prerequisites:    Pre-requisite tests that must be run
before this test can be run. This is again an array which presumes a test may
>>>>> have multiple pre-requisites, but the data should not
contain a
>>>>> chain of prerequisites, i.e. if A requires B and B requires
C, the
>>>>> pre-requisites of A is B not B & C.
>>>> On which step do you want to check chains? And what is logical
base
>>>> for this prerequisites exclude case that current tests have
hidden
>>>> dependencies?
>>>>  I don''t see any difference between one test which
have body from tests a,b,c and this prerequisites definition.
>>>> Could you please explain more why we need this field?
>>> As I said we can mine this data any-time and anyway that we want,
and
>>> the purpose of this discussion is the data not how we use it. But
as
>>> an example something that dynamically built
>>> test sets would need to know prerequisites.
>>>
>>> The suffix of a,b,c could be used to generate prerequisite
information
>>> but it is firstly inflexible, for example I bet
''b'',''c'' and ''d'' are
>>> often dependent on ''a'' but not each other,
secondly and more
>>> importantly we want a standard form for storing metadata because we
>>> want to introduce order and knowledge into the test
>>> scripts that we have today.
>>
>> Why I asked about way of usage: if we want to use this information in
>> scripts and in other automated way we must strictly specify logic on
>> items and provides tool for check it.
> 
> I think it is sufficient to have a well-structured repository of test
> metadata, and then multiple uses can be found for this data.  Even for
> human use, a good description of what the test is supposed to check,
> and why this test exists would be a good start.
I absolute agree that good description, summary and other fields are
very important.> 
> The test metadata format is extensible, so should we need more fields
> in the future it will be possible to add them.  I think the hardest
> work will be to get good text descriptions of the tests, not mechanical
> issues like dependencies and such.
I think this work will be pretty long and I suggest to ask it only for
new and changed tests. In this case, possibility to have some kind of
description inheritance is good solution.
> 
>> F.e. we will use it when built test execution queue. We have chain like
>> this: test C prerequisite B, test B prerequisite A. Test A
doesn''t have
>> prerequisite. In one good day test A became excluded. Is it possible to
>> execute test C?
>> But if we will not use it in scripting there is no big logical problem.
>>
>> (My opinion: I don''t like this situation and think that test
>> dependencies should be used only in very specific and rare case.)
>>
>>>
>>>>> TicketIDs:             This is an array of ticket numbers
that this test
>>>>> explicitly tests. In theory we should aim for the state
where
>>>>> every ticket has a test associated with it, and in future
we
>>>>> should be able to carry out a gap analysis.
>>>>>
>>>> I suggest add keywords(Components could be translated as
keywords too) and test type (stress, benchmark, load, functional, negative,
>>>> etc) for quick filtering. For example, SLOW could transform to
>>>> keyword.
>>> This seems like a reasonable idea although we need a name that
describes what it is, we will need to define that set of possible
>>> words as we need to with the Components elements.
>>
>> I mean that ''keywords'' should be separated from
components but could be
>> logically included. I think, ''Components'' is special
type of keywords.
>>
>>> What should this field be called - we should not reduce the value
of
>>> this data why genericizing it into ''keywords''.
>>>
>>>> Also,  I would like to mention, we have 3 different logical
types of
>>>> data:
>>>> 1) just human-readable descriptions
>>>> 2) filtering and targeting fields (Componens, keywords if you
agree with
>>>> my suggestion)
>>>> 3) framework directives(Prerequisites)
>>>>
>>>>> As time goes on we may well expand this compulsory list,
but this is I
>>>>> believe a sensible starting place.
>>>>>
>>>>> Being part of the source this data will be subject to the
same review
>>>>> process as any other change and so we cannot store dynamic
data here,
>>>>> such as pass rates etc.
>>>> What you you think, maybe it is good idea to keep metadata
separately?
>>>> This can be useful for simplifying changing data via script for
mass
>>>> modification also as adding tickets and pass rate and execution
time on
>>>> ''gold'' configurations?
>>> It would be easier to store the data separately and we could use
Maloo
>>> but it''s very important that this data becomes part of the
Lustre
>>> ''source'' so that everybody can benefit from it.
Adding tickets is
>>> not a problem as part of the resolution issue is to ensure that at
>>> least one test exercises the problem and proves it has been fixed,
>>> the fact that this assurance process requires active
>>> interaction by an engineer with the scripts is a positive.
>>>
>>> As for pass rate, execution time and gold configurations this
>>> information is just not 1 dimensional enough to store in the
source.
>>
>> I''m not accidentally in previous letter said about group of
fields. All
>> meta data may be separated by rare and often changed fields. F.e.
>> Summary will change not so often. But test timeout in golden
>> configuration (I mean that this timeout will be set as default based on
>> ''gold'' configuration and can be overloaded in
specific configuration)
>> could be more variable(and possible more important for testing).
> 
> I think this is something that needs to live outside the test metadata
> being described here.  The definition of "golden configuration"
is
> hard to define, and depends heavily on factors that change from one
> environment to the next.
We could separate dynamic and static metadata. But it will be good if
both set of data use one engine and storage type with just different
sources.
> 
> Ideally, tests will be written so that they can run under a wide range
> of configurations (number of clients, servers, virtual and real nodes).
> A further goal might be to allow many non-destructive functional subtests
> to be run in parallel, which would further skew the time taken, but
> would allow much more efficient use of test resources.
It will be very good if we have big enough set of fully independent tests.
> 
>> Using separated files provides more flexibility and nobody stop us to
>> commit it to lustre repo and it became " Lustre
''source''". In separated
>> files we can use format which we want and all information will be
>> available without parsing shell script or without running it. More
over,
>> in great future, it give us very simple migration from shell to other
>> language.
> 
> I think the metadata format should be chosen so that it is trivial to
> extract the test metadata without having to execute or parse the shell
> (or other) test language itself.  Simple filtering and regexp should
> be enough.
> 
Why do you want do ''filtering and regexp'' with some error
probability
for selecting data and also do special code injection to shell script
when we can avoid it? It is good chance to start work for run away from
shell there. If question is in developers comfort  I prefer to suggest
tools for checking metadata completeness then have code and metadata in
one file.
Also,  I don''t see good way to use ''metadata
inheritance'' way in shell
without adding pretty unclear shell code, so switch to metadata usage
should be one-monent or test framework just ignore it and metadata
became just static text for external scripts.

-- 
Thanks,
	Roman

Roman Grigoryev

2012-May-02 15:53 UTC

head link

[Lustre-discuss] Metadata storage in test script files

Hi Andreas,

On 05/02/2012 08:14 AM, Andreas Dilger wrote:> On 2012-05-01, at 9:23 PM, Roman Grigoryev wrote:
>>>> On 04/30/2012 08:50 PM, Chris wrote:
>>>>> Prerequisites:    Pre-requisite tests that must be run
before this test can be run. This is again an array which presumes a test may
>>>>> have multiple pre-requisites, but the data should not
contain a
>>>>> chain of prerequisites, i.e. if A requires B and B requires
C, the
>>>>> pre-requisites of A is B not B & C.
>>>> On which step do you want to check chains? And what is logical
base
>>>> for this prerequisites exclude case that current tests have
hidden
>>>> dependencies?
>>>>  I don''t see any difference between one test which
have body from tests a,b,c and this prerequisites definition.
>>>> Could you please explain more why we need this field?
>>> As I said we can mine this data any-time and anyway that we want,
and
>>> the purpose of this discussion is the data not how we use it. But
as
>>> an example something that dynamically built
>>> test sets would need to know prerequisites.
>>>
>>> The suffix of a,b,c could be used to generate prerequisite
information
>>> but it is firstly inflexible, for example I bet
''b'',''c'' and ''d'' are
>>> often dependent on ''a'' but not each other,
secondly and more
>>> importantly we want a standard form for storing metadata because we
>>> want to introduce order and knowledge into the test
>>> scripts that we have today.
>>
>> Why I asked about way of usage: if we want to use this information in
>> scripts and in other automated way we must strictly specify logic on
>> items and provides tool for check it.
> 
> I think it is sufficient to have a well-structured repository of test
> metadata, and then multiple uses can be found for this data.  Even for
> human use, a good description of what the test is supposed to check,
> and why this test exists would be a good start.
I absolute agree that good description, summary and other fields are
very important.> 
> The test metadata format is extensible, so should we need more fields
> in the future it will be possible to add them.  I think the hardest
> work will be to get good text descriptions of the tests, not mechanical
> issues like dependencies and such.
I think this work will be pretty long and I suggest to ask it only for
new and changed tests. In this case, possibility to have some kind of
description inheritance is good solution.
> 
>> F.e. we will use it when built test execution queue. We have chain like
>> this: test C prerequisite B, test B prerequisite A. Test A
doesn''t have
>> prerequisite. In one good day test A became excluded. Is it possible to
>> execute test C?
>> But if we will not use it in scripting there is no big logical problem.
>>
>> (My opinion: I don''t like this situation and think that test
>> dependencies should be used only in very specific and rare case.)
>>
>>>
>>>>> TicketIDs:             This is an array of ticket numbers
that this test
>>>>> explicitly tests. In theory we should aim for the state
where
>>>>> every ticket has a test associated with it, and in future
we
>>>>> should be able to carry out a gap analysis.
>>>>>
>>>> I suggest add keywords(Components could be translated as
keywords too) and test type (stress, benchmark, load, functional, negative,
>>>> etc) for quick filtering. For example, SLOW could transform to
>>>> keyword.
>>> This seems like a reasonable idea although we need a name that
describes what it is, we will need to define that set of possible
>>> words as we need to with the Components elements.
>>
>> I mean that ''keywords'' should be separated from
components but could be
>> logically included. I think, ''Components'' is special
type of keywords.
>>
>>> What should this field be called - we should not reduce the value
of
>>> this data why genericizing it into ''keywords''.
>>>
>>>> Also,  I would like to mention, we have 3 different logical
types of
>>>> data:
>>>> 1) just human-readable descriptions
>>>> 2) filtering and targeting fields (Componens, keywords if you
agree with
>>>> my suggestion)
>>>> 3) framework directives(Prerequisites)
>>>>
>>>>> As time goes on we may well expand this compulsory list,
but this is I
>>>>> believe a sensible starting place.
>>>>>
>>>>> Being part of the source this data will be subject to the
same review
>>>>> process as any other change and so we cannot store dynamic
data here,
>>>>> such as pass rates etc.
>>>> What you you think, maybe it is good idea to keep metadata
separately?
>>>> This can be useful for simplifying changing data via script for
mass
>>>> modification also as adding tickets and pass rate and execution
time on
>>>> ''gold'' configurations?
>>> It would be easier to store the data separately and we could use
Maloo
>>> but it''s very important that this data becomes part of the
Lustre
>>> ''source'' so that everybody can benefit from it.
Adding tickets is
>>> not a problem as part of the resolution issue is to ensure that at
>>> least one test exercises the problem and proves it has been fixed,
>>> the fact that this assurance process requires active
>>> interaction by an engineer with the scripts is a positive.
>>>
>>> As for pass rate, execution time and gold configurations this
>>> information is just not 1 dimensional enough to store in the
source.
>>
>> I''m not accidentally in previous letter said about group of
fields. All
>> meta data may be separated by rare and often changed fields. F.e.
>> Summary will change not so often. But test timeout in golden
>> configuration (I mean that this timeout will be set as default based on
>> ''gold'' configuration and can be overloaded in
specific configuration)
>> could be more variable(and possible more important for testing).
> 
> I think this is something that needs to live outside the test metadata
> being described here.  The definition of "golden configuration"
is
> hard to define, and depends heavily on factors that change from one
> environment to the next.
We could separate dynamic and static metadata. But it will be good if
both set of data use one engine and storage type with just different
sources.
> 
> Ideally, tests will be written so that they can run under a wide range
> of configurations (number of clients, servers, virtual and real nodes).
> A further goal might be to allow many non-destructive functional subtests
> to be run in parallel, which would further skew the time taken, but
> would allow much more efficient use of test resources.
It will be very good if we have big enough set of fully independent tests.
> 
>> Using separated files provides more flexibility and nobody stop us to
>> commit it to lustre repo and it became " Lustre
''source''". In separated
>> files we can use format which we want and all information will be
>> available without parsing shell script or without running it. More
over,
>> in great future, it give us very simple migration from shell to other
>> language.
> 
> I think the metadata format should be chosen so that it is trivial to
> extract the test metadata without having to execute or parse the shell
> (or other) test language itself.  Simple filtering and regexp should
> be enough.
> 
Why do you want do ''filtering and regexp'' with some error
probability
for selecting data and also do special code injection to shell script
when we can avoid it? It is good chance to start work for run away from
shell there. If question is in developers comfort  I prefer to suggest
tools for checking metadata completeness then have code and metadata in
one file.
Also,  I don''t see good way to use ''metadata
inheritance'' way in shell
without adding pretty unclear shell code, so switch to metadata usage
should be one-monent or test framework just ignore it and metadata
became just static text for external scripts.

-- 
Thanks,
	Roman

Chris

2012-May-02 16:06 UTC

head link

[Lustre-discuss] Metadata storage in test script files

On 02/05/2012 16:44, Roman wrote:>
>> I think this is something that needs to live outside the test metadata
>> being described here.  The definition of "golden
configuration" is
>> hard to define, and depends heavily on factors that change from one
>> environment to the next.
> We could separate dynamic and static metadata. But it will be good if
> both set of data use one engine and storage type with just different
> sources.
I think we all understand the static metadata and I believe that the 
data in my original examples is static data. This data relates to a 
version of the test scripts and so can live as part of the test script 
managed using the same git mechanisms.

Could you explain what you mean by dynamic data so that we can all 
understand exactly what you are suggesting we store.
> Also, I don''t see good way to use ''metadata
inheritance'' way in shell
> without adding pretty unclear shell code, so switch to metadata usage 
> should be one-monent or test framework just ignore it and metadata 
> became just static text for external scripts. 
I''m not sure if there is a place for inheritance in this particular 
situation but if there is then we need to be clear of one thing. There 
can be no implicit inheritance for these scripts. I.e. We can''t have a 
single attribute at the top of a file that applies to all tests. The 
reason for this is because one major reason for having metadata is that 
we cause the data to be collected properly, each test needs to have the 
data explicitly captured. If a test does not have the data captured then 
we do not have any data - and no data is a fact (data) in itself, If a 
test inherits data from another test then that must have be explicitly set.

We cannot allow sweeping inheritance that allows us to imagine we have 
learnt something when actually we''ve just taken a short cut to give the
impression of knowledge.

Chris

Roman Grigoryev

2012-May-02 16:35 UTC

head link

[Lustre-discuss] Metadata storage in test script files

Hi,

On 05/02/2012 01:25 PM, Chris wrote:> On 02/05/2012 04:23, Roman Grigoryev wrote:
>> Hi Cris,
>>
>> On 05/01/2012 08:17 PM, Chris wrote:
>>> The metadata can be used in a multitude of ways, for example we can
>>> create dynamic test sets based on
>>> the changes made or target area of testing. What we are doing here
is
>>> creating an understanding of the
>>> tests that we have so that we can improve our processes and testing
>>> capabilities in the future.
>> I think that when are are defining tool we should say about purpose.
>> F.e. good description  and summary is not needed for creating dynamic
>> test sets. I think, it very important to say how will we use it.
>> Continue of this idea please read below.
> The purpose is to enable use to develop and store knowledge/information
> about the tests, the information should be in a conical form, objective
> and correct. If we do this then the whole community can make use of it
> as they see fit. I want to ensure that the initial set of stored
> variables describes the tests as completely as reasonably possible. The
> conical description of each test is not effected by the usage to which
> the data is put.
> 
>>> The metadata does not go to the results. The metadata is a database
in
>>> it''s own right and should metadata
>>> about a test be required it would be accessed from the source
(database)
>>> itself.
>> I think fields like title, summary, and, possible. description should
be
>> present in results too. It can be very helpful for quickly
understanding
>> test results.
> They can be presented as part of results but I would not store with the
> results, if for example Maloo presents the description it will fetch it
> from the correct version of the source, we should not be making copies
> of data.
ok, good.
> 
> I cannot suppose if you should store this information with your results
> because I have no insight into your private testing practices.
I just want to have info not only in maloo or other big systems but in
default test harness. Developers can run results by hand, tester also
should have possibility to execute in specific environment. If we can
provides some helpful info - i think it is good. few kilobytes is not so
match as logs, but can help in some cases.
>>
>>>> On 04/30/2012 08:50 PM, Chris wrote:
>>> ... snip ...
>>>
>>>
>>> As I said we can mine this data any-time and anyway that we want,
and
>>> the purpose of this
>>> discussion is the data not how we use it. But as an example
something
>>> that dynamically built
>>> test sets would need to know prerequisites.
>>>
>>> The suffix of a,b,c could be used to generate prerequisite
information
>>> but it is firstly inflexible, for example
>>> I bet ''b'',''c'' and
''d'' are often dependent on ''a'' but not each
other,
>>> secondly and more importantly we want a
>>> standard form for storing metadata because we want to introduce
order
>>> and knowledge into the test
>>> scripts that we have today.
>> Why I asked about way of usage: if we want to use this information in
>> scripts and in other automated way we must strictly specify logic on
>> items and provides tool for check it.
>>
>> F.e. we will use it when built test execution queue. We have chain like
>> this: test C prerequisite B, test B prerequisite A. Test A
doesn''t have
>> prerequisite. In one good day test A became excluded. Is it possible to
>> execute test C?
>> But if we will not use it in scripting there is no big logical problem.
>>
>> (My opinion: I don''t like this situation and think that test
>> dependencies should be used only in very specific and rare case.)
> I don''t think people should introduce dependencies either, but
they have
> and we have to deal with that fact. In your example if C is dependent on
> A and A is removed then C cannot be run.
Maybe I''m incorrect, but fight with dependencies looks like more
important then adding descriptions.
>>
>>>> I suggest add keywords(Components could be translated as
keywords too)
>>>> and test type (stress, benchmark, load, functional, negative,
etc) for
>>>> quick filtering. For example, SLOW could transform to keyword.
>>> This seems like a reasonable idea although we need a name that
describes
>>> what it is,
>>> we will need to define that set of possible words as we need to
with the
>>> Components elements.
>> I mean that ''keywords'' should be separated from
components but could be
>> logically included. I think, ''Components'' is special
type of keywords.
> I don''t think of Components as a keyword, I think of it as a
factual
> piece of data and if we want to add the test purpose then we should call
> it that. The use of keywords in data is generally a typeless catch-all.
> All of this metadata should be clear and well defined which does not in
> my opinion allow scope for a keywords element.
I agreed that Components aren''t keywords.
> 
> I would suggest that we add a variable called Purposes which is an array
> containing a set of predefined elements like stress, benchmark, load and
> functional etc.
> 
> For example
> 
> Purposes:
>   - stress
>   - load
What about SLOW(which should be named as  be smoke or sanity) , negative
keywords? It is not about purposes but mostly about test type.
> 
>>> It would be easier to store the data separately and we could use
Maloo
>>> but it''s very important
>>> that this data becomes part of the Lustre
''source'' so that everybody can
>>> benefit from it. Adding
>>> tickets is not a problem as part of the resolution issue is to
ensure
>>> that at least one test exercises
>>> the problem and proves it has been fixed, the fact that this
assurance
>>> process requires active
>>> interaction by an engineer with the scripts is a positive.
>>>
>>> As for pass rate, execution time and gold configurations this
>>> information is just not 1 dimensional
>>> enough to store in the source.
>>>
>> I''m not accidentally in previous letter said about group of
fields. All
>> meta data may be separated by rare and often changed fields. F.e.
>> Summary will change not so often. But test timeout in golden
>> configuration (I mean that this timeout will be set as default based on
>> ''gold'' configuration and can be overloaded in
specific configuration)
>> could be more variable(and possible more important for testing).
> What exactly is a gold configuration? Lustre has such breadth of
> possibilities that gold configurations would be a matrix of
> distro/architecture/distro version/interconnect/cpu
> speed/memory/storage/oss count/client count/... . To try and summarise
> this into some useful single value does not make any sense to me.
I incorrectly used phrase ''gold configurations'', correctly
says
''development configuration'' or maybe ''default
configuration''.

I absolutely agree that some test characteristic are relative to
configuration.
But, for many tests it is possible (and for many people is could be very
helpful) to have suggested  f.e. timeout which indicates assumed upper
limit of execution time on often used configuration. In this case,
''default configuration'' should be, I think, 4 nodes VM or 4
nodes real
cluster in one subnet.

For covering other configuration, we can use option ''timeout
multiplexor'' in test framework and this option allows to cover 100%
configurations, I think.

Currently I use 300 sec per test in my scripts, it is overkill for the
most of tests and only for few tests I set longer time. (Also I think it
should be true for the most of configuration exclude f.e. configuration
with complex lnet routing or systems under high load)


>>   Using separated files provides more flexibility and nobody stop us to
>> commit it to lustre repo and it became " Lustre
''source''". In separated
>> files we can use format which we want and all information will be
>> available without parsing shell script or without running it. More
over,
>> in great future, it give us very simple migration from shell to other
>> language.
> This data is valuable and needs to be treated with the same respect and
> discipline as we treat the source, to imagine we can have a ''free
for
> all'' where people just update it at will does not work. The
controls on
> what goes into the Lustre tree are there for very good reason and we are
> not going to circumvent those controls. We have to invest in this as we
> do with all the test infrastructure, it cannot be done on the cheap.
I don''t really understand why we cannot cover by discipline separated
yaml sources too as shell code. More over, the most of yaml testing can
be done by automated tools. I don''t say that this data should be
''free
for all''. But, I think, it is good idea(mostly for test developers):
providing way for user to override main metadata by his own metadata
with simple switch.
> 
> Parsing the scripts for the data is easy because computers are really
> good at it. I would expect someone will write a library to access and
> modify the data as required, I''d also expect them to publish that
library.
It looks not so simple to have one more library for cutting from shell
yaml data which have to parse by yaml reader. Question is not in cpu
time, but in full complexity of bash, external utilities and libraries
and test for them.
> 
> If test''s were re-written then this data will probably change, and
the
> cost of migrating unchanged data will be insignificant compared to the
> cost of re-writing the test itself.
I''m not sure what do you mean there, could you please explain.

Thanks,
	Roman

Roman Grigoryev

2012-May-02 17:05 UTC

head link

[Lustre-discuss] Metadata storage in test script files

Hi Chris,

On 05/02/2012 08:06 PM, Chris wrote:> On 02/05/2012 16:44, Roman wrote:
>>
>>> I think this is something that needs to live outside the test
metadata
>>> being described here.  The definition of "golden
configuration" is
>>> hard to define, and depends heavily on factors that change from one
>>> environment to the next.
>> We could separate dynamic and static metadata. But it will be good if
>> both set of data use one engine and storage type with just different
>> sources.
> 
> I think we all understand the static metadata and I believe that the
> data in my original examples is static data. This data relates to a
> version of the test scripts and so can live as part of the test script
> managed using the same git mechanisms.
> 
> Could you explain what you mean by dynamic data so that we can all
> understand exactly what you are suggesting we store.
As true dynamic data I can imagine only tickets now. And I''m not sure
how it important to keep in test sources, it think umbrella for old
bugzilla, WC jira and maybe other bug sources is more important.

But I can imagine situation when we want to update meta data in many
tests. F.e. somebody done by test coverage and want to add it to meta
information.
> 
>> Also, I don''t see good way to use ''metadata
inheritance'' way in shell
>> without adding pretty unclear shell code, so switch to metadata usage
>> should be one-monent or test framework just ignore it and metadata
>> became just static text for external scripts. 
> 
> I''m not sure if there is a place for inheritance in this
particular
> situation but if there is then we need to be clear of one thing. There
> can be no implicit   for these scripts. I.e. We can''t have a
> single attribute at the top of a file that applies to all tests. The
> reason for this is because one major reason for having metadata is that
> we cause the data to be collected properly, each test needs to have the
> data explicitly captured. If a test does not have the data captured then
> we do not have any data - and no data is a fact (data) in itself, If a
> test inherits data from another test then that must have be explicitly set.
> 
> We cannot allow sweeping inheritance that allows us to imagine we have
> learnt something when actually we''ve just taken a short cut to
give the
> impression of knowledge.
Yes, I mean inheritance from "single attribute at the top of a file"
(with overriding if defined in detailed level). Why we can''t have
single
attribute at the top which is default values? Going over all tests
manually is very big task.

Back to your original definition, f.e. all tests from lustre-rsync
should be on one component (maybe, as I understand), there is no big
reasons to duplicate componetns.

-- 
Thanks,
	Roman

Andreas Dilger

2012-May-02 19:01 UTC

head link

[Lustre-discuss] Metadata storage in test script files

I''m chopping out most of the discussion, to try and focus on the core
issues here.

On 2012-05-02, at 10:35 AM, Roman Grigoryev wrote:> On 05/02/2012 01:25 PM, Chris wrote:
>> I cannot suppose if you should store this information with your results
>> because I have no insight into your private testing practices.
> 
> I just want to have info not only in maloo or other big systems but in
> default test harness. Developers can run results by hand, tester also
> should have possibility to execute in specific environment. If we can
> provides some helpful info - i think it is good. few kilobytes is not so
> match as logs, but can help in some cases.
I don''t think you two are in disagreement here.  We want the test
descriptions and other metadata with the tests, open for any usage (human, test
scripts, different test harnesses, etc).
>> I don''t think people should introduce dependencies either, but
they have and we have to deal with that fact. In your example if C is dependent
on A and A is removed then C cannot be run.
> 
> Maybe I''m incorrect, but fight with dependencies looks like more
> important then adding descriptions.
For the short term.  However, finding dependencies is easily done through simple
mechanical steps (e.g. try to run each subtest independently).  Since the policy
in the past was to make all tests independent, I expect that not very many tests
will actually have dependencies.

However, the main reason for having good descriptions of the tests is to gain an
understanding of what part of the code the tests are trying to exercise, what
problem they were written to verify, and what value they provide.  We cannot
reasonably rewrite or modify tests safely if we don''t have a good
understanding of what they are doing today.  Also, this helps people running and
debugging the tests and their failures for the long term.


Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Lustre Engineer            http://www.whamcloud.com/

Roman Grigoryev

2012-May-03 09:17 UTC

head link

[Lustre-discuss] Metadata storage in test script files

Hi,

On 05/02/2012 11:01 PM, Andreas Dilger wrote:> I''m chopping out most of the discussion, to try and focus on the
core issues here.
> 
> On 2012-05-02, at 10:35 AM, Roman Grigoryev wrote:
>> On 05/02/2012 01:25 PM, Chris wrote:
>>> I cannot suppose if you should store this information with your
results
>>> because I have no insight into your private testing practices.
>>
>> I just want to have info not only in maloo or other big systems but in
>> default test harness. Developers can run results by hand, tester also
>> should have possibility to execute in specific environment. If we can
>> provides some helpful info - i think it is good. few kilobytes is not
so
>> match as logs, but can help in some cases.
> 
> I don''t think you two are in disagreement here.  We want the test
descriptions and other
> metadata with the tests, open for any usage (human, test scripts, different
test harnesses, etc).
I absolute agree. My point is just about form: machine usage need formal
description of fields and tools for simple check it.
> 
>>> I don''t think people should introduce dependencies either,
but they have and we have to deal with that fact. In your example
>>> if C is dependent on A and A is removed then C cannot be run.
>>
>> Maybe I''m incorrect, but fight with dependencies looks like
more
>> important then adding descriptions.
> 
> For the short term.  However, finding dependencies is easily done through
simple mechanical steps (e.g. try to run each subtest
> independently).  Since the policy in the past was to make all tests
independent, I expect that not very many tests will actually
> have dependencies.
Just now I''m working on this task.
> 
> However, the main reason for having good descriptions of the tests is to
gain an understanding of what part of the
> code the tests are trying to exercise, what problem they were written to
verify, and what value they provide.
> We cannot reasonably rewrite or modify tests safely if we don''t
have a good understanding of what they are doing today.
> Also, this helps people running and debugging the tests and their failures
for the long term.
I absolute agree with common target and text descriptions for humans. I
just don''t really see why test refactoring and test understanding
(creating summary-descriptions) cannot be combine into one. (Also I have
feeling that developer will find many errors when go around test for get
description. I have some experience in same tasks and could say that
fresh look to old tests often find problems.).

-- 
Thanks,
	Roman

Chris Gearing

2012-May-04 14:46 UTC

head link

[Lustre-discuss] Metadata storage in test script files

Hi Roman,

I think we may have rat-holed here and perhaps it''s worth just 
re-stating what I''m trying to achieve here.

We have a need to be able to test in a more directed and targeted 
manner, to be able to focus on a unit of code like lnet or an attribute 
of capability like performance. However since starting work on the 
Lustre test infrastructure it has become clear to me that knowledge 
about the capability, functionality and purpose of individual tests is 
very general and held in the heads of Lustre engineers. Because we are 
talking about targeting tests we require knowledge about the capability, 
functionality and purpose of the tests not the outcome of running the 
tests, or to put it another way what the tests can do not what they have 
done.

One key fact about cataloguing the the capabilities of the tests is that 
for almost every imaginable case the capability of the test only changes 
if the test itself changes and so the rate of change of the data in the 
catalogue is the same and actually much less than the rate of change 
test code itself. The only exception to this could be that a test 
suddenly discovers a new bug which has to have a new ticket attached to 
it, although this should be a very very rare if we manage our 
development process properly.

This requirement leads to the conclusion that we need to catalogue all 
of the tests within the current test-framework and a catalogue equates 
to a database, hence we need a database of the capability, functionality 
and purpose of the individual tests. With this requirement in mind it 
would be easy to create a database using something like mysql that could 
be used by applications like the Lustre test system, but using an 
approach like that would make the database very difficult to share and 
will be even harder to attach the knowledge to the Lustre tree which is 
were it belongs.

So the question I want to solve is how to catalogue the capabilities of 
the individual tests in a database, store that data as part of the 
Lustre source and as a bonus make the data readable and even carefully 
editable by people as well as machines. Now to focus on the last point I 
do not think we should constrain ourselves to something that can be read 
by machine using just bash, we do have access to structure languages and 
should make use of that fact.

The solution to all of this seemed to be to store the catalogue about 
the tests as part of the tests themselves, this provides for human and 
machine accessibility, implicit version control and certainty the what 
ever happens to Lustre source the data goes with it. It is also the case 
that by keeping the catalogue with the subject the maintenance of the 
catalogue is more likely to occur than if the two are separate.

My original use of the term test metadata is intended as a more modern 
term for catalogue or the [test] library.

So to refresh everybody''s mind, I''d like to suggest that we
place test
metadata in the source code itself using the following format, where the 
here doc is inserted into the copy about the test function itself.

======================================================================<<TEST_METADATA
Name:
   before_upgrade_create_data
Summary:
   Copies lustre source into a node specific directory and then creates 
a tarball using that directory
Description:
   This should be called prior to upgrading Lustre and creates a set of 
data on the Lustre partition
   which be accessed and checked after the upgrade has taken place. 
Several methods are using
   including tar''ing directories so the can later be untar''ed
and
compared, along with create sha1''s
   of stored data.
Component:
   - lnet
   - recovery
Prerequisites:
   - before_upgrade_clear_filesystem
TicketIDs:
   - LU-123
   - LU-432
Purposes:
   - upgrade
TEST_METADATA

test_before_upgrade_create_data() {
    ...
}

run_test before_upgrade_create_data "Copying lustre source into a 
directory $IOP_DIR1, creating and then using source to create a tarball"
======================================================================
Again thoughts and input very much appreciated

Chris

Nathan Rutman

2012-May-07 18:33 UTC

head link

[Lustre-discuss] Metadata storage in test script files

On May 4, 2012, at 7:46 AM, Chris Gearing wrote:
> Hi Roman,
> 
> I think we may have rat-holed here and perhaps it''s worth just 
> re-stating what I''m trying to achieve here.
> 
> We have a need to be able to test in a more directed and targeted 
> manner, to be able to focus on a unit of code like lnet or an attribute 
> of capability like performance. However since starting work on the 
> Lustre test infrastructure it has become clear to me that knowledge 
> about the capability, functionality and purpose of individual tests is 
> very general and held in the heads of Lustre engineers. Because we are 
> talking about targeting tests we require knowledge about the capability, 
> functionality and purpose of the tests not the outcome of running the 
> tests, or to put it another way what the tests can do not what they have 
> done.
> 
> One key fact about cataloguing the the capabilities of the tests is that 
> for almost every imaginable case the capability of the test only changes 
> if the test itself changes and so the rate of change of the data in the 
> catalogue is the same and actually much less than the rate of change 
> test code itself. The only exception to this could be that a test 
> suddenly discovers a new bug which has to have a new ticket attached to 
> it, although this should be a very very rare if we manage our 
> development process properly.
> 
> This requirement leads to the conclusion that we need to catalogue all 
> of the tests within the current test-framework and a catalogue equates 
> to a database, hence we need a database of the capability, functionality 
> and purpose of the individual tests. With this requirement in mind it 
> would be easy to create a database using something like mysql that could 
> be used by applications like the Lustre test system, but using an 
> approach like that would make the database very difficult to share and 
> will be even harder to attach the knowledge to the Lustre tree which is 
> were it belongs.
> 
> So the question I want to solve is how to catalogue the capabilities of 
> the individual tests in a database, store that data as part of the 
> Lustre source and as a bonus make the data readable and even carefully 
> editable by people as well as machines. Now to focus on the last point I 
> do not think we should constrain ourselves to something that can be read 
> by machine using just bash, we do have access to structure languages and 
> should make use of that fact.
> I think we all agree 100% on the above...
> The solution to all of this seemed to be to store the catalogue about 
> the tests as part of the tests themselves... but not necessarily that conclusion.
> , this provides for human and 
> machine accessibility, implicit version control and certainty the what 
> ever happens to Lustre source the data goes with it. It is also the case 
> that by keeping the catalogue with the subject the maintenance of the 
> catalogue is more likely to occur than if the two are separate.
I agree with all those.  But there are some difficulties with this as well:
1. bash isn''t a great language to encapsulate this metadata
2. this further locks us in to current test implementation - there''s
not much possibility to start writing tests in another language if
we''re parsing through looking for bash-formatted metadata. Sure,
multiple parsers could be written...
3. difficulty changing md of groups of tests en-mass - eg. add "slow"
keyword to a set of tests
4. no inheritance of characteristics - each test must explicitly list every
piece of md.  This not only blows up the amount of md it also is a source for
typos, etc. to cause problems.
5. no automatic modification of characteristics.  In particular, one piece of md
I would like to see is "maximum allowed test time" for each test. 
Ideally, this could be measured and adjusted automatically based on historical
and ongoing run data.  But it would be dangerous to allow automatic modification
to the script itself.

To address those problems, I think a database-type approach is exactly right, or
perhaps a YAML file with hierarchical inheritance.
To some degree, this is a "evolution vs revolution" question, and I
prefer to come down on the revolution-enabling design, despite the problems you
list.  Basically, I believe the separated MD model allows for the replacement of
test-framework, and this, to my mind, is the majority driver for adding the MD
at all.

> 
> My original use of the term test metadata is intended as a more modern 
> term for catalogue or the [test] library.
> 
> So to refresh everybody''s mind, I''d like to suggest that
we place test
> metadata in the source code itself using the following format, where the 
> here doc is inserted into the copy about the test function itself.
> 
> ======================================================================>
<<TEST_METADATA
> Name:
>   before_upgrade_create_data
> Summary:
>   Copies lustre source into a node specific directory and then creates 
> a tarball using that directory
> Description:
>   This should be called prior to upgrading Lustre and creates a set of 
> data on the Lustre partition
>   which be accessed and checked after the upgrade has taken place. 
> Several methods are using
>   including tar''ing directories so the can later be
untar''ed and
> compared, along with create sha1''s
>   of stored data.
> Component:
>   - lnet
>   - recovery
> Prerequisites:
>   - before_upgrade_clear_filesystem
> TicketIDs:
>   - LU-123
>   - LU-432
> Purposes:
>   - upgrade
> TEST_METADATA
> 
> test_before_upgrade_create_data() {
>    ...
> }
> 
> run_test before_upgrade_create_data "Copying lustre source into a 
> directory $IOP_DIR1, creating and then using source to create a
tarball"
> ======================================================================> 
> Again thoughts and input very much appreciated
> 
> Chris
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Chris Gearing

2012-May-07 21:34 UTC

head link

[Lustre-discuss] Metadata storage in test script files

On Mon, May 7, 2012 at 7:33 PM, Nathan Rutman <nrutman at gmail.com>
wrote:
>
> On May 4, 2012, at 7:46 AM, Chris Gearing wrote:
>
> > Hi Roman,
> >
> > I think we may have rat-holed here and perhaps it''s worth
just
> > re-stating what I''m trying to achieve here.
> >
> > We have a need to be able to test in a more directed and targeted
> > manner, to be able to focus on a unit of code like lnet or an
attribute
> > of capability like performance. However since starting work on the
> > Lustre test infrastructure it has become clear to me that knowledge
> > about the capability, functionality and purpose of individual tests is
> > very general and held in the heads of Lustre engineers. Because we are
> > talking about targeting tests we require knowledge about the
capability,
> > functionality and purpose of the tests not the outcome of running the
> > tests, or to put it another way what the tests can do not what they
have
> > done.
> >
> > One key fact about cataloguing the the capabilities of the tests is
that
> > for almost every imaginable case the capability of the test only
changes
> > if the test itself changes and so the rate of change of the data in
the
> > catalogue is the same and actually much less than the rate of change
> > test code itself. The only exception to this could be that a test
> > suddenly discovers a new bug which has to have a new ticket attached
to
> > it, although this should be a very very rare if we manage our
> > development process properly.
> >
> > This requirement leads to the conclusion that we need to catalogue all
> > of the tests within the current test-framework and a catalogue equates
> > to a database, hence we need a database of the capability,
functionality
> > and purpose of the individual tests. With this requirement in mind it
> > would be easy to create a database using something like mysql that
could
> > be used by applications like the Lustre test system, but using an
> > approach like that would make the database very difficult to share and
> > will be even harder to attach the knowledge to the Lustre tree which
is
> > were it belongs.
> >
> > So the question I want to solve is how to catalogue the capabilities
of
> > the individual tests in a database, store that data as part of the
> > Lustre source and as a bonus make the data readable and even carefully
> > editable by people as well as machines. Now to focus on the last point
I
> > do not think we should constrain ourselves to something that can be
read
> > by machine using just bash, we do have access to structure languages
and
> > should make use of that fact.
> >
> I think we all agree 100% on the above...
>
> > The solution to all of this seemed to be to store the catalogue about
> > the tests as part of the tests themselves
> ... but not necessarily that conclusion.
>
>
> , this provides for human and
> > machine accessibility, implicit version control and certainty the what
> > ever happens to Lustre source the data goes with it. It is also the
case
> > that by keeping the catalogue with the subject the maintenance of the
> > catalogue is more likely to occur than if the two are separate.
>
> I agree with all those.  But there are some difficulties with this as well:
> 1. bash isn''t a great language to encapsulate this metadata
>
The thing to focus on I think is the data captured not the format. The
parser for yaml encapsulated in the source or anywhere else is a small
amount of effort compared to capturing the data in the first place. If we
capture the data and it''s machine readable then changing the format is
easy.

There are many advantages today to keeping the source and the metadata in
the same place, one being that when reviewing new or updated tests the
reviewers can and will be encouraged to by the locality to ensure the
metadata matches the new or revised test. If the two are not together then
they have very little chance of being kept in sync.

2. this further locks us in to current test implementation - there''s
not> much possibility to start writing tests in another language if
we''re
> parsing through looking for bash-formatted metadata. Sure, multiple parsers
> could be written...
>
I don''t think it is a lock in at all, the data is machine readable and
moving to a new format when and should we need it will be easy. Let''s
focus
on capturing the data so we increase our knowledge, once we have the data
we can manipulate it however we want. The data and the metadata together in
my opinion increases the chance of capturing and updating the data given
todays methods and tools.

3. difficulty changing md of groups of tests en-mass - eg. add
"slow"> keyword to a set of tests
>
The data can read and written by machine and the libraries/application to
do this would be written. Referring back to the description of the metadata
we would not be making sweeping changes to test metadata because the
metadata should only change when the test changes [exceptions will always
apply but we should not optimize for exceptions].

Also I don''t think ''slow'' would not be part of the
metadata because it is
not an attribute of the test, it is an attribute of how the test is used.
We need to be strict and clear here. The metadata describes the
functionality of the test code and slow is not a test code function, if we
want to be able to select ''slow'' then we need to understand
what code
functionality of a test cause it to be a ''slow'' test and
ensure those
attributes are captured.

> 4. no inheritance of characteristics - each test must explicitly list
> every piece of md.  This not only blows up the amount of md it also is a
> source for typos, etc. to cause problems.
>
I''m not against inheritance but the inheritance must be explicit not
implicit we want to draw out knowledge about the tests if we just allow
people to say ''all 200 tests in this file are X, Y, Z'' then
that is what
will happen no one will check each test to make sure it is true and our
data will be corrupted before we start.

So explicit inheritance might make sense, and please do propose an
inheritance model for the data, we can discuss the storage format later but
today let''s just understand how inheritance relates to our bash tests.

> 5. no automatic modification of characteristics.  In particular, one piece
> of md I would like to see is "maximum allowed test time" for each
test.
>  Ideally, this could be measured and adjusted automatically based on
> historical and ongoing run data.  But it would be dangerous to allow
> automatic modification to the script itself.
>
>I really do not think maximum test time as a measurement is a piece of test
metadata.

Metadata describes the functionality of the test that is encapsulated
within the test code itself, if the code said ''run for 60 minutes and
no
more'' then maximum time would be an attribute.

Maybe there are a set of useful attributes like amount of storage used, or
minimum clients, or minimum osts etc. etc, again these can only be metadata
if they are implicitly in the test code, and for most tests they would not
be definable, and the variability might be impossible to systematically
capture, although I do think it''s worth having a go.


> To address those problems, I think a database-type approach is exactly
> right, or perhaps a YAML file with hierarchical inheritance.
> To some degree, this is a "evolution vs revolution" question, and
I prefer
> to come down on the revolution-enabling design, despite the problems you
> list.  Basically, I believe the separated MD model allows for the
> replacement of test-framework, and this, to my mind, is the majority driver
> for adding the MD at all.
>
> Database is good and I believe metadata in the source fulfils thatobjective whilst being something that we can manage with what we have today
manually, whilst easily creating tools for some automation. When we do
begin work on a new test framework approach we will have all the data at
hand to be manipulated in any way that we want, including if we
want separating it and storing it somewhere else.

I don''t think creating the metadata however is linked with a new
test-framework, creating the metadata is required because today we do not
know what we have and we need to know what we have today whatever strategy
we use for the future.

Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120507/22f7110b/attachment-0001.html

Roman Grigoryev

2012-May-08 13:51 UTC

head link

[Lustre-discuss] Metadata storage in test script files

Hi,
On 05/08/2012 01:34 AM, Chris Gearing wrote:> 
> 
> On Mon, May 7, 2012 at 7:33 PM, Nathan Rutman <nrutman at gmail.com
> <mailto:nrutman at gmail.com>> wrote:
> 
> 
>     On May 4, 2012, at 7:46 AM, Chris Gearing wrote:
> 
>     > Hi Roman,
>     >
>     > I think we may have rat-holed here and perhaps it''s worth
just
>     > re-stating what I''m trying to achieve here.
>     >
>     > We have a need to be able to test in a more directed and targeted
>     > manner, to be able to focus on a unit of code like lnet or an
>     attribute
>     > of capability like performance. However since starting work on the
>     > Lustre test infrastructure it has become clear to me that
knowledge
>     > about the capability, functionality and purpose of individual
tests is
>     > very general and held in the heads of Lustre engineers. Because we
are
>     > talking about targeting tests we require knowledge about the
>     capability,
>     > functionality and purpose of the tests not the outcome of running
the
>     > tests, or to put it another way what the tests can do not what
>     they have
>     > done.
>     >
>     > One key fact about cataloguing the the capabilities of the tests
>     is that
>     > for almost every imaginable case the capability of the test only
>     changes
>     > if the test itself changes and so the rate of change of the data
>     in the
>     > catalogue is the same and actually much less than the rate of
change
>     > test code itself. The only exception to this could be that a test
>     > suddenly discovers a new bug which has to have a new ticket
>     attached to
>     > it, although this should be a very very rare if we manage our
>     > development process properly.
>     >
>     > This requirement leads to the conclusion that we need to catalogue
all
>     > of the tests within the current test-framework and a catalogue
equates
>     > to a database, hence we need a database of the capability,
>     functionality
>     > and purpose of the individual tests. With this requirement in mind
it
>     > would be easy to create a database using something like mysql that
>     could
>     > be used by applications like the Lustre test system, but using an
>     > approach like that would make the database very difficult to share
and
>     > will be even harder to attach the knowledge to the Lustre tree
>     which is
>     > were it belongs.
>     >
>     > So the question I want to solve is how to catalogue the
>     capabilities of
>     > the individual tests in a database, store that data as part of the
>     > Lustre source and as a bonus make the data readable and even
carefully
>     > editable by people as well as machines. Now to focus on the last
>     point I
>     > do not think we should constrain ourselves to something that can
>     be read
>     > by machine using just bash, we do have access to structure
>     languages and
>     > should make use of that fact.
>     >
>     I think we all agree 100% on the above...
> 
>     > The solution to all of this seemed to be to store the catalogue
about
>     > the tests as part of the tests themselves
>     ... but not necessarily that conclusion.
>      
> 
>     > , this provides for human and
>     > machine accessibility, implicit version control and certainty the
what
>     > ever happens to Lustre source the data goes with it. It is also
>     the case
>     > that by keeping the catalogue with the subject the maintenance of
the
>     > catalogue is more likely to occur than if the two are separate.
> 
>     I agree with all those.  But there are some difficulties with this
>     as well:
>     1. bash isn''t a great language to encapsulate this metadata
> 
>  
> The thing to focus on I think is the data captured not the format. The
> parser for yaml encapsulated in the source or anywhere else is a small
> amount of effort compared to capturing the data in the first place. If
> we capture the data and it''s machine readable then changing the
format
> is easy.
> 
> There are many advantages today to keeping the source and the metadata
> in the same place, one being that when reviewing new or updated tests
> the reviewers can and will be encouraged to by the locality to ensure
> the metadata matches the new or revised test. If the two are not
> together then they have very little chance of being kept in sync.
Also I have more then one concerns. You are suggesting to put in bash
structure which has his formal description. Who and when will check that
a embedded structure is correct? Formal structure must be checked by
tools not by eyes. For example I use Rx tools with schema definition for
yaml. Extracting yaml data and checked it separately decrease comfort of
using tools.

To be honest, I don''t see big difference between using 2 files and one
file from developer point of view. This is more about discipline
question then comfort. Absolutely same developer could ignore
description which is placed nearly. (From my experience with tests live
cycle, descriptions became good after few cycles of adding,changing and
review them, often it is result of developer-user interaction. As
result, the most problematic tests have the best descriptions)
> 
>     2. this further locks us in to current test implementation -
there''s
>     not much possibility to start writing tests in another language if
>     we''re parsing through looking for bash-formatted metadata.
Sure,
>     multiple parsers could be written...
> 
>  
> I don''t think it is a lock in at all, the data is machine readable
and
> moving to a new format when and should we need it will be easy.
Let''s
> focus on capturing the data so we increase our knowledge, once we have
> the data we can manipulate it however we want. The data and the metadata
> together in my opinion increases the chance of capturing and updating
> the data given todays methods and tools. 
> 
>     3. difficulty changing md of groups of tests en-mass - eg. add
>     "slow" keyword to a set of tests
> 
> 
> The data can read and written by machine and the libraries/application
> to do this would be written. Referring back to the description of the
> metadata we would not be making sweeping changes to test metadata
> because the metadata should only change when the test changes
> [exceptions will always apply but we should not optimize for exceptions].
> 
> Also I don''t think ''slow'' would not be part of
the metadata because it
> is not an attribute of the test, it is an attribute of how the test is
> used. We need to be strict and clear here. The metadata describes the
> functionality of the test code and slow is not a test code function, if
> we want to be able to select ''slow'' then we need to
understand what code
> functionality of a test cause it to be a ''slow'' test and
ensure those
> attributes are captured.
Do you suggest has separated metadata about how test is used? There is
some logical vagueness:  tests metadata can became test usage metadata
and back. Where is border? For example Component from your suggestion
also can be test usage metadata. "SLOW",in general, is set of tests
with
big coverage and small time. if we put to tests info about his coverage
it became tests metadata?
>  
> 
>     4. no inheritance of characteristics - each test must explicitly
>     list every piece of md.  This not only blows up the amount of md it
>     also is a source for typos, etc. to cause problems.
> 
> 
> I''m not against inheritance but the inheritance must be explicit
not
> implicit we want to draw out knowledge about the tests if we just allow
> people to say ''all 200 tests in this file are X, Y, Z''
then that is what
> will happen no one will check each test to make sure it is true and our
> data will be corrupted before we start.
Absolutely same behavior, but with copy-paste, is possible for adding
info to every tests. And i don''t see problems with implicit inheritance
of f.e. Components field. In some tests suites is really possible when
all tests have one Components set. More over, I think, it is possible to
get some test Components based on  test coverage automatically. Maybe we
can solve this via enabling implicit inheritance for limited list of fields?
> 
> So explicit inheritance might make sense, and please do propose an
> inheritance model for the data, we can discuss the storage format later
> but today let''s just understand how inheritance relates to our
bash tests.
What is ''explicit inheritance'' in case of your suggestion? Ans
why it need?
>  
> 
>     5. no automatic modification of characteristics.  In particular, one
>     piece of md I would like to see is "maximum allowed test
time" for
>     each test.  Ideally, this could be measured and adjusted
>     automatically based on historical and ongoing run data.  But it
>     would be dangerous to allow automatic modification to the script
itself.
> 
> 
> I really do not think maximum test time as a measurement is a piece of
> test metadata.
If we want to provide some help,advice to new user where we should store
this data? What is difference between Tickets, Component, Purposes and
''Assumed Execution time''? All fields not just precise
descriptions but
also are advices.
> 
> Metadata describes the functionality of the test that is encapsulated
> within the test code itself, if the code said ''run for 60 minutes
and no
> more'' then maximum time would be an attribute.
it will be different field. 60 min - ''max time''. 45 min
''Assumed
Execution time''. No conflict there.
> 
> Maybe there are a set of useful attributes like amount of storage used,
> or minimum clients, or minimum osts etc. etc, again these can only be
> metadata if they are implicitly in the test code, and for most tests
> they would not be definable, and the variability might be impossible to
> systematically capture, although I do think it''s worth having a
go.
But this data 1) helpful 2) I already use it. Where we could store this
data?

Thanks,
	Roman
>  
>  
> 
>     To address those problems, I think a database-type approach is
>     exactly right, or perhaps a YAML file with hierarchical inheritance.
>     To some degree, this is a "evolution vs revolution" question,
and I
>     prefer to come down on the revolution-enabling design, despite the
>     problems you list.  Basically, I believe the separated MD model
>     allows for the replacement of test-framework, and this, to my mind,
>     is the majority driver for adding the MD at all.
> 
> Database is good and I believe metadata in the source fulfils that
> objective whilst being something that we can manage with what we have
> today manually, whilst easily creating tools for some automation. When
> we do begin work on a new test framework approach we will have all the
> data at hand to be manipulated in any way that we want, including if we
> want separating it and storing it somewhere else.
> 
> I don''t think creating the metadata however is linked with a new
> test-framework, creating the metadata is required because today we do
> not know what we have and we need to know what we have today whatever
> strategy we use for the future.
> 
> Chris
>

Lustre discuss - Apr 2012 - Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files

[Lustre-discuss] Metadata storage in test script files