thr3ads.net - Lustre devel - [Lustre-devel] broader Lustre testing [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Nathan Rutman

2012-Jul-12 19:37 UTC

[Lustre-devel] broader Lustre testing

On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
> A more strategic solution is to do more testing of a feature release
> candidate _before_ it is released.  Even if a Community member has no
> interest in using a feature release in production, early testing with
> pre-release versions of feature releases will help identify
> instabilities created by the new feature with their workloads and
> hardware before the release is official. 

Taking a few threads that have discussed recently, regarding the stability of
certain releases vs others, what maintenance branches are, what testing was
done, and "which branch should I use":
These questions, I think, should not need to be asked.  Which version of MacOS
should I use?  The latest one, period.  Why can''t Lustre do the same
thing?  The answer I think lies in testing, which becomes a chicken and egg
problem.   I''m only going to use a "stable" release, which is
the release which was tested with my applications.  I know acceptance-small was
run, and passed, on Master, otherwise it wouldn''t be released. 
Hopefully it even ran on a big system like Hyperion.  (Do we learn anything more
about running acc-sm on other big systems?  Probably not much.)  But it
certainly wasn''t tested with my application, because I didn''t
test it.  Because it wasn''t released yet.  Chicken and egg.  Only after
enough others make the leap am I willing to.
So, it seems, we need to test pre-release versions of Lustre, aka Master, with
my applications.  To that end, how willing are people to set aside a day, say
once every two months, to be "filesystem beta day".  Scientists, run
your codes, users, do your normal work, but bear in mind there may be filesystem
instabilities on that day.  Make sure your data is backed up.  Make sure
it''s not in the middle of a critical week-long run.  Accept that you
might have to re-run it tomorrow in the worst case.  Report any problems you
have.
What you get out of it is a much more stable Master, and an end to the question
of "which version should I run".  When released, you have confidence
that you can move up, get the great new features and performance, and it runs
your applications.  More people are on the same release, so it sees even more
testing. The maintenance branch is always the latest branch, you can pull in
point releases with more bug fixes with ease. No more rolling your own Lustre
with Frankenstein sets of patches.  Latest and greatest and most stable.

Pipe dream?

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20120712/2fed6c1d/attachment.html

Nathan Rutman

2012-Jul-12 20:07 UTC

head link

[Lustre-devel] broader Lustre testing

On Jul 12, 2012, at 12:48 PM, Bruce Korb wrote:
> Hi Nathan,
> On 2012-07-12, at 20:37, Nathan Rutman <nathan_rutman at xyratex.com>
wrote:
> 
>> 
>> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>> 
>>> A more strategic solution is to do more testing of a feature
release
>>> candidate _before_ it is released.  Even if a Community member has
no
>>> interest in using a feature release in production, early testing
with
>>> pre-release versions of feature releases will help identify
>>> instabilities created by the new feature with their workloads and
>>> hardware before the release is official. 
>> 
>> 
>> Taking a few threads that have discussed recently, regarding the
stability of certain releases vs others, what maintenance branches are, what
testing was done, and "which branch should I use":
>> These questions, I think, should not need to be asked.  Which version
of MacOS should I use?  The latest one, period.  Why can''t Lustre do
the same thing?  The answer I think lies in testing, which becomes a chicken and
egg problem.   I''m only going to use a "stable" release,
which is the release which was tested with my applications.  I know
acceptance-small was run, and passed, on Master, otherwise it wouldn''t
be released.  Hopefully it even ran on a big system like Hyperion.  (Do we learn
anything more about running acc-sm on other big systems?  Probably not much.) 
But it certainly wasn''t tested with my application, because I
didn''t test it.  Because it wasn''t released yet.  Chicken and
egg.  Only after enough others make the leap am I willing to.
>> So, it seems, we need to test pre-release versions of Lustre, aka
Master, with my applications.  To that end, how willing are people to set aside
a day, say once every two months, to be "filesystem beta day". 
Scientists, run your codes, users, do your normal work, but bear in mind there
may be filesystem instabilities on that day.  Make sure your data is backed up. 
Make sure it''s not in the middle of a critical week-long run.  Accept
that you might have to re-run it tomorrow in the worst case.  Report any
problems you have.
>> What you get out of it is a much more stable Master, and an end to the
question of "which version should I run".  When released, you have
confidence that you can move up, get the great new features and performance, and
it runs your applications.  More people are on the same release, so it sees even
more testing. The maintenance branch is always the latest branch, you can pull
in point releases with more bug fixes with ease. No more rolling your own Lustre
with Frankenstein sets of patches.  Latest and greatest and most stable.
>> 
>> Pipe dream?
On Jul 12, 2012, at 12:48 PM, Bruce Korb wrote:> 
> _I_ think so.  You might get a few customers to say, "yes" but
> never be able to find the appropriate round tuit.  A more fruitful
> approach might be to solicit customer acceptance tests.  Presumably,
> they''ve written them to hit the wrinkles that they tend to stub
> their toes on.  And there may be exceptions, too.  (e.g. Cray might
> well actually do some pre-testing -- they, too, have paying customers.)
> 

I have no aversion to customers writing and supplying their own acceptance
tests, but I think that approach doesn''t work for many of the cases:
- acceptance tests may not exist; acceptance may simply be testing with large
production codes
- tests that run in a particular environment need to be significantly
generalized
- tests may not be sharable for various legal reasons

This also doesn''t have to be an all-or-nothing proposition --
interested parties will be able to use the latest features, and will help
contribute to the stability of Master, and will help reduce the
"spread" of deployed systems, in a positive feedback loop.

Yes, absolutely, this is effort on the part of Lustre users.  But it can be
balanced by the savings of efforts in roll-your-own, and risk reduction.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20120712/3da68586/attachment.html

Cliff White

2012-Jul-12 20:25 UTC

head link

[Lustre-devel] broader Lustre testing

There are certainly examples of this working for other products, for
example (it''s been a good number of years) at one time the main QA
benchmark for the Oracle database was a customer-furnished test (the
''Churchill'' test) which exercised the database throughly.
It would also be useful to have data from those using standard IO tests
(IOR, iozone, etc) as we could easily expand the existing tests with
different parameter sets.
However, in the HPC space, i suspect obtaining/generating the data set
needed to replicate some customer situations would be a challenge.
cliffw


On Thu, Jul 12, 2012 at 1:07 PM, Nathan Rutman <nathan_rutman at
xyratex.com>wrote:
>
> On Jul 12, 2012, at 12:48 PM, Bruce Korb wrote:
>
> Hi Nathan,
>
>
> On 2012-07-12, at 20:37, Nathan Rutman <nathan_rutman at xyratex.com>
wrote:
>
>
> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>
> A more strategic solution is to do more testing of a feature release
> candidate _before_ it is released.  Even if a Community member has no
> interest in using a feature release in production, early testing with
> pre-release versions of feature releases will help identify
> instabilities created by the new feature with their workloads and
> hardware before the release is official.
>
>
>
> Taking a few threads that have discussed recently, regarding the stability
> of certain releases vs others, what maintenance branches are, what testing
> was done, and "which branch should I use":
> These questions, I think, should not need to be asked.  Which version of
> MacOS should I use?  The latest one, period.  Why can''t Lustre do
the same
> thing?  The answer I think lies in testing, which becomes a chicken and egg
> problem.   I''m only going to use a "stable" release,
which is the release
> which was tested *with my applications*.  I know acceptance-small was
> run, and passed, on Master, otherwise it wouldn''t be released. 
Hopefully
> it even ran on a big system like Hyperion.  (Do we learn anything more
> about running acc-sm on other big systems?  Probably not much.)  But it
> certainly wasn''t tested with my application, because I
didn''t test it.
>  Because it wasn''t released yet.  Chicken and egg.  Only after
enough
> others make the leap am I willing to.
> So, it seems, we need to test pre-release versions of Lustre, aka Master,
> *with my applications*.  To that end, how willing are people to set aside
> a day, say once every two months, to be "filesystem beta day". 
Scientists,
> run your codes, users, do your normal work, but bear in mind there may be
> filesystem instabilities on that day.  Make sure your data is backed up.
>  Make sure it''s not in the middle of a critical week-long run. 
Accept that
> you might have to re-run it tomorrow in the worst case.  Report any
> problems you have.
> What you get out of it is a much more stable Master, and an end to the
> question of "which version should I run".  When released, you
have
> confidence that you can move up, get the great new features and
> performance, and it runs your applications.  More people are on the same
> release, so it sees even more testing. The maintenance branch is always the
> latest branch, you can pull in point releases with more bug fixes with
> ease. No more rolling your own Lustre with Frankenstein sets of patches.
>  Latest and greatest and most stable.
>
> Pipe dream?
>
>
> On Jul 12, 2012, at 12:48 PM, Bruce Korb wrote:
>
>
> _I_ think so.  You might get a few customers to say, "yes" but
> never be able to find the appropriate round tuit.  A more fruitful
> approach might be to solicit customer acceptance tests.  Presumably,
> they''ve written them to hit the wrinkles that they tend to stub
> their toes on.  And there may be exceptions, too.  (e.g. Cray might
> well actually do some pre-testing -- they, too, have paying customers.)
>
>
> I have no aversion to customers writing and supplying their own acceptance
> tests, but I think that approach doesn''t work for many of the
cases:
> - acceptance tests may not exist; acceptance may simply be testing with
> large production codes
> - tests that run in a particular environment need to be significantly
> generalized
> - tests may not be sharable for various legal reasons
>
> This also doesn''t have to be an all-or-nothing proposition --
interested
> parties will be able to use the latest features, and will help contribute
> to the stability of Master, and will help reduce the "spread" of
deployed
> systems, in a positive feedback loop.
>
> Yes, absolutely, this is effort on the part of Lustre users.  But it can
> be balanced by the savings of efforts in roll-your-own, and risk reduction.
>
>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
>

-- 
cliffw
Support Guy
WhamCloud, Inc.
www.whamcloud.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20120712/231a9543/attachment-0001.html

Andreas Dilger

2012-Jul-12 20:25 UTC

head link

[Lustre-devel] [cdwg] broader Lustre testing

On 2012-07-12, at 1:37 PM, Nathan Rutman wrote:> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>> A more strategic solution is to do more testing of a feature release
>> candidate _before_ it is released.  Even if a Community member has no
>> interest in using a feature release in production, early testing with
>> pre-release versions of feature releases will help identify
>> instabilities created by the new feature with their workloads and
>> hardware before the release is official. 
> 
> 
> Taking a few threads that have discussed recently, regarding the stability
of certain releases vs others, what maintenance branches are, what testing was
done, and "which branch should I use":
> 
> These questions, I think, should not need to be asked.  Which version of
MacOS should I use?  The latest one, period.
Interesting...   I _don''t_ run the latest version of MacOS, and I
distinctly recall people having a variety of issues with 10.7.0 when it was
released.  Does that mean the MacOS testing was insufficient?  Partly, but it is
unrealistic to test every possible usage pattern, so testing has to be
"optimized" to cover the most common use cases in order to be finished
within both time and cost constraints.
> Why can''t Lustre do the same thing?  The answer I think lies in
testing, which becomes a chicken and egg problem.   I''m only going to
use a "stable" release, which is the release which was tested with my
applications.  I know acceptance-small was run, and passed, on Master, otherwise
it wouldn''t be released.  Hopefully it even ran on a big system like
Hyperion.  (Do we learn anything more about running acc-sm on other big systems?
Probably not much.)
Right.  I don''t think that acc-sm is the end-all in testing frameworks,
and I freely admit that there is a lot more testing that could be done, both in
scale and in the types of loads that are used.  The acceptance-small.sh script
is intended to be an "optimized" test set that can run in a few hours
to give some reasonable confidence in a particular change.
>  But it certainly wasn''t tested with my application, because I
didn''t test it.  Because it wasn''t released yet.  Chicken and
egg.  Only after enough others make the leap am I willing to.
There are all kinds of other load/stress tests (including applications) that
can/should be run after the "basic" tests have been run to find new
defects.  When those defects are found they should be distilled down to a simple
and specific test that gets added to the regular regression suite.  I think it
is this kind of testing that is needed moving forward.
> So, it seems, we need to test pre-release versions of Lustre, aka Master,
with my applications.
I would caveat this to say - only test on tags which we know to be at least
reasonably stable, since a lot of testing time will be wasted otherwise.
> To that end, how willing are people to set aside a day, say once every two
months, to be "filesystem beta day".  Scientists, run your codes,
users, do your normal work, but bear in mind there may be filesystem
instabilities on that day.  Make sure your data is backed up.  Make sure
it''s not in the middle of a critical week-long run.  Accept that you
might have to re-run it tomorrow in the worst case.  Report any problems you
have.
I''m not sure that users will be willing to do this, though some
"friendly" users are known to make the leap onto new systems in order
to get early/free CPU cycles on new clusters.

There are also "feature tests" that need to be run at scale to
validate new features, to ensure they are functional at scale, don''t
impact performance, and experiencing the kids of race conditions that scale
testing provides.
> What you get out of it is a much more stable Master, and an end to the
question of "which version should I run".  When released, you have
confidence that you can move up, get the great new features and performance, and
it runs your applications.  More people are on the same release, so it sees even
more testing. The maintenance branch is always the latest branch, you can pull
in point releases with more bug fixes with ease. No more rolling your own Lustre
with Frankenstein sets of patches.  Latest and greatest and most stable.
> 
> Pipe dream?
I hope not.  When I see users taking a specific release of Lustre, testing it,
and then applying a patch series to their branch, the unfortunate result is more
effort for the user (vendor/site, not end users) to maintain their patches, and
more effort for support to determine if some _other_ bug is already fixed, or to
debug a problem that appears only with a specific combination of patches
applied, and then craft a different fix for that branch than the mainline.

A better use case would be for users to start testing _before_ a major release
is made, find/fix bugs, and merge the fixes into mainline, so when it is
released in a maintenance release it will already be quite stable.  This keeps
the user patchset much smaller, and everyone will benefit from fixes from other
testing before the release, and hopefully find fewer bugs in the field.  It also
avoids the issue of each user testing some cross-product of patches, and not
really leveraging each others testing.  Then, any bugs found in the field go
into the maintenance branch and master, but there is much less of a need to
"test" the maintenance branch, since the changes there should be
relatively small.

I think this is a reasonable approach, given that we no longer land features on
maintenance branches.  That means the risk of following maintenance releases is
much smaller than it was in the 1.6 and 1.8 days (1.8.x only really entered
"maintenance" mode with 1.8.6 or so).

We''ve been trying to follow this model with LLNL.  One issue is that
2.1.0 didn''t really receive as much up-front testing as it could have,
so it is getting more fixes than it should.  We are working hard to land all of
the LLNL (and other) bugfix patches into master and the next 2.1.x release.

There is a parallel effort to test orion (2.4 development branch) so that by the
time 2.4 rolls around (including features that are not in master or orion yet)
it will be relatively stable and does not need its own "test effort".

Are we at this nirvana yet?  Not quite, but I think we are closer than ever
before, and we have the chance to get there with a coordinated effort of the
community.

Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Lustre Engineer            http://www.whamcloud.com/

Christopher J. Morrone

2012-Jul-12 20:57 UTC

head link

[Lustre-devel] [cdwg] broader Lustre testing

On 07/12/2012 12:37 PM, Nathan Rutman wrote:>
> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>
> A more strategic solution is to do more testing of a feature release
> candidate _before_ it is released.  Even if a Community member has no
> interest in using a feature release in production, early testing with
> pre-release versions of feature releases will help identify
> instabilities created by the new feature with their workloads and
> hardware before the release is official.
>
>
> Taking a few threads that have discussed recently, regarding the stability
of certain releases vs others, what maintenance branches are, what testing was
done, and "which branch should I use":
> These questions, I think, should not need to be asked.  Which version of
MacOS should I use?  The latest one, period.  Why can''t Lustre do the
same thing?
Because we''re an open source project where all of our dirty laundry is 
in the public.  I''m sure that Apple has all kinds of internal deadlines
and testing tags and things that we don''t see on the outside world 
because it is a close-source proprietary product with vast resources to 
develop and test internally.

The every-six month cadence is a good thing in my opinion.  It forces us 
developers to regularly address the stability of the changes we are 
introducing.  It provides a clear, explicit time in the schedule for 
developers to stop writing new bugs, and focus their effort on fixing bugs.

I believe that the maintenance branch _is_ the place that you go when 
the question is "which version should I use"?  We just need to have a 
decent web page that says "Want Lustre? Here''s the latest stable 
release!"  We need to increase exposure of the maintence releases, and 
hid the "feature" releases off on a developers page.
> The answer I think lies in testing, which becomes a chicken and egg
problem.   I''m only going to use a "stable" release, which is
the release which was tested with my applications.  I know acceptance-small was
run, and passed, on Master, otherwise it wouldn''t be released. 
Hopefully it even ran on a big system like Hyperion.  (Do we learn anything more
about running acc-sm on other big systems?  Probably not much.)  But it
certainly wasn''t tested with my application, because I didn''t
test it.  Because it wasn''t released yet.  Chicken and egg.  Only after
enough others make the leap am I willing to.
> So, it seems, we need to test pre-release versions of Lustre, aka Master,
with my applications.  To that end, how willing are people to set aside a day,
say once every two months, to be "filesystem beta day".  Scientists,
run your codes, users, do your normal work, but bear in mind there may be
filesystem instabilities on that day.  Make sure your data is backed up.  Make
sure it''s not in the middle of a critical week-long run.  Accept that
you might have to re-run it tomorrow in the worst case.  Report any problems you
have.
> What you get out of it is a much more stable Master, and an end to the
question of "which version should I run".  When released, you have
confidence that you can move up, get the great new features and performance, and
it runs your applications.  More people are on the same release, so it sees even
more testing. The maintenance branch is always the latest branch, you can pull
in point releases with more bug fixes with ease. No more rolling your own Lustre
with Frankenstein sets of patches.  Latest and greatest and most stable.
We can do a great deal more testing, and find a seriously large amount 
of bugs that we have been missing by getting more testing personnel 
allocated to Lustre.  I think that''s the major gap in Lustre right now.

One day every two months is, I think, insufficient validating any 
software product, let alone something as complex as Lustre.  Not that I 
am opposed to the idea.  If you can arrange that, go for it!  But that 
isn''t good enough by itself by a long shot.

We need full time personnel working on testing lustre.  I would think 
that all of the vendors out there selling products to customers would 
already have alot of experience testing hardware, and other software 
bits.  Lets apply some of that know-how to Lustre!

And I think these testing personnel need to be made known to the 
community, so they can talk to each other, so that developers can guide 
their efforts, so we know what our testing converage looks like, etc.

Testing needs to be a CONTINUAL process, not just something we do at the 
end for a specific release number.  By the time we tag 2.4, it should 
already have been tested so frequently all along the master development 
cycle that the final testing will start to look like a formality to us. 
  We should still do it, of course, but we should have confidence long 
before that happens.

LLNL is trying to do that with the master branch as it moves to 2.4. 
Our coverage is mainly on zfs backends for now, but as the rest of orion 
lands on master, and Sequoia goes into limited production use we''ll
have
both zfs and ldiskfs filesystems in our testbed, and test regularly all 
the way up to, and beyond, 2.4.

The gaps in testing are NOT all an issue of insufficent scale testing, 
although there is admittedly a constant issue there.  We need much 
better testing at small scale as well.

And let me be really clear: when I say testing, I mean a real human 
being thinking up new tests all of the time.  Looking at logs all of the 
time (so even when the test app succeeded, we''ll catch the timeouts and
reconnections and things that should not be happening, and are symptoms 
of bugs).  Powering things off randomly.  Literally pulling cables out 
while an evil, pathologically bad IO workload is running.

We need real people to test all of the things that it is really easy for 
a human to do, and would take years for developers to automate with any 
reliability.

The automated regression suite that we use is great.  We should continue 
to improve that over time.  But I would content that it is not, and 
never will be, sufficient to tells us if Lustre is stable.

I would argue that the regressions tests are, in fact, a very low bar. 
And Lustre is just too complicated, networks are too complicated, we 
have too few developers, to ever come up with an automated suite with 
any thing but a relatively low confidence level in the stability of the 
software.

And human testers are given a very different set of goals then 
developers.  A developer''s job is to make things work.  A
tester''s is to
do whatever they can to break it.  And then create a good report of how 
they broke it so the developers can fix it.

I also agree that I don''t want to continue in this mode of
"we''ll only
run it when LLNL/ORNL runs it and says its good".  So we need more human 
testers.

And to get back to the topic of making every single release a "stable"
release:  That ignores the fact that we have roughly a decade of 
seriously buggy, undocumented code that we''re dealing with.  It just 
will not happen.  Period.  We have to accept that and move forward.

We can strive from this point on to make every release better than the 
last.  But developers are human.  Every time we add new features, we''re
going to add new bugs.  We''ll also fix bugs.  But we''re going
to add new
ones as well.

So we deal with that by having "maintenance" releases.  The
maintenance
release is maintained for a "long" period of time, but add NO new 
features.  No new support for new kernels.  No fantastic new performance 
improvements.  Just bug fixes.

The maintenance release is what vendors should build products upon, 
because that is where we''ll land only bug fixes.  So it is far more 
likely to only improve with time, whereas "master" (and therefore the 
"feature" releases which are just tags on master every 6 months), will
also introduce destabilizing new features.

We''ll endevour to make the new features as stable as we are capable of 
doing, and we can do better if we have more testers, but we have to be 
pragmatic.

"Every tag should be completely stable" is impossible.  "Every
tag on
the maintenance branch should be more stable than the last" is an 
achievable goal.

Chris

Roman Grigoryev

2012-Jul-16 18:31 UTC

head link

[Lustre-devel] [cdwg] broader Lustre testing

Hi Christopher,

.....>
> The automated regression suite that we use is great.  We should continue
> to improve that over time.  But I would content that it is not, and
> never will be, sufficient to tells us if Lustre is stable.
>
> I would argue that the regressions tests are, in fact, a very low bar.
> And Lustre is just too complicated, networks are too complicated, we
> have too few developers, to ever come up with an automated suite with
> any thing but a relatively low confidence level in the stability of the
> software.
>
> And human testers are given a very different set of goals then
> developers.  A developer''s job is to make things work.  A
tester''s is to
> do whatever they can to break it.  And then create a good report of how
> they broke it so the developers can fix it.
>.............

Just for proving your statement that it is not enough just execute
automated regression suite (acc-small) for testing quality I would like to
share coverage summary which we got:

958 tests was executed
                     Hit	Total	Coverage
Lines:	        79691	128935	61.8 %
Functions:	6206	7935	78.2 %
Branches:	49287	113914	43.3 %

Thanks,
	Roman

James A Simmons

2012-Jul-20 14:13 UTC

head link

[Lustre-devel] [cdwg] broader Lustre testing

On Thu, 2012-07-12 at 15:37 -0400, Nathan Rutman wrote:> 
> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
> 
> > A more strategic solution is to do more testing of a feature release
> > candidate _before_ it is released.  Even if a Community member has
> > no
> > interest in using a feature release in production, early testing
> > with
> > pre-release versions of feature releases will help identify
> > instabilities created by the new feature with their workloads and
> > hardware before the release is official. 
...> So, it seems, we need to test pre-release versions of Lustre, aka
> Master, with my applications.  To that end, how willing are people to
> set aside a day, say once every two months, to be "filesystem beta
> day".  Scientists, run your codes, users, do your normal work, but
> bear in mind there may be filesystem instabilities on that day.  Make
> sure your data is backed up.  Make sure it''s not in the middle of
a
> critical week-long run.  Accept that you might have to re-run it
> tomorrow in the worst case.  Report any problems you have.
> What you get out of it is a much more stable Master, and an end to the
> question of "which version should I run".  When released, you
have
> confidence that you can move up, get the great new features and
> performance, and it runs your applications.  More people are on the
> same release, so it sees even more testing. The maintenance branch is
> always the latest branch, you can pull in point releases with more bug
> fixes with ease. No more rolling your own Lustre with Frankenstein
> sets of patches.  Latest and greatest and most stable.
> 
> 
> Pipe dream?
Since people are now moving to help test out the current master branch
for whamcloud I like to purpose posting a general summary of testing
results people are seeing. I personally have finished a first run at
testing 2.2.91 this last week and would galdly share the results. Anyone
else can to share :-)

Nathan Rutman

2012-Jul-20 18:20 UTC

head link

[Lustre-devel] [cdwg] broader Lustre testing

On Jul 20, 2012, at 7:13 AM, James A Simmons wrote:
> On Thu, 2012-07-12 at 15:37 -0400, Nathan Rutman wrote:
>> 
>> On Jul 12, 2012, at 7:30 AM, John Carrier wrote:
>> 
>>> A more strategic solution is to do more testing of a feature
release
>>> candidate _before_ it is released.  Even if a Community member has
>>> no
>>> interest in using a feature release in production, early testing
>>> with
>>> pre-release versions of feature releases will help identify
>>> instabilities created by the new feature with their workloads and
>>> hardware before the release is official. 
> 
> ...
>> So, it seems, we need to test pre-release versions of Lustre, aka
>> Master, with my applications.  To that end, how willing are people to
>> set aside a day, say once every two months, to be "filesystem beta
>> day".  Scientists, run your codes, users, do your normal work, but
>> bear in mind there may be filesystem instabilities on that day.  Make
>> sure your data is backed up.  Make sure it''s not in the middle
of a
>> critical week-long run.  Accept that you might have to re-run it
>> tomorrow in the worst case.  Report any problems you have.
>> What you get out of it is a much more stable Master, and an end to the
>> question of "which version should I run".  When released, you
have
>> confidence that you can move up, get the great new features and
>> performance, and it runs your applications.  More people are on the
>> same release, so it sees even more testing. The maintenance branch is
>> always the latest branch, you can pull in point releases with more bug
>> fixes with ease. No more rolling your own Lustre with Frankenstein
>> sets of patches.  Latest and greatest and most stable.
>> 
>> 
>> Pipe dream?
> 
> Since people are now moving to help test out the current master branch
> for whamcloud I like to purpose posting a general summary of testing
> results people are seeing. I personally have finished a first run at
> testing 2.2.91 this last week and would galdly share the results. Anyone
> else can to share :-)
> 
> 
I started a page on the OpenSFS Wiki for everyone to share their test results in
a free-form format.  Note that the Wiki itself is still in it''s infancy
- I call on the community to help populate it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20120720/18bf46fe/attachment.html

Lustre devel - Jul 2012 - broader Lustre testing

[Lustre-devel] broader Lustre testing

[Lustre-devel] broader Lustre testing

[Lustre-devel] broader Lustre testing

[Lustre-devel] [cdwg] broader Lustre testing

[Lustre-devel] [cdwg] broader Lustre testing

[Lustre-devel] [cdwg] broader Lustre testing

[Lustre-devel] [cdwg] broader Lustre testing

[Lustre-devel] [cdwg] broader Lustre testing