On Jul 12, 2012, at 7:30 AM, John Carrier wrote:> A more strategic solution is to do more testing of a feature release > candidate _before_ it is released. Even if a Community member has no > interest in using a feature release in production, early testing with > pre-release versions of feature releases will help identify > instabilities created by the new feature with their workloads and > hardware before the release is official.Taking a few threads that have discussed recently, regarding the stability of certain releases vs others, what maintenance branches are, what testing was done, and "which branch should I use": These questions, I think, should not need to be asked. Which version of MacOS should I use? The latest one, period. Why can''t Lustre do the same thing? The answer I think lies in testing, which becomes a chicken and egg problem. I''m only going to use a "stable" release, which is the release which was tested with my applications. I know acceptance-small was run, and passed, on Master, otherwise it wouldn''t be released. Hopefully it even ran on a big system like Hyperion. (Do we learn anything more about running acc-sm on other big systems? Probably not much.) But it certainly wasn''t tested with my application, because I didn''t test it. Because it wasn''t released yet. Chicken and egg. Only after enough others make the leap am I willing to. So, it seems, we need to test pre-release versions of Lustre, aka Master, with my applications. To that end, how willing are people to set aside a day, say once every two months, to be "filesystem beta day". Scientists, run your codes, users, do your normal work, but bear in mind there may be filesystem instabilities on that day. Make sure your data is backed up. Make sure it''s not in the middle of a critical week-long run. Accept that you might have to re-run it tomorrow in the worst case. Report any problems you have. What you get out of it is a much more stable Master, and an end to the question of "which version should I run". When released, you have confidence that you can move up, get the great new features and performance, and it runs your applications. More people are on the same release, so it sees even more testing. The maintenance branch is always the latest branch, you can pull in point releases with more bug fixes with ease. No more rolling your own Lustre with Frankenstein sets of patches. Latest and greatest and most stable. Pipe dream? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20120712/2fed6c1d/attachment.html
On Jul 12, 2012, at 12:48 PM, Bruce Korb wrote:> Hi Nathan,> On 2012-07-12, at 20:37, Nathan Rutman <nathan_rutman at xyratex.com> wrote: > >> >> On Jul 12, 2012, at 7:30 AM, John Carrier wrote: >> >>> A more strategic solution is to do more testing of a feature release >>> candidate _before_ it is released. Even if a Community member has no >>> interest in using a feature release in production, early testing with >>> pre-release versions of feature releases will help identify >>> instabilities created by the new feature with their workloads and >>> hardware before the release is official. >> >> >> Taking a few threads that have discussed recently, regarding the stability of certain releases vs others, what maintenance branches are, what testing was done, and "which branch should I use": >> These questions, I think, should not need to be asked. Which version of MacOS should I use? The latest one, period. Why can''t Lustre do the same thing? The answer I think lies in testing, which becomes a chicken and egg problem. I''m only going to use a "stable" release, which is the release which was tested with my applications. I know acceptance-small was run, and passed, on Master, otherwise it wouldn''t be released. Hopefully it even ran on a big system like Hyperion. (Do we learn anything more about running acc-sm on other big systems? Probably not much.) But it certainly wasn''t tested with my application, because I didn''t test it. Because it wasn''t released yet. Chicken and egg. Only after enough others make the leap am I willing to. >> So, it seems, we need to test pre-release versions of Lustre, aka Master, with my applications. To that end, how willing are people to set aside a day, say once every two months, to be "filesystem beta day". Scientists, run your codes, users, do your normal work, but bear in mind there may be filesystem instabilities on that day. Make sure your data is backed up. Make sure it''s not in the middle of a critical week-long run. Accept that you might have to re-run it tomorrow in the worst case. Report any problems you have. >> What you get out of it is a much more stable Master, and an end to the question of "which version should I run". When released, you have confidence that you can move up, get the great new features and performance, and it runs your applications. More people are on the same release, so it sees even more testing. The maintenance branch is always the latest branch, you can pull in point releases with more bug fixes with ease. No more rolling your own Lustre with Frankenstein sets of patches. Latest and greatest and most stable. >> >> Pipe dream?On Jul 12, 2012, at 12:48 PM, Bruce Korb wrote:> > _I_ think so. You might get a few customers to say, "yes" but > never be able to find the appropriate round tuit. A more fruitful > approach might be to solicit customer acceptance tests. Presumably, > they''ve written them to hit the wrinkles that they tend to stub > their toes on. And there may be exceptions, too. (e.g. Cray might > well actually do some pre-testing -- they, too, have paying customers.) >I have no aversion to customers writing and supplying their own acceptance tests, but I think that approach doesn''t work for many of the cases: - acceptance tests may not exist; acceptance may simply be testing with large production codes - tests that run in a particular environment need to be significantly generalized - tests may not be sharable for various legal reasons This also doesn''t have to be an all-or-nothing proposition -- interested parties will be able to use the latest features, and will help contribute to the stability of Master, and will help reduce the "spread" of deployed systems, in a positive feedback loop. Yes, absolutely, this is effort on the part of Lustre users. But it can be balanced by the savings of efforts in roll-your-own, and risk reduction. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20120712/3da68586/attachment.html
There are certainly examples of this working for other products, for example (it''s been a good number of years) at one time the main QA benchmark for the Oracle database was a customer-furnished test (the ''Churchill'' test) which exercised the database throughly. It would also be useful to have data from those using standard IO tests (IOR, iozone, etc) as we could easily expand the existing tests with different parameter sets. However, in the HPC space, i suspect obtaining/generating the data set needed to replicate some customer situations would be a challenge. cliffw On Thu, Jul 12, 2012 at 1:07 PM, Nathan Rutman <nathan_rutman at xyratex.com>wrote:> > On Jul 12, 2012, at 12:48 PM, Bruce Korb wrote: > > Hi Nathan, > > > On 2012-07-12, at 20:37, Nathan Rutman <nathan_rutman at xyratex.com> wrote: > > > On Jul 12, 2012, at 7:30 AM, John Carrier wrote: > > A more strategic solution is to do more testing of a feature release > candidate _before_ it is released. Even if a Community member has no > interest in using a feature release in production, early testing with > pre-release versions of feature releases will help identify > instabilities created by the new feature with their workloads and > hardware before the release is official. > > > > Taking a few threads that have discussed recently, regarding the stability > of certain releases vs others, what maintenance branches are, what testing > was done, and "which branch should I use": > These questions, I think, should not need to be asked. Which version of > MacOS should I use? The latest one, period. Why can''t Lustre do the same > thing? The answer I think lies in testing, which becomes a chicken and egg > problem. I''m only going to use a "stable" release, which is the release > which was tested *with my applications*. I know acceptance-small was > run, and passed, on Master, otherwise it wouldn''t be released. Hopefully > it even ran on a big system like Hyperion. (Do we learn anything more > about running acc-sm on other big systems? Probably not much.) But it > certainly wasn''t tested with my application, because I didn''t test it. > Because it wasn''t released yet. Chicken and egg. Only after enough > others make the leap am I willing to. > So, it seems, we need to test pre-release versions of Lustre, aka Master, > *with my applications*. To that end, how willing are people to set aside > a day, say once every two months, to be "filesystem beta day". Scientists, > run your codes, users, do your normal work, but bear in mind there may be > filesystem instabilities on that day. Make sure your data is backed up. > Make sure it''s not in the middle of a critical week-long run. Accept that > you might have to re-run it tomorrow in the worst case. Report any > problems you have. > What you get out of it is a much more stable Master, and an end to the > question of "which version should I run". When released, you have > confidence that you can move up, get the great new features and > performance, and it runs your applications. More people are on the same > release, so it sees even more testing. The maintenance branch is always the > latest branch, you can pull in point releases with more bug fixes with > ease. No more rolling your own Lustre with Frankenstein sets of patches. > Latest and greatest and most stable. > > Pipe dream? > > > On Jul 12, 2012, at 12:48 PM, Bruce Korb wrote: > > > _I_ think so. You might get a few customers to say, "yes" but > never be able to find the appropriate round tuit. A more fruitful > approach might be to solicit customer acceptance tests. Presumably, > they''ve written them to hit the wrinkles that they tend to stub > their toes on. And there may be exceptions, too. (e.g. Cray might > well actually do some pre-testing -- they, too, have paying customers.) > > > I have no aversion to customers writing and supplying their own acceptance > tests, but I think that approach doesn''t work for many of the cases: > - acceptance tests may not exist; acceptance may simply be testing with > large production codes > - tests that run in a particular environment need to be significantly > generalized > - tests may not be sharable for various legal reasons > > This also doesn''t have to be an all-or-nothing proposition -- interested > parties will be able to use the latest features, and will help contribute > to the stability of Master, and will help reduce the "spread" of deployed > systems, in a positive feedback loop. > > Yes, absolutely, this is effort on the part of Lustre users. But it can > be balanced by the savings of efforts in roll-your-own, and risk reduction. > > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel > >-- cliffw Support Guy WhamCloud, Inc. www.whamcloud.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20120712/231a9543/attachment-0001.html
On 2012-07-12, at 1:37 PM, Nathan Rutman wrote:> On Jul 12, 2012, at 7:30 AM, John Carrier wrote: >> A more strategic solution is to do more testing of a feature release >> candidate _before_ it is released. Even if a Community member has no >> interest in using a feature release in production, early testing with >> pre-release versions of feature releases will help identify >> instabilities created by the new feature with their workloads and >> hardware before the release is official. > > > Taking a few threads that have discussed recently, regarding the stability of certain releases vs others, what maintenance branches are, what testing was done, and "which branch should I use": > > These questions, I think, should not need to be asked. Which version of MacOS should I use? The latest one, period.Interesting... I _don''t_ run the latest version of MacOS, and I distinctly recall people having a variety of issues with 10.7.0 when it was released. Does that mean the MacOS testing was insufficient? Partly, but it is unrealistic to test every possible usage pattern, so testing has to be "optimized" to cover the most common use cases in order to be finished within both time and cost constraints.> Why can''t Lustre do the same thing? The answer I think lies in testing, which becomes a chicken and egg problem. I''m only going to use a "stable" release, which is the release which was tested with my applications. I know acceptance-small was run, and passed, on Master, otherwise it wouldn''t be released. Hopefully it even ran on a big system like Hyperion. (Do we learn anything more about running acc-sm on other big systems? Probably not much.)Right. I don''t think that acc-sm is the end-all in testing frameworks, and I freely admit that there is a lot more testing that could be done, both in scale and in the types of loads that are used. The acceptance-small.sh script is intended to be an "optimized" test set that can run in a few hours to give some reasonable confidence in a particular change.> But it certainly wasn''t tested with my application, because I didn''t test it. Because it wasn''t released yet. Chicken and egg. Only after enough others make the leap am I willing to.There are all kinds of other load/stress tests (including applications) that can/should be run after the "basic" tests have been run to find new defects. When those defects are found they should be distilled down to a simple and specific test that gets added to the regular regression suite. I think it is this kind of testing that is needed moving forward.> So, it seems, we need to test pre-release versions of Lustre, aka Master, with my applications.I would caveat this to say - only test on tags which we know to be at least reasonably stable, since a lot of testing time will be wasted otherwise.> To that end, how willing are people to set aside a day, say once every two months, to be "filesystem beta day". Scientists, run your codes, users, do your normal work, but bear in mind there may be filesystem instabilities on that day. Make sure your data is backed up. Make sure it''s not in the middle of a critical week-long run. Accept that you might have to re-run it tomorrow in the worst case. Report any problems you have.I''m not sure that users will be willing to do this, though some "friendly" users are known to make the leap onto new systems in order to get early/free CPU cycles on new clusters. There are also "feature tests" that need to be run at scale to validate new features, to ensure they are functional at scale, don''t impact performance, and experiencing the kids of race conditions that scale testing provides.> What you get out of it is a much more stable Master, and an end to the question of "which version should I run". When released, you have confidence that you can move up, get the great new features and performance, and it runs your applications. More people are on the same release, so it sees even more testing. The maintenance branch is always the latest branch, you can pull in point releases with more bug fixes with ease. No more rolling your own Lustre with Frankenstein sets of patches. Latest and greatest and most stable. > > Pipe dream?I hope not. When I see users taking a specific release of Lustre, testing it, and then applying a patch series to their branch, the unfortunate result is more effort for the user (vendor/site, not end users) to maintain their patches, and more effort for support to determine if some _other_ bug is already fixed, or to debug a problem that appears only with a specific combination of patches applied, and then craft a different fix for that branch than the mainline. A better use case would be for users to start testing _before_ a major release is made, find/fix bugs, and merge the fixes into mainline, so when it is released in a maintenance release it will already be quite stable. This keeps the user patchset much smaller, and everyone will benefit from fixes from other testing before the release, and hopefully find fewer bugs in the field. It also avoids the issue of each user testing some cross-product of patches, and not really leveraging each others testing. Then, any bugs found in the field go into the maintenance branch and master, but there is much less of a need to "test" the maintenance branch, since the changes there should be relatively small. I think this is a reasonable approach, given that we no longer land features on maintenance branches. That means the risk of following maintenance releases is much smaller than it was in the 1.6 and 1.8 days (1.8.x only really entered "maintenance" mode with 1.8.6 or so). We''ve been trying to follow this model with LLNL. One issue is that 2.1.0 didn''t really receive as much up-front testing as it could have, so it is getting more fixes than it should. We are working hard to land all of the LLNL (and other) bugfix patches into master and the next 2.1.x release. There is a parallel effort to test orion (2.4 development branch) so that by the time 2.4 rolls around (including features that are not in master or orion yet) it will be relatively stable and does not need its own "test effort". Are we at this nirvana yet? Not quite, but I think we are closer than ever before, and we have the chance to get there with a coordinated effort of the community. Cheers, Andreas -- Andreas Dilger Whamcloud, Inc. Principal Lustre Engineer http://www.whamcloud.com/
On 07/12/2012 12:37 PM, Nathan Rutman wrote:> > On Jul 12, 2012, at 7:30 AM, John Carrier wrote: > > A more strategic solution is to do more testing of a feature release > candidate _before_ it is released. Even if a Community member has no > interest in using a feature release in production, early testing with > pre-release versions of feature releases will help identify > instabilities created by the new feature with their workloads and > hardware before the release is official. > > > Taking a few threads that have discussed recently, regarding the stability of certain releases vs others, what maintenance branches are, what testing was done, and "which branch should I use": > These questions, I think, should not need to be asked. Which version of MacOS should I use? The latest one, period. Why can''t Lustre do the same thing?Because we''re an open source project where all of our dirty laundry is in the public. I''m sure that Apple has all kinds of internal deadlines and testing tags and things that we don''t see on the outside world because it is a close-source proprietary product with vast resources to develop and test internally. The every-six month cadence is a good thing in my opinion. It forces us developers to regularly address the stability of the changes we are introducing. It provides a clear, explicit time in the schedule for developers to stop writing new bugs, and focus their effort on fixing bugs. I believe that the maintenance branch _is_ the place that you go when the question is "which version should I use"? We just need to have a decent web page that says "Want Lustre? Here''s the latest stable release!" We need to increase exposure of the maintence releases, and hid the "feature" releases off on a developers page.> The answer I think lies in testing, which becomes a chicken and egg problem. I''m only going to use a "stable" release, which is the release which was tested with my applications. I know acceptance-small was run, and passed, on Master, otherwise it wouldn''t be released. Hopefully it even ran on a big system like Hyperion. (Do we learn anything more about running acc-sm on other big systems? Probably not much.) But it certainly wasn''t tested with my application, because I didn''t test it. Because it wasn''t released yet. Chicken and egg. Only after enough others make the leap am I willing to. > So, it seems, we need to test pre-release versions of Lustre, aka Master, with my applications. To that end, how willing are people to set aside a day, say once every two months, to be "filesystem beta day". Scientists, run your codes, users, do your normal work, but bear in mind there may be filesystem instabilities on that day. Make sure your data is backed up. Make sure it''s not in the middle of a critical week-long run. Accept that you might have to re-run it tomorrow in the worst case. Report any problems you have. > What you get out of it is a much more stable Master, and an end to the question of "which version should I run". When released, you have confidence that you can move up, get the great new features and performance, and it runs your applications. More people are on the same release, so it sees even more testing. The maintenance branch is always the latest branch, you can pull in point releases with more bug fixes with ease. No more rolling your own Lustre with Frankenstein sets of patches. Latest and greatest and most stable.We can do a great deal more testing, and find a seriously large amount of bugs that we have been missing by getting more testing personnel allocated to Lustre. I think that''s the major gap in Lustre right now. One day every two months is, I think, insufficient validating any software product, let alone something as complex as Lustre. Not that I am opposed to the idea. If you can arrange that, go for it! But that isn''t good enough by itself by a long shot. We need full time personnel working on testing lustre. I would think that all of the vendors out there selling products to customers would already have alot of experience testing hardware, and other software bits. Lets apply some of that know-how to Lustre! And I think these testing personnel need to be made known to the community, so they can talk to each other, so that developers can guide their efforts, so we know what our testing converage looks like, etc. Testing needs to be a CONTINUAL process, not just something we do at the end for a specific release number. By the time we tag 2.4, it should already have been tested so frequently all along the master development cycle that the final testing will start to look like a formality to us. We should still do it, of course, but we should have confidence long before that happens. LLNL is trying to do that with the master branch as it moves to 2.4. Our coverage is mainly on zfs backends for now, but as the rest of orion lands on master, and Sequoia goes into limited production use we''ll have both zfs and ldiskfs filesystems in our testbed, and test regularly all the way up to, and beyond, 2.4. The gaps in testing are NOT all an issue of insufficent scale testing, although there is admittedly a constant issue there. We need much better testing at small scale as well. And let me be really clear: when I say testing, I mean a real human being thinking up new tests all of the time. Looking at logs all of the time (so even when the test app succeeded, we''ll catch the timeouts and reconnections and things that should not be happening, and are symptoms of bugs). Powering things off randomly. Literally pulling cables out while an evil, pathologically bad IO workload is running. We need real people to test all of the things that it is really easy for a human to do, and would take years for developers to automate with any reliability. The automated regression suite that we use is great. We should continue to improve that over time. But I would content that it is not, and never will be, sufficient to tells us if Lustre is stable. I would argue that the regressions tests are, in fact, a very low bar. And Lustre is just too complicated, networks are too complicated, we have too few developers, to ever come up with an automated suite with any thing but a relatively low confidence level in the stability of the software. And human testers are given a very different set of goals then developers. A developer''s job is to make things work. A tester''s is to do whatever they can to break it. And then create a good report of how they broke it so the developers can fix it. I also agree that I don''t want to continue in this mode of "we''ll only run it when LLNL/ORNL runs it and says its good". So we need more human testers. And to get back to the topic of making every single release a "stable" release: That ignores the fact that we have roughly a decade of seriously buggy, undocumented code that we''re dealing with. It just will not happen. Period. We have to accept that and move forward. We can strive from this point on to make every release better than the last. But developers are human. Every time we add new features, we''re going to add new bugs. We''ll also fix bugs. But we''re going to add new ones as well. So we deal with that by having "maintenance" releases. The maintenance release is maintained for a "long" period of time, but add NO new features. No new support for new kernels. No fantastic new performance improvements. Just bug fixes. The maintenance release is what vendors should build products upon, because that is where we''ll land only bug fixes. So it is far more likely to only improve with time, whereas "master" (and therefore the "feature" releases which are just tags on master every 6 months), will also introduce destabilizing new features. We''ll endevour to make the new features as stable as we are capable of doing, and we can do better if we have more testers, but we have to be pragmatic. "Every tag should be completely stable" is impossible. "Every tag on the maintenance branch should be more stable than the last" is an achievable goal. Chris
Hi Christopher, .....> > The automated regression suite that we use is great. We should continue > to improve that over time. But I would content that it is not, and > never will be, sufficient to tells us if Lustre is stable. > > I would argue that the regressions tests are, in fact, a very low bar. > And Lustre is just too complicated, networks are too complicated, we > have too few developers, to ever come up with an automated suite with > any thing but a relatively low confidence level in the stability of the > software. > > And human testers are given a very different set of goals then > developers. A developer''s job is to make things work. A tester''s is to > do whatever they can to break it. And then create a good report of how > they broke it so the developers can fix it. >............. Just for proving your statement that it is not enough just execute automated regression suite (acc-small) for testing quality I would like to share coverage summary which we got: 958 tests was executed Hit Total Coverage Lines: 79691 128935 61.8 % Functions: 6206 7935 78.2 % Branches: 49287 113914 43.3 % Thanks, Roman
On Thu, 2012-07-12 at 15:37 -0400, Nathan Rutman wrote:> > On Jul 12, 2012, at 7:30 AM, John Carrier wrote: > > > A more strategic solution is to do more testing of a feature release > > candidate _before_ it is released. Even if a Community member has > > no > > interest in using a feature release in production, early testing > > with > > pre-release versions of feature releases will help identify > > instabilities created by the new feature with their workloads and > > hardware before the release is official....> So, it seems, we need to test pre-release versions of Lustre, aka > Master, with my applications. To that end, how willing are people to > set aside a day, say once every two months, to be "filesystem beta > day". Scientists, run your codes, users, do your normal work, but > bear in mind there may be filesystem instabilities on that day. Make > sure your data is backed up. Make sure it''s not in the middle of a > critical week-long run. Accept that you might have to re-run it > tomorrow in the worst case. Report any problems you have. > What you get out of it is a much more stable Master, and an end to the > question of "which version should I run". When released, you have > confidence that you can move up, get the great new features and > performance, and it runs your applications. More people are on the > same release, so it sees even more testing. The maintenance branch is > always the latest branch, you can pull in point releases with more bug > fixes with ease. No more rolling your own Lustre with Frankenstein > sets of patches. Latest and greatest and most stable. > > > Pipe dream?Since people are now moving to help test out the current master branch for whamcloud I like to purpose posting a general summary of testing results people are seeing. I personally have finished a first run at testing 2.2.91 this last week and would galdly share the results. Anyone else can to share :-)
On Jul 20, 2012, at 7:13 AM, James A Simmons wrote:> On Thu, 2012-07-12 at 15:37 -0400, Nathan Rutman wrote: >> >> On Jul 12, 2012, at 7:30 AM, John Carrier wrote: >> >>> A more strategic solution is to do more testing of a feature release >>> candidate _before_ it is released. Even if a Community member has >>> no >>> interest in using a feature release in production, early testing >>> with >>> pre-release versions of feature releases will help identify >>> instabilities created by the new feature with their workloads and >>> hardware before the release is official. > > ... >> So, it seems, we need to test pre-release versions of Lustre, aka >> Master, with my applications. To that end, how willing are people to >> set aside a day, say once every two months, to be "filesystem beta >> day". Scientists, run your codes, users, do your normal work, but >> bear in mind there may be filesystem instabilities on that day. Make >> sure your data is backed up. Make sure it''s not in the middle of a >> critical week-long run. Accept that you might have to re-run it >> tomorrow in the worst case. Report any problems you have. >> What you get out of it is a much more stable Master, and an end to the >> question of "which version should I run". When released, you have >> confidence that you can move up, get the great new features and >> performance, and it runs your applications. More people are on the >> same release, so it sees even more testing. The maintenance branch is >> always the latest branch, you can pull in point releases with more bug >> fixes with ease. No more rolling your own Lustre with Frankenstein >> sets of patches. Latest and greatest and most stable. >> >> >> Pipe dream? > > Since people are now moving to help test out the current master branch > for whamcloud I like to purpose posting a general summary of testing > results people are seeing. I personally have finished a first run at > testing 2.2.91 this last week and would galdly share the results. Anyone > else can to share :-) > >I started a page on the OpenSFS Wiki for everyone to share their test results in a free-form format. Note that the Wiki itself is still in it''s infancy - I call on the community to help populate it. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20120720/18bf46fe/attachment.html