thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre.org Community Forum

If this information is useful, please help other people find it:
Share via:

Peter Bojanic

2006-Nov-15 22:18 UTC

[Lustre-discuss] Lustre.org Community Forum - 2006-11-13

Thanks to all the Lustre users who joined us for this meeting in
Tampa, Florida. Please find attached slides from the discussion. I''ve
also attached Jody McIntyre''s notes.

I do a lot of talking in these meetings, but I expect that to change
in the future with more participation by Lustre community members and
other CFS engineers. If you have any suggested topics for future
Lustre.org meetings, please let me know.

Thanks also to Brent Gorda, LLNL, for hosting an interesting and
informative Lustre BOF on Tuesday morning. It was well work getting
up early to attend;)

Cheers,
Peter

P.S. I''m fairly certain that "blue shirt guy, front" is Brent
Gorda,
LLNL. If anyone else cares to self-identify based on their shirt
colour and dialog, please feel free.

Lustre.org Community Forum, 2006-11-13
~~~~~~~~~~~~~~~~~~~~~~~~~~

"Open Source Strategy"

pbojanic: Any thoughts on what is needed to support this type of
development?

blue golf shirt guy: open bug tracking, so users can tell if anyone
else has seen a bug before.

kevinc: customers would like the ability to comment on design as we go
along.

pbojanic: hears about private comments (and the fact that we use them)
all the time.

white golf shirt guy: it would be nice to know what other people are
thinking about regarding needed features

Adam Boggs: This will prevent duplicated effort.

Evan Felix: We would like this to occur on a public list,
rather than on an internal list with a feature that shows up at the
end sometime.

Adam Boggs: Having seen a lot of the code, one of the difficulties is
the complexity and intricacy of the codebase, which implies a high
level of committment to making contributions and means a lot of
testing is required - there should be some sort of testing to assure
quality of outside contributions.

--
"Release Model"

plaid shirt guy in front right: How are you going to do testing
without hardware?

pbojanic: customers will help with testing

plaid shirt guy in front right: is this what customers want?

blue golf shirt guy: yes.. running a test cluster is part of having a
reliable production system.

Lee Ward: how do you get customers to sign up for testing and make
sure they do this?

pbojanic: our current testing infrastructure is woefully inadequate -
deploying ltest externally is practically impossible. We are
re-engineering a test environment. We will share our
requirements/etc, get input, for a unified test framework that can run
on developers'' workstations, the CFS test clusters, and on
customers''
systems for system-wide and system-scale testing.

blue shirt guy, front: Can you walk us through a release? How much
testing does it get?

pbojanic: what we do today is not what we want to be doing in the
future. 1.6.0 is where we''ve drawn the line. It hasn''t been
released
yet because we want it to be tested differently: has to have been
tested at LLNL, on an XT3, etc. We haven''t been able to coordinate
this yet. Today: a series of regression tests on our test cluster.

blue shirt guy, front: what model are we going to? What if Sandia or
LLNL do not have time to test a release?

pbojanic: Yes, the release will be held up. We need to test it
before putting up a major release.

blue shirt guy, front: that''s an interesting dependency.

Lee Ward: I think this is a mistake - because customers can not do
this on your schedule.

Adam Boggs: the challenge will be if things blow up, tracking down the
problem/etc could be a nightmare for the customer.

pink shirt guy, front row: What are the key issues for
testing?

pbojanic: the key issue is the hardware. We are willing to allocate
the resources to do the testing - it''s a lot easier to hire a US
citizen who can login to an XT3 to obtain access to an XT3.

Lee Ward: allow your customers to "sign" a release to give confidence
to others.

pbojanic: Good suggestion - perhaps if LLNL doesn''t have time but 3
major partners have signed it, this might be OK for most.

Adam Boggs: Having it be web-observable would be useful.

white golf shirt guy: would be nice to know what tests have
passed/failed - visible on the web.

--
"Test System"

Lee Ward: this policy is counterproductive with your other slides -
this is antagonistic since you will not open your test system.

pbojanic: the tests themselves will be completely open.

Lee Ward: if the test framework is not open, the bar is much higher
for anyone who wants to do testing who does not have access.

pbojanic: not sure how to address this point...

Lee Ward: you''re shooting yourself in the foot - your value
isn''t in
QA. If I can''t easily run everyone''s tests, I will only run
mine.

pbojanic: If the tool for collecting results were not open, would that
be OK?

Lee Ward: ...maybe..

very pink shirt guy: (I missed his point)

pbojanic: thanks for the input

Adam Boggs: Will the tests be packaged with the Lustre source tree?
Will it have the same development model?

pbojanic: yes..

Adam Boggs: good, will eliminate the need for version matching of
tests vs. lustre version.

LLNL guy next to Evan: Will performance be part of the test suite?

pbojanic: Yes.

--
"Benchmarking Web Site"

Adam Boggs: It could be useful to have some statistical confidence
with benchmarks - is there dramatic benchmarking over repeated runs of
the same benchmark?

blue golf shirt guy: This would be helpful for us to know if we''re
getting reasonable results from our hardware.

Adam Boggs: potential bragging rights.

Lee Ward: will you winnow the results?

kevinc: unlikely.

Lee Ward: your competitors also have bugs - if you''re going to put up
graphs, they''re not reasonable if you throw out the ones you
don''t
like.

Adam Boggs: statistical confidence may help with this issue.

kevinc: in general, people want to see how well things run.

Lee Ward: there are 2 ways to use this system - for marketing, or for
a potential customer to see how Lustre will perform on their
configuration.

pbojanic: we aim to generate a library, perhaps with commentary on
negative results.

Lee Ward: this will just generate work for you. I have been
generating 20 graphs a month, just from Sandia. Would you be
comfortable with this?

pbojanic: yes, we are. If there are bad points, we want to know why.

--
"CFS Strategic Priorities"

pbojanic: (described zerocopy fixes)

pink tshirt guy: which upstream kernel version introduced this bug?

pbojanic: sometime in 2.4; well documented in bug 10089

Lee Ward: Does CFS have multiple teams?

pbojanic: ...

Lee Ward: How many people do these teams represent?

pbojanic: ~60 engineers, including QA.

--
(conclusion)

Adam Boggs: What timeframe are you looking at for these changes?

pbojanic: ASAP, based on sysadmins, however this will be a bit of a
culture shift internally for us. No timeframe on the SVN repository.

pink shirt guy behind me: Will you still maintain patches if metadata
is still faster with patched systems.

pbojanic: unknown. We are quite confident we will be able to close
the gap.

SFS guy in the back: to the people in the room: please contribute
success stories as well as problems openly.

Lee Ward: Well, it _runs_ on a 10000 node cluster... The fact that the
national labs run Lustre and not something else says a lot.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Lustre.org-SC06DiscussionSlides-061113.pdf
Type: application/pdf
Size: 49731 bytes
Desc: not available
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20061115/d2ce9ef0/Lustre.org-SC06DiscussionSlides-061113-0001.pdf

Lustre discuss - Nov 2006 - Lustre.org Community Forum - 2006-11-13

[Lustre-discuss] Lustre.org Community Forum - 2006-11-13