It comes up on this list from time to time that there's not sufficient documentation on troubleshooting. I assume that's what some people mean when they refer to disappointing documentation as the current documentation is far more detailed and useful than it was 3 years ago when I got started. I'm not really sure what's being asked for here, nor am I sure how one would document how to troubleshoot. In my mind, if there's a trouble that can be documented with a clear path to resolution, then a bug report should be filed and that should be fixed. Any other cases that cannot be coded for require human intervention and are already documented. Please tell me your thoughts.
Joe Julian <joe at julianfamily.org> writes:> It comes up on this list from time to time that there's not sufficient > documentation on troubleshooting. I assume that's what some people > mean when they refer to disappointing documentation as the current > documentation is far more detailed and useful than it was 3 years ago > when I got started. I'm not really sure what's being asked for here, > nor am I sure how one would document how to troubleshoot. In my mind, > if there's a trouble that can be documented with a clear path to > resolution, then a bug report should be filed and that should be > fixed. Any other cases that cannot be coded for require human > intervention and are already documented.It is true that the documentation has gotten better. However, since the switch to the new release cycle, bugs don't seem to get fixed (within a release) and the documentation could do a better job listing some of the holes new users starting with the current GA will likely fall into: Examples: - Don't use ext4 (https://bugzilla.redhat.com/show_bug.cgi?id=838784) - Don't use fix-layout after adding a brick (https://bugzilla.redhat.com/show_bug.cgi?id=913699), maybe fixed by 10617e6cbc73329f259b471327d88375352042b0 in 3.3.1 but: - Don't upgrade from 3.3 to 3.3.1 if you need NFS (https://bugzilla.redhat.com/show_bug.cgi?id=893778) 1. Perhaps a wiki entry like "Known Issues" with links to all these bugs? 2. Copying the excellent info about gluster's xattrs from this blog post (http://cloudfs.org/2011/04/glusterfs-extended-attributes/) into the admin guide would be a start. 3. A brief guide on how to collect info on problematic files (permissions, xattrs, client log, brick log) would probably help generate more helpful bug reports and help users sort out many of their own problems. It's all stuff you pickup after you've been in the game for a while, but they must really flummox new users. -- Shawn Nock (OpenPGP: 0x65118FA5) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130305/63e1f674/attachment.sig>
On Tue, Mar 05, 2013 at 08:33:28AM -0800, Joe Julian wrote:> It comes up on this list from time to time that there's not > sufficient documentation on troubleshooting. I assume that's what > some people mean when they refer to disappointing documentation as > the current documentation is far more detailed and useful than it > was 3 years ago when I got started. I'm not really sure what's being > asked for here, nor am I sure how one would document how to > troubleshoot. In my mind, if there's a trouble that can be > documented with a clear path to resolution, then a bug report should > be filed and that should be fixed. Any other cases that cannot be > coded for require human intervention and are already documented.When people come to this list and say "I am seeing split brain errors" or "ls shows question marks for file attributes" or "I need to replace a failed server with a new one" or "probing a server fails", I don't think there's any official documentation to help them. "Documenting how to troubleshoot" would include what log messages you should look for and what they mean, what xattrs you should expect to see on the bricks and what they mean (for each case of distributed, replicated etc). Given a basic checklist of these things, it would be easy for users to report to the list "I checked A, B and C and the output from B was XXXX when the docs say it should be YYYY on a working system", which is at least a starting point. As far as I'm aware, the official admin guide is completely oblivious to internals like this. Users may be able to find suggestions by perusing mailing list archives, or by trying gluster 2.x wiki documentation (which may be stale), or some blog postings.
On 03/05/2013 08:51 AM, Jason Villalta wrote:> I think some people would benefit from more recipe examples, > especially around using with Virtualization. (KVM, Openstack, > Cloudstack, OpenNebula). I know it is not the job of Gluster to tell > people how to configure these other systems but maybe a quick list of > will work, sorta work and not work at all. I know from my personal > experience I have spent a lot time testing different > configuration/combinations of these virtualization systems and Gluster > before find what seems to work acceptably for my uses.I don't suppose you added the results of your testing to the wiki somewhere?> Admittedly most of the issues are around this stubborn > FUSE/Direct-IO/ODirect support in most distributions (Ubuntu/CentOS). > I think if these FUSE mounting IO issues were resolve/(made better) > people would be A LOT happier to use Gluster. I think there will > always be weird hardware related issues that will crop up in Gluster > that all the documentation in the world can't fix but getting a clear > supported path for (Cloud/KVM/Gluster/FS of choice) would > help tremendously or at least allow people to determine if Gluster is > the right fit.That O_DIRECT issue has been resolved in EL (aka RHEL, CentOS, Scientific Linux, etc) 6.3.> > My 2 cents. > >Thanks. On that note, as well, I'd like to remind everybody that open source isn't about being a producer or a consumer. If you know something, share it. Add it to the wiki, or blog, email, Facebook, Google+, etc. You don't have to be a coder to be part of any open source project. I haven't written anything in C for over 20 years, but I'm still a part of the Gluster Community. I'm also becoming more involved in puppet, logstash, and OpenStack. I know that before I got involved with Gluster I always felt like there was "them" that made the software, and "us" that used whatever they gave us. I knew I didn't have the time to contribute code so I quietly used free software. Since I've gotten involved, I've realized how easy it is. Get involved. It's fun. :)> On Tue, Mar 5, 2013 at 11:33 AM, Joe Julian <joe at julianfamily.org > <mailto:joe at julianfamily.org>> wrote: > > It comes up on this list from time to time that there's not > sufficient documentation on troubleshooting. I assume that's what > some people mean when they refer to disappointing documentation as > the current documentation is far more detailed and useful than it > was 3 years ago when I got started. I'm not really sure what's > being asked for here, nor am I sure how one would document how to > troubleshoot. In my mind, if there's a trouble that can be > documented with a clear path to resolution, then a bug report > should be filed and that should be fixed. Any other cases that > cannot be coded for require human intervention and are already > documented. > > Please tell me your thoughts. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130305/d210d888/attachment.html>
On 03/05/2013 05:33 PM, Joe Julian wrote:> > It comes up on this list from time to time that there's not sufficient documentation on > troubleshooting. I assume that's what some people mean when they refer to disappointing > documentation as the current documentation is far more detailed and useful than it was 3 years ago > when I got started. I'm not really sure what's being asked for here, nor am I sure how one would > document how to troubleshoot. In my mind, if there's a trouble that can be documented with a clear > path to resolution, then a bug report should be filed and that should be fixed. Any other cases that > cannot be coded for require human intervention and are already documented. > > Please tell me your thoughts.There is no online documentation for v3.3. Every google search points to http://gluster.org/community/documentation/index.php/Gluster_3.2_Filesystem_Administration_Guide There is an Admin guide, but it's in pdf and html gzipped? Is it fun?:) Also I'm afraid this guide is RH specific. For example there is this section: "If you are using Samba to access GlusterFS FUSE mount, then POSIX ACLs are enabled by default. Samba has been compiled with the --with-acl-support option, so no special flags are required when accessing or mounting a Samba share." How do you know about that? Also there is no good cifs documentation. I suggest adding a full smb.conf. I asked here in the list for tuning help, with no answer, then later I found the solution by google. For me max/min protocol = SMB2 was the trick. But there can be locking issues (for example Adobe Premiere and Photoshop). I don't blame the list, but the online, searchable documentation should be more helpful. However of course this is a samba general issue at the same time. And I think this guide should be used more or less by experts, not absolutely beginners. If someone is looking for the right information, it's hard to find, because too verbose: "Chapter 6. Accessing Data - Setting Up GlusterFS Client 28 6.3.1.2. Manually Mounting Volumes Using CIFS You can manually mount Gluster volumes using CIFS on Microsoft Windows-based client machines. To manually mount a Gluster volume using CIFS 1. Using Windows Explorer, choose Tools > Map Network Drive... from the menu. The Map Network Drive window appears. 2. Choose the drive letter using the Drive drop-down list. 3. Click Browse , select the volume to map to the network drive, and click OK . 4. Click Finish. The network drive (mapped to the volume) appears in the Computer window. Alternatively, to manually mount a Gluster volume using CIFS. ? Click Start > Run and enter the following: \\SERVERNAME\VOLNAME For example: \\server1\test-volume" This whol cifs section is obviously only an example:) Cheers, tamas
On 06/03/13 03:33, Joe Julian wrote:> It comes up on this list from time to time that there's not sufficient > documentation on troubleshooting. I assume that's what some people mean > when they refer to disappointing documentation as the current > documentation is far more detailed and useful than it was 3 years ago > when I got started. I'm not really sure what's being asked for here, nor > am I sure how one would document how to troubleshoot. In my mind, if > there's a trouble that can be documented with a clear path to > resolution, then a bug report should be filed and that should be fixed. > Any other cases that cannot be coded for require human intervention and > are already documented. > > Please tell me your thoughts.So I've asked things like this and received no help: http://www.gluster.org/pipermail/gluster-users/2012-June/033398.html I've found answers relating to old versions of glusterfs and asked if they still apply, to no response: http://www.gluster.org/pipermail/gluster-users/2012-June/033478.html I've seen other people ask about split-brain issues to no response: http://thr3ads.net/gluster-users/2012/06/1962160-managing-split-brain-in-3.3 I've tried to figure things out about the self-heal daemon, and found no documentation (but at least received some responses on list): http://thr3ads.net/gluster-users/2012/05/1919111-3.3-beta3-When-should-the-self-heal-daemon-be-triggered I found that the official documentation for the 3.2->3.3 upgrade path was in fact erroneous and did not work: http://www.mail-archive.com/gluster-users at gluster.org/msg10890.html (This turned out to be because the blog the wiki had copied the commands from had turned hypens into mdashes) I've gone looking for info on the options available to simply mounting a glusterfs volume, and would you believe the latest version available with docs on the website is 3.2, not 3.3? (At least as far as a google search is concerned): https://www.google.com.au/search?q=glusterfs+mount+volume So, yeah, from my point of view GlusterFS's documentation fails at covering (a) simple, day-to-day actions, (b) upgrade paths, (c) handling failures. I note there is also this site: http://community.gluster.org/t/glusterfs/ It's full of people asking questions, and almost completely empty of people receiving useful replies. I know it's a *community* driven thing, but still.. it doesn't keep a good impression of community support. The mailing list is often a source of good, useful information, and I appreciate everyone's help. Thanks for continuing to provide that support! It would just be great if some of the collective's knowledge was available online in an easily-searched manner, and kept up to date. Cheers, Toby