Each of the two bricks has just one XFS file system w/ inode64 enabled on a 10TB LVM LV. Each of the volumes is less than 40% full and inode counts look reasonable. I'm working to get a test environment going so I can reproduce this off-production and add the trace translator. I've sent in a bunch of trace data and opened a bug: bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1040 -Brian -- Brian Smith Senior Systems Administrator IT Research Computing, University of South Florida 4202 E. Fowler Ave. ENB204 Office Phone: +1 813 974-1467 Organization URL: rc.usf.edu> Date: Wed, 30 Jun 2010 15:33:24 -0400 > From: Brian Smith <brs at usf.edu> > Subject: Re: [Gluster-users] Revisit: FORTRAN Codes and File I/O > To: Harshavardhana <harsha at gluster.com> > Cc: Gluster General Discussion List <gluster-users at gluster.org> > Message-ID: <1277926404.2687.60.camel at localhost.localdomain> > Content-Type: text/plain; charset="UTF-8" > > Spoke too soon. Same problem occurs minus all performance translators. > Debug logs on the server show > > [2010-06-30 15:30:54] D [server-protocol.c:2104:server_create_cbk] > server-tcp: create(/b/brs/Si/CHGCAR) inode (ptr=0x2aaab00e05b0, > ino=2159011921, gen=5488651098262601749) found conflict > (ptr=0x2aaab40cca00, ino=2159011921, gen=5488651098262601749) > [2010-06-30 15:30:54] D [server-resolve.c:386:resolve_entry_simple] > server-tcp: inode (pointer: 0x2aaab40cca00 ino:2159011921) found for > path (/b/brs/Si/CHGCAR) while type is RESOLVE_NOT > [2010-06-30 15:30:54] D [server-protocol.c:2132:server_create_cbk] > server-tcp: 72: CREATE (null) (0) ==> -1 (File exists) > > -Brian > > -- > Brian Smith > Senior Systems Administrator > IT Research Computing, University of South Florida > 4202 E. Fowler Ave. ENB204 > Office Phone: +1 813 974-1467 > Organization URL: rc.usf.edu > > > On Wed, 2010-06-30 at 13:06 -0400, Brian Smith wrote: > > I received these in my debug output during a run that failed: > > > > [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: > > unexpected offset (8192 != 1062) resetting > > [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: > > unexpected offset (8192 != 1062) resetting > > [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: > > unexpected offset (8192 != 1062) resetting > > [2010-06-30 12:34:25] D [read-ahead.c:468:ra_readv] readahead: > > unexpected offset (8192 != 1062) resetting > > > > I disabled the read-ahead translator as well as the three other > > performance translators commented out in my vol file (I'm on GigE; the > > docs say I can still reach link max anyway) and my processes appear to > > be running smoothly. I'll go ahead and submit the bug report with > > tracing enabled as well. > > > > -Brian > > > > > Date: Wed, 30 Jun 2010 21:17:59 -0400 > From: Jeff Darcy <jdarcy at redhat.com> > Subject: Re: [Gluster-users] Revisit: FORTRAN Codes and File I/O > To: gluster-users at gluster.org > Message-ID: <4C2BECC7.5010402 at redhat.com> > Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" > > On 06/30/2010 03:33 PM, Brian Smith wrote: > > Spoke too soon. Same problem occurs minus all performance translators. > > Debug logs on the server show > > > > [2010-06-30 15:30:54] D [server-protocol.c:2104:server_create_cbk] > > server-tcp: create(/b/brs/Si/CHGCAR) inode (ptr=0x2aaab00e05b0, > > ino=2159011921, gen=5488651098262601749) found conflict > > (ptr=0x2aaab40cca00, ino=2159011921, gen=5488651098262601749) > > [2010-06-30 15:30:54] D [server-resolve.c:386:resolve_entry_simple] > > server-tcp: inode (pointer: 0x2aaab40cca00 ino:2159011921) found for > > path (/b/brs/Si/CHGCAR) while type is RESOLVE_NOT > > [2010-06-30 15:30:54] D [server-protocol.c:2132:server_create_cbk] > > server-tcp: 72: CREATE (null) (0) ==> -1 (File exists) > > > The first line almost looks like a create attempt for a file that > already exists at the server. The second and third lines look like *yet > another* create attempt, failing this time before the request is even > passed to the next translator. This might be a good time to drag out > the debug/trace translator, and sit it on top of brick1 to watch the > create calls. That will help nail down the exact sequence of events as > the server sees them, so we don't go looking in the wrong places. It > might even be useful to do the same on the client side, but perhaps not > yet. Instructions are here: > > gluster.com/community/documentation/index.php/Translators/debug/trace > > In the mean time, to further identity which code paths are most likely > to be relevant, it would be helpful to know a couple more things. > > (1) Is each storage/posix volume using just one local filesystem, or is > it possible that the underlying directory tree spans more than one? > This could lead to inode-number duplication, which requires extra handling. > > (2) Are either of the server-side volumes close to being full? This > could result in creating an extra "linkfile" on the subvolume/server > where we'd normally create the file, pointing to where we really created > it due to space considerations. > > ------------------------------ > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > gluster.org/cgi-bin/mailman/listinfo/gluster-users > > > End of Gluster-users Digest, Vol 27, Issue 1 > ********************************************
Harshavardhana
2010-Jul-01 20:18 UTC
[Gluster-users] Gluster-users Digest, Vol 27, Issue 1
On 07/01/2010 12:29 PM, Brian Smith wrote:> Each of the two bricks has just one XFS file system w/ inode64 enabled > on a 10TB LVM LV. Each of the volumes is less than 40% full and inode > counts look reasonable. > > I'm working to get a test environment going so I can reproduce this > off-production and add the trace translator. I've sent in a bunch of > trace data and opened a bug: > bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1040 > > -Brian > > >Thanks a lot Brian, your help is much appreciated. Regards -- Harshavardhana Gluster Inc - gluster.com +1(408)-770-1887, Ext-113 +1(408)-480-1730