Harry Mangalam
2012-Aug-02 01:20 UTC
[Gluster-users] gluster server overload; recovers, now "Transport endpoint is not connected" for some files
Hi All, I'm using 3.3-1 on IPoIB servers that are serving to a native gluster clients mostly over GbE: $ gluster volume info Volume Name: gli Type: Distribute Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4 Status: Started Number of Bricks: 4 Transport-type: tcp,rdma Bricks: Brick1: pbs1ib:/bducgl Brick2: pbs2ib:/bducgl Brick3: pbs3ib:/bducgl Brick4: pbs4ib:/bducgl Options Reconfigured: performance.write-behind-window-size: 1024MB performance.flush-behind: on performance.cache-size: 268435456 nfs.disable: off performance.io-thread-count: 64 performance.quick-read: on performance.io-cache: on Everything was working fine until a few hours ago when a user noticed that gluster perf was very slow. Turns out that one (and only one) of the 4 gluster servers had a load of >30 and the glusterfsd was consuming >3GB RAM: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20953 root 20 0 3862m 3.2g 1240 R 382 20.7 19669:20 glusterfsd 443 root 20 0 0 0 0 S 4 0.0 6017:19 md127_raid5 ... The init.d scripts are meant for RH distros so wouldn't initiate a restart on this Ubuntu server and rather than go thru the re-write process then, I killed off the offending glusterfsd and restarted it manually (perhaps ill -advised). The node immediately re-joined the others: root at pbs1:~ 730 $ gluster peer status Number of Peers: 3 Hostname: pbs4ib Uuid: 2a593581-bf45-446c-8f7c-212c53297803 State: Peer in Cluster (Connected) Hostname: pbs2ib Uuid: 26de63bd-c5b7-48ba-b81d-5d77a533d077 State: Peer in Cluster (Connected) Hostname: pbs3ib Uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42 State: Peer in Cluster (Connected) and the volume did appear to be re-constituted. The load did indeed decrease almost to zero and responsiveness immediately improved but almost immediately users started reporting locked or missing files. The files appear to be on the bricks as you might guess, but they appear to be locked on the client, even when they're not visible per se: $ cp -vR DeNovoGenome/* /gl/iychang/DeNovoGenome/ `DeNovoGenome/contigs_k51.fa' -> `/gl/iychang/DeNovoGenome/contigs_k51.fa' `DeNovoGenome/L7-17-1-Sequences.fastq' -> `/gl/iychang/DeNovoGenome/L7-17-1-Sequences.fastq' `DeNovoGenome/L7-9-1-Sequences.fastq' -> `/gl/iychang/DeNovoGenome/L7-9-1-Sequences.fastq' `DeNovoGenome/y_lipolitica.1.ebwt' -> `/gl/iychang/DeNovoGenome/y_lipolitica.1.ebwt' `DeNovoGenome/y_lipolitica.2.ebwt' -> `/gl/iychang/DeNovoGenome/y_lipolitica.2.ebwt' cp: cannot create regular file `/gl/iychang/DeNovoGenome/y_lipolitica.2.ebwt': Transport endpoint is not connected `DeNovoGenome/y_lipolitica.3.ebwt' -> `/gl/iychang/DeNovoGenome/y_lipolitica.3.ebwt' `DeNovoGenome/y_lipolitica.4.ebwt' -> `/gl/iychang/DeNovoGenome/y_lipolitica.4.ebwt' `DeNovoGenome/y_lipolitica.fa' -> `/gl/iychang/DeNovoGenome/y_lipolitica.fa' `DeNovoGenome/y_lipolitica.rev.1.ebwt' -> `/gl/iychang/DeNovoGenome/y_lipolitica.rev.1.ebwt' `DeNovoGenome/y_lipolitica.rev.2.ebwt' -> `/gl/iychang/DeNovoGenome/y_lipolitica.rev.2.ebwt' In the case of the problem file above, $ ls -l /gl/iychang/DeNovoGenome/y_lipolitica.2.ebwt ls: /gl/iychang/DeNovoGenome/y_lipolitica.2.ebwt: No such file or directory it doesn't appear to exist on the gluster fs (/gl) but the replacement file can't be copied there. That file DOES exist on the bricks tho: root at pbs3://bducgl/iychang/DeNovoGenome 295 $ ls -l y_lipolitica.2.ebwt -rw-rw-r-- 2 7424 7424 2528956 2012-08-01 17:38 y_lipolitica.2.ebwt so I think that the problem is a messed up volume index or something like it. How do you resolve this issue? I read that gluster 3.3 does self healing and in fact we had an instance where an entire gluster node when down for several hours (at night, in a low-usage environment) and when we brought it back, it immediately re-joined the gluster group and we did not see any such problems (altho it may have been during an idle time). the client log is filled with lines like this: [2012-08-01 18:00:11.976198] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gli-client-2: remote operation failed: Transport endpoint is not connected. Path: /fsaizpoy/input/emim-tf2n/consecutive/8thcSi/output/libc.so.6 (00000000-0000-0000-0000-000000000000) [2012-08-01 18:00:50.809517] W [client3_1-fops.c:763:client3_1_statfs_cbk] 0-gli-client-2: remote operation failed: Transport endpoint is not connected [2012-08-01 18:01:30.885114] W [client3_1-fops.c:763:client3_1_statfs_cbk] 0-gli-client-2: remote operation failed: Transport endpoint is not connected [2012-08-01 18:02:10.964532] W [client3_1-fops.c:763:client3_1_statfs_cbk] 0-gli-client-2: remote operation failed: Transport endpoint is not connected In a best guess, I initiated a 'fix-layout' and while the 'status' printout says that no files have been rebalanced (expected, since there have been no bricks added), there have been lots of fixes: 2012-08-01 18:04:31.149116] I [dht-common.c:2337:dht_setxattr] 0-gli-dht: fixing the layout of /nlduong/benchmarks/ALPBench/Face_Rec/data/EBGM_CSUNG [2012-08-01 18:04:31.462275] I [dht-common.c:2337:dht_setxattr] 0-gli-dht: fixing the layout of /nlduong/benchmarks/ALPBench/Face_Rec/data/EBGM_CSU_FG [2012-08-01 18:04:31.778421] I [dht-common.c:2337:dht_setxattr] 0-gli-dht: fixing the layout of /nlduong/benchmarks/ALPBench/Face_Rec/data/csuScrapShots [2012-08-01 18:04:31.885009] I [dht-common.c:2337:dht_setxattr] 0-gli-dht: fixing the layout of /nlduong/benchmarks/ALPBench/Face_Rec/data/csuScrapShots/normSep2002sfi [2012-08-01 18:04:32.337981] I [dht-common.c:2337:dht_setxattr] 0-gli-dht: fixing the layout of /nlduong/benchmarks/ALPBench/Face_Rec/data/csuScrapShots/source [2012-08-01 18:04:32.441383] I [dht-common.c:2337:dht_setxattr] 0-gli-dht: fixing the layout of /nlduong/benchmarks/ALPBench/Face_Rec/data/csuScrapShots/source/pgm [2012-08-01 18:04:32.558827] I [dht-common.c:2337:dht_setxattr] 0-gli-dht: fixing the layout of /nlduong/benchmarks/ALPBench/Face_Rec/data/faceGraphsWiskott [2012-08-01 18:04:32.617823] I [dht-common.c:2337:dht_setxattr] 0-gli-dht: fixing the layout of /nlduong/benchmarks/ALPBench/Face_Rec/data/novelGraphsWiskott Unfortunately, I'm also seeing this: [2012-08-01 18:07:26.104859] I [dht-layout.c:593:dht_layout_normalize] 0-gli-dht: found anomalies in /nlduong/benchmarks/SPEC2K6-org/benchspec/CPU2006/403.gcc/data/test/input. holes=1 overlaps=0 [2012-08-01 18:07:26.104910] W [dht-selfheal.c:875:dht_selfheal_directory] 0-gli-dht: 1 subvolumes down -- not fixing [2012-08-01 18:07:26.104996] I [dht-common.c:2337:dht_setxattr] 0-gli-dht: fixing the layout of /nlduong/benchmarks/SPEC2K6-org/benchspec/CPU2006/403.gcc/data/test/input [2012-08-01 18:07:26.189403] I [dht-layout.c:593:dht_layout_normalize] 0-gli-dht: found anomalies in /nlduong/benchmarks/SPEC2K6-org/benchspec/CPU2006/403.gcc/data/test/output. holes=1 overlaps=0 [2012-08-01 18:07:26.189457] W [dht-selfheal.c:875:dht_selfheal_directory] 0-gli-dht: 1 subvolumes down -- not fixing which implies that some of the errors are not fixable. Is there a best-practices solution for this problem? I suspect this is one of the most common problems to affect an operating gluster fs. hjm -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
Joe Landman
2012-Aug-02 02:25 UTC
[Gluster-users] gluster server overload; recovers, now "Transport endpoint is not connected" for some files
On 08/01/2012 09:20 PM, Harry Mangalam wrote: [...]> which implies that some of the errors are not fixable. > > Is there a best-practices solution for this problem? I suspect this > is one of the most common problems to affect an operating gluster fs.This is one of the issues that caused us to develop some tools to find missing files, and add them back into the file system. We've had to do this with 3.2.x series. Haven't had many 3.3 deployments, I had hoped that this would have been fixed. Search back in the archives for some of our tools. Last year, June/July time period. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615