Markus Fröhlich
2011-Jun-06 15:03 UTC
[Gluster-users] uninterruptible processes writing to glusterfs share
hi! sometimes we've on some client-servers hanging uninterruptible processes ("ps aux" stat is on "D" ) and on one the CPU wait I/O grows within some minutes to 100%. you are not able to kill such processes - also "kill -9" doesnt work - when you connect via "strace" to such an process, you wont see anything and you cannot detach it again. there are only two possibilities: killing the glusterfs process (umount GFS share) or rebooting the server. the only log entry I found, was on one client - just a single line: [2011-06-06 10:44:18.593211] I [afr-common.c:581:afr_lookup_collect_xattr] 0-office-data-replicate-0: data self-heal is pending for /pc-partnerbet-public/Promotionaktionen/Mailakquise_2009/Webmaster_2010/HTML/bilder/Thumbs.db. one of the client-servers is a samba-server, the other one a backup-server based on rsync with millions of small files. gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0 and here are the configs from server and client: server config "/etc/glusterd/vols/office-data/office-data.gfs-01-01.GFS-office-data02.vol": volume office-data-posix type storage/posix option directory /GFS/office-data02 end-volume volume office-data-access-control type features/access-control subvolumes office-data-posix end-volume volume office-data-locks type features/locks subvolumes office-data-access-control end-volume volume office-data-io-threads type performance/io-threads subvolumes office-data-locks end-volume volume office-data-marker type features/marker option volume-uuid 3c6e633d-a0bb-4c52-8f05-a2db9bc9c659 option timestamp-file /etc/glusterd/vols/office-data/marker.tstamp option xtime off option quota off subvolumes office-data-io-threads end-volume volume /GFS/office-data02 type debug/io-stats option latency-measurement off option count-fop-hits off subvolumes office-data-marker end-volume volume office-data-server type protocol/server option transport-type tcp option auth.addr./GFS/office-data02.allow * subvolumes /GFS/office-data02 end-volume -------------- client config "/etc/glusterd/vols/office-data/office-data-fuse.vol": volume office-data-client-0 type protocol/client option remote-host gfs-01-01 option remote-subvolume /GFS/office-data02 option transport-type tcp end-volume volume office-data-replicate-0 type cluster/replicate subvolumes office-data-client-0 end-volume volume office-data-write-behind type performance/write-behind subvolumes office-data-replicate-0 end-volume volume office-data-read-ahead type performance/read-ahead subvolumes office-data-write-behind end-volume volume office-data-io-cache type performance/io-cache subvolumes office-data-read-ahead end-volume volume office-data-quick-read type performance/quick-read subvolumes office-data-io-cache end-volume volume office-data-stat-prefetch type performance/stat-prefetch subvolumes office-data-quick-read end-volume volume office-data type debug/io-stats option latency-measurement off option count-fop-hits off subvolumes office-data-stat-prefetch end-volume -- Mit freundlichen Gr?ssen Markus Fr?hlich Techniker ________________________________________________________ Xidras GmbH Stockern 47 3744 Stockern Austria Tel: +43 (0) 2983 201 30503 Fax: +43 (0) 2983 201 305039 Email: markus.froehlich at xidras.com Web: http://www.xidras.com FN 317036 f | Landesgericht Krems | ATU64485024 ________________________________________________________________________________ VERTRAULICHE INFORMATIONEN! Diese eMail enth?lt vertrauliche Informationen und ist nur f?r den berechtigten Empf?nger bestimmt. Wenn diese eMail nicht f?r Sie bestimmt ist, bitten wir Sie, diese eMail an uns zur?ckzusenden und anschlie?end auf Ihrem Computer und Mail-Server zu l?schen. Solche eMails und Anlagen d?rfen Sie weder nutzen, noch verarbeiten oder Dritten zug?nglich machen, gleich in welcher Form. Wir danken f?r Ihre Kooperation! CONFIDENTIAL! This email contains confidential information and is intended for the authorised recipient only. If you are not an authorised recipient, please return the email to us and then delete it from your computer and mail-server. You may neither use nor edit any such emails including attachments, nor make them accessible to third parties in any manner whatsoever. Thank you for your cooperation ________________________________________________________________________________
Mohit Anchlia
2011-Jun-06 19:13 UTC
[Gluster-users] uninterruptible processes writing to glusterfs share
Is there anything in the server logs? Does it follow any particular pattern before going in this mode? Did you upgrade Gluster or is this new install? 2011/6/6 Markus Fr?hlich <markus.froehlich at xidras.com>:> hi! > > sometimes we've on some client-servers hanging uninterruptible processes > ("ps aux" stat is on "D" ) and on one the CPU wait I/O grows within some > minutes to 100%. > you are not able to kill such processes - also "kill -9" doesnt work - when > you connect via "strace" to such an process, you wont see anything and you > cannot detach it again. > > there are only two possibilities: > killing the glusterfs process (umount GFS share) or rebooting the server. > > the only log entry I found, was on one client - just a single line: > [2011-06-06 10:44:18.593211] I [afr-common.c:581:afr_lookup_collect_xattr] > 0-office-data-replicate-0: data self-heal is pending for > /pc-partnerbet-public/Promotionaktionen/Mailakquise_2009/Webmaster_2010/HTML/bilder/Thumbs.db. > > one of the client-servers is a samba-server, the other one a backup-server > based on rsync with millions of small files. > > gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0 > > and here are the configs from server and client: > server config > "/etc/glusterd/vols/office-data/office-data.gfs-01-01.GFS-office-data02.vol": > volume office-data-posix > ? ?type storage/posix > ? ?option directory /GFS/office-data02 > end-volume > > volume office-data-access-control > ? ?type features/access-control > ? ?subvolumes office-data-posix > end-volume > > volume office-data-locks > ? ?type features/locks > ? ?subvolumes office-data-access-control > end-volume > > volume office-data-io-threads > ? ?type performance/io-threads > ? ?subvolumes office-data-locks > end-volume > > volume office-data-marker > ? ?type features/marker > ? ?option volume-uuid 3c6e633d-a0bb-4c52-8f05-a2db9bc9c659 > ? ?option timestamp-file /etc/glusterd/vols/office-data/marker.tstamp > ? ?option xtime off > ? ?option quota off > ? ?subvolumes office-data-io-threads > end-volume > > volume /GFS/office-data02 > ? ?type debug/io-stats > ? ?option latency-measurement off > ? ?option count-fop-hits off > ? ?subvolumes office-data-marker > end-volume > > volume office-data-server > ? ?type protocol/server > ? ?option transport-type tcp > ? ?option auth.addr./GFS/office-data02.allow * > ? ?subvolumes /GFS/office-data02 > end-volume > > > -------------- > client config "/etc/glusterd/vols/office-data/office-data-fuse.vol": > volume office-data-client-0 > ? ?type protocol/client > ? ?option remote-host gfs-01-01 > ? ?option remote-subvolume /GFS/office-data02 > ? ?option transport-type tcp > end-volume > > volume office-data-replicate-0 > ? ?type cluster/replicate > ? ?subvolumes office-data-client-0 > end-volume > > volume office-data-write-behind > ? ?type performance/write-behind > ? ?subvolumes office-data-replicate-0 > end-volume > > volume office-data-read-ahead > ? ?type performance/read-ahead > ? ?subvolumes office-data-write-behind > end-volume > > volume office-data-io-cache > ? ?type performance/io-cache > ? ?subvolumes office-data-read-ahead > end-volume > > volume office-data-quick-read > ? ?type performance/quick-read > ? ?subvolumes office-data-io-cache > end-volume > > volume office-data-stat-prefetch > ? ?type performance/stat-prefetch > ? ?subvolumes office-data-quick-read > end-volume > > volume office-data > ? ?type debug/io-stats > ? ?option latency-measurement off > ? ?option count-fop-hits off > ? ?subvolumes office-data-stat-prefetch > end-volume > > > ?-- Mit freundlichen Gr?ssen > > Markus Fr?hlich > Techniker > > ________________________________________________________ > > Xidras GmbH > Stockern 47 > 3744 Stockern > Austria > > Tel: ? ? +43 (0) 2983 201 30503 > Fax: ? ? +43 (0) 2983 201 305039 > Email: ? markus.froehlich at xidras.com > Web: ? ?http://www.xidras.com > > FN 317036 f | Landesgericht Krems | ATU64485024 > > ________________________________________________________________________________ > > VERTRAULICHE INFORMATIONEN! > Diese eMail enth?lt vertrauliche Informationen und ist nur f?r den > berechtigten Empf?nger bestimmt. Wenn diese eMail nicht f?r Sie bestimmt > ist, bitten wir Sie, diese eMail an uns zur?ckzusenden und anschlie?end > auf Ihrem Computer und Mail-Server zu l?schen. Solche eMails und Anlagen > d?rfen Sie weder nutzen, noch verarbeiten oder Dritten zug?nglich > machen, gleich in welcher Form. > Wir danken f?r Ihre Kooperation! > > CONFIDENTIAL! > This email contains confidential information and is intended for the > authorised recipient only. If you are not an authorised recipient, > please return the email to us and then delete it from your computer > and mail-server. You may neither use nor edit any such emails including > attachments, nor make them accessible to third parties in any manner > whatsoever. > Thank you for your cooperation > > ________________________________________________________________________________ > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >
Tomasz Chmielewski
2011-Jun-07 07:55 UTC
[Gluster-users] uninterruptible processes writing to glusterfs share
On 06.06.2011 16:03, Markus Fr?hlich wrote:> hi! > > sometimes we've on some client-servers hanging uninterruptible processes > ("ps aux" stat is on "D" ) and on one the CPU wait I/O grows within some > minutes to 100%. > you are not able to kill such processes - also "kill -9" doesnt work - > when you connect via "strace" to such an process, you wont see anything > and you cannot detach it again. > > there are only two possibilities: > killing the glusterfs process (umount GFS share) or rebooting the server.> gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0I've seen such "hangs" as well with 3.2.0; 3.1.4 works fine. -- Tomasz Chmielewski http://wpkg.org