anthony garnier
2012-Apr-27 11:41 UTC
[Gluster-users] remote operation failed: No space left on device
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120427/3c152c36/attachment.html>
Gerald Brandt
2012-Apr-27 11:51 UTC
[Gluster-users] remote operation failed: No space left on device
Hi, 1. What version of GlusterFS are you running? 2. Do an lsof | grep users98. Do you see a lot of files in the (DELETED) state? Gerald ----- Original Message -----> From: "anthony garnier" <sokar6012 at hotmail.com> > To: gluster-users at gluster.org > Sent: Friday, April 27, 2012 6:41:43 AM > Subject: [Gluster-users] remote operation failed: No space left on device > > > > > > > Hi all, > > > I've got an issue , it's seems that the size reported by df -h grows > indefinitely. Any help would be appreciated. > > some details : > On the client : > > yval9000:/users98 # df -h . > Filesystem Size Used Avail Use% Mounted on > ylal3510:/poolsave/yval9000 > 1.7T 1.7T 25G 99% /users98 > > yval9000:/users98 # du -ch . > 5.1G /users98 > > > My logs are full of : > [2012-04-27 12:14:32.402972] I > [client3_1-fops.c:683:client3_1_writev_cbk] 0-poolsave-client-1: > remote operation failed: No space left on device > [2012-04-27 12:14:32.426964] I > [client3_1-fops.c:683:client3_1_writev_cbk] 0-poolsave-client-1: > remote operation failed: No space left on device > [2012-04-27 12:14:32.439424] I > [client3_1-fops.c:683:client3_1_writev_cbk] 0-poolsave-client-1: > remote operation failed: No space left on device > [2012-04-27 12:14:32.441505] I > [client3_1-fops.c:683:client3_1_writev_cbk] 0-poolsave-client-0: > remote operation failed: No space left on device > > > > This is my volume config : > > Volume Name: poolsave > Type: Distributed-Replicate > Status: Started > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp > Bricks: > Brick1: ylal3510:/users3/poolsave > Brick2: ylal3530:/users3/poolsave > Brick3: ylal3520:/users3/poolsave > Brick4: ylal3540:/users3/poolsave > Options Reconfigured: > nfs.enable-ino32: off > features.quota-timeout: 30 > features.quota: off > performance.cache-size: 6GB > network.ping-timeout: 60 > performance.cache-min-file-size: 1KB > performance.cache-max-file-size: 4GB > performance.cache-refresh-timeout: 2 > nfs.port: 2049 > performance.io-thread-count: 64 > diagnostics.latency-measurement: on > diagnostics.count-fop-hits: on > > > Space left on servers : > > ylal3510:/users3 # df -h . > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/users-users3vol > 858G 857G 1.1G 100% /users3 > ylal3510:/users3 # du -ch /users3 | grep total > 129G total > --- > > ylal3530:/users3 # df -h . > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/users-users3vol > 858G 857G 1.1G 100% /users3 > ylal3530:/users3 # du -ch /users3 | grep total > 129G total > --- > > ylal3520:/users3 # df -h . > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/users-users3vol > 858G 835G 24G 98% /users3 > ylal3520:/users3 # du -ch /users3 | grep total > 182G total > --- > > ylal3540:/users3 # df -h . > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/users-users3vol > 858G 833G 25G 98% /users3 > ylal3540:/users3 # du -ch /users3 | grep total > 181G total > > > This issue appears after those 2 scripts running during 2 weeks : > > test_save.sh is executed each hour,it takes a bunch of data to > compress (dest : > REP_SAVE_TEMP) and then move it in a folder (REP_SAVE) that the > netback.sh > script will scan each 30 min > > #!/usr/bin/ksh > # > ________________________________________________________________________ > # | > # Nom test_save.sh > # > ____________|___________________________________________________________ > # | > # Description | test GlusterFS > # > ____________|___________________________________________________________ > > UNIXSAVE=/users98/test > REP_SAVE_TEMP=${UNIXSAVE}/tmp > REP_SAVE=${UNIXSAVE}/gluster > LOG=/users/glusterfs_test > > > f_tar_mv() > { > echo "\n" > ARCHNAME=${REP_SAVE_TEMP}/`date +%d-%m-%H-%M`_${SUBNAME}.tar > > tar -cpvf ${ARCHNAME} ${REPERTOIRE} > > echo "creation of ${ARCHNAME}" > > > # mv ${REP_SAVE_TEMP}/*_${SUBNAME}.tar ${REP_SAVE} > mv ${REP_SAVE_TEMP}/* ${REP_SAVE} > echo "Moving archive in ${REP_SAVE} " > echo "\n" > > return $? > } > > REPERTOIRE="/users2/" > SUBNAME="test_glusterfs_save" > f_tar_mv >$LOG/save_`date +%d-%m-%Y-%H-%M`.log 2>&1 > > > #!/usr/bin/ksh > # > ________________________________________________________________________ > # | > # Nom netback.sh > # > ____________|___________________________________________________________ > # | > # Description | Sauvegarde test GlusterFS > # > ____________|___________________________________________________________ > > UNIXSAVE=/users98/test > REP_SAVE_TEMP=${UNIXSAVE}/tmp > REP_SAVE=${UNIXSAVE}/gluster > LOG=/users/glusterfs_test > > f_net_back() > { > if [[ `find ${REP_SAVE} -type f | wc -l` -eq 0 ]] > then > echo "nothing to save"; > else > echo "Simulation netbackup, tar in /dev/null" > tar -cpvf /dev/null ${REP_SAVE}/* > echo "deletion archive" > rm ${REP_SAVE}/* > > fi > return $? > } > > f_net_back >${LOG}/netback_`date +%d-%m-%H-%M`.log 2>&1 > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >
anthony garnier
2012-Apr-27 12:41 UTC
[Gluster-users] remote operation failed: No space left on device
After some research, I found what's going on:
seems that the glusterfs process has a lot of FD open on each brick :
lsof -n -P | grep deleted
ksh 644 cdui 2w REG 253,10 952
190 /users/mmr00/log/2704mmr.log (deleted)
glusterfs 2357 root 10u REG 253,6 0
28 /tmp/tmpfnMwo2I (deleted)
glusterfs 2361 root 10u REG 253,6 0
35 /tmp/tmpfnBlK2I (deleted)
glusterfs 2365 root 10u REG 253,6 0
41 /tmp/tmpfHZG51I (deleted)
glusterfs 2365 root 12u REG 253,6 1011
13 /tmp/tmpfPGJjje (deleted)
glusterfs 2365 root 13u REG 253,6 1013
20 /tmp/tmpf4ITi6m (deleted)
glusterfs 2365 root 17u REG 253,6 1012
25 /tmp/tmpfBwwE1h (deleted)
glusterfs 2365 root 18u REG 253,6 1011
43 /tmp/tmpfsoNSmV (deleted)
glusterfs 2365 root 19u REG 253,6 1011
19 /tmp/tmpfDmMruu (deleted)
glusterfs 2365 root 21u REG 253,6 1012
47 /tmp/tmpfE4SpVM (deleted)
glusterfs 2365 root 22u REG 253,6 1012
48 /tmp/tmpfHjjdXw (deleted)
glusterfs 2365 root 24u REG 253,6 1011
49 /tmp/tmpfwOoX6F (deleted)
glusterfs 2365 root 26u REG 253,13 13509969920
1829 /users3/poolsave/yval9000/test/tmp/23-04-18-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 27u REG 253,13 13538048000
1842 /users3/poolsave/yval9000/test/tmp/24-04-07-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 28u REG 253,13 13607956480
1737 /users3/poolsave/yval9000/test/tmp/22-04-01-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 30u REG 253,13 13519441920
1337 /users3/poolsave/yval9000/test/tmp/16-04-14-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 31u REG 253,13 13530081280
1342 /users3/poolsave/yval9000/test/tmp/16-04-16-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 34u REG 253,13 13559777280
1347 /users3/poolsave/yval9000/test/tmp/16-04-20-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 35u REG 253,13 13581772800
1352 /users3/poolsave/yval9000/test/tmp/16-04-23-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 36u REG 253,13 13513922560
1357 /users3/poolsave/yval9000/test/tmp/17-04-04-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 37u REG 253,13 13513338880
1362 /users3/poolsave/yval9000/test/tmp/17-04-07-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 38u REG 253,13 13520199680
1367 /users3/poolsave/yval9000/test/tmp/17-04-08-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 39u REG 253,13 13576509440
1372 /users3/poolsave/yval9000/test/tmp/18-04-00-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 40u REG 253,13 13591869440
1377 /users3/poolsave/yval9000/test/tmp/18-04-02-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 41u REG 253,13 13592371200
1382 /users3/poolsave/yval9000/test/tmp/18-04-04-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 42u REG 253,13 13512202240
1387 /users3/poolsave/yval9000/test/tmp/18-04-11-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 43u REG 253,13 13528012800
1392 /users3/poolsave/yval9000/test/tmp/18-04-19-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 44u REG 253,13 13547735040
1397 /users3/poolsave/yval9000/test/tmp/18-04-22-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 45u REG 253,13 13574236160
1402 /users3/poolsave/yval9000/test/tmp/19-04-03-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 46u REG 253,13 13553295360
1407 /users3/poolsave/yval9000/test/tmp/19-04-05-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 47u REG 253,13 13542963200
1416 /users3/poolsave/yval9000/test/tmp/19-04-13-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 48u REG 253,13 13556346880
1421 /users3/poolsave/yval9000/test/tmp/19-04-15-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 49u REG 253,13 13517445120
1426 /users3/poolsave/yval9000/test/tmp/19-04-16-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 50u REG 253,13 13515581440
1443 /users3/poolsave/yval9000/test/tmp/19-04-17-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 51u REG 253,13 13575065600
1448 /users3/poolsave/yval9000/test/tmp/20-04-05-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 52u REG 253,13 13578659840
1453 /users3/poolsave/yval9000/test/tmp/20-04-06-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 53u REG 253,13 13575741440
1458 /users3/poolsave/yval9000/test/tmp/20-04-10-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 54u REG 253,13 13580052480
1463 /users3/poolsave/yval9000/test/tmp/20-04-11-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 55u REG 253,13 13578004480
1468 /users3/poolsave/yval9000/test/tmp/20-04-15-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 56u REG 253,13 13597194240
1473 /users3/poolsave/yval9000/test/tmp/20-04-18-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 57u REG 253,13 13558159360
1478 /users3/poolsave/yval9000/test/tmp/20-04-20-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 58u REG 253,13 13583390720
1483 /users3/poolsave/yval9000/test/tmp/21-04-01-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 59u REG 253,13 13542072320
1504 /users3/poolsave/yval9000/test/tmp/21-04-10-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 60u REG 253,13 13563269120
1509 /users3/poolsave/yval9000/test/tmp/21-04-15-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 61u REG 253,13 13565573120
1514 /users3/poolsave/yval9000/test/tmp/21-04-16-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 62u REG 253,13 13570037760
1519 /users3/poolsave/yval9000/test/tmp/21-04-17-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 63u REG 253,13 13576038400
1711 /users3/poolsave/yval9000/test/tmp/21-04-18-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 64u REG 253,13 13599846400
1724 /users3/poolsave/yval9000/test/tmp/22-04-00-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 65u REG 253,13 13554862080
1746 /users3/poolsave/yval9000/test/tmp/22-04-03-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 67u REG 253,13 13537761280
1759 /users3/poolsave/yval9000/test/tmp/22-04-16-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 68u REG 253,13 13539768320
1772 /users3/poolsave/yval9000/test/tmp/22-04-18-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 69u REG 253,13 13539614720
1781 /users3/poolsave/yval9000/test/tmp/23-04-02-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 70u REG 253,13 13544693760
1794 /users3/poolsave/yval9000/test/tmp/23-04-05-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 71u REG 253,13 13488281600
1807 /users3/poolsave/yval9000/test/tmp/23-04-10-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 72u REG 253,13 13504133120
1816 /users3/poolsave/yval9000/test/tmp/23-04-11-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 73u REG 253,13 13488353280
1851 /users3/poolsave/yval9000/test/tmp/24-04-14-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 74u REG 253,13 13647964160
1864 /users3/poolsave/yval9000/test/tmp/24-04-15-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 75u REG 253,13 14598379520
1877 /users3/poolsave/yval9000/test/tmp/24-04-16-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 76u REG 253,13 13529159680
1886 /users3/poolsave/yval9000/test/tmp/25-04-02-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 77u REG 253,13 13544314880
1899 /users3/poolsave/yval9000/test/tmp/25-04-14-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 78u REG 253,13 13547100160
1912 /users3/poolsave/yval9000/test/tmp/25-04-15-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 79u REG 253,13 13549813760
1921 /users3/poolsave/yval9000/test/tmp/25-04-17-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 80u REG 253,13 13538140160
1934 /users3/poolsave/yval9000/test/tmp/25-04-23-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 81u REG 253,13 13538641920
1947 /users3/poolsave/yval9000/test/tmp/26-04-16-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 82u REG 253,13 13543915520
1956 /users3/poolsave/yval9000/test/tmp/26-04-17-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 83u REG 253,13 13551063040
1969 /users3/poolsave/yval9000/test/tmp/26-04-18-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 84u REG 253,13 13561364480
1982 /users3/poolsave/yval9000/test/tmp/26-04-20-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 85u REG 253,13 13486448640
1991 /users3/poolsave/yval9000/test/tmp/27-04-07-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 86u REG 253,13 13492623360
2004 /users3/poolsave/yval9000/test/tmp/27-04-08-00_test_glusterfs_save.tar
(deleted)
glusterfs 2365 root 87u REG 253,13 8968806400
2017 /users3/poolsave/yval9000/test/tmp/27-04-09-00_test_glusterfs_save.tar
(deleted)
ksh 18908 u347750 3u REG 253,6 28
45 /tmp/ast3c.bh3 (deleted)
Each Tar archive is around 15Go
Restarting the daemon seems to solve the problem :
ylal3510:/users3 # du -s -h .
129G .
ylal3510:/users3 # df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/users-users3vol
858G 130G 729G 16% /users3
But my main preoccupation is " Is it a normal behavior for glusterfs
?"
From: sokar6012 at hotmail.com
To: gluster-users-request at gluster.org
Subject: remote operation failed: No space left on device
Date: Fri, 27 Apr 2012 11:41:01 +0000
Hi all,
I've got an issue , it's seems that the size reported by df -h grows
indefinitely. Any help would be appreciated.
some details :
On the client :
yval9000:/users98 # df -h .
Filesystem Size Used Avail Use% Mounted on
ylal3510:/poolsave/yval9000
1.7T 1.7T 25G 99% /users98
yval9000:/users98 # du -ch .
5.1G /users98
My logs are full of :
[2012-04-27 12:14:32.402972] I [client3_1-fops.c:683:client3_1_writev_cbk]
0-poolsave-client-1: remote operation failed: No space left on device
[2012-04-27 12:14:32.426964] I [client3_1-fops.c:683:client3_1_writev_cbk]
0-poolsave-client-1: remote operation failed: No space left on device
[2012-04-27 12:14:32.439424] I [client3_1-fops.c:683:client3_1_writev_cbk]
0-poolsave-client-1: remote operation failed: No space left on device
[2012-04-27 12:14:32.441505] I [client3_1-fops.c:683:client3_1_writev_cbk]
0-poolsave-client-0: remote operation failed: No space left on device
This is my volume config :
Volume Name: poolsave
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: ylal3510:/users3/poolsave
Brick2: ylal3530:/users3/poolsave
Brick3: ylal3520:/users3/poolsave
Brick4: ylal3540:/users3/poolsave
Options Reconfigured:
nfs.enable-ino32: off
features.quota-timeout: 30
features.quota: off
performance.cache-size: 6GB
network.ping-timeout: 60
performance.cache-min-file-size: 1KB
performance.cache-max-file-size: 4GB
performance.cache-refresh-timeout: 2
nfs.port: 2049
performance.io-thread-count: 64
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
Space left on servers :
ylal3510:/users3 # df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/users-users3vol
858G 857G 1.1G 100% /users3
ylal3510:/users3 # du -ch /users3 | grep total
129G total
---
ylal3530:/users3 # df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/users-users3vol
858G 857G 1.1G 100% /users3
ylal3530:/users3 # du -ch /users3 | grep total
129G total
---
ylal3520:/users3 # df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/users-users3vol
858G 835G 24G 98% /users3
ylal3520:/users3 # du -ch /users3 | grep total
182G total
---
ylal3540:/users3 # df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/users-users3vol
858G 833G 25G 98% /users3
ylal3540:/users3 # du -ch /users3 | grep total
181G total
This issue appears after those 2 scripts running during 2 weeks :
test_save.sh is executed each hour,it takes a bunch of data to compress (dest :
REP_SAVE_TEMP) and then move it in a folder (REP_SAVE) that the netback.sh
script will scan each 30 min
#!/usr/bin/ksh
# ________________________________________________________________________
# |
# Nom test_save.sh
# ____________|___________________________________________________________
# |
# Description | test GlusterFS
# ____________|___________________________________________________________
UNIXSAVE=/users98/test
REP_SAVE_TEMP=${UNIXSAVE}/tmp
REP_SAVE=${UNIXSAVE}/gluster
LOG=/users/glusterfs_test
f_tar_mv()
{
echo "\n"
ARCHNAME=${REP_SAVE_TEMP}/`date +%d-%m-%H-%M`_${SUBNAME}.tar
tar -cpvf ${ARCHNAME} ${REPERTOIRE}
echo "creation of ${ARCHNAME}"
# mv ${REP_SAVE_TEMP}/*_${SUBNAME}.tar ${REP_SAVE}
mv ${REP_SAVE_TEMP}/* ${REP_SAVE}
echo "Moving archive in ${REP_SAVE} "
echo "\n"
return $?
}
REPERTOIRE="/users2/"
SUBNAME="test_glusterfs_save"
f_tar_mv >$LOG/save_`date +%d-%m-%Y-%H-%M`.log 2>&1
#!/usr/bin/ksh
# ________________________________________________________________________
# |
# Nom netback.sh
# ____________|___________________________________________________________
# |
# Description | Sauvegarde test GlusterFS
# ____________|___________________________________________________________
UNIXSAVE=/users98/test
REP_SAVE_TEMP=${UNIXSAVE}/tmp
REP_SAVE=${UNIXSAVE}/gluster
LOG=/users/glusterfs_test
f_net_back()
{
if [[ `find ${REP_SAVE} -type f | wc -l` -eq 0 ]]
then
echo "nothing to save";
else
echo "Simulation netbackup, tar in /dev/null"
tar -cpvf /dev/null ${REP_SAVE}/*
echo "deletion archive"
rm ${REP_SAVE}/*
fi
return $?
}
f_net_back >${LOG}/netback_`date +%d-%m-%H-%M`.log 2>&1
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120427/17e81daf/attachment.html>