Hi everyone, I am trying to run multipal jobs using fio benchamark in replica volume with 3 bricks, but some hours later, warning message ?W [socket.c:195:__socket_rwv] 0-tcp.ida-server: readv failed (Connection timed out)? appear in bricks logs, I think this waring may due to high work loads, glusterfs with high work loads can not respond socket timely. So I add codes in rpc/rpc-transport/socket/src/socket.c to expand timeout threshold of socket, now the SO_RCVTIMEO is 180s, KEEP_ALIVE is 300s, then run work load again, but it does not work. My test enviroment is as follow: Three nodes work as gluster cluster, each nodes with 16GB memory, 8 core 3.3GHz cpu? two 10000baseT/full and one 1000baseT/full network cards, each nodes use 16 * 2T raid5 disks working as brick. The glusterfs version is 3.3.1. I create a 1*3 replica volume use this three nodes, every node use fuse to mount volume through a 10000baseT/full network card. At the sametime, every node use cifs to mount fuse_mount_point through another 10000baseT/full card. Each node run two fio scripts, read and write jobs. Both scripts do operation in cifs_mount_point. scripts is as follows: write_jobs: while true do mkdir -p ${DIR}_write_${i} /usr/local/bin/fio --ioengine=libaio --iodepth=256 --numjobs=100 --rw=write --bs=1k --size=1000m --directory=${DIR}_write_${i} --name=job01_1k_write >> ${DIR}_write_${i}/job01_1k_write.log i=`expr $i + 1` done read jobs: mkdir -p ${DIR}_read_${i} /usr/local/bin/fio --ioengine=libaio --iodepth=256 --numjobs=100 --rw=read --bs=1k --size=1000m --directory=${DIR}_read_${i} --name=job01_1k_read >> ${DIR}_read_${i}/job01_1k_read.log i=`expr $i + 1` done I change iodepth from 256 to 16, numjobs from 100 to 25, but it still does not work. Is there anybody pay attention to this problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141111/871f3b8d/attachment.html>
Pranith Kumar Karampuri
2014-Nov-11 05:58 UTC
[Gluster-users] socket time out in fio benchmark
On 11/11/2014 11:21 AM, shuau li wrote:> > Hi everyone, > > > I am trying to run multipal jobs using fio benchamark in replica > volume > > with 3 bricks, but some hours later, warning message "W > [socket.c:195:__socket_rwv] > > 0-tcp.ida-server: readv failed (Connection timed out)" appear in > bricks logs, I think this > > waring may due to high work loads, glusterfs with high work loads can > not respond > > socket timely. So I add codes in rpc/rpc-transport/socket/src/socket.c > to expand timeout > > threshold of socket, now the SO_RCVTIMEO is 180s, KEEP_ALIVE is 300s, > then run > > work load again, but it does not work. > > > My test enviroment is as follow: > > Three nodes work as gluster cluster, each nodes with 16GB memory, > 8 core > > 3.3GHz cpu, two 10000baseT/full and one 1000baseT/full network cards, > each nodes > > use 16 * 2T raid5 disks working as brick. The glusterfs version is 3.3.1.Just wanted to know if there is a possibility of using newer versions. No new releases are being made from 3.3.x. Could you try the case on newer versions and let us know your findings? If for some reasons you can't upgrade then we can probably start debugging this in 3.3.x. But I highly recommend you use a version that is active i.e. at least 3.4.x Pranith> > I create a 1*3 replica volume use this three nodes, every node use > fuse to mount > > volume through a 10000baseT/full network card. At the sametime, every > node use cifs to > > mount fuse_mount_point through another 10000baseT/full card. > > Each node run two fio scripts, read and write jobs. Both scripts > do operation in > > cifs_mount_point. scripts is as follows: > > write_jobs: > while true > do > mkdir -p ${DIR}_write_${i} > /usr/local/bin/fio --ioengine=libaio --iodepth=256 --numjobs=100 > --rw=write --bs=1k --size=1000m --directory=${DIR}_write_${i} > --name=job01_1k_write >> ${DIR}_write_${i}/job01_1k_write.log > i=`expr $i + 1` > done > > > read jobs: > mkdir -p ${DIR}_read_${i} > /usr/local/bin/fio --ioengine=libaio --iodepth=256 --numjobs=100 > --rw=read --bs=1k --size=1000m --directory=${DIR}_read_${i} > --name=job01_1k_read >> ${DIR}_read_${i}/job01_1k_read.log > i=`expr $i + 1` > done > > I change iodepth from 256 to 16, numjobs from 100 to 25, but it > still does not > > work. Is there anybody pay attention to this problem? > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141111/8a3fa434/attachment.html>