Kingsley
2015-Apr-10 08:31 UTC
[Gluster-users] 3.6.2, file-write events out of order - data missing temporarily
Hi, We're running gluster 3.6.2 on CentOS 7, using a replicate-only volume with 4 way replication. We have 10 hosts mounting the volume - 6 running CentOS 6 that submit jobs to a "to-process" directory on the gluster volume, and 4 running CentOS 7 that process entries from that directory. So that the 4 "processor" machines don't read partly written files, the submitting machines write to a tmpspool subdirectory first (subdirectory of the to-process directory on the gluster volume) and then move it into the main to-process directory once written, eg: cp /localdir/job1234.txt /mnt/gv0/to_process/tmpspool mv /mnt/gv0/to_process/tmpspool/job1234.txt /mnt/gv0/to_process These job files are small (less than 500 bytes). However, if one of the processor machines picks up one of the files quite quickly after it appears, it sees a smaller (ie not fully written) file. If it waits a few seconds and tries again, the file is complete. Is this a known bug that might be fixed in 3.6.3, or is it a new issue? One I recently saw was a 441 byte file that was moved from tmpspool into to_process by the client machine, but was read from to_process as a 391 byte file by one of the processing machines with the last 2 lines missing, but read again 3 seconds later with all of the data in place. Curiously, when there is data missing, it's always whole lines; the temporarily-short file never seems to end half way along a line of text. Cheers, Kingsley.
Krutika Dhananjay
2015-Apr-10 11:03 UTC
[Gluster-users] 3.6.2, file-write events out of order - data missing temporarily
Hi, So are the "submitter" clients exactly doing the same commands that you just pasted: i.e., cp /localdir/job1234.txt /mnt/gv0/to_process/tmpspool followed by mv /mnt/gv0/to_process/tmpspool/job1234.txt /mnt/gv0/to_process in a loop? Or are they executing a hand-written program perhaps which open()s the file, write()s to the file, and then executes a rename() syscall? Also, is this issue hit if you turn off flush-behind (by doing a `gluster volume set <VOLNAME> performance.flush-behind off`)? -Krutika ----- Original Message -----> From: "Kingsley" <gluster at gluster.dogwind.com> > To: gluster-users at gluster.org > Sent: Friday, April 10, 2015 2:01:54 PM > Subject: [Gluster-users] 3.6.2, file-write events out of order - data missing > temporarily> Hi,> We're running gluster 3.6.2 on CentOS 7, using a replicate-only volume > with 4 way replication.> We have 10 hosts mounting the volume - 6 running CentOS 6 that submit > jobs to a "to-process" directory on the gluster volume, and 4 running > CentOS 7 that process entries from that directory.> So that the 4 "processor" machines don't read partly written files, the > submitting machines write to a tmpspool subdirectory first (subdirectory > of the to-process directory on the gluster volume) and then move it into > the main to-process directory once written, eg:> cp /localdir/job1234.txt /mnt/gv0/to_process/tmpspool > mv /mnt/gv0/to_process/tmpspool/job1234.txt /mnt/gv0/to_process> These job files are small (less than 500 bytes).> However, if one of the processor machines picks up one of the files > quite quickly after it appears, it sees a smaller (ie not fully written) > file. If it waits a few seconds and tries again, the file is complete.> Is this a known bug that might be fixed in 3.6.3, or is it a new issue?> One I recently saw was a 441 byte file that was moved from tmpspool into > to_process by the client machine, but was read from to_process as a 391 > byte file by one of the processing machines with the last 2 lines > missing, but read again 3 seconds later with all of the data in place.> Curiously, when there is data missing, it's always whole lines; the > temporarily-short file never seems to end half way along a line of text.> Cheers, > Kingsley.> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150410/de1636cb/attachment.html>