Sabyasachi Ruj
2012-Jun-12 12:22 UTC
[Gluster-users] Coherency problem with file buffer cache
Hi, I am having a problem with glusterfs and kernel file buffer cache. I've been able to reduce the problem to a simple case to demonstrate the behavior: I have an app-1 in client1. This application reads page x of a file. The content of this page will get stored in kernel file buffer cache of client1. app-1 periodically checks for update in the content of page x. Now an app-2 in client2, updates page x of the same file and does fsync. The problem is app-1 never sees the update made by app-2. I guess it is because of the page cache is not being marked as dirty in client-1. Is there any way to guarantee that once a fsync has been called from any client. All client will see the latest content? I have written a C program to replicate this problem if that helps. You can find the program here: http://snipt.org/vaId9. Download the program into an mounted gluster volume. You have to have at least two clients. Complie the program using: # g++ -g -o test <filename.cpp> Then from client1: # test samplefile verify It will output 'A' in the screen, which now the content of the file samplefile. Then from client2 run: # test samplefile update B This will update the content of the file samplefile with content 'B'. You will be able to see that client1 will still show 'A'. This will not happen when you update the file from the same client where verify is running, that is client1. I know that direct I/O mode can help to certain extent. But it does not guarantee a atomic transaction either. -- Sabyasachi
Sabyasachi Ruj
2012-Jun-12 12:26 UTC
[Gluster-users] Coherency problem with file buffer cache
The link the C program is: http://snipt.org/vaId9 The full-stop at the end of the link in my previous mail was making problem in some mail clients.
Brian Candler
2012-Jun-12 13:35 UTC
[Gluster-users] Coherency problem with file buffer cache
On Tue, Jun 12, 2012 at 05:52:37PM +0530, Sabyasachi Ruj wrote:> This will update the content of the file samplefile with content 'B'. > You will be able to see that client1 will still show 'A'. This will > not happen when you update the file from the same client where verify > is running, that is client1. > > I know that direct I/O mode can help to certain extent. But it does > not guarantee a atomic transaction either.I can confirm (with 3.3.0) the behaviour seen. Running strace on the verify process shows it doing read(3) = 2048 every time; however an strace on the glusterfs (FUSE) process shows only a single 2048+ byte transfer the first time. writev(7, [{"\20\10\0\0\0\0\0\0\367\1\0\0\0\0\0\0", 16}, {"DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD"..., 2048}], 2) = 2064 However, closing and opening the file again makes it see the new contents: --- gtest.c.orig 2012-06-12 14:23:04.399365721 +0100 +++ gtest.c 2012-06-12 14:31:37.811383033 +0100 @@ -134,6 +134,8 @@ if (strcmp(argv[2], "verify") == 0) { int i = 0; while(continue_loop) { + close(fd); + fd = open_file(argv[1], &is_new_file); lockFile(fd); page_read(fd, buffer_r); unlockFile(fd); Regards, Brian.
Sabyasachi Ruj
2012-Jun-12 14:08 UTC
[Gluster-users] Coherency problem with file buffer cache
I don't have any such file on client. Though I have one in server. The content there: volume testvol-client-0 type protocol/client option remote-host server1 option remote-subvolume /bricks/0001 option transport-type tcp end-volume volume testvol-io-cache type performance/io-cache subvolumes testvol-client-0 end-volume volume testvol type debug/io-stats option latency-measurement off option count-fop-hits off subvolumes testvol-io-cache end-volume On 12 June 2012 19:22, Brian Candler <B.Candler at pobox.com> wrote:> What does /var/lib/glusterd/vols/single1/<volname>-fuse.vol contain on the > client? Is the performance/io-cache translator present? > > I'm afraid I don't know the 'official' way to disable this (e.g. mount > option?)-- Sabyasachi
On 06/12/2012 08:22 AM, Sabyasachi Ruj wrote:> I am having a problem with glusterfs and kernel file buffer cache. > I've been able to reduce the problem to a simple case to demonstrate > the behavior: > ...I tried running this with RHS 2.0+ (what you would know as GlusterFS 3.3+, from current git HEAD) under RHEL 6.2 on my machines. What I saw was that FUSE *doesn't even send us the read* in most cases, not even reliably when we poke /proc/sys/vm/drop_caches (which seems to me like it should *always* result in a read). It shows up in strace because that's before FUSE, but we're after FUSE. If I set a breakpoint on fuse_readv_resume, for example, I don't reach it until we do an open for another process - at which point the original process does see the new data. This can be avoided by using attribute-timeout=0 and entry-timeout=0, which are unfortunately available in the "glusterfs" command but not in mount.glusterfs AFAICT. With these options we still don't see the change immediately, but we do get called when drop_caches is invoked.