Hi, I am running several clients writing to local disk that are setup to replicate data to N glusterfs server processes that use NFS for storing this data. I have the following questions: 1. Does every write of a process to a local directory managed by glusterfs client involve syncronous replication of data to glusterfs server? 2. Is it possible that if NFS is slow on the glusterfs server side, then writing to local disk of a process on the glusterfs client's machine would block? Thanks. Serge -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081119/e14d51bc/attachment.html>
Serge, I am assuming you are not using source code from repository. If you are using source code directly from repository, then answers below might not hold good. Please mention the version, if you are using from source repository. On Thu, Nov 20, 2008 at 6:13 AM, Aleynikov, Serge <Serge.Aleynikov at gs.com>wrote:> Hi, > > I am running several clients writing to local disk that are setup to > replicate data to N glusterfs server processes that use NFS for storing this > data. I have the following questions: > > 1. Does every write of a process to a local directory managed by glusterfs > client involve syncronous replication of data to glusterfs server? >yes. it involves synchronous replication. glusterfs--mailine--3.0--patch-621 (onwards) in our source repository has an AFR feature called quick-unwind, which would make replication asychronous. 2. Is it possible that if NFS is slow on the glusterfs server side, then> writing to local disk of a process on the glusterfs client's machine would > block? >yes. fops on glusterfs will block.> Thanks. > > Serge > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > >-- gowda -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081120/1bd2e3cb/attachment.html>
I''m now also getting a bunch of these types of log messages: [posix.c:2493:posix_fxattrop] home1: 13: Numerical result out of range one of the servers is 64 bit and the other 32 bit.. is it possible there''s a problem here? some serial number on an attribute is larger than 32 bits? Keith At 01:21 AM 11/22/2008, Keith Freedman wrote:>I''m running 1.4.0qa63 > >it seems that the problems I reported long ago are still present. > >so I have 2 nodes AFR''ing eachother. > >when one is offline the other basically doesn''t work. > >I get error log entries with (Invalid Argument) for any file which >doesn''t exists when it does a lookup. > >this is a problem for the webserver because it tries to find a >.htaccess file.. and instead of being told by gluster it doesn''t >exist, it produces some error, and so apache reports back >forbidden until the other server goes back online. > >if I change the AFR brick to NOT include the other server, and >remount then all is well in the world, but if the other (offline) >server is in the afr subvolumes list then I get this problem. > >also, I''m noticing a LOT of messages on both servers about impunging >and expunging directories. this seems to be happening a lot more >frequently than makes sense, but I dont know what this does really >but I know that sometimes this takes a VERY long time and things >timeout. (mostly on large directories). > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I''m now also getting a bunch of these types of log messages: [posix.c:2493:posix_fxattrop] home1: 13: Numerical result out of range one of the servers is 64 bit and the other 32 bit.. is it possible there''s a problem here? some serial number on an attribute is larger than 32 bits? Keith At 01:21 AM 11/22/2008, Keith Freedman wrote:>I''m running 1.4.0qa63 > >it seems that the problems I reported long ago are still present. > >so I have 2 nodes AFR''ing eachother. > >when one is offline the other basically doesn''t work. > >I get error log entries with (Invalid Argument) for any file which >doesn''t exists when it does a lookup. > >this is a problem for the webserver because it tries to find a >.htaccess file.. and instead of being told by gluster it doesn''t >exist, it produces some error, and so apache reports back >forbidden until the other server goes back online. > >if I change the AFR brick to NOT include the other server, and >remount then all is well in the world, but if the other (offline) >server is in the afr subvolumes list then I get this problem. > >also, I''m noticing a LOT of messages on both servers about impunging >and expunging directories. this seems to be happening a lot more >frequently than makes sense, but I dont know what this does really >but I know that sometimes this takes a VERY long time and things >timeout. (mostly on large directories). > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I''m now also getting a bunch of these types of log messages: [posix.c:2493:posix_fxattrop] home1: 13: Numerical result out of range one of the servers is 64 bit and the other 32 bit.. is it possible there''s a problem here? some serial number on an attribute is larger than 32 bits? Keith At 01:21 AM 11/22/2008, Keith Freedman wrote:>I''m running 1.4.0qa63 > >it seems that the problems I reported long ago are still present. > >so I have 2 nodes AFR''ing eachother. > >when one is offline the other basically doesn''t work. > >I get error log entries with (Invalid Argument) for any file which >doesn''t exists when it does a lookup. > >this is a problem for the webserver because it tries to find a >.htaccess file.. and instead of being told by gluster it doesn''t >exist, it produces some error, and so apache reports back >forbidden until the other server goes back online. > >if I change the AFR brick to NOT include the other server, and >remount then all is well in the world, but if the other (offline) >server is in the afr subvolumes list then I get this problem. > >also, I''m noticing a LOT of messages on both servers about impunging >and expunging directories. this seems to be happening a lot more >frequently than makes sense, but I dont know what this does really >but I know that sometimes this takes a VERY long time and things >timeout. (mostly on large directories). > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith, can you mail me your log files? both server and client.. avati 2008/11/22 Keith Freedman <freedman at freeformit.com>> I'm now also getting a bunch of these types of log messages: > [posix.c:2493:posix_fxattrop] home1: 13: Numerical result out of range > > one of the servers is 64 bit and the other 32 bit.. is it possible > there's a problem here? some serial number on an attribute is larger > than 32 bits? > > Keith > > At 01:21 AM 11/22/2008, Keith Freedman wrote: > >I'm running 1.4.0qa63 > > > >it seems that the problems I reported long ago are still present. > > > >so I have 2 nodes AFR'ing eachother. > > > >when one is offline the other basically doesn't work. > > > >I get error log entries with (Invalid Argument) for any file which > >doesn't exists when it does a lookup. > > > >this is a problem for the webserver because it tries to find a > >.htaccess file.. and instead of being told by gluster it doesn't > >exist, it produces some error, and so apache reports back > >forbidden until the other server goes back online. > > > >if I change the AFR brick to NOT include the other server, and > >remount then all is well in the world, but if the other (offline) > >server is in the afr subvolumes list then I get this problem. > > > >also, I'm noticing a LOT of messages on both servers about impunging > >and expunging directories. this seems to be happening a lot more > >frequently than makes sense, but I dont know what this does really > >but I know that sometimes this takes a VERY long time and things > >timeout. (mostly on large directories). > > > > > >_______________________________________________ > >Gluster-users mailing list > >Gluster-users at gluster.org > >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >-- If I traveled to the end of the rainbow As Dame Fortune did intend, Murphy would be there to tell me The pot's at the other end. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081122/7f38b945/attachment.html>
I''m also noticing a problem with file times using AFR it seems that the file times get set to the time the file was AFR''ed to the other server. the file times should (and used to as far as I recall) be set to the time from the server the files is originating. here''s what happens. we have a process which modifies a file at 1:17 on server1 this file get''s AFR''ed to server 2, but it takes some time and the file gets there at 1:18 so, the process which updated the file knows it was updated at 1:17, it now connects to the other server and sees that the file there is newer than it thinks it should be so it raises an error. Also, I believe this is part of the problem with what I''m currently getting, which are a bunch of Input/Output errors from gluster itself. the error logs look like this: [afr-self-heal-data.c:767:afr_sh_data_fix] home: Unable to resolve conflicting data of /XYZ/public_html/brokenfile. Please resolve manually by deleting the file /XYZ/public_html/brokenfile from all but the preferred subvolume [fuse-bridge.c:605:fuse_fd_cbk] glusterfs-fuse: 3013026: OPEN() /XYZ/public_html/brokenfile => -1 (Input/output error) the frustration is that in these cases both servers are on and active and working yet, gluster seems to be causing it''s own problems. Again, I believe it''s dues to the timestamps on the underlying filesystem not being what is expected.
I''m also noticing a problem with file times using AFR it seems that the file times get set to the time the file was AFR''ed to the other server. the file times should (and used to as far as I recall) be set to the time from the server the files is originating. here''s what happens. we have a process which modifies a file at 1:17 on server1 this file get''s AFR''ed to server 2, but it takes some time and the file gets there at 1:18 so, the process which updated the file knows it was updated at 1:17, it now connects to the other server and sees that the file there is newer than it thinks it should be so it raises an error. Also, I believe this is part of the problem with what I''m currently getting, which are a bunch of Input/Output errors from gluster itself. the error logs look like this: [afr-self-heal-data.c:767:afr_sh_data_fix] home: Unable to resolve conflicting data of /XYZ/public_html/brokenfile. Please resolve manually by deleting the file /XYZ/public_html/brokenfile from all but the preferred subvolume [fuse-bridge.c:605:fuse_fd_cbk] glusterfs-fuse: 3013026: OPEN() /XYZ/public_html/brokenfile => -1 (Input/output error) the frustration is that in these cases both servers are on and active and working yet, gluster seems to be causing it''s own problems. Again, I believe it''s dues to the timestamps on the underlying filesystem not being what is expected.
I''m also noticing a problem with file times using AFR it seems that the file times get set to the time the file was AFR''ed to the other server. the file times should (and used to as far as I recall) be set to the time from the server the files is originating. here''s what happens. we have a process which modifies a file at 1:17 on server1 this file get''s AFR''ed to server 2, but it takes some time and the file gets there at 1:18 so, the process which updated the file knows it was updated at 1:17, it now connects to the other server and sees that the file there is newer than it thinks it should be so it raises an error. Also, I believe this is part of the problem with what I''m currently getting, which are a bunch of Input/Output errors from gluster itself. the error logs look like this: [afr-self-heal-data.c:767:afr_sh_data_fix] home: Unable to resolve conflicting data of /XYZ/public_html/brokenfile. Please resolve manually by deleting the file /XYZ/public_html/brokenfile from all but the preferred subvolume [fuse-bridge.c:605:fuse_fd_cbk] glusterfs-fuse: 3013026: OPEN() /XYZ/public_html/brokenfile => -1 (Input/output error) the frustration is that in these cases both servers are on and active and working yet, gluster seems to be causing it''s own problems. Again, I believe it''s dues to the timestamps on the underlying filesystem not being what is expected.
Please see comments/questions inline. I'm also noticing a problem with file times using AFR> > it seems that the file times get set to the time the file was AFR'ed > to the other server.Do you mean "heal"ed to the other server? In normal operation AFR modifies both servers together at the same time. here's what happens.> we have a process which modifies a file at 1:17 on server1 > this file get's AFR'ed to server 2, but it takes some time and the > file gets there at 1:18 >What do you mean 'a file is modified at 1:17 on server1' ? Is it modifying the backend directly? Is it modifying from the mountpoint with server2 offline? Or are you just considering a network delay pushing the 'modification' to happen a minute late on server2? so, the process which updated the file knows it was updated at 1:17,> it now connects to the other server and sees that the file there is > newer than it thinks it should be so it raises an error. >As long as both the servers are online, the times are returned from the first subvolume, so in both the cases the process should see the mtime at 1:17.> > Also, I believe this is part of the problem with what I'm currently > getting, which are a bunch of Input/Output errors from gluster itself. > the error logs look like this: > [afr-self-heal-data.c:767:afr_sh_data_fix] home: Unable to resolve > conflicting data of /XYZ/public_html/brokenfile. Please resolve > manually by deleting the file /XYZ/public_html/brokenfile from all > but the preferred subvolume > [fuse-bridge.c:605:fuse_fd_cbk] glusterfs-fuse: 3013026: OPEN() > /XYZ/public_html/brokenfile => -1 (Input/output error) > > the frustration is that in these cases both servers are on and active > and working yet, gluster seems to be causing it's own > problems. Again, I believe it's dues to the timestamps on the > underlying filesystem not being what is expected. >The EIO problem is unrelated to mtimes. We are investigating the EIO problem already. avati -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081124/830f6d1e/attachment.html>
At 10:17 AM 11/24/2008, Anand Avati wrote:>Please see comments/questions inline. > >I''m also noticing a problem with file times using AFR > >it seems that the file times get set to the time the file was AFR''ed >to the other server. > > >Do you mean "heal"ed to the other server? In normal operation AFR >modifies both servers together at the same time.well. I honestly dont know exactly what''s happening. the servers shouldn''t need to "heal" because they''re never offline and not out of communication, so yes, it should be modifying both simultaneously, but yet I constantly am getting Input/Output errors.>here''s what happens. >we have a process which modifies a file at 1:17 on server1 >this file get''s AFR''ed to server 2, but it takes some time and the >file gets there at 1:18 > > >What do you mean ''a file is modified at 1:17 on server1'' ? Is it >modifying the backend directly? Is it modifying from the mountpoint >with server2 offline? Or are you just considering a network delay >pushing the ''modification'' to happen a minute late on server2?When I notice the problem most is when I upload files with dreamweaver via FTP. later I''ll update the same file go to push it and it will report back that the file was modified on the server at 1:18 and our last push to the server was 1:17, so it thinks the file was modified. looking at the logs it just seems things aren''t working in general. these Input/Output errors shouldn''t ever happen, and it''s just odd. everything is using the gluster mount point. the only time I touch the back end filesystem is when I delete because the log says "should be deleted from all but the preferred server" and, since the servers are always communicating, I can''t understand why this happens at all.>so, the process which updated the file knows it was updated at 1:17, >it now connects to the other server and sees that the file there is >newer than it thinks it should be so it raises an error. > > >As long as both the servers are online, the times are returned from >the first subvolume, so in both the cases the process should see the >mtime at 1:17.should yes.. but this is NOT what''s happening.>Also, I believe this is part of the problem with what I''m currently >getting, which are a bunch of Input/Output errors from gluster itself. >the error logs look like this: >[afr-self-heal-data.c:767:afr_sh_data_fix] home: Unable to resolve >conflicting data of /XYZ/public_html/brokenfile. Please resolve >manually by deleting the file /XYZ/public_html/brokenfile from all >but the preferred subvolume >[fuse-bridge.c:605:fuse_fd_cbk] glusterfs-fuse: 3013026: OPEN() >/XYZ/public_html/brokenfile => -1 (Input/output error) > >the frustration is that in these cases both servers are on and active >and working yet, gluster seems to be causing it''s own >problems. Again, I believe it''s dues to the timestamps on the >underlying filesystem not being what is expected. > > >The EIO problem is unrelated to mtimes. We are investigating the EIO >problem already.ok. that''s good to know that it''s not related, hopefully this will be fixed soon too :)>avati
At 10:17 AM 11/24/2008, Anand Avati wrote:>Please see comments/questions inline. > >I''m also noticing a problem with file times using AFR > >it seems that the file times get set to the time the file was AFR''ed >to the other server. > > >Do you mean "heal"ed to the other server? In normal operation AFR >modifies both servers together at the same time.well. I honestly dont know exactly what''s happening. the servers shouldn''t need to "heal" because they''re never offline and not out of communication, so yes, it should be modifying both simultaneously, but yet I constantly am getting Input/Output errors.>here''s what happens. >we have a process which modifies a file at 1:17 on server1 >this file get''s AFR''ed to server 2, but it takes some time and the >file gets there at 1:18 > > >What do you mean ''a file is modified at 1:17 on server1'' ? Is it >modifying the backend directly? Is it modifying from the mountpoint >with server2 offline? Or are you just considering a network delay >pushing the ''modification'' to happen a minute late on server2?When I notice the problem most is when I upload files with dreamweaver via FTP. later I''ll update the same file go to push it and it will report back that the file was modified on the server at 1:18 and our last push to the server was 1:17, so it thinks the file was modified. looking at the logs it just seems things aren''t working in general. these Input/Output errors shouldn''t ever happen, and it''s just odd. everything is using the gluster mount point. the only time I touch the back end filesystem is when I delete because the log says "should be deleted from all but the preferred server" and, since the servers are always communicating, I can''t understand why this happens at all.>so, the process which updated the file knows it was updated at 1:17, >it now connects to the other server and sees that the file there is >newer than it thinks it should be so it raises an error. > > >As long as both the servers are online, the times are returned from >the first subvolume, so in both the cases the process should see the >mtime at 1:17.should yes.. but this is NOT what''s happening.>Also, I believe this is part of the problem with what I''m currently >getting, which are a bunch of Input/Output errors from gluster itself. >the error logs look like this: >[afr-self-heal-data.c:767:afr_sh_data_fix] home: Unable to resolve >conflicting data of /XYZ/public_html/brokenfile. Please resolve >manually by deleting the file /XYZ/public_html/brokenfile from all >but the preferred subvolume >[fuse-bridge.c:605:fuse_fd_cbk] glusterfs-fuse: 3013026: OPEN() >/XYZ/public_html/brokenfile => -1 (Input/output error) > >the frustration is that in these cases both servers are on and active >and working yet, gluster seems to be causing it''s own >problems. Again, I believe it''s dues to the timestamps on the >underlying filesystem not being what is expected. > > >The EIO problem is unrelated to mtimes. We are investigating the EIO >problem already.ok. that''s good to know that it''s not related, hopefully this will be fixed soon too :)>avati
At 10:17 AM 11/24/2008, Anand Avati wrote:>Please see comments/questions inline. > >I''m also noticing a problem with file times using AFR > >it seems that the file times get set to the time the file was AFR''ed >to the other server. > > >Do you mean "heal"ed to the other server? In normal operation AFR >modifies both servers together at the same time.well. I honestly dont know exactly what''s happening. the servers shouldn''t need to "heal" because they''re never offline and not out of communication, so yes, it should be modifying both simultaneously, but yet I constantly am getting Input/Output errors.>here''s what happens. >we have a process which modifies a file at 1:17 on server1 >this file get''s AFR''ed to server 2, but it takes some time and the >file gets there at 1:18 > > >What do you mean ''a file is modified at 1:17 on server1'' ? Is it >modifying the backend directly? Is it modifying from the mountpoint >with server2 offline? Or are you just considering a network delay >pushing the ''modification'' to happen a minute late on server2?When I notice the problem most is when I upload files with dreamweaver via FTP. later I''ll update the same file go to push it and it will report back that the file was modified on the server at 1:18 and our last push to the server was 1:17, so it thinks the file was modified. looking at the logs it just seems things aren''t working in general. these Input/Output errors shouldn''t ever happen, and it''s just odd. everything is using the gluster mount point. the only time I touch the back end filesystem is when I delete because the log says "should be deleted from all but the preferred server" and, since the servers are always communicating, I can''t understand why this happens at all.>so, the process which updated the file knows it was updated at 1:17, >it now connects to the other server and sees that the file there is >newer than it thinks it should be so it raises an error. > > >As long as both the servers are online, the times are returned from >the first subvolume, so in both the cases the process should see the >mtime at 1:17.should yes.. but this is NOT what''s happening.>Also, I believe this is part of the problem with what I''m currently >getting, which are a bunch of Input/Output errors from gluster itself. >the error logs look like this: >[afr-self-heal-data.c:767:afr_sh_data_fix] home: Unable to resolve >conflicting data of /XYZ/public_html/brokenfile. Please resolve >manually by deleting the file /XYZ/public_html/brokenfile from all >but the preferred subvolume >[fuse-bridge.c:605:fuse_fd_cbk] glusterfs-fuse: 3013026: OPEN() >/XYZ/public_html/brokenfile => -1 (Input/output error) > >the frustration is that in these cases both servers are on and active >and working yet, gluster seems to be causing it''s own >problems. Again, I believe it''s dues to the timestamps on the >underlying filesystem not being what is expected. > > >The EIO problem is unrelated to mtimes. We are investigating the EIO >problem already.ok. that''s good to know that it''s not related, hopefully this will be fixed soon too :)>avati