Marcus Herou
2008-Sep-15 07:41 UTC
[Gluster-users] [Gluster-devel] GlusterFS drops mount point
Hi posting to the user list as well since I think there are 2 different issues. 1. Mount point gets dropped after 12-24 hours, seems like the connection gets stale since if you do an ls or such it hangs "forever". 2. Bad client config which spits out a lot of EIO's. We are onto this and will fix the config asap. However it is the no1 I'm really concerned about. We did heavy loadtests with IOZone with the bad config (it actually works but unify does not like it) and we got no errors from IOZone, to the contrary we got quite nice throughput! What would happen to glusterfs if the network between client and master goes away for just a sec once a day ? I'm suspecting that this could be an issue. it would be nice if glusterfs could "auto remount" like NFS. Kindly //Marcus ---------- Forwarded message ---------- From: Marcus Herou <marcus.herou at tailsweep.com> Date: Sun, Sep 14, 2008 at 1:43 PM Subject: Re: {Disarmed} Re: [Gluster-devel] GlusterFS drops mount point To: "Amar S. Tumballi" <amar at zresearch.com> Cc: Brian Taber <btaber at diversecg.com>, Gluster-devel at nongnu.org Thanks a bunch! So this would lead to that the mount point would "loose" it's connection ? Kindly //Marcus On Sat, Sep 13, 2008 at 7:49 PM, Amar S. Tumballi <amar at zresearch.com>wrote:> this will always lead to EIO as it fails to meet the criteria for unify's > functioning. > > Unify wants a file to be present only on one of its subvolumes, and in this > case, you have done afr of (v1 v2), (v2 v3), (v3 v1), which means, if a file > is present on (v1 v2) pair, it will be seen by other two afrs too, (v2 in > second pair, and v1 in third pair), so unify sees file to be present on all > of its subvolume, and gets confused which file to open, and returns EIO. > > the fix is, you need to export two volumes (instead of currently present 1) > per server, and make pairs of (v1-1 v2-2), (v2-1 v3-2) (v3-1 v1-2), hope i > am clear > > Regards, > > > >> Client: >> volume v1 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.10.30 >> option remote-subvolume home >> end-volume >> >> volume v2 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.10.31 >> option remote-subvolume home >> end-volume >> >> volume v3 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.10.32 >> option remote-subvolume home >> end-volume >> >> volume afr-1 >> type cluster/afr >> subvolumes v1 v2 >> end-volume >> >> volume afr-2 >> type cluster/afr >> subvolumes v2 v3 >> end-volume >> >> volume afr-3 >> type cluster/afr >> subvolumes v3 v1 >> end-volume >> >> volume ns1 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.10.30 >> option remote-subvolume home-namespace >> end-volume >> >> volume ns2 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.10.31 >> option remote-subvolume home-namespace >> end-volume >> >> volume ns3 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.10.32 >> option remote-subvolume home-namespace >> end-volume >> >> volume namespace >> type cluster/afr >> subvolumes ns1 ns2 ns3 >> end-volume >> >> volume v >> type cluster/unify >> option scheduler rr >> option namespace namespace >> subvolumes afr-1 afr-2 afr-3 >> end-volume >> >> I really hope we have misconfigured something since that is the easiest >> fix :) >> >> Kindly >> >> //Marcus >> >> >> >> >> On Sat, Sep 13, 2008 at 12:50 AM, Amar S. Tumballi <amar at zresearch.com>wrote: >> >>> Also which version of GlusterFS? >>> >>> ber at diversecg.com> >>> >>>> may be configuration issue... lets start with config, what does you >>>> config look like on client and server? >>>> >>>> Marcus Herou wrote: >>>> >>>>> Lots of these on server >>>>> 2008-09-12 20:48:14 E [protocol.c:271:gf_block_unserialize_transport] >>>>> server: EOF from peer (*MailScanner has detected a possible fraud attempt >>>>> from "192.168.10.4:1007" claiming to be* *MailScanner warning: >>>>> numerical links are often malicious:* 192.168.10.4:1007 < >>>>> http://192.168.10.4:1007>) >>>>> ... >>>>> 2008-09-12 20:50:12 E [server-protocol.c:4153:server_closedir] server: >>>>> not getting enough data, returning EINVAL >>>>> ... >>>>> 2008-09-12 20:50:12 E [server-protocol.c:4148:server_closedir] server: >>>>> unresolved fd 6 >>>>> ... >>>>> 2008-09-12 20:51:47 E [protocol.c:271:gf_block_unserialize_transport] >>>>> server: EOF from peer (*MailScanner has detected a possible fraud attempt >>>>> from "192.168.10.10:1015" claiming to be* *MailScanner warning: >>>>> numerical links are often malicious:* 192.168.10.10:1015 < >>>>> http://192.168.10.10:1015>) >>>>> >>>>> ... >>>>> >>>>> And lots of these on client >>>>> >>>>> 2008-09-12 19:54:45 E [afr.c:2201:afr_open] home-namespace: self heal >>>>> failed, returning EIO >>>>> 2008-09-12 19:54:45 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3954: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:45 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3956: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:45 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3958: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:45 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3987: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:45 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3989: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:45 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3991: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:45 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3993: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:54 C [client-protocol.c:212:call_bail] home3: bailing >>>>> transport >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4827:client_protocol_cleanup] >>>>> home3: forced unwinding frame type(2) op(5) reply=@0x809abb0 >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4239:client_lock_cbk] home3: >>>>> no proper reply from server, returning ENOTCONN >>>>> 2008-09-12 19:54:54 E [afr.c:1933:afr_selfheal_lock_cbk] home-afr-3: >>>>> (path=/rsyncer/.ssh/authorized_keys2 child=home3) op_ret=-1 op_errno=107 >>>>> 2008-09-12 19:54:54 E [afr.c:2201:afr_open] home-afr-3: self heal >>>>> failed, returning EIO >>>>> 2008-09-12 19:54:54 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3970: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4827:client_protocol_cleanup] >>>>> home3: forced unwinding frame type(2) op(5) reply=@0x809abb0 >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4239:client_lock_cbk] home3: >>>>> no proper reply from server, returning ENOTCONN >>>>> 2008-09-12 19:54:54 E [afr.c:1933:afr_selfheal_lock_cbk] home-afr-3: >>>>> (path=/rsyncer/.ssh/authorized_keys2 child=home3) op_ret=-1 op_errno=107 >>>>> 2008-09-12 19:54:54 E [afr.c:2201:afr_open] home-afr-3: self heal >>>>> failed, returning EIO >>>>> 2008-09-12 19:54:54 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3971: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4827:client_protocol_cleanup] >>>>> home3: forced unwinding frame type(2) op(5) reply=@0x809abb0 >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4239:client_lock_cbk] home3: >>>>> no proper reply from server, returning ENOTCONN >>>>> 2008-09-12 19:54:54 E [afr.c:1933:afr_selfheal_lock_cbk] home-afr-3: >>>>> (path=/rsyncer/.ssh/authorized_keys2 child=home3) op_ret=-1 op_errno=107 >>>>> 2008-09-12 19:54:54 E [afr.c:2201:afr_open] home-afr-3: self heal >>>>> failed, returning EIO >>>>> 2008-09-12 19:54:54 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3972: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4827:client_protocol_cleanup] >>>>> home3: forced unwinding frame type(2) op(5) reply=@0x809abb0 >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4239:client_lock_cbk] home3: >>>>> no proper reply from server, returning ENOTCONN >>>>> 2008-09-12 19:54:54 E [afr.c:1933:afr_selfheal_lock_cbk] home-afr-3: >>>>> (path=/rsyncer/.ssh/authorized_keys2 child=home3) op_ret=-1 op_errno=107 >>>>> 2008-09-12 19:54:54 E [afr.c:2201:afr_open] home-afr-3: self heal >>>>> failed, returning EIO >>>>> 2008-09-12 19:54:54 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 3974: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4827:client_protocol_cleanup] >>>>> home3: forced unwinding frame type(2) op(5) reply=@0x809abb0 >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4239:client_lock_cbk] home3: >>>>> no proper reply from server, returning ENOTCONN >>>>> 2008-09-12 19:54:54 E [afr.c:1933:afr_selfheal_lock_cbk] home-afr-3: >>>>> (path=/rsyncer/.ssh/authorized_keys2 child=home3) op_ret=-1 op_errno=107 >>>>> 2008-09-12 19:54:54 E [afr.c:2201:afr_open] home-afr-3: self heal >>>>> failed, returning EIO >>>>> 2008-09-12 19:54:54 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 4001: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4827:client_protocol_cleanup] >>>>> home3: forced unwinding frame type(2) op(5) reply=@0x809abb0 >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4239:client_lock_cbk] home3: >>>>> no proper reply from server, returning ENOTCONN >>>>> 2008-09-12 19:54:54 E [afr.c:1933:afr_selfheal_lock_cbk] home-afr-3: >>>>> (path=/rsyncer/.ssh/authorized_keys2 child=home3) op_ret=-1 op_errno=107 >>>>> 2008-09-12 19:54:54 E [afr.c:2201:afr_open] home-afr-3: self heal >>>>> failed, returning EIO >>>>> 2008-09-12 19:54:54 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 4002: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4827:client_protocol_cleanup] >>>>> home3: forced unwinding frame type(2) op(5) reply=@0x809abb0 >>>>> 2008-09-12 19:54:54 E [client-protocol.c:4239:client_lock_cbk] home3: >>>>> no proper reply from server, returning ENOTCONN >>>>> 2008-09-12 19:54:54 E [afr.c:1933:afr_selfheal_lock_cbk] home-afr-3: >>>>> (path=/rsyncer/.ssh/authorized_keys2 child=home3) op_ret=-1 op_errno=107 >>>>> 2008-09-12 19:54:54 E [afr.c:2201:afr_open] home-afr-3: self heal >>>>> failed, returning EIO >>>>> 2008-09-12 19:54:54 E [fuse-bridge.c:715:fuse_fd_cbk] glusterfs-fuse: >>>>> 4004: (12) /rsyncer/.ssh/authorized_keys2 => -1 (5) >>>>> 2008-09-12 19:55:01 E [unify.c:335:unify_lookup] home: returning ESTALE >>>>> for /rsyncer/.ssh/authorized_keys2: file count is 4 >>>>> 2008-09-12 19:55:01 E [unify.c:339:unify_lookup] home: >>>>> /rsyncer/.ssh/authorized_keys2: found on home-namespace >>>>> 2008-09-12 19:55:01 E [unify.c:339:unify_lookup] home: >>>>> /rsyncer/.ssh/authorized_keys2: found on home-afr-2 >>>>> 2008-09-12 19:55:01 E [unify.c:339:unify_lookup] home: >>>>> /rsyncer/.ssh/authorized_keys2: found on home-afr-1 >>>>> 2008-09-12 19:55:01 E [unify.c:339:unify_lookup] home: >>>>> /rsyncer/.ssh/authorized_keys2: found on home-afr-3 >>>>> >>>>> >>>>> Both server and client are spitting out tons of these. Thought "E" was >>>>> Error level, seems like DEBUG ? >>>>> >>>>> Kindly >>>>> >>>>> //Marcus >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Sep 12, 2008 at 8:01 PM, Brian Taber <btaber at diversecg.com<mailto: >>>>> btaber at diversecg.com>> wrote: >>>>> >>>>> What do you see in your server and client logs for gluster? >>>>> >>>>> ------------------------- >>>>> Brian Taber >>>>> Owner/IT Specialist >>>>> Diverse Computer Group >>>>> Office: 774-206-5592 >>>>> Cell: 508-496-9221 >>>>> btaber at diversecg.com <mailto:btaber at diversecg.com> >>>>> >>>>> >>>>> >>>>> >>>>> Marcus Herou wrote: >>>>> > Hi. >>>>> > >>>>> > We have just recently installed a 3 node cluster with 16 SATA >>>>> disks each. >>>>> > >>>>> > We are using Hardy and the glusterfs-3.10 Ubuntu package on both >>>>> client(s) >>>>> > and server. >>>>> > >>>>> > We have only created one export (/home) yet since we want to >>>>> test it a while >>>>> > before putting it into a live high performance environment. >>>>> > >>>>> > The problem is currently that the client looses /home once a day >>>>> or so. This >>>>> > is really bad since this is a machine which all other connect to >>>>> with ssh >>>>> > keys thus making them unable to log in. >>>>> > >>>>> > Anyone seen something similar ? >>>>> > >>>>> > Kindly >>>>> > >>>>> > //Marcus >>>>> > _______________________________________________ >>>>> > Gluster-devel mailing list >>>>> > Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org> >>>>> > http://lists.nongnu.org/mailman/listinfo/gluster-devel >>>>> > >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Marcus Herou CTO and co-founder Tailsweep AB >>>>> +46702561312 >>>>> marcus.herou at tailsweep.com <mailto:marcus.herou at tailsweep.com> >>>>> http://www.tailsweep.com/ >>>>> http://blogg.tailsweep.com/ >>>>> >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> Gluster-devel at nongnu.org >>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel >>>> >>> >>> >>> >>> -- >>> Amar Tumballi >>> Gluster/GlusterFS Hacker >>> [bulde on #gluster/irc.gnu.org] >>> http://www.zresearch.com - Commoditizing Super Storage! >>> >> >> >> >> -- >> Marcus Herou CTO and co-founder Tailsweep AB >> +46702561312 >> marcus.herou at tailsweep.com >> http://www.tailsweep.com/ >> http://blogg.tailsweep.com/ >> > > > > -- > Amar Tumballi > Gluster/GlusterFS Hacker > [bulde on #gluster/irc.gnu.org] > http://www.zresearch.com - Commoditizing Super Storage! >-- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.herou at tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.herou at tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20080915/1ff7533c/attachment.html>
Ok I think I've narrowed it down to only being a problem with my home directory on my server (/home is bound to the cluster which in turn is all setup to come up at startup). I've just installed 1.3.12 and I *think* I'm running gluster fuse (hopefully it was a kernel module, I haven't looked into this too closely) and what seems to happen is that if I browse the normal directories of my server (i.e. the publically accessible ones) but when I try to access my home directory (either through /home or through the gluster mount) then gluster freezes for what seems a really long time and then eventually crashes. It's not clear from the log files, but it looks like for some reason this is a version of the issue where my namespace brick would refuse to pick up certain files from my home directory, which I only fixed by manually copying a bunch of them around with gluster unmounted. An example of this should be in the log I posted earlier. - Will -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20080915/763d90f4/attachment.html>