Kelvin Westlake
2010-Apr-11 23:09 UTC
[Gluster-users] Replication destroying file content on Node crash
Hi Guys I've got 2 servers with a volume replicated between them, and each has a Client connected for the volume. Everything works fine while both servers remain up, I can copy files between them and it works flawlessly, the same occurs if I take one of the servers offline (i.e. simulating a crash), but the moment I bring the server back up, any new files (i.e. only on the live server) become inaccessible (If I edit it, I get "Input/Output error" and the following appears in the Gluster client logs - [2010-04-12 00:05:27] E [afr-self-heal-common.c:1237:sh_missing_entries_create] mirror-0: unknown file type: 01 [2010-04-12 00:05:27] W [fuse-bridge.c:858:fuse_fd_cbk] glusterfs-fuse: 168144: OPEN() /test2 => -1 (Input/output error) I've also tried firing off a resync with "ls -lR" but this seems to have no effect. Here are the vol files I'm using, I even tried disabling the stat-prefetch in the client as suggest here (http://gluster.org/pipermail/gluster-users/2009-December/003636.html), but still no joy Server 1 - 192.168.100.29 ## file auto generated by /usr/local/bin/glusterfs-volgen (export.vol) # Cmd line: # $ /usr/local/bin/glusterfs-volgen --name cluster3 --raid 1 192.168.100.31:/glusterfs/home 192.168.100.29:/glusterfs/home volume posix1 type storage/posix option directory /glusterfs/home end-volume volume locks1 type features/locks subvolumes posix1 end-volume volume brick1 type performance/io-threads option thread-count 8 subvolumes locks1 end-volume volume server-tcp type protocol/server option transport-type tcp option auth.addr.brick1.allow * option transport.socket.listen-port 6996 option transport.socket.nodelay on subvolumes brick1 end-volume Server 2 - 192.168.100.31 ## file auto generated by /usr/local/bin/glusterfs-volgen (export.vol) # Cmd line: # $ /usr/local/bin/glusterfs-volgen --name cluster3 --raid 1 192.168.100.31:/glusterfs/home 192.168.100.29:/glusterfs/home volume posix1 type storage/posix option directory /glusterfs/home end-volume volume locks1 type features/locks subvolumes posix1 end-volume volume brick1 type performance/io-threads option thread-count 8 subvolumes locks1 end-volume volume server-tcp type protocol/server option transport-type tcp option auth.addr.brick1.allow * option transport.socket.listen-port 6996 option transport.socket.nodelay on subvolumes brick1 end-volume Client vol file, mount on both servers to /glusterfs/home-mnt ## file auto generated by /usr/local/bin/glusterfs-volgen (mount.vol) # Cmd line: # $ /usr/local/bin/glusterfs-volgen --name cluster3 --raid 1 192.168.100.31:/glusterfs/home 192.168.100.29:/glusterfs/home # RAID 1 # TRANSPORT-TYPE tcp volume 192.168.100.31-1 type protocol/client option transport-type tcp option remote-host 192.168.100.31 option transport.socket.nodelay on option transport.remote-port 6996 option remote-subvolume brick1 end-volume volume 192.168.100.29-1 type protocol/client option transport-type tcp option remote-host 192.168.100.29 option transport.socket.nodelay on option transport.remote-port 6996 option remote-subvolume brick1 end-volume volume mirror-0 type cluster/replicate subvolumes 192.168.100.31-1 192.168.100.29-1 end-volume volume writebehind type performance/write-behind option cache-size 4MB subvolumes mirror-0 end-volume volume readahead type performance/read-ahead option page-count 4 subvolumes writebehind end-volume volume iocache type performance/io-cache option cache-size `echo $[ $(grep 'MemTotal' /proc/meminfo | sed 's/[^0-9]//g') / 5120 ]`MB option cache-timeout 1 subvolumes readahead end-volume volume quickread type performance/quick-read option cache-timeout 1 option max-file-size 64kB subvolumes iocache end-volume #volume statprefetch # type performance/stat-prefetch # subvolumes quickread #end-volume Any help or advise would be greatly appreciated. Thanks Kelvin This email with any attachments is for the exclusive and confidential use of the addressee(s) and may contain legally privileged information. Any other distribution, use or reproduction without the senders prior consent is unauthorised and strictly prohibited. If you receive this message in error please notify the sender by email and delete the message from your computer. Netbasic Limited registered office and business address is 9 Funtley Court, Funtley Hill, Fareham, Hampshire PO16 7UY. Company No. 04906681. Netbasic Limited is authorised and regulated by the Financial Services Authority in respect of regulated activities. Please note that many of our activities do not require FSA regulation.
Vijay Bellur
2010-Apr-12 17:26 UTC
[Gluster-users] Replication destroying file content on Node crash
Kelvin Westlake wrote:> Client vol file, mount on both servers to /glusterfs/home-mnt > > > volume mirror-0 > type cluster/replicate > subvolumes 192.168.100.31-1 192.168.100.29-1 > end-volume > > volume writebehind > type performance/write-behind > option cache-size 4MB > subvolumes mirror-0 > end-volume > > volume readahead > type performance/read-ahead > option page-count 4 > subvolumes writebehind > end-volume > > volume iocache > type performance/io-cache > option cache-size `echo $[ $(grep 'MemTotal' /proc/meminfo | sed > 's/[^0-9]//g') / 5120 ]`MB > option cache-timeout 1 > subvolumes readahead > end-volume > > volume quickread > type performance/quick-read > option cache-timeout 1 > option max-file-size 64kB > subvolumes iocache > end-volume > > #volume statprefetch > # type performance/stat-prefetch > # subvolumes quickread > #end-volume >Hi Kevin, Can you please check if the same behavior persists with quick-read translator section commented out in the client? If it doesn't, this could be related to bug 815. Thanks, Vijay