Kelvin Westlake
2010-Apr-11 23:09 UTC
[Gluster-users] Replication destroying file content on Node crash
Hi Guys
I've got 2 servers with a volume replicated between them, and each has a
Client connected for the volume. Everything works fine while both
servers remain up, I can copy files between them and it works
flawlessly, the same occurs if I take one of the servers offline (i.e.
simulating a crash), but the moment I bring the server back up, any new
files (i.e. only on the live server) become inaccessible (If I edit it,
I get "Input/Output error" and the following appears in the Gluster
client logs -
[2010-04-12 00:05:27] E
[afr-self-heal-common.c:1237:sh_missing_entries_create] mirror-0:
unknown file type: 01
[2010-04-12 00:05:27] W [fuse-bridge.c:858:fuse_fd_cbk]
glusterfs-fuse: 168144: OPEN() /test2 => -1 (Input/output error)
I've also tried firing off a resync with "ls -lR" but this seems
to have
no effect.
Here are the vol files I'm using, I even tried disabling the
stat-prefetch in the client as suggest here
(http://gluster.org/pipermail/gluster-users/2009-December/003636.html),
but still no joy
Server 1 - 192.168.100.29
## file auto generated by /usr/local/bin/glusterfs-volgen (export.vol)
# Cmd line:
# $ /usr/local/bin/glusterfs-volgen --name cluster3 --raid 1
192.168.100.31:/glusterfs/home 192.168.100.29:/glusterfs/home
volume posix1
type storage/posix
option directory /glusterfs/home
end-volume
volume locks1
type features/locks
subvolumes posix1
end-volume
volume brick1
type performance/io-threads
option thread-count 8
subvolumes locks1
end-volume
volume server-tcp
type protocol/server
option transport-type tcp
option auth.addr.brick1.allow *
option transport.socket.listen-port 6996
option transport.socket.nodelay on
subvolumes brick1
end-volume
Server 2 - 192.168.100.31
## file auto generated by /usr/local/bin/glusterfs-volgen (export.vol)
# Cmd line:
# $ /usr/local/bin/glusterfs-volgen --name cluster3 --raid 1
192.168.100.31:/glusterfs/home 192.168.100.29:/glusterfs/home
volume posix1
type storage/posix
option directory /glusterfs/home
end-volume
volume locks1
type features/locks
subvolumes posix1
end-volume
volume brick1
type performance/io-threads
option thread-count 8
subvolumes locks1
end-volume
volume server-tcp
type protocol/server
option transport-type tcp
option auth.addr.brick1.allow *
option transport.socket.listen-port 6996
option transport.socket.nodelay on
subvolumes brick1
end-volume
Client vol file, mount on both servers to /glusterfs/home-mnt
## file auto generated by /usr/local/bin/glusterfs-volgen (mount.vol)
# Cmd line:
# $ /usr/local/bin/glusterfs-volgen --name cluster3 --raid 1
192.168.100.31:/glusterfs/home 192.168.100.29:/glusterfs/home
# RAID 1
# TRANSPORT-TYPE tcp
volume 192.168.100.31-1
type protocol/client
option transport-type tcp
option remote-host 192.168.100.31
option transport.socket.nodelay on
option transport.remote-port 6996
option remote-subvolume brick1
end-volume
volume 192.168.100.29-1
type protocol/client
option transport-type tcp
option remote-host 192.168.100.29
option transport.socket.nodelay on
option transport.remote-port 6996
option remote-subvolume brick1
end-volume
volume mirror-0
type cluster/replicate
subvolumes 192.168.100.31-1 192.168.100.29-1
end-volume
volume writebehind
type performance/write-behind
option cache-size 4MB
subvolumes mirror-0
end-volume
volume readahead
type performance/read-ahead
option page-count 4
subvolumes writebehind
end-volume
volume iocache
type performance/io-cache
option cache-size `echo $[ $(grep 'MemTotal' /proc/meminfo | sed
's/[^0-9]//g') / 5120 ]`MB
option cache-timeout 1
subvolumes readahead
end-volume
volume quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes iocache
end-volume
#volume statprefetch
# type performance/stat-prefetch
# subvolumes quickread
#end-volume
Any help or advise would be greatly appreciated.
Thanks
Kelvin
This email with any attachments is for the exclusive and confidential use of the
addressee(s) and may contain legally privileged information. Any other
distribution, use or reproduction without the senders prior consent is
unauthorised and strictly prohibited. If you receive this message in error
please notify the sender by email and delete the message from your computer.
Netbasic Limited registered office and business address is 9 Funtley Court,
Funtley Hill, Fareham, Hampshire PO16 7UY. Company No. 04906681. Netbasic
Limited is authorised and regulated by the Financial Services Authority in
respect of regulated activities. Please note that many of our activities do not
require FSA regulation.
Vijay Bellur
2010-Apr-12 17:26 UTC
[Gluster-users] Replication destroying file content on Node crash
Kelvin Westlake wrote:> Client vol file, mount on both servers to /glusterfs/home-mnt > > > volume mirror-0 > type cluster/replicate > subvolumes 192.168.100.31-1 192.168.100.29-1 > end-volume > > volume writebehind > type performance/write-behind > option cache-size 4MB > subvolumes mirror-0 > end-volume > > volume readahead > type performance/read-ahead > option page-count 4 > subvolumes writebehind > end-volume > > volume iocache > type performance/io-cache > option cache-size `echo $[ $(grep 'MemTotal' /proc/meminfo | sed > 's/[^0-9]//g') / 5120 ]`MB > option cache-timeout 1 > subvolumes readahead > end-volume > > volume quickread > type performance/quick-read > option cache-timeout 1 > option max-file-size 64kB > subvolumes iocache > end-volume > > #volume statprefetch > # type performance/stat-prefetch > # subvolumes quickread > #end-volume >Hi Kevin, Can you please check if the same behavior persists with quick-read translator section commented out in the client? If it doesn't, this could be related to bug 815. Thanks, Vijay