thr3ads.net - Gluster users - [Gluster-users] VMDK file replication issue [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Marco Trevisan

2008-Aug-07 17:19 UTC

[Gluster-users] VMDK file replication issue

Hi all,

I'm in the process of evaluating GlusterFS as a clustered file system, I 
like it very much because -among the other cool features- it's very easy 
to configure and it allows me to reuse the filesystems I already know as 
storage backends.

Before trying it on expensive hardware, I decided to try it on a very 
low HW configuration:

- 2 old PCs  (one P4 class CPU, IDE drives, one 100 Mbps ethernet card) 
and a 100 Mbps switch.

The OS is Debian 'lenny' in both nodes. 'Lenny' comes with FUSE
v2.7.2.

I then compiled glusterfs 1.3.10 on both nodes and setup a server-side, 
single-process AFR configuration (the file content is reported below).
I did NOT use the glusterfs-patched FUSE library.

On top of that I've put some VMWare Server virtual machines. Each 
virtual machine image is split into a few 2 GB "vmdk" files (not 
preallocated).

I was successful in starting up and running my virtual machines (with 
only an additional line in their configuration files), so I was very 
happy with it.

The problem now is, after putting a virtual machine under "intense" 
I/O,  when I rebooted it today I found its root filesystem (=the vmdk) 
was corrupted. It lost some important directories (e.g. kernel modules 
directory under /lib/modules).

Just to give you a little more detail of the behaviour under I/O, when 
the virtual machine is doing I/O to the VMDK file, iptraf shows the 
corresponding traffic on the ethernet link at about 15-50 Mbps, so it 
looks like only the modified portions of the file are being sent to the 
other AFR node, infact if I simulate a failure by powering off the other 
AFR node, at reboot I see 90 Mbps (link saturation) traffic as I try to 
open the VMDK file, and that operation blocks until full synchronization 
has finished.

The glusterfs.log content is as follows:
[...]
2008-08-06 12:20:24 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning 
EAGAIN
2008-08-06 12:20:24 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: 
(path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) 
op_ret=-1 op_errno=11
2008-08-06 12:42:25 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning 
EAGAIN
2008-08-06 12:42:25 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: 
(path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) 
op_ret=-1 op_errno=11
2008-08-07 11:42:17 E [posix-locks.c:1180:pl_forget] gfs-ds-locks: 
Active locks found!


The above log does not seem to justify such file corruption... there is 
nothing related to "vmdk" files.

Any suggestions?
Is the HW configuration way too slow for afr to work reliably?
Are there mistakes in the configuration file?

Any help is really appreciated.

Kind regards,
   Marco


----------GlusterFS config file ----------------

# dataspace on storage1
volume gfs-ds
  type storage/posix
  option directory /mnt/hda7/gfs-ds
end-volume

# posix locks
volume gfs-ds-locks
  type features/posix-locks
  subvolumes gfs-ds
end-volume

volume gfs-ds-threads
  type performance/io-threads
  option thread-count 1
  option cache-size 32MB
  subvolumes gfs-ds-locks
end-volume

volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes gfs-ds-threads
  # storage network access only
  option auth.ip.gfs-ds-threads.allow *
  option auth.ip.gfs-ds-afr.allow *
end-volume


# dataspace on storage2
volume gfs-storage2-ds
  type protocol/client
  option transport-type tcp/client
  option remote-host <the other node's IP>  # storage network
  option remote-subvolume gfs-ds-threads
  option transport-timeout 10           # value in seconds; it should be 
set relatively low
end-volume

# automatic file replication translator for dataspace
volume gfs-ds-afr
  type cluster/afr
  subvolumes gfs-ds-locks gfs-storage2-ds         # local and remote 
dataspaces
end-volume

volume writebehind
  type performance/write-behind
  option aggregate-size 128kB
  subvolumes gfs-ds-afr
end-volume

volume readahead
  type performance/read-ahead
  option page-size 64kB
  option page-count 16
  subvolumes writebehind
end-volume
-------------

Raghavendra G

2008-Aug-08 03:46 UTC

head link

[Gluster-users] VMDK file replication issue

Hi Marco,
There is nothing suspicious in Configuration. Can you attach the complete
logfiles of both glusterfs process?

regards,

On Thu, Aug 7, 2008 at 9:19 PM, Marco Trevisan
<marco.trevisan at cardinis.com>wrote:
> Hi all,
>
> I'm in the process of evaluating GlusterFS as a clustered file system,
I
> like it very much because -among the other cool features- it's very
easy
> to configure and it allows me to reuse the filesystems I already know as
> storage backends.
>
> Before trying it on expensive hardware, I decided to try it on a very
> low HW configuration:
>
> - 2 old PCs  (one P4 class CPU, IDE drives, one 100 Mbps ethernet card)
> and a 100 Mbps switch.
>
> The OS is Debian 'lenny' in both nodes. 'Lenny' comes with
FUSE v2.7.2.
>
> I then compiled glusterfs 1.3.10 on both nodes and setup a server-side,
> single-process AFR configuration (the file content is reported below).
> I did NOT use the glusterfs-patched FUSE library.
>
> On top of that I've put some VMWare Server virtual machines. Each
> virtual machine image is split into a few 2 GB "vmdk" files (not
> preallocated).
>
> I was successful in starting up and running my virtual machines (with
> only an additional line in their configuration files), so I was very
> happy with it.
>
> The problem now is, after putting a virtual machine under
"intense"
> I/O,  when I rebooted it today I found its root filesystem (=the vmdk)
> was corrupted. It lost some important directories (e.g. kernel modules
> directory under /lib/modules).
>
> Just to give you a little more detail of the behaviour under I/O, when
> the virtual machine is doing I/O to the VMDK file, iptraf shows the
> corresponding traffic on the ethernet link at about 15-50 Mbps, so it
> looks like only the modified portions of the file are being sent to the
> other AFR node, infact if I simulate a failure by powering off the other
> AFR node, at reboot I see 90 Mbps (link saturation) traffic as I try to
> open the VMDK file, and that operation blocks until full synchronization
> has finished.
>
> The glusterfs.log content is as follows:
> [...]
> 2008-08-06 12:20:24 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning
> EAGAIN
> 2008-08-06 12:20:24 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr:
> (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks)
> op_ret=-1 op_errno=11
> 2008-08-06 12:42:25 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning
> EAGAIN
> 2008-08-06 12:42:25 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr:
> (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks)
> op_ret=-1 op_errno=11
> 2008-08-07 11:42:17 E [posix-locks.c:1180:pl_forget] gfs-ds-locks:
> Active locks found!
>
>
> The above log does not seem to justify such file corruption... there is
> nothing related to "vmdk" files.
>
> Any suggestions?
> Is the HW configuration way too slow for afr to work reliably?
> Are there mistakes in the configuration file?
>
> Any help is really appreciated.
>
> Kind regards,
>   Marco
>
>
> ----------GlusterFS config file ----------------
>
> # dataspace on storage1
> volume gfs-ds
>  type storage/posix
>  option directory /mnt/hda7/gfs-ds
> end-volume
>
> # posix locks
> volume gfs-ds-locks
>  type features/posix-locks
>  subvolumes gfs-ds
> end-volume
>
> volume gfs-ds-threads
>  type performance/io-threads
>  option thread-count 1
>  option cache-size 32MB
>  subvolumes gfs-ds-locks
> end-volume
>
> volume server
>  type protocol/server
>  option transport-type tcp/server
>  subvolumes gfs-ds-threads
>  # storage network access only
>  option auth.ip.gfs-ds-threads.allow *
>  option auth.ip.gfs-ds-afr.allow *
> end-volume
>
>
> # dataspace on storage2
> volume gfs-storage2-ds
>  type protocol/client
>  option transport-type tcp/client
>  option remote-host <the other node's IP>  # storage network
>  option remote-subvolume gfs-ds-threads
>  option transport-timeout 10           # value in seconds; it should be
> set relatively low
> end-volume
>
> # automatic file replication translator for dataspace
> volume gfs-ds-afr
>  type cluster/afr
>  subvolumes gfs-ds-locks gfs-storage2-ds         # local and remote
> dataspaces
> end-volume
>
> volume writebehind
>  type performance/write-behind
>  option aggregate-size 128kB
>  subvolumes gfs-ds-afr
> end-volume
>
> volume readahead
>  type performance/read-ahead
>  option page-size 64kB
>  option page-count 16
>  subvolumes writebehind
> end-volume
> -------------
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>


-- 
Raghavendra G

A centipede was happy quite, until a toad in fun,
Said, "Prey, which leg comes after which?",
This raised his doubts to such a pitch,
He fell flat into the ditch,
Not knowing how to run.
-Anonymous
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20080808/b3906def/attachment.html>

Raghavendra G

2008-Aug-12 05:28 UTC

head link

[Gluster-users] VMDK file replication issue

Hi Marco,

What is the backend filesystem you are using? Does it support extended
attributes? you can check the presence  of exten
ded attribute support using setfattr/getfattr commands. Also the
transport-timeout value is too low. Can you test again with
transport-timeout set to default value (42s)?

regards,
On Thu, Aug 7, 2008 at 9:19 PM, Marco Trevisan
<marco.trevisan at cardinis.com>wrote:
> Hi all,
>
> I'm in the process of evaluating GlusterFS as a clustered file system,
I
> like it very much because -among the other cool features- it's very
easy
> to configure and it allows me to reuse the filesystems I already know as
> storage backends.
>
> Before trying it on expensive hardware, I decided to try it on a very
> low HW configuration:
>
> - 2 old PCs  (one P4 class CPU, IDE drives, one 100 Mbps ethernet card)
> and a 100 Mbps switch.
>
> The OS is Debian 'lenny' in both nodes. 'Lenny' comes with
FUSE v2.7.2.
>
> I then compiled glusterfs 1.3.10 on both nodes and setup a server-side,
> single-process AFR configuration (the file content is reported below).
> I did NOT use the glusterfs-patched FUSE library.
>
> On top of that I've put some VMWare Server virtual machines. Each
> virtual machine image is split into a few 2 GB "vmdk" files (not
> preallocated).
>
> I was successful in starting up and running my virtual machines (with
> only an additional line in their configuration files), so I was very
> happy with it.
>
> The problem now is, after putting a virtual machine under
"intense"
> I/O,  when I rebooted it today I found its root filesystem (=the vmdk)
> was corrupted. It lost some important directories (e.g. kernel modules
> directory under /lib/modules).
>
> Just to give you a little more detail of the behaviour under I/O, when
> the virtual machine is doing I/O to the VMDK file, iptraf shows the
> corresponding traffic on the ethernet link at about 15-50 Mbps, so it
> looks like only the modified portions of the file are being sent to the
> other AFR node, infact if I simulate a failure by powering off the other
> AFR node, at reboot I see 90 Mbps (link saturation) traffic as I try to
> open the VMDK file, and that operation blocks until full synchronization
> has finished.
>
> The glusterfs.log content is as follows:
> [...]
> 2008-08-06 12:20:24 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning
> EAGAIN
> 2008-08-06 12:20:24 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr:
> (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks)
> op_ret=-1 op_errno=11
> 2008-08-06 12:42:25 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning
> EAGAIN
> 2008-08-06 12:42:25 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr:
> (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks)
> op_ret=-1 op_errno=11
> 2008-08-07 11:42:17 E [posix-locks.c:1180:pl_forget] gfs-ds-locks:
> Active locks found!
>
>
> The above log does not seem to justify such file corruption... there is
> nothing related to "vmdk" files.
>
> Any suggestions?
> Is the HW configuration way too slow for afr to work reliably?
> Are there mistakes in the configuration file?
>
> Any help is really appreciated.
>
> Kind regards,
>   Marco
>
>
> ----------GlusterFS config file ----------------
>
> # dataspace on storage1
> volume gfs-ds
>  type storage/posix
>  option directory /mnt/hda7/gfs-ds
> end-volume
>
> # posix locks
> volume gfs-ds-locks
>  type features/posix-locks
>  subvolumes gfs-ds
> end-volume
>
> volume gfs-ds-threads
>  type performance/io-threads
>  option thread-count 1
>  option cache-size 32MB
>  subvolumes gfs-ds-locks
> end-volume
>
> volume server
>  type protocol/server
>  option transport-type tcp/server
>  subvolumes gfs-ds-threads
>  # storage network access only
>  option auth.ip.gfs-ds-threads.allow *
>  option auth.ip.gfs-ds-afr.allow *
> end-volume
>
>
> # dataspace on storage2
> volume gfs-storage2-ds
>  type protocol/client
>  option transport-type tcp/client
>  option remote-host <the other node's IP>  # storage network
>  option remote-subvolume gfs-ds-threads
>  option transport-timeout 10           # value in seconds; it should be
> set relatively low
> end-volume
>
> # automatic file replication translator for dataspace
> volume gfs-ds-afr
>  type cluster/afr
>  subvolumes gfs-ds-locks gfs-storage2-ds         # local and remote
> dataspaces
> end-volume
>
> volume writebehind
>  type performance/write-behind
>  option aggregate-size 128kB
>  subvolumes gfs-ds-afr
> end-volume
>
> volume readahead
>  type performance/read-ahead
>  option page-size 64kB
>  option page-count 16
>  subvolumes writebehind
> end-volume
> -------------
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>


-- 
Raghavendra G

A centipede was happy quite, until a toad in fun,
Said, "Prey, which leg comes after which?",
This raised his doubts to such a pitch,
He fell flat into the ditch,
Not knowing how to run.
-Anonymous
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20080812/7b6467f0/attachment.html>

Collin Douglas

2008-Aug-13 21:48 UTC

head link

[Gluster-users] GlusterFS replication over WAN

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



Keith Freedman wrote:> At 01:29 PM 8/13/2008, Collin Douglas wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> How does GlusterFS' AFR translator work over a WAN connection?  I
know
>> the simple answer is that it depends on the connection what's being
>> replicated but I'm interested in understanding how it works in a
low
>> bandwidth situation.
>
> it works fine from my personal experience.  you can probably move data 
> at close to wire speed.  I think you'll have different experiences 
> depending on whether or not you're using server or client AFR.
> My personal preference is to use server based AFR, only because this 
> really simplifies client configuration and when you have multiple 
> clients acting on the same data sets, when you set up new clients, if 
> they're not AFRing correctly you can get some out of sync data.
>
>> We are building a new storage model for our imaging system in which
>> GlusterFS is a candidate.
>>
>> Current design ideas consist of a tier 1 storage where our current
>> "pipeline" of active files would be stored and operated on. 
Once the
>> files are in a relative static state, they would be moved to tier 2
>> storage.  It's this tier 2 storage that would need WAN replication.
In
>> this way we would limit the amount of data somewhat that has to be
>> replicated as the pipeline files are in an almost constant state of 
>> flux.
>>
>> The idea is to have an array on site (let's call it building 1),
one
>> next door in another building (building 2) and one a few hundred miles
>> away.  The local arrays will be connected via Infiniband or fiber and
>> the remote one via a 10MB link.
>>
>> It seems logical that I would want to replicate from building 1 to
>> building 2 and then have another AFR configuration at building 2 to
>> replicate to the remote site.  Does this fit best practices for
>> GlusterFS or should I use more of a hub and spoke method?
>
> I'm not sure I'm conceptualizing your setup quite right. . if this
is
> what you mean:
> for live active files that are being worked on, AFR building 1 to 
> building 2. Lets call this /active
> so the /active filesystem would be an AFR fs using servers in building 
> 1 and 2.  your 10MB infiniband should be more than sufficient.
> For the completed files to have them replicated, you move them to 
> /backup which is a volume afr'ed from building 2 to offsite building 
> C.  This is probably over lan speed?  hopefully faster than T1, but 
> you're moving smaller amounts of data it should be fine.
>I think you conceptualized it quite well.  /active would only be 
mirrored from building 1 to building 2 over 20GB/sec Infiniband while 
/backup would be replicated from building 1 to building 2 and remote 
with remote being connected via a 10MB/sec fiber link.> Remember AFR is a real time replication, so what you can't have is
this:
> /active being AFR building 1 to building 2, while also having the 
> building 2 server set up to afr /active to building C.
>
> I belive this would cause the files once they get updated on building 
> 2 to get AFR'ed to building C.   I *think* this would make the afr to 
> building 1 as slow as the building C afr, but I'm not positive.
>That's what I am unsure about as well.  I was hoping that a 
configuration like that (building 1 AFRs to building 2 which then AFRs 
to remote) would take place asynchronously so that replications from 
building 1 to building 2 would be quick and then the data that arrives 
in building 2 would be able to take it's time getting to the remote site 
without causing problems with building 1's filesystem.

>
>> I am interested to hear what people think.
>>
>> Collin Douglas
>> Adfitech, Inc
>>
>>
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: PGP Universal 2.6.3
>> Charset: ISO-8859-1
>>
>> wsBVAwUBSKNEIfvgUY49IQeAAQjeAggAolKbmc5xw9f0BOqp4Uo0NH7VuhK9n8ol
>> D9wquJ85fecI08BfoTuCLOjH7oviayZBCqNC+CzQm21QZP1hTGBisGrJUJ87rscc
>> MLici37YmtQC+ItAWdzqUq33bGgNp+T+HiJbYmX3AE0PY19vC0YUOK9QYM0hMosc
>> cWyeVCfPpM0SM4/ND83zyO6bjv9QkD+JpGoQMOwPMNnY055kdFlWRo7tDaUjhdhN
>> GQkTtoEmNWQL1uzZ4sibUudKD8YXBFaY/GGbRLorrkNJHZepYH3VM/saCMR7pjo+
>> cVh2Mek5F+IdwyH8Zg6Vks73RAQtUaJQHMonx7g7Y2dY/A2l/p3beg=>> =y84r
>> -----END PGP SIGNATURE-----
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>
-----BEGIN PGP SIGNATURE-----
Version: PGP Universal 2.6.3
Charset: ISO-8859-1

wsBVAwUBSKNWtfvgUY49IQeAAQjk2gf9FTVvPSFe8ohlgNSqF+eP4JHiwywtNEE+
d2j2eMguY1UCE4YpupgcvsPq6G75A8Qoig2nGxtu0AMIam6MP2DmNBfLQV7pl1or
ErlxdfdQlX6Z26GBrOUPpiiLvgfZYjX6QRaBe2DqFb4vKSqfn3c1ztLcNLx3dilS
jajPK3oYXid1rY1ZKngbTdiSdCceQ505D1sv6SB2AqDyAVSBvu2vQPgpMt4mMRAJ
AsVgLVTT0yudLTyHOXEkeAt3QHsWLFXqBytgqI8sn7CSfr85jo41JpeCFUyiVP07
DWV6r8r9F8tHLkUILBbCz2SwYClWeKhMjA2KA8emTcWwsPx7ZeOLug==eCoE
-----END PGP SIGNATURE-----

Gluster users - Aug 2008 - VMDK file replication issue

[Gluster-users] VMDK file replication issue

[Gluster-users] VMDK file replication issue

[Gluster-users] VMDK file replication issue

[Gluster-users] GlusterFS replication over WAN