Hi all, I'm in the process of evaluating GlusterFS as a clustered file system, I like it very much because -among the other cool features- it's very easy to configure and it allows me to reuse the filesystems I already know as storage backends. Before trying it on expensive hardware, I decided to try it on a very low HW configuration: - 2 old PCs (one P4 class CPU, IDE drives, one 100 Mbps ethernet card) and a 100 Mbps switch. The OS is Debian 'lenny' in both nodes. 'Lenny' comes with FUSE v2.7.2. I then compiled glusterfs 1.3.10 on both nodes and setup a server-side, single-process AFR configuration (the file content is reported below). I did NOT use the glusterfs-patched FUSE library. On top of that I've put some VMWare Server virtual machines. Each virtual machine image is split into a few 2 GB "vmdk" files (not preallocated). I was successful in starting up and running my virtual machines (with only an additional line in their configuration files), so I was very happy with it. The problem now is, after putting a virtual machine under "intense" I/O, when I rebooted it today I found its root filesystem (=the vmdk) was corrupted. It lost some important directories (e.g. kernel modules directory under /lib/modules). Just to give you a little more detail of the behaviour under I/O, when the virtual machine is doing I/O to the VMDK file, iptraf shows the corresponding traffic on the ethernet link at about 15-50 Mbps, so it looks like only the modified portions of the file are being sent to the other AFR node, infact if I simulate a failure by powering off the other AFR node, at reboot I see 90 Mbps (link saturation) traffic as I try to open the VMDK file, and that operation blocks until full synchronization has finished. The glusterfs.log content is as follows: [...] 2008-08-06 12:20:24 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning EAGAIN 2008-08-06 12:20:24 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) op_ret=-1 op_errno=11 2008-08-06 12:42:25 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning EAGAIN 2008-08-06 12:42:25 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) op_ret=-1 op_errno=11 2008-08-07 11:42:17 E [posix-locks.c:1180:pl_forget] gfs-ds-locks: Active locks found! The above log does not seem to justify such file corruption... there is nothing related to "vmdk" files. Any suggestions? Is the HW configuration way too slow for afr to work reliably? Are there mistakes in the configuration file? Any help is really appreciated. Kind regards, Marco ----------GlusterFS config file ---------------- # dataspace on storage1 volume gfs-ds type storage/posix option directory /mnt/hda7/gfs-ds end-volume # posix locks volume gfs-ds-locks type features/posix-locks subvolumes gfs-ds end-volume volume gfs-ds-threads type performance/io-threads option thread-count 1 option cache-size 32MB subvolumes gfs-ds-locks end-volume volume server type protocol/server option transport-type tcp/server subvolumes gfs-ds-threads # storage network access only option auth.ip.gfs-ds-threads.allow * option auth.ip.gfs-ds-afr.allow * end-volume # dataspace on storage2 volume gfs-storage2-ds type protocol/client option transport-type tcp/client option remote-host <the other node's IP> # storage network option remote-subvolume gfs-ds-threads option transport-timeout 10 # value in seconds; it should be set relatively low end-volume # automatic file replication translator for dataspace volume gfs-ds-afr type cluster/afr subvolumes gfs-ds-locks gfs-storage2-ds # local and remote dataspaces end-volume volume writebehind type performance/write-behind option aggregate-size 128kB subvolumes gfs-ds-afr end-volume volume readahead type performance/read-ahead option page-size 64kB option page-count 16 subvolumes writebehind end-volume -------------
Hi Marco, There is nothing suspicious in Configuration. Can you attach the complete logfiles of both glusterfs process? regards, On Thu, Aug 7, 2008 at 9:19 PM, Marco Trevisan <marco.trevisan at cardinis.com>wrote:> Hi all, > > I'm in the process of evaluating GlusterFS as a clustered file system, I > like it very much because -among the other cool features- it's very easy > to configure and it allows me to reuse the filesystems I already know as > storage backends. > > Before trying it on expensive hardware, I decided to try it on a very > low HW configuration: > > - 2 old PCs (one P4 class CPU, IDE drives, one 100 Mbps ethernet card) > and a 100 Mbps switch. > > The OS is Debian 'lenny' in both nodes. 'Lenny' comes with FUSE v2.7.2. > > I then compiled glusterfs 1.3.10 on both nodes and setup a server-side, > single-process AFR configuration (the file content is reported below). > I did NOT use the glusterfs-patched FUSE library. > > On top of that I've put some VMWare Server virtual machines. Each > virtual machine image is split into a few 2 GB "vmdk" files (not > preallocated). > > I was successful in starting up and running my virtual machines (with > only an additional line in their configuration files), so I was very > happy with it. > > The problem now is, after putting a virtual machine under "intense" > I/O, when I rebooted it today I found its root filesystem (=the vmdk) > was corrupted. It lost some important directories (e.g. kernel modules > directory under /lib/modules). > > Just to give you a little more detail of the behaviour under I/O, when > the virtual machine is doing I/O to the VMDK file, iptraf shows the > corresponding traffic on the ethernet link at about 15-50 Mbps, so it > looks like only the modified portions of the file are being sent to the > other AFR node, infact if I simulate a failure by powering off the other > AFR node, at reboot I see 90 Mbps (link saturation) traffic as I try to > open the VMDK file, and that operation blocks until full synchronization > has finished. > > The glusterfs.log content is as follows: > [...] > 2008-08-06 12:20:24 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning > EAGAIN > 2008-08-06 12:20:24 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: > (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) > op_ret=-1 op_errno=11 > 2008-08-06 12:42:25 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning > EAGAIN > 2008-08-06 12:42:25 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: > (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) > op_ret=-1 op_errno=11 > 2008-08-07 11:42:17 E [posix-locks.c:1180:pl_forget] gfs-ds-locks: > Active locks found! > > > The above log does not seem to justify such file corruption... there is > nothing related to "vmdk" files. > > Any suggestions? > Is the HW configuration way too slow for afr to work reliably? > Are there mistakes in the configuration file? > > Any help is really appreciated. > > Kind regards, > Marco > > > ----------GlusterFS config file ---------------- > > # dataspace on storage1 > volume gfs-ds > type storage/posix > option directory /mnt/hda7/gfs-ds > end-volume > > # posix locks > volume gfs-ds-locks > type features/posix-locks > subvolumes gfs-ds > end-volume > > volume gfs-ds-threads > type performance/io-threads > option thread-count 1 > option cache-size 32MB > subvolumes gfs-ds-locks > end-volume > > volume server > type protocol/server > option transport-type tcp/server > subvolumes gfs-ds-threads > # storage network access only > option auth.ip.gfs-ds-threads.allow * > option auth.ip.gfs-ds-afr.allow * > end-volume > > > # dataspace on storage2 > volume gfs-storage2-ds > type protocol/client > option transport-type tcp/client > option remote-host <the other node's IP> # storage network > option remote-subvolume gfs-ds-threads > option transport-timeout 10 # value in seconds; it should be > set relatively low > end-volume > > # automatic file replication translator for dataspace > volume gfs-ds-afr > type cluster/afr > subvolumes gfs-ds-locks gfs-storage2-ds # local and remote > dataspaces > end-volume > > volume writebehind > type performance/write-behind > option aggregate-size 128kB > subvolumes gfs-ds-afr > end-volume > > volume readahead > type performance/read-ahead > option page-size 64kB > option page-count 16 > subvolumes writebehind > end-volume > ------------- > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >-- Raghavendra G A centipede was happy quite, until a toad in fun, Said, "Prey, which leg comes after which?", This raised his doubts to such a pitch, He fell flat into the ditch, Not knowing how to run. -Anonymous -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20080808/b3906def/attachment.html>
Hi Marco, What is the backend filesystem you are using? Does it support extended attributes? you can check the presence of exten ded attribute support using setfattr/getfattr commands. Also the transport-timeout value is too low. Can you test again with transport-timeout set to default value (42s)? regards, On Thu, Aug 7, 2008 at 9:19 PM, Marco Trevisan <marco.trevisan at cardinis.com>wrote:> Hi all, > > I'm in the process of evaluating GlusterFS as a clustered file system, I > like it very much because -among the other cool features- it's very easy > to configure and it allows me to reuse the filesystems I already know as > storage backends. > > Before trying it on expensive hardware, I decided to try it on a very > low HW configuration: > > - 2 old PCs (one P4 class CPU, IDE drives, one 100 Mbps ethernet card) > and a 100 Mbps switch. > > The OS is Debian 'lenny' in both nodes. 'Lenny' comes with FUSE v2.7.2. > > I then compiled glusterfs 1.3.10 on both nodes and setup a server-side, > single-process AFR configuration (the file content is reported below). > I did NOT use the glusterfs-patched FUSE library. > > On top of that I've put some VMWare Server virtual machines. Each > virtual machine image is split into a few 2 GB "vmdk" files (not > preallocated). > > I was successful in starting up and running my virtual machines (with > only an additional line in their configuration files), so I was very > happy with it. > > The problem now is, after putting a virtual machine under "intense" > I/O, when I rebooted it today I found its root filesystem (=the vmdk) > was corrupted. It lost some important directories (e.g. kernel modules > directory under /lib/modules). > > Just to give you a little more detail of the behaviour under I/O, when > the virtual machine is doing I/O to the VMDK file, iptraf shows the > corresponding traffic on the ethernet link at about 15-50 Mbps, so it > looks like only the modified portions of the file are being sent to the > other AFR node, infact if I simulate a failure by powering off the other > AFR node, at reboot I see 90 Mbps (link saturation) traffic as I try to > open the VMDK file, and that operation blocks until full synchronization > has finished. > > The glusterfs.log content is as follows: > [...] > 2008-08-06 12:20:24 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning > EAGAIN > 2008-08-06 12:20:24 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: > (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) > op_ret=-1 op_errno=11 > 2008-08-06 12:42:25 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning > EAGAIN > 2008-08-06 12:42:25 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr: > (path=/logs-127.0.1.1/.terracotta-logging.lock child=gfs-ds-locks) > op_ret=-1 op_errno=11 > 2008-08-07 11:42:17 E [posix-locks.c:1180:pl_forget] gfs-ds-locks: > Active locks found! > > > The above log does not seem to justify such file corruption... there is > nothing related to "vmdk" files. > > Any suggestions? > Is the HW configuration way too slow for afr to work reliably? > Are there mistakes in the configuration file? > > Any help is really appreciated. > > Kind regards, > Marco > > > ----------GlusterFS config file ---------------- > > # dataspace on storage1 > volume gfs-ds > type storage/posix > option directory /mnt/hda7/gfs-ds > end-volume > > # posix locks > volume gfs-ds-locks > type features/posix-locks > subvolumes gfs-ds > end-volume > > volume gfs-ds-threads > type performance/io-threads > option thread-count 1 > option cache-size 32MB > subvolumes gfs-ds-locks > end-volume > > volume server > type protocol/server > option transport-type tcp/server > subvolumes gfs-ds-threads > # storage network access only > option auth.ip.gfs-ds-threads.allow * > option auth.ip.gfs-ds-afr.allow * > end-volume > > > # dataspace on storage2 > volume gfs-storage2-ds > type protocol/client > option transport-type tcp/client > option remote-host <the other node's IP> # storage network > option remote-subvolume gfs-ds-threads > option transport-timeout 10 # value in seconds; it should be > set relatively low > end-volume > > # automatic file replication translator for dataspace > volume gfs-ds-afr > type cluster/afr > subvolumes gfs-ds-locks gfs-storage2-ds # local and remote > dataspaces > end-volume > > volume writebehind > type performance/write-behind > option aggregate-size 128kB > subvolumes gfs-ds-afr > end-volume > > volume readahead > type performance/read-ahead > option page-size 64kB > option page-count 16 > subvolumes writebehind > end-volume > ------------- > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >-- Raghavendra G A centipede was happy quite, until a toad in fun, Said, "Prey, which leg comes after which?", This raised his doubts to such a pitch, He fell flat into the ditch, Not knowing how to run. -Anonymous -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20080812/7b6467f0/attachment.html>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Keith Freedman wrote:> At 01:29 PM 8/13/2008, Collin Douglas wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> How does GlusterFS' AFR translator work over a WAN connection? I know >> the simple answer is that it depends on the connection what's being >> replicated but I'm interested in understanding how it works in a low >> bandwidth situation. > > it works fine from my personal experience. you can probably move data > at close to wire speed. I think you'll have different experiences > depending on whether or not you're using server or client AFR. > My personal preference is to use server based AFR, only because this > really simplifies client configuration and when you have multiple > clients acting on the same data sets, when you set up new clients, if > they're not AFRing correctly you can get some out of sync data. > >> We are building a new storage model for our imaging system in which >> GlusterFS is a candidate. >> >> Current design ideas consist of a tier 1 storage where our current >> "pipeline" of active files would be stored and operated on. Once the >> files are in a relative static state, they would be moved to tier 2 >> storage. It's this tier 2 storage that would need WAN replication. In >> this way we would limit the amount of data somewhat that has to be >> replicated as the pipeline files are in an almost constant state of >> flux. >> >> The idea is to have an array on site (let's call it building 1), one >> next door in another building (building 2) and one a few hundred miles >> away. The local arrays will be connected via Infiniband or fiber and >> the remote one via a 10MB link. >> >> It seems logical that I would want to replicate from building 1 to >> building 2 and then have another AFR configuration at building 2 to >> replicate to the remote site. Does this fit best practices for >> GlusterFS or should I use more of a hub and spoke method? > > I'm not sure I'm conceptualizing your setup quite right. . if this is > what you mean: > for live active files that are being worked on, AFR building 1 to > building 2. Lets call this /active > so the /active filesystem would be an AFR fs using servers in building > 1 and 2. your 10MB infiniband should be more than sufficient. > For the completed files to have them replicated, you move them to > /backup which is a volume afr'ed from building 2 to offsite building > C. This is probably over lan speed? hopefully faster than T1, but > you're moving smaller amounts of data it should be fine. >I think you conceptualized it quite well. /active would only be mirrored from building 1 to building 2 over 20GB/sec Infiniband while /backup would be replicated from building 1 to building 2 and remote with remote being connected via a 10MB/sec fiber link.> Remember AFR is a real time replication, so what you can't have is this: > /active being AFR building 1 to building 2, while also having the > building 2 server set up to afr /active to building C. > > I belive this would cause the files once they get updated on building > 2 to get AFR'ed to building C. I *think* this would make the afr to > building 1 as slow as the building C afr, but I'm not positive. >That's what I am unsure about as well. I was hoping that a configuration like that (building 1 AFRs to building 2 which then AFRs to remote) would take place asynchronously so that replications from building 1 to building 2 would be quick and then the data that arrives in building 2 would be able to take it's time getting to the remote site without causing problems with building 1's filesystem.> >> I am interested to hear what people think. >> >> Collin Douglas >> Adfitech, Inc >> >> >> >> -----BEGIN PGP SIGNATURE----- >> Version: PGP Universal 2.6.3 >> Charset: ISO-8859-1 >> >> wsBVAwUBSKNEIfvgUY49IQeAAQjeAggAolKbmc5xw9f0BOqp4Uo0NH7VuhK9n8ol >> D9wquJ85fecI08BfoTuCLOjH7oviayZBCqNC+CzQm21QZP1hTGBisGrJUJ87rscc >> MLici37YmtQC+ItAWdzqUq33bGgNp+T+HiJbYmX3AE0PY19vC0YUOK9QYM0hMosc >> cWyeVCfPpM0SM4/ND83zyO6bjv9QkD+JpGoQMOwPMNnY055kdFlWRo7tDaUjhdhN >> GQkTtoEmNWQL1uzZ4sibUudKD8YXBFaY/GGbRLorrkNJHZepYH3VM/saCMR7pjo+ >> cVh2Mek5F+IdwyH8Zg6Vks73RAQtUaJQHMonx7g7Y2dY/A2l/p3beg=>> =y84r >> -----END PGP SIGNATURE----- >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > >-----BEGIN PGP SIGNATURE----- Version: PGP Universal 2.6.3 Charset: ISO-8859-1 wsBVAwUBSKNWtfvgUY49IQeAAQjk2gf9FTVvPSFe8ohlgNSqF+eP4JHiwywtNEE+ d2j2eMguY1UCE4YpupgcvsPq6G75A8Qoig2nGxtu0AMIam6MP2DmNBfLQV7pl1or ErlxdfdQlX6Z26GBrOUPpiiLvgfZYjX6QRaBe2DqFb4vKSqfn3c1ztLcNLx3dilS jajPK3oYXid1rY1ZKngbTdiSdCceQ505D1sv6SB2AqDyAVSBvu2vQPgpMt4mMRAJ AsVgLVTT0yudLTyHOXEkeAt3QHsWLFXqBytgqI8sn7CSfr85jo41JpeCFUyiVP07 DWV6r8r9F8tHLkUILBbCz2SwYClWeKhMjA2KA8emTcWwsPx7ZeOLug==eCoE -----END PGP SIGNATURE-----