thr3ads.net - Gluster users - [Gluster-users] Strange file corruption [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Lindsay Mathieson

2015-Dec-08 01:59 UTC

[Gluster-users] Strange file corruption

Hi Udo, thanks for posting your volume info settings. Please note for 
the following, I am not one of the devs, just a user, so unfortunately I 
have no authoritative answers :(

I am running a very similar setup - Proxmox 4.0, three nodes, but using 
ceph for our production storage. Am heavily testing gluster 3.7 on the 
side. We find the performance of ceph slow on these small setups and 
management of it a PITA.

Some more questions

- how are your VM images being accessed by Proxmox? gfapi? (Proxmox 
Gluster storage type) or by using the fuse mount?

- whats your underlying filesystem (ext4, zfs etc)

- Are you using the HA/Watchdog system in Proxmox?

On 07/12/15 21:03, Udo Giacomozzi wrote:> esterday I had a strange situation where Gluster healing corrupted 
> *all* my VM images.
>
>
> In detail:
> I had about 15 VMs running (in Proxmox 4.0) totaling about 600 GB of 
> qcow2 images. Gluster is used as storage for those images in replicate 
> 3 setup (ie. 3 physical servers replicating all data).
> All VMs were running on machine #1 - the two other machines (#2 and 
> #3) were *idle*.
> Gluster was fully operating (no healing) when I rebooted machine #2.
> For other reasons I had to reboot machines #2 and #3 a few times, but 
> since all VMs were running on machine #1 and nothing on the other 
> machines was accessing Gluster files, I was confident that this 
> wouldn't disturb Gluster.
> But anyway this means that I rebootet Gluster nodes during a healing 
> process.
>
> After a few minutes, Gluster files began showing corruption - up to 
> the point that the qcow2 files became unreadable and all VMs stopped 
> working. 

:( sounds painful - my sympathies.

You're running 3.5.2 - thats getting rather old. I use the gluster 
debian repos:

   3.6.7 : 
http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/Debian/
   3.7.6 : 
http://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/jessie/

3.6.x is the latest stable, 3.7 is close to stable(?) 3.7 has some nice 
new features such as sharding, which is very useful for VM hosting - it 
enables much faster heal times.

Regards what happened with your VM's, I'm not sure. Having two servers 
down should have disabled the entire store making it not readable or 
writable. I note that you are missing some settings that need to be set 
for VM stores - there will be corruption problems if you live migrate 
without them.

quick-read=off
read-ahead=off
io-cache=off
stat-prefetch=off
eager-lock=enable
remote-dio=enable
quorum-type=auto
server-quorum-type=server

"stat-prefetch=off" is particularly important.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151208/38411c27/attachment.html>

Krutika Dhananjay

2015-Dec-08 06:57 UTC

head link

[Gluster-users] Strange file corruption

----- Original Message -----
> From: "Lindsay Mathieson" <lindsay.mathieson at gmail.com>
> To: gluster-users at gluster.org
> Sent: Tuesday, December 8, 2015 7:29:00 AM
> Subject: Re: [Gluster-users] Strange file corruption
> Hi Udo, thanks for posting your volume info settings. Please note for the
> following, I am not one of the devs, just a user, so unfortunately I have
no
> authoritative answers :(
> I am running a very similar setup - Proxmox 4.0, three nodes, but using
ceph
> for our production storage. Am heavily testing gluster 3.7 on the side. We
> find the performance of ceph slow on these small setups and management of
it
> a PITA.
> Some more questions
> - how are your VM images being accessed by Proxmox? gfapi? (Proxmox Gluster
> storage type) or by using the fuse mount?
> - whats your underlying filesystem (ext4, zfs etc)
> - Are you using the HA/Watchdog system in Proxmox?
> On 07/12/15 21:03, Udo Giacomozzi wrote:
> > esterday I had a strange situation where Gluster healing corrupted *
all *
> > my
> > VM images.
> 
> > In detail:
> 
> > I had about 15 VMs running (in Proxmox 4.0) totaling about 600 GB of
qcow2
> > images. Gluster is used as storage for those images in replicate 3
setup
> > (ie. 3 physical servers replicating all data).
> 
> > All VMs were running on machine #1 - the two other machines (#2 and
#3)
> > were
> > * idle * .
> 
> > Gluster was fully operating (no healing) when I rebooted machine #2.
> 
> > For other reasons I had to reboot machines #2 and #3 a few times, but
since
> > all VMs were running on machine #1 and nothing on the other machines
was
> > accessing Gluster files, I was confident that this wouldn't
disturb
> > Gluster.
> 
> > But anyway this means that I rebootet Gluster nodes during a healing
> > process.
> 
> > After a few minutes, Gluster files began showing corruption - up to
the
> > point
> > that the qcow2 files became unreadable and all VMs stopped working.
> 
> :( sounds painful - my sympathies.
> You're running 3.5.2 - thats getting rather old. I use the gluster
debian
> repos:
> 3.6.7 :
http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/Debian/
> 3.7.6 :
> http://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/jessie/
> 3.6.x is the latest stable, 3.7 is close to stable(?) 3.7 has some nice new
> features such as sharding, which is very useful for VM hosting - it enables
> much faster heal times.
> Regards what happened with your VM's, I'm not sure. Having two
servers down
> should have disabled the entire store making it not readable or writable. I
> note that you are missing some settings that need to be set for VM stores -
> there will be corruption problems if you live migrate without them.
> quick-read=off
> read-ahead=off
> io-cache=off
> stat-prefetch=off
> eager-lock=enable
> remote-dio=enable
> quorum-type=auto
> server-quorum-type=server
Perfectly put. I am one of the devs who work on replicate module. You can
alternatively enable this configuration in one shot using the following command
for VM workloads:
# gluster volume set <VOLNAME> group virt 

Like Lindsay mentioned, it is not clear why the brick on the first node did not
go down when the other two nodes were rebooted, given that server quorum is
enabled on your volume.
Client-quorum option seems to be missing in your volume. Once you enable it, it
is necessary for at least two bricks per replica set to be online for the writes
to happen on the volume.
If two nodes are down, the volume becomes read-only, causing the VMs to possibly
go into a paused state.

-Krutika 
> "stat-prefetch=off" is particularly important.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151208/9be025c1/attachment.html>

Udo Giacomozzi

2015-Dec-09 14:41 UTC

head link

[Gluster-users] Strange file corruption

Am 08.12.2015 um 02:59 schrieb Lindsay Mathieson:> Hi Udo, thanks for posting your volume info settings. Please note for 
> the following, I am not one of the devs, just a user, so unfortunately 
> I have no authoritative answers :(
>
> I am running a very similar setup - Proxmox 4.0, three nodes, but 
> using ceph for our production storage. Am heavily testing gluster 3.7 
> on the side. We find the performance of ceph slow on these small 
> setups and management of it a PITA.
>
>
> Some more questions
>
> - how are your VM images being accessed by Proxmox? gfapi? (Proxmox 
> Gluster storage type) or by using the fuse mount?
>
Sorry, forgot to say that: I'm accessing the Gluster Storage via NFS 
since (at least in version 3.4 of Proxmox) the gfapi method has some 
problems with sockets.
> - whats your underlying filesystem (ext4, zfs etc)
a dedicated ext4 partition>
> - Are you using the HA/Watchdog system in Proxmox?
I am now (watchdog HA), but Proxmox was running in non-HA mode at the 
time of failure.

>
>
>
> On 07/12/15 21:03, Udo Giacomozzi wrote:
>> esterday I had a strange situation where Gluster healing corrupted 
>> *all* my VM images.
>>
>>
>> In detail:
>> I had about 15 VMs running (in Proxmox 4.0) totaling about 600 GB of 
>> qcow2 images. Gluster is used as storage for those images in 
>> replicate 3 setup (ie. 3 physical servers replicating all data).
>> All VMs were running on machine #1 - the two other machines (#2 and 
>> #3) were *idle*.
>> Gluster was fully operating (no healing) when I rebooted machine #2.
>> For other reasons I had to reboot machines #2 and #3 a few times, but 
>> since all VMs were running on machine #1 and nothing on the other 
>> machines was accessing Gluster files, I was confident that this 
>> wouldn't disturb Gluster.
>> But anyway this means that I rebootet Gluster nodes during a healing 
>> process.
>>
>> After a few minutes, Gluster files began showing corruption - up to 
>> the point that the qcow2 files became unreadable and all VMs stopped 
>> working. 
>
>
> :( sounds painful - my sympathies.
>
> You're running 3.5.2 - thats getting rather old. I use the gluster 
> debian repos:
>
>   3.6.7 : 
> http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/Debian/
>   3.7.6 : 
> http://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/jessie/
>
> 3.6.x is the latest stable, 3.7 is close to stable(?) 3.7 has some 
> nice new features such as sharding, which is very useful for VM 
> hosting - it enables much faster heal times.
I'm using the most current version provided by Proxmox APT sources.

Can 3.6 or even 3.7 Gluster nodes work together with 3.5 node? If not 
I'm wondering how I could upgrade...

You might understand that I hesitate a bit to upgrade Gluster without 
having some certainty that it won't make things even worse. I mean, this 
is a production system..
>
> Regards what happened with your VM's, I'm not sure. Having two
servers
> down should have disabled the entire store making it not readable or 
> writable. 
I'm not sure if both servers were down at the same time (could be, 
though). I'm just sure that I rebootet them rather quickly in sequence.

Right now my credo is "/never ever reboot/shutdown more than 1 node at a 
time and most importantly, always make sure that no Gluster healing is 
in progress/". For sure, I did not respect that when I crashed my storage.
> I note that you are missing some settings that need to be set for VM 
> stores - there will be corruption problems if you live migrate without 
> them.
>
> quick-read=off
> read-ahead=off
> io-cache=off
> stat-prefetch=off
> eager-lock=enable
> remote-dio=enable
> quorum-type=auto
> server-quorum-type=server
>
>
> "stat-prefetch=off" is particularly important.
>
Thanks. Is there a document that explains the reasoning behind this config?

Does this apply to volumes for virtual HDD images only? My
"docker-repo"
is still "replicate 3" type but is used by the VMs themselves, not by 
the hypervisor - I guess other settings apply there..

Thanks a lot,
Udo

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151209/51da59ad/attachment.html>

Gluster users - Dec 2015 - Strange file corruption

[Gluster-users] Strange file corruption

[Gluster-users] Strange file corruption

[Gluster-users] Strange file corruption