thr3ads.net - Gluster users - [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Krutika Dhananjay

2016-Oct-20 11:13 UTC

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

Thanks a lot, Lindsay! Appreciate the help.

It would be awesome if you could tell us whether you
see the issue with FUSE as well, while we get around
to setting up the environment and running the test ourselves.

-Krutika

On Thu, Oct 20, 2016 at 2:57 AM, Lindsay Mathieson <
lindsay.mathieson at gmail.com> wrote:
> On 20/10/2016 7:01 AM, Kevin Lemonnier wrote:
>
>> Yes, you need to add a full replica set at once.
>> I don't remember, but according to my history, looks like I've
used this :
>>
>> gluster volume add-brick VMs host1:/brick host2:/brick host3:/brick
force
>>
>> (I have the same without force just before that, so I assume force is
>> needed)
>>
>
> Ok, I did a:
>
> gluster volume add-brick datastore1
vna.proxmox.softlog:/tank/vmdata/datastore1-2
> vnb.proxmox.softlog:/tank/vmdata/datastore1-2
> vng.proxmox.softlog:/tank/vmdata/datastore1-2
>
> I had added a 2nd windows VM as well.
>
> Looked like it was going ok for a while, then blew up. The first windows
> vm which was running diskmark died and won't boot. qemu-img check shows
the
> image hopelessly corrupted. 2nd VM has also crashed and is unbootable,
> though qemuimg shows the qcow2 file as ok.
>
>
> I have a sneaking suspicion its related to active IO. VM1 was doing heavy
> io compared to vm2, perhaps thats while is image was corrupted worse.
>
>
> rebalance status looks odd to me:
>
> root at vna:~# gluster volume rebalance datastore1 status
>                                     Node Rebalanced-files          size
>    scanned      failures skipped               status  run time in h:m:s
>                                --------- -----------   -----------
>  -----------   ----------- -----------         ------------
>  --------------
>                                localhost 0        0Bytes             0
>          0 0            completed        0:0:1
>                      vnb.proxmox.softlog 0        0Bytes             0
>          0 0            completed        0:0:1
>                      vng.proxmox.softlog 328        19.2GB          1440
>            0 0          in progress        0:11:55
>
>
> Don't know why vng is taking so much longer, the nodes are identical.
But
> maybe this normal?
>
>
> When I get time, I'll try again with:
>
> - all vm's shutdown (no IO)
>
> - All VM's running off the gluster fuse mount (no gfapi).
>
>
> cheers,
>
>
> --
> Lindsay Mathieson
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161020/b03d9c0a/attachment.html>

Lindsay Mathieson

2016-Oct-21 02:39 UTC

head link

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

And now I have it all setup for logging etc I can't reproduce the error :(

Though I did manage to score a "volume rebalance: teststore1: failed:
Another transaction is in progress for teststore1. Please try again
after sometime" problem. No gluster commands would work after that. I
had to restart the glusterfsd service.

On 20 October 2016 at 21:13, Krutika Dhananjay <kdhananj at redhat.com>
wrote:> Thanks a lot, Lindsay! Appreciate the help.
>
> It would be awesome if you could tell us whether you
> see the issue with FUSE as well, while we get around
> to setting up the environment and running the test ourselves.
>
> -Krutika
>
> On Thu, Oct 20, 2016 at 2:57 AM, Lindsay Mathieson
> <lindsay.mathieson at gmail.com> wrote:
>>
>> On 20/10/2016 7:01 AM, Kevin Lemonnier wrote:
>>>
>>> Yes, you need to add a full replica set at once.
>>> I don't remember, but according to my history, looks like
I've used this
>>> :
>>>
>>> gluster volume add-brick VMs host1:/brick host2:/brick host3:/brick
force
>>>
>>> (I have the same without force just before that, so I assume force
is
>>> needed)
>>
>>
>> Ok, I did a:
>>
>> gluster volume add-brick datastore1
>> vna.proxmox.softlog:/tank/vmdata/datastore1-2
>> vnb.proxmox.softlog:/tank/vmdata/datastore1-2
>> vng.proxmox.softlog:/tank/vmdata/datastore1-2
>>
>> I had added a 2nd windows VM as well.
>>
>> Looked like it was going ok for a while, then blew up. The first
windows
>> vm which was running diskmark died and won't boot. qemu-img check
shows the
>> image hopelessly corrupted. 2nd VM has also crashed and is unbootable,
>> though qemuimg shows the qcow2 file as ok.
>>
>>
>> I have a sneaking suspicion its related to active IO. VM1 was doing
heavy
>> io compared to vm2, perhaps thats while is image was corrupted worse.
>>
>>
>> rebalance status looks odd to me:
>>
>> root at vna:~# gluster volume rebalance datastore1 status
>>                                     Node Rebalanced-files          size
>> scanned      failures skipped               status  run time in h:m:s
>>                                --------- -----------   -----------
>> -----------   ----------- -----------         ------------
>> --------------
>>                                localhost 0        0Bytes             0
>> 0 0            completed        0:0:1
>>                      vnb.proxmox.softlog 0        0Bytes             0
>> 0 0            completed        0:0:1
>>                      vng.proxmox.softlog 328        19.2GB         
1440
>> 0 0          in progress        0:11:55
>>
>>
>> Don't know why vng is taking so much longer, the nodes are
identical. But
>> maybe this normal?
>>
>>
>> When I get time, I'll try again with:
>>
>> - all vm's shutdown (no IO)
>>
>> - All VM's running off the gluster fuse mount (no gfapi).
>>
>>
>> cheers,
>>
>>
>> --
>> Lindsay Mathieson
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>


-- 
Lindsay

Lindsay Mathieson

2016-Oct-23 00:17 UTC

head link

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

On 20/10/2016 9:13 PM, Krutika Dhananjay wrote:> It would be awesome if you could tell us whether you
> see the issue with FUSE as well, while we get around
> to setting up the environment and running the test ourselves.
I just managed to replicate the exact same error using the fuse mount

-- 
Lindsay Mathieson

Gluster users - Oct 2016 - [URGENT] Add-bricks to a volume corrupted the files

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files