thr3ads.net - Gluster users - [Gluster-users] Replicated striped data lose [Mar 2016]

If this information is useful, please help other people find it:
Share via:

David Gossage

2016-Mar-12 17:11 UTC

[Gluster-users] Replicated striped data lose

On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan <mahdi.adnan at
earthlinktele.com> wrote:
> Both servers have HBA no RAIDs and i can setup a replicated or dispensers
> without any issues.
> Logs are clean and when i tried to migrate a vm and got the error, nothing
> showed up in the logs.
> i tried mounting the volume into my laptop and it mounted fine but, if i
> use dd to create a data file it just hang and i cant cancel it, and i cant
> unmount it or anything, i just have to reboot.
> The same servers have another volume on other bricks in a distributed
> replicas, works fine.
> I have even tried the same setup in a virtual environment (created two vms
> and install gluster and created a replicated striped) and again same thing,
> data corruption.
>
I'd look through mail archives for a topic "Shard in Production" I
think
it's called.  The shard portion may not be relevant but it does discuss
certain settings that had to be applied with regards to avoiding corruption
with VM's.  You may want to try and disable the  performance.readdir-ahead
also.

>
> On 03/12/2016 07:02 PM, David Gossage wrote:
>
>
>
> On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan <
> mahdi.adnan at earthlinktele.com> wrote:
>
>> Thanks David,
>>
>> My settings are all defaults, i have just created the pool and started
it.
>> I have set the settings as your recommendation and it seems to be the
>> same issue;
>>
>> Type: Striped-Replicate
>> Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14
>> Status: Started
>> Number of Bricks: 1 x 2 x 2 = 4
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfs001:/bricks/t1/s
>> Brick2: gfs002:/bricks/t1/s
>> Brick3: gfs001:/bricks/t2/s
>> Brick4: gfs002:/bricks/t2/s
>> Options Reconfigured:
>> performance.stat-prefetch: off
>> network.remote-dio: on
>> cluster.eager-lock: enable
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.readdir-ahead: on
>>
>
>
> Is their a raid controller perhaps doing any caching?
>
> In the gluster logs any errors being reported during migration process?
> Since they aren't in use yet have you tested making just mirrored
bricks
> using different pairings of servers two at a time to see if problem follows
> certain machine or network ports?
>
>
>
>>
>>
>>
>>
>>
>> On 03/12/2016 03:25 PM, David Gossage wrote:
>>
>>
>>
>> On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan <
>> <mahdi.adnan at earthlinktele.com>mahdi.adnan at
earthlinktele.com> wrote:
>>
>>> Dears,
>>>
>>> I have created a replicated striped volume with two bricks and two
>>> servers but I can't use it because when I mount it in ESXi and
try to
>>> migrate a VM to it, the data get corrupted.
>>> Is any one have any idea why is this happening ?
>>>
>>> Dell 2950 x2
>>> Seagate 15k 600GB
>>> CentOS 7.2
>>> Gluster 3.7.8
>>>
>>> Appreciate your help.
>>>
>>
>> Most reports of this I have seen end up being settings related.  Post
>> gluster volume info. Below is what I have seen as most common
recommended
>> settings.
>> I'd hazard a guess you may have some the read ahead cache or
prefetch on.
>>
>> quick-read=off
>> read-ahead=off
>> io-cache=off
>> stat-prefetch=off
>> eager-lock=enable
>> remote-dio=on
>>
>>>
>>> Mahdi Adnan
>>> System Admin
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160312/1150a7cd/attachment.html>

Mahdi Adnan

2016-Mar-13 13:16 UTC

head link

[Gluster-users] Replicated striped data lose

Okay so i have enabled shard in my test volume and it did not help, 
stupidly enough, i have enabled it in a production volume 
"Distributed-Replicate" and it currpted  half of my VMs.
I have updated Gluster to the latest and nothing seems to be changed in 
my situation.
below the info of my volume;

Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: gfs001:/bricks/b001/vmware
Brick2: gfs002:/bricks/b004/vmware
Brick3: gfs001:/bricks/b002/vmware
Brick4: gfs002:/bricks/b005/vmware
Brick5: gfs001:/bricks/b003/vmware
Brick6: gfs002:/bricks/b006/vmware
Options Reconfigured:
performance.strict-write-ordering: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
performance.stat-prefetch: disable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
cluster.eager-lock: enable
features.shard-block-size: 16MB
features.shard: on
performance.readdir-ahead: off


On 03/12/2016 08:11 PM, David Gossage wrote:>
> On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan 
> <mahdi.adnan at earthlinktele.com <mailto:mahdi.adnan at
earthlinktele.com>>
> wrote:
>
>     Both servers have HBA no RAIDs and i can setup a replicated or
>     dispensers without any issues.
>     Logs are clean and when i tried to migrate a vm and got the error,
>     nothing showed up in the logs.
>     i tried mounting the volume into my laptop and it mounted fine
>     but, if i use dd to create a data file it just hang and i cant
>     cancel it, and i cant unmount it or anything, i just have to reboot.
>     The same servers have another volume on other bricks in a
>     distributed replicas, works fine.
>     I have even tried the same setup in a virtual environment (created
>     two vms and install gluster and created a replicated striped) and
>     again same thing, data corruption.
>
>
> I'd look through mail archives for a topic "Shard in
Production" I
> think it's called.  The shard portion may not be relevant but it does 
> discuss certain settings that had to be applied with regards to 
> avoiding corruption with VM's.  You may want to try and disable the 
>  performance.readdir-ahead also.
>
>
>
>     On 03/12/2016 07:02 PM, David Gossage wrote:
>>
>>
>>     On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan
>>     <mahdi.adnan at earthlinktele.com
>>     <mailto:mahdi.adnan at earthlinktele.com>> wrote:
>>
>>         Thanks David,
>>
>>         My settings are all defaults, i have just created the pool
>>         and started it.
>>         I have set the settings as your recommendation and it seems
>>         to be the same issue;
>>
>>         Type: Striped-Replicate
>>         Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14
>>         Status: Started
>>         Number of Bricks: 1 x 2 x 2 = 4
>>         Transport-type: tcp
>>         Bricks:
>>         Brick1: gfs001:/bricks/t1/s
>>         Brick2: gfs002:/bricks/t1/s
>>         Brick3: gfs001:/bricks/t2/s
>>         Brick4: gfs002:/bricks/t2/s
>>         Options Reconfigured:
>>         performance.stat-prefetch: off
>>         network.remote-dio: on
>>         cluster.eager-lock: enable
>>         performance.io-cache: off
>>         performance.read-ahead: off
>>         performance.quick-read: off
>>         performance.readdir-ahead: on
>>
>>
>>     Is their a raid controller perhaps doing any caching?
>>
>>     In the gluster logs any errors being reported during migration
>>     process?
>>     Since they aren't in use yet have you tested making just
mirrored
>>     bricks using different pairings of servers two at a time to see
>>     if problem follows certain machine or network ports?
>>
>>
>>
>>
>>
>>
>>         On 03/12/2016 03:25 PM, David Gossage wrote:
>>>
>>>
>>>         On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan
>>>         <mahdi.adnan at earthlinktele.com
>>>         <mailto:mahdi.adnan at earthlinktele.com>> wrote:
>>>
>>>             Dears,
>>>
>>>             I have created a replicated striped volume with two
>>>             bricks and two servers but I can't use it because
when I
>>>             mount it in ESXi and try to migrate a VM to it, the
data
>>>             get corrupted.
>>>             Is any one have any idea why is this happening ?
>>>
>>>             Dell 2950 x2
>>>             Seagate 15k 600GB
>>>             CentOS 7.2
>>>             Gluster 3.7.8
>>>
>>>             Appreciate your help.
>>>
>>>
>>>         Most reports of this I have seen end up being settings
>>>         related.  Post gluster volume info. Below is what I have
>>>         seen as most common recommended settings.
>>>         I'd hazard a guess you may have some the read ahead
cache or
>>>         prefetch on.
>>>
>>>         quick-read=off
>>>         read-ahead=off
>>>         io-cache=off
>>>         stat-prefetch=off
>>>         eager-lock=enable
>>>         remote-dio=on
>>>
>>>
>>>             Mahdi Adnan
>>>             System Admin
>>>
>>>
>>>             _______________________________________________
>>>             Gluster-users mailing list
>>>             Gluster-users at gluster.org <mailto:Gluster-users
at gluster.org>
>>>             http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160313/3cf6e277/attachment.html>

Gluster users - Mar 2016 - Replicated striped data lose

[Gluster-users] Replicated striped data lose

[Gluster-users] Replicated striped data lose