thr3ads.net - Gluster users - [Gluster-users] gluster remove-brick [Feb 2019]

If this information is useful, please help other people find it:
Share via:

mohammad kashif

2019-Feb-04 11:09 UTC

[Gluster-users] gluster remove-brick

Hi Nithya

Thanks for replying so quickly. It is very much appreciated.

There are lots if  " [No space left on device] " errors which I can
not
understand as there are much space on all of the nodes.

A little bit of background will be useful in this case. I had cluster of
seven nodes of varying capacity(73, 73, 73, 46, 46, 46,46 TB) .  The
cluster was almost 90% full so every node has almost 8 to 15 TB free
space.  I added two new nodes with 100TB each and ran fix-layout which
completed successfully.

After that I started remove-brick operation.  I don't think that any point
, any of the nodes were 100% full. Looking at my ganglia graph, there is
minimum 5TB always available at every node.

I was keeping an eye on remove-brick status and for very long time there
was no failures and then at some point these 17000 failures appeared and it
stayed like that.

 Thanks

Kashif

Let me explain a little bit of background.

On Mon, Feb 4, 2019 at 5:09 AM Nithya Balachandran <nbalacha at
redhat.com>
wrote:
> Hi,
>
> The status shows quite a few failures. Please check the rebalance logs to
> see why that happened. We can decide what to do based on the errors.
> Once you run a commit, the brick will no longer be part of the volume and
> you will not be able to access those files via the client.
> Do you have sufficient space on the remaining bricks for the files on the
> removed brick?
>
> Regards,
> Nithya
>
> On Mon, 4 Feb 2019 at 03:50, mohammad kashif <kashif.alig at
gmail.com>
> wrote:
>
>> Hi
>>
>> I have a pure distributed gluster volume with nine nodes and trying to
>> remove one node, I ran
>> gluster volume remove-brick atlasglust
>> nodename:/glusteratlas/brick007/gv0 start
>>
>> It completed but with around 17000 failures
>>
>>       Node Rebalanced-files          size       scanned      failures
>>    skipped               status  run time in h:m:s
>>                                ---------      -----------   -----------
>>  -----------   -----------   -----------         ------------
>>  --------------
>>           nodename          4185858        27.5TB       6746030
>>  17488             0            completed      405:15:34
>>
>> I can see that there is still 1.5 TB of data on the node which I was
>> trying to remove.
>>
>> I am not sure what to do now?  Should I run remove-brick command again
so
>> the files which has been failed can be tried again?
>>
>> or should I run commit first and then try to remove node again?
>>
>> Please advise as I don't want to remove files.
>>
>> Thanks
>>
>> Kashif
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/2ca11f7b/attachment.html>

Nithya Balachandran

2019-Feb-04 11:37 UTC

head link

[Gluster-users] gluster remove-brick

Hi,


On Mon, 4 Feb 2019 at 16:39, mohammad kashif <kashif.alig at gmail.com>
wrote:
> Hi Nithya
>
> Thanks for replying so quickly. It is very much appreciated.
>
> There are lots if  " [No space left on device] " errors which I
can not
> understand as there are much space on all of the nodes.
>
This means that Gluster could not find sufficient space for the file. Would
you be willing to share your rebalance log file?
Please provide the following information:

   - The gluster version
   - The gluster volume info for the volume
   - How full are the individual bricks for the volume?


> A little bit of background will be useful in this case. I had cluster of
> seven nodes of varying capacity(73, 73, 73, 46, 46, 46,46 TB) .  The
> cluster was almost 90% full so every node has almost 8 to 15 TB free
> space.  I added two new nodes with 100TB each and ran fix-layout which
> completed successfully.
>
> After that I started remove-brick operation.  I don't think that any
point
> , any of the nodes were 100% full. Looking at my ganglia graph, there is
> minimum 5TB always available at every node.
>
> I was keeping an eye on remove-brick status and for very long time there
> was no failures and then at some point these 17000 failures appeared and it
> stayed like that.
>
>  Thanks
>
> Kashif
>
>
>
>
>
> Let me explain a little bit of background.
>
>
> On Mon, Feb 4, 2019 at 5:09 AM Nithya Balachandran <nbalacha at
redhat.com>
> wrote:
>
>> Hi,
>>
>> The status shows quite a few failures. Please check the rebalance logs
to
>> see why that happened. We can decide what to do based on the errors.
>> Once you run a commit, the brick will no longer be part of the volume
and
>> you will not be able to access those files via the client.
>> Do you have sufficient space on the remaining bricks for the files on
the
>> removed brick?
>>
>> Regards,
>> Nithya
>>
>> On Mon, 4 Feb 2019 at 03:50, mohammad kashif <kashif.alig at
gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> I have a pure distributed gluster volume with nine nodes and trying
to
>>> remove one node, I ran
>>> gluster volume remove-brick atlasglust
>>> nodename:/glusteratlas/brick007/gv0 start
>>>
>>> It completed but with around 17000 failures
>>>
>>>       Node Rebalanced-files          size       scanned     
failures
>>>    skipped               status  run time in h:m:s
>>>                                ---------      -----------  
-----------
>>>  -----------   -----------   -----------         ------------
>>>  --------------
>>>           nodename          4185858        27.5TB       6746030
>>>  17488             0            completed      405:15:34
>>>
>>> I can see that there is still 1.5 TB of data on the node which I
was
>>> trying to remove.
>>>
>>> I am not sure what to do now?  Should I run remove-brick command
again
>>> so the files which has been failed can be tried again?
>>>
>>> or should I run commit first and then try to remove node again?
>>>
>>> Please advise as I don't want to remove files.
>>>
>>> Thanks
>>>
>>> Kashif
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/b819b55e/attachment.html>

Gluster users - Feb 2019 - gluster remove-brick

[Gluster-users] gluster remove-brick

[Gluster-users] gluster remove-brick